[libvirt PATCH v3 00/21] Add support for persistent mediated devices

Jonathon Jongsma posted 21 patches 3 years, 4 months ago
Test syntax-check failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20201224141445.163819-1-jjongsma@redhat.com
There is a newer version of this series
create mode 100644 tests/nodedevmdevctldata/mdevctl-create.argv
create mode 100644 tests/nodedevmdevctldata/mdevctl-list-defined.argv
create mode 100644 tests/nodedevmdevctldata/mdevctl-list-multiple.json
create mode 100644 tests/nodedevmdevctldata/mdevctl-list-multiple.out.xml
[libvirt PATCH v3 00/21] Add support for persistent mediated devices
Posted by Jonathon Jongsma 3 years, 4 months ago
This patch series follows the previously-merged series which added support for
transient mediated devices. This series expands mdev support to include
persistent device definitions. Again, it relies on mdevctl as the backend.

It follows the common libvirt pattern of APIs by adding the following new APIs
for node devices:
    - virNodeDeviceDefineXML() - defines a persistent device
    - virNodeDeviceUndefine() - undefines a persistent device
    - virNodeDeviceCreate() - starts a previously-defined device

It also adds virsh commands mapping to these new APIs: nodedev-define,
nodedev-undefine, and nodedev-start.

Since we rely on mdevctl for the definition of ediated devices, we need a way
to stay up-to-date with devices that are defined by mdevctl (outside of
libvirt).  The method for staying up-to-date is currently a little bit crude
due to the fact that mdevctl does not emit any events when new devices are
added or removed. As a workaround, we create a file monitor for the mdevctl
config directory and re-query mdevctl when we detect changes within that
directory. In the future, mdevctl may introduce a more elegant solution.

changes in v3:
 - streamlined tests -- removed some unnecessary duplication
 - split out patch to factor out node device name generation function
 - split nodeDeviceParseMdevctlChildDevice() into a separate function
 - added follow-up patch to remove space-padded alignment in header
 - refactored the mdevctl update handling significantly:
   - no longer a separate persistent thread that gets signaled by a timer
   - now piggybacks onto the existing udev thread and signals the thread in t=
he
     same way that the udev event does.
   - Daniel suggested spawning a throw-away thread to handle mdevctl updates,
     but that introduces the complexity of possibly serializing multiple
     throw-away threads (e.g. if we get an 'created' event followed immediate=
ly
     by a 'deleted' event, two threads may be spawned and we'd need to ensure
     they are properly ordered)
 - added virNodeDeviceObjListForEach() and virNodeDeviceObjListRemoveLocked()
   to simplify removing devices that are removed from mdevctl.
 - coding style fixes
 - NOTE: per Erik's request, I experimented with changing the way that mdevctl
   commands were generated and tested (e.g. introducing something like
   virMdevctlGetCommand(def, MDEVCTL_COMMAND_<SUBCOMMAND>, ...)), but it was
   too invasive and awkward and didn't seem worthwhile

Changes in v2:
 - rebase to latest git master

Jonathon Jongsma (21):
  tests: remove extra trailing semicolon
  nodedev: introduce concept of 'active' node devices
  nodedev: Add ability to filter by active state
  nodedev: expose internal helper for naming devices
  nodedev: add ability to list and parse defined mdevs
  nodedev: add STOPPED/STARTED lifecycle events
  nodedev: add mdevctl devices to node device list
  nodedev: add helper functions to remove node devices
  nodedev: handle mdevs that disappear from mdevctl
  nodedev: rename dataReady to udevReady
  nodedev: Refresh mdev devices when changes are detected
  api: add virNodeDeviceDefineXML()
  virsh: Add --active, --inactive, --all to nodedev-list
  virsh: add nodedev-define command
  nodedev: refactor tests to support mdev undefine
  api: add virNodeDeviceUndefine()
  virsh: Factor out function to find node device
  virsh: add nodedev-undefine command
  api: add virNodeDeviceCreate()
  virsh: add "nodedev-start" command
  libvirt-nodedev.h: remove space-padded alignment

 examples/c/misc/event-test.c                  |   4 +
 include/libvirt/libvirt-nodedev.h             |  98 ++--
 src/access/viraccessperm.c                    |   2 +-
 src/access/viraccessperm.h                    |   6 +
 src/conf/node_device_conf.h                   |   9 +
 src/conf/virnodedeviceobj.c                   | 132 ++++-
 src/conf/virnodedeviceobj.h                   |  19 +
 src/driver-nodedev.h                          |  14 +
 src/libvirt-nodedev.c                         | 115 ++++
 src/libvirt_private.syms                      |   4 +
 src/libvirt_public.syms                       |   3 +
 src/node_device/node_device_driver.c          | 538 +++++++++++++++++-
 src/node_device/node_device_driver.h          |  38 ++
 src/node_device/node_device_udev.c            | 308 ++++++++--
 src/remote/remote_driver.c                    |   3 +
 src/remote/remote_protocol.x                  |  40 +-
 src/remote_protocol-structs                   |  16 +
 src/rpc/gendispatch.pl                        |   1 +
 ...19_36ea_4111_8f0a_8c9a70e21366-define.argv |   1 +
 ...19_36ea_4111_8f0a_8c9a70e21366-define.json |   1 +
 ...39_495e_4243_ad9f_beb3f14c23d9-define.argv |   1 +
 ...39_495e_4243_ad9f_beb3f14c23d9-define.json |   1 +
 ...16_1ca8_49ac_b176_871d16c13076-define.argv |   1 +
 ...16_1ca8_49ac_b176_871d16c13076-define.json |   1 +
 tests/nodedevmdevctldata/mdevctl-create.argv  |   1 +
 .../mdevctl-list-defined.argv                 |   1 +
 .../mdevctl-list-multiple.json                |  59 ++
 .../mdevctl-list-multiple.out.xml             |  39 ++
 tests/nodedevmdevctltest.c                    | 222 +++++++-
 tools/virsh-nodedev.c                         | 276 +++++++--
 30 files changed, 1765 insertions(+), 189 deletions(-)
 create mode 100644 tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9=
a70e21366-define.argv
 create mode 100644 tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9=
a70e21366-define.json
 create mode 100644 tests/nodedevmdevctldata/mdev_d2441d39_495e_4243_ad9f_beb=
3f14c23d9-define.argv
 create mode 100644 tests/nodedevmdevctldata/mdev_d2441d39_495e_4243_ad9f_beb=
3f14c23d9-define.json
 create mode 100644 tests/nodedevmdevctldata/mdev_fedc4916_1ca8_49ac_b176_871=
d16c13076-define.argv
 create mode 100644 tests/nodedevmdevctldata/mdev_fedc4916_1ca8_49ac_b176_871=
d16c13076-define.json
 create mode 100644 tests/nodedevmdevctldata/mdevctl-create.argv
 create mode 100644 tests/nodedevmdevctldata/mdevctl-list-defined.argv
 create mode 100644 tests/nodedevmdevctldata/mdevctl-list-multiple.json
 create mode 100644 tests/nodedevmdevctldata/mdevctl-list-multiple.out.xml

--=20
2.26.2


Re: [libvirt PATCH v3 00/21] Add support for persistent mediated devices
Posted by Han Han 3 years, 4 months ago
On Thu, Dec 24, 2020 at 10:15 PM Jonathon Jongsma <jjongsma@redhat.com>
wrote:

> This patch series follows the previously-merged series which added support
> for
> transient mediated devices. This series expands mdev support to include
> persistent device definitions. Again, it relies on mdevctl as the backend.
>
> It follows the common libvirt pattern of APIs by adding the following new
> APIs
> for node devices:
>     - virNodeDeviceDefineXML() - defines a persistent device
>     - virNodeDeviceUndefine() - undefines a persistent device
>     - virNodeDeviceCreate() - starts a previously-defined device
>
> It also adds virsh commands mapping to these new APIs: nodedev-define,
> nodedev-undefine, and nodedev-start.
>
Trivial suggestions:
1. Mention the bug to be resolved by this series in commit msg:
https://bugzilla.redhat.com/show_bug.cgi?id=1699274
2. Add doc of man page for the new virsh commands and options

>
> Since we rely on mdevctl for the definition of ediated devices, we need a
> way
> to stay up-to-date with devices that are defined by mdevctl (outside of
> libvirt).  The method for staying up-to-date is currently a little bit
> crude
> due to the fact that mdevctl does not emit any events when new devices are
> added or removed. As a workaround, we create a file monitor for the mdevctl
> config directory and re-query mdevctl when we detect changes within that
> directory. In the future, mdevctl may introduce a more elegant solution.
>
> changes in v3:
>  - streamlined tests -- removed some unnecessary duplication
>  - split out patch to factor out node device name generation function
>  - split nodeDeviceParseMdevctlChildDevice() into a separate function
>  - added follow-up patch to remove space-padded alignment in header
>  - refactored the mdevctl update handling significantly:
>    - no longer a separate persistent thread that gets signaled by a timer
>    - now piggybacks onto the existing udev thread and signals the thread
> in t=
> he
>      same way that the udev event does.
>    - Daniel suggested spawning a throw-away thread to handle mdevctl
> updates,
>      but that introduces the complexity of possibly serializing multiple
>      throw-away threads (e.g. if we get an 'created' event followed
> immediate=
> ly
>      by a 'deleted' event, two threads may be spawned and we'd need to
> ensure
>      they are properly ordered)
>  - added virNodeDeviceObjListForEach() and
> virNodeDeviceObjListRemoveLocked()
>    to simplify removing devices that are removed from mdevctl.
>  - coding style fixes
>  - NOTE: per Erik's request, I experimented with changing the way that
> mdevctl
>    commands were generated and tested (e.g. introducing something like
>    virMdevctlGetCommand(def, MDEVCTL_COMMAND_<SUBCOMMAND>, ...)), but it
> was
>    too invasive and awkward and didn't seem worthwhile
>
> Changes in v2:
>  - rebase to latest git master
>
> Jonathon Jongsma (21):
>   tests: remove extra trailing semicolon
>   nodedev: introduce concept of 'active' node devices
>   nodedev: Add ability to filter by active state
>   nodedev: expose internal helper for naming devices
>   nodedev: add ability to list and parse defined mdevs
>   nodedev: add STOPPED/STARTED lifecycle events
>   nodedev: add mdevctl devices to node device list
>   nodedev: add helper functions to remove node devices
>   nodedev: handle mdevs that disappear from mdevctl
>   nodedev: rename dataReady to udevReady
>   nodedev: Refresh mdev devices when changes are detected
>   api: add virNodeDeviceDefineXML()
>   virsh: Add --active, --inactive, --all to nodedev-list
>   virsh: add nodedev-define command
>   nodedev: refactor tests to support mdev undefine
>   api: add virNodeDeviceUndefine()
>   virsh: Factor out function to find node device
>   virsh: add nodedev-undefine command
>   api: add virNodeDeviceCreate()
>   virsh: add "nodedev-start" command
>   libvirt-nodedev.h: remove space-padded alignment
>
>  examples/c/misc/event-test.c                  |   4 +
>  include/libvirt/libvirt-nodedev.h             |  98 ++--
>  src/access/viraccessperm.c                    |   2 +-
>  src/access/viraccessperm.h                    |   6 +
>  src/conf/node_device_conf.h                   |   9 +
>  src/conf/virnodedeviceobj.c                   | 132 ++++-
>  src/conf/virnodedeviceobj.h                   |  19 +
>  src/driver-nodedev.h                          |  14 +
>  src/libvirt-nodedev.c                         | 115 ++++
>  src/libvirt_private.syms                      |   4 +
>  src/libvirt_public.syms                       |   3 +
>  src/node_device/node_device_driver.c          | 538 +++++++++++++++++-
>  src/node_device/node_device_driver.h          |  38 ++
>  src/node_device/node_device_udev.c            | 308 ++++++++--
>  src/remote/remote_driver.c                    |   3 +
>  src/remote/remote_protocol.x                  |  40 +-
>  src/remote_protocol-structs                   |  16 +
>  src/rpc/gendispatch.pl                        |   1 +
>  ...19_36ea_4111_8f0a_8c9a70e21366-define.argv |   1 +
>  ...19_36ea_4111_8f0a_8c9a70e21366-define.json |   1 +
>  ...39_495e_4243_ad9f_beb3f14c23d9-define.argv |   1 +
>  ...39_495e_4243_ad9f_beb3f14c23d9-define.json |   1 +
>  ...16_1ca8_49ac_b176_871d16c13076-define.argv |   1 +
>  ...16_1ca8_49ac_b176_871d16c13076-define.json |   1 +
>  tests/nodedevmdevctldata/mdevctl-create.argv  |   1 +
>  .../mdevctl-list-defined.argv                 |   1 +
>  .../mdevctl-list-multiple.json                |  59 ++
>  .../mdevctl-list-multiple.out.xml             |  39 ++
>  tests/nodedevmdevctltest.c                    | 222 +++++++-
>  tools/virsh-nodedev.c                         | 276 +++++++--
>  30 files changed, 1765 insertions(+), 189 deletions(-)
>  create mode 100644
> tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9=
> a70e21366-define.argv
>  create mode 100644
> tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9=
> a70e21366-define.json
>  create mode 100644
> tests/nodedevmdevctldata/mdev_d2441d39_495e_4243_ad9f_beb=
> 3f14c23d9-define.argv
>  create mode 100644
> tests/nodedevmdevctldata/mdev_d2441d39_495e_4243_ad9f_beb=
> 3f14c23d9-define.json
>  create mode 100644
> tests/nodedevmdevctldata/mdev_fedc4916_1ca8_49ac_b176_871=
> d16c13076-define.argv
>  create mode 100644
> tests/nodedevmdevctldata/mdev_fedc4916_1ca8_49ac_b176_871=
> d16c13076-define.json
>  create mode 100644 tests/nodedevmdevctldata/mdevctl-create.argv
>  create mode 100644 tests/nodedevmdevctldata/mdevctl-list-defined.argv
>  create mode 100644 tests/nodedevmdevctldata/mdevctl-list-multiple.json
>  create mode 100644 tests/nodedevmdevctldata/mdevctl-list-multiple.out.xml
>
> --=20
> 2.26.2
>
>
>
Re: [libvirt PATCH v3 00/21] Add support for persistent mediated devices
Posted by Yan Fu 3 years, 3 months ago
Tested with v6.10.0-283-g1948d4e61e.

1.Can define/start/destroy mdev device successfully;

2.'virsh nodedev-list' has no '--active' option, which is inconsistent with
the description in the patch:
# virsh nodedev-list --active
error: command 'nodedev-list' doesn't support option --active

3.virsh client hang when trying to destroy a mdev device which is using by
a vm, and after that all 'virsh nodev*' cmds will hang. If restarting
llibvirtd after that, libvirtd will hang.

4.Define a mdev device with the uuid specified, but the mdev device defined
seems using another uuid. Maybe it make a little confusion about the use of
uuid in the xml:
#cat mdev.xml
<device>
  <name>mdev_85531b6d_e5e4_41c1_aa2a_8844717f066a</name>   ****

<path>/sys/devices/pci0000:00/0000:00:02.0/85531b6d-e5e4-41c1-aa2a-8844717f066a</path>
***
  <parent>pci_0000_00_02_0</parent>
  <driver>
    <name>vfio_mdev</name>
  </driver>
  <capability type='mdev'>
    <type id='i915-GVTg_V5_8'/>
    <iommuGroup number='0'/>
  </capability>
</device>

# virsh nodedev-define /root/mdev.xml
Node device ***mdev_6cdbaad0_7215_4741_82b9_c51e1a4decda**** defined from
/root/mdev.xml



--
Tested by Yan Fu(yafu@redhat.com)


On Fri, Dec 25, 2020 at 10:19 AM Han Han <hhan@redhat.com> wrote:

>
>
> On Thu, Dec 24, 2020 at 10:15 PM Jonathon Jongsma <jjongsma@redhat.com>
> wrote:
>
>> This patch series follows the previously-merged series which added
>> support for
>> transient mediated devices. This series expands mdev support to include
>> persistent device definitions. Again, it relies on mdevctl as the backend.
>>
>> It follows the common libvirt pattern of APIs by adding the following new
>> APIs
>> for node devices:
>>     - virNodeDeviceDefineXML() - defines a persistent device
>>     - virNodeDeviceUndefine() - undefines a persistent device
>>     - virNodeDeviceCreate() - starts a previously-defined device
>>
>> It also adds virsh commands mapping to these new APIs: nodedev-define,
>> nodedev-undefine, and nodedev-start.
>>
> Trivial suggestions:
> 1. Mention the bug to be resolved by this series in commit msg:
> https://bugzilla.redhat.com/show_bug.cgi?id=1699274
> 2. Add doc of man page for the new virsh commands and options
>
>>
>> Since we rely on mdevctl for the definition of ediated devices, we need a
>> way
>> to stay up-to-date with devices that are defined by mdevctl (outside of
>> libvirt).  The method for staying up-to-date is currently a little bit
>> crude
>> due to the fact that mdevctl does not emit any events when new devices are
>> added or removed. As a workaround, we create a file monitor for the
>> mdevctl
>> config directory and re-query mdevctl when we detect changes within that
>> directory. In the future, mdevctl may introduce a more elegant solution.
>>
>> changes in v3:
>>  - streamlined tests -- removed some unnecessary duplication
>>  - split out patch to factor out node device name generation function
>>  - split nodeDeviceParseMdevctlChildDevice() into a separate function
>>  - added follow-up patch to remove space-padded alignment in header
>>  - refactored the mdevctl update handling significantly:
>>    - no longer a separate persistent thread that gets signaled by a timer
>>    - now piggybacks onto the existing udev thread and signals the thread
>> in t=
>> he
>>      same way that the udev event does.
>>    - Daniel suggested spawning a throw-away thread to handle mdevctl
>> updates,
>>      but that introduces the complexity of possibly serializing multiple
>>      throw-away threads (e.g. if we get an 'created' event followed
>> immediate=
>> ly
>>      by a 'deleted' event, two threads may be spawned and we'd need to
>> ensure
>>      they are properly ordered)
>>  - added virNodeDeviceObjListForEach() and
>> virNodeDeviceObjListRemoveLocked()
>>    to simplify removing devices that are removed from mdevctl.
>>  - coding style fixes
>>  - NOTE: per Erik's request, I experimented with changing the way that
>> mdevctl
>>    commands were generated and tested (e.g. introducing something like
>>    virMdevctlGetCommand(def, MDEVCTL_COMMAND_<SUBCOMMAND>, ...)), but it
>> was
>>    too invasive and awkward and didn't seem worthwhile
>>
>> Changes in v2:
>>  - rebase to latest git master
>>
>> Jonathon Jongsma (21):
>>   tests: remove extra trailing semicolon
>>   nodedev: introduce concept of 'active' node devices
>>   nodedev: Add ability to filter by active state
>>   nodedev: expose internal helper for naming devices
>>   nodedev: add ability to list and parse defined mdevs
>>   nodedev: add STOPPED/STARTED lifecycle events
>>   nodedev: add mdevctl devices to node device list
>>   nodedev: add helper functions to remove node devices
>>   nodedev: handle mdevs that disappear from mdevctl
>>   nodedev: rename dataReady to udevReady
>>   nodedev: Refresh mdev devices when changes are detected
>>   api: add virNodeDeviceDefineXML()
>>   virsh: Add --active, --inactive, --all to nodedev-list
>>   virsh: add nodedev-define command
>>   nodedev: refactor tests to support mdev undefine
>>   api: add virNodeDeviceUndefine()
>>   virsh: Factor out function to find node device
>>   virsh: add nodedev-undefine command
>>   api: add virNodeDeviceCreate()
>>   virsh: add "nodedev-start" command
>>   libvirt-nodedev.h: remove space-padded alignment
>>
>>  examples/c/misc/event-test.c                  |   4 +
>>  include/libvirt/libvirt-nodedev.h             |  98 ++--
>>  src/access/viraccessperm.c                    |   2 +-
>>  src/access/viraccessperm.h                    |   6 +
>>  src/conf/node_device_conf.h                   |   9 +
>>  src/conf/virnodedeviceobj.c                   | 132 ++++-
>>  src/conf/virnodedeviceobj.h                   |  19 +
>>  src/driver-nodedev.h                          |  14 +
>>  src/libvirt-nodedev.c                         | 115 ++++
>>  src/libvirt_private.syms                      |   4 +
>>  src/libvirt_public.syms                       |   3 +
>>  src/node_device/node_device_driver.c          | 538 +++++++++++++++++-
>>  src/node_device/node_device_driver.h          |  38 ++
>>  src/node_device/node_device_udev.c            | 308 ++++++++--
>>  src/remote/remote_driver.c                    |   3 +
>>  src/remote/remote_protocol.x                  |  40 +-
>>  src/remote_protocol-structs                   |  16 +
>>  src/rpc/gendispatch.pl                        |   1 +
>>  ...19_36ea_4111_8f0a_8c9a70e21366-define.argv |   1 +
>>  ...19_36ea_4111_8f0a_8c9a70e21366-define.json |   1 +
>>  ...39_495e_4243_ad9f_beb3f14c23d9-define.argv |   1 +
>>  ...39_495e_4243_ad9f_beb3f14c23d9-define.json |   1 +
>>  ...16_1ca8_49ac_b176_871d16c13076-define.argv |   1 +
>>  ...16_1ca8_49ac_b176_871d16c13076-define.json |   1 +
>>  tests/nodedevmdevctldata/mdevctl-create.argv  |   1 +
>>  .../mdevctl-list-defined.argv                 |   1 +
>>  .../mdevctl-list-multiple.json                |  59 ++
>>  .../mdevctl-list-multiple.out.xml             |  39 ++
>>  tests/nodedevmdevctltest.c                    | 222 +++++++-
>>  tools/virsh-nodedev.c                         | 276 +++++++--
>>  30 files changed, 1765 insertions(+), 189 deletions(-)
>>  create mode 100644
>> tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9=
>> a70e21366-define.argv
>>  create mode 100644
>> tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9=
>> a70e21366-define.json
>>  create mode 100644
>> tests/nodedevmdevctldata/mdev_d2441d39_495e_4243_ad9f_beb=
>> 3f14c23d9-define.argv
>>  create mode 100644
>> tests/nodedevmdevctldata/mdev_d2441d39_495e_4243_ad9f_beb=
>> 3f14c23d9-define.json
>>  create mode 100644
>> tests/nodedevmdevctldata/mdev_fedc4916_1ca8_49ac_b176_871=
>> d16c13076-define.argv
>>  create mode 100644
>> tests/nodedevmdevctldata/mdev_fedc4916_1ca8_49ac_b176_871=
>> d16c13076-define.json
>>  create mode 100644 tests/nodedevmdevctldata/mdevctl-create.argv
>>  create mode 100644 tests/nodedevmdevctldata/mdevctl-list-defined.argv
>>  create mode 100644 tests/nodedevmdevctldata/mdevctl-list-multiple.json
>>  create mode 100644 tests/nodedevmdevctldata/mdevctl-list-multiple.out.xml
>>
>> --=20
>> 2.26.2
>>
>>
>>
Re: [libvirt PATCH v3 00/21] Add support for persistent mediated devices
Posted by Han Han 3 years, 3 months ago
On Tue, Jan 5, 2021 at 11:50 AM Yan Fu <yafu@redhat.com> wrote:

> Tested with v6.10.0-283-g1948d4e61e.
>
> 1.Can define/start/destroy mdev device successfully;
>
> 2.'virsh nodedev-list' has no '--active' option, which is inconsistent
> with the description in the patch:
> # virsh nodedev-list --active
> error: command 'nodedev-list' doesn't support option --active
>
> 3.virsh client hang when trying to destroy a mdev device which is using by
> a vm, and after that all 'virsh nodev*' cmds will hang. If restarting
> llibvirtd after that, libvirtd will hang.
>
Please provide the domain xml and the libvirtd log by the filter "1:qemu
1:node_device"

>
> 4.Define a mdev device with the uuid specified, but the mdev device
> defined seems using another uuid. Maybe it make a little confusion about
> the use of uuid in the xml:
> #cat mdev.xml
> <device>
>   <name>mdev_85531b6d_e5e4_41c1_aa2a_8844717f066a</name>   ****
>
> <path>/sys/devices/pci0000:00/0000:00:02.0/85531b6d-e5e4-41c1-aa2a-8844717f066a</path>
> ***
>   <parent>pci_0000_00_02_0</parent>
>   <driver>
>     <name>vfio_mdev</name>
>   </driver>
>   <capability type='mdev'>
>     <type id='i915-GVTg_V5_8'/>
>     <iommuGroup number='0'/>
>   </capability>
> </device>
>
> # virsh nodedev-define /root/mdev.xml
> Node device ***mdev_6cdbaad0_7215_4741_82b9_c51e1a4decda**** defined from
> /root/mdev.xml
>
>
>
> --
> Tested by Yan Fu(yafu@redhat.com)
>
It should be:
Tested-by: Yan Fu <yafu@redhat.com>

That means it works for me. If you think there are problems here, the
tested-by should not be given.

>
>
> On Fri, Dec 25, 2020 at 10:19 AM Han Han <hhan@redhat.com> wrote:
>
>>
>>
>> On Thu, Dec 24, 2020 at 10:15 PM Jonathon Jongsma <jjongsma@redhat.com>
>> wrote:
>>
>>> This patch series follows the previously-merged series which added
>>> support for
>>> transient mediated devices. This series expands mdev support to include
>>> persistent device definitions. Again, it relies on mdevctl as the
>>> backend.
>>>
>>> It follows the common libvirt pattern of APIs by adding the following
>>> new APIs
>>> for node devices:
>>>     - virNodeDeviceDefineXML() - defines a persistent device
>>>     - virNodeDeviceUndefine() - undefines a persistent device
>>>     - virNodeDeviceCreate() - starts a previously-defined device
>>>
>>> It also adds virsh commands mapping to these new APIs: nodedev-define,
>>> nodedev-undefine, and nodedev-start.
>>>
>> Trivial suggestions:
>> 1. Mention the bug to be resolved by this series in commit msg:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1699274
>> 2. Add doc of man page for the new virsh commands and options
>>
>>>
>>> Since we rely on mdevctl for the definition of ediated devices, we need
>>> a way
>>> to stay up-to-date with devices that are defined by mdevctl (outside of
>>> libvirt).  The method for staying up-to-date is currently a little bit
>>> crude
>>> due to the fact that mdevctl does not emit any events when new devices
>>> are
>>> added or removed. As a workaround, we create a file monitor for the
>>> mdevctl
>>> config directory and re-query mdevctl when we detect changes within that
>>> directory. In the future, mdevctl may introduce a more elegant solution.
>>>
>>> changes in v3:
>>>  - streamlined tests -- removed some unnecessary duplication
>>>  - split out patch to factor out node device name generation function
>>>  - split nodeDeviceParseMdevctlChildDevice() into a separate function
>>>  - added follow-up patch to remove space-padded alignment in header
>>>  - refactored the mdevctl update handling significantly:
>>>    - no longer a separate persistent thread that gets signaled by a timer
>>>    - now piggybacks onto the existing udev thread and signals the thread
>>> in t=
>>> he
>>>      same way that the udev event does.
>>>    - Daniel suggested spawning a throw-away thread to handle mdevctl
>>> updates,
>>>      but that introduces the complexity of possibly serializing multiple
>>>      throw-away threads (e.g. if we get an 'created' event followed
>>> immediate=
>>> ly
>>>      by a 'deleted' event, two threads may be spawned and we'd need to
>>> ensure
>>>      they are properly ordered)
>>>  - added virNodeDeviceObjListForEach() and
>>> virNodeDeviceObjListRemoveLocked()
>>>    to simplify removing devices that are removed from mdevctl.
>>>  - coding style fixes
>>>  - NOTE: per Erik's request, I experimented with changing the way that
>>> mdevctl
>>>    commands were generated and tested (e.g. introducing something like
>>>    virMdevctlGetCommand(def, MDEVCTL_COMMAND_<SUBCOMMAND>, ...)), but it
>>> was
>>>    too invasive and awkward and didn't seem worthwhile
>>>
>>> Changes in v2:
>>>  - rebase to latest git master
>>>
>>> Jonathon Jongsma (21):
>>>   tests: remove extra trailing semicolon
>>>   nodedev: introduce concept of 'active' node devices
>>>   nodedev: Add ability to filter by active state
>>>   nodedev: expose internal helper for naming devices
>>>   nodedev: add ability to list and parse defined mdevs
>>>   nodedev: add STOPPED/STARTED lifecycle events
>>>   nodedev: add mdevctl devices to node device list
>>>   nodedev: add helper functions to remove node devices
>>>   nodedev: handle mdevs that disappear from mdevctl
>>>   nodedev: rename dataReady to udevReady
>>>   nodedev: Refresh mdev devices when changes are detected
>>>   api: add virNodeDeviceDefineXML()
>>>   virsh: Add --active, --inactive, --all to nodedev-list
>>>   virsh: add nodedev-define command
>>>   nodedev: refactor tests to support mdev undefine
>>>   api: add virNodeDeviceUndefine()
>>>   virsh: Factor out function to find node device
>>>   virsh: add nodedev-undefine command
>>>   api: add virNodeDeviceCreate()
>>>   virsh: add "nodedev-start" command
>>>   libvirt-nodedev.h: remove space-padded alignment
>>>
>>>  examples/c/misc/event-test.c                  |   4 +
>>>  include/libvirt/libvirt-nodedev.h             |  98 ++--
>>>  src/access/viraccessperm.c                    |   2 +-
>>>  src/access/viraccessperm.h                    |   6 +
>>>  src/conf/node_device_conf.h                   |   9 +
>>>  src/conf/virnodedeviceobj.c                   | 132 ++++-
>>>  src/conf/virnodedeviceobj.h                   |  19 +
>>>  src/driver-nodedev.h                          |  14 +
>>>  src/libvirt-nodedev.c                         | 115 ++++
>>>  src/libvirt_private.syms                      |   4 +
>>>  src/libvirt_public.syms                       |   3 +
>>>  src/node_device/node_device_driver.c          | 538 +++++++++++++++++-
>>>  src/node_device/node_device_driver.h          |  38 ++
>>>  src/node_device/node_device_udev.c            | 308 ++++++++--
>>>  src/remote/remote_driver.c                    |   3 +
>>>  src/remote/remote_protocol.x                  |  40 +-
>>>  src/remote_protocol-structs                   |  16 +
>>>  src/rpc/gendispatch.pl                        |   1 +
>>>  ...19_36ea_4111_8f0a_8c9a70e21366-define.argv |   1 +
>>>  ...19_36ea_4111_8f0a_8c9a70e21366-define.json |   1 +
>>>  ...39_495e_4243_ad9f_beb3f14c23d9-define.argv |   1 +
>>>  ...39_495e_4243_ad9f_beb3f14c23d9-define.json |   1 +
>>>  ...16_1ca8_49ac_b176_871d16c13076-define.argv |   1 +
>>>  ...16_1ca8_49ac_b176_871d16c13076-define.json |   1 +
>>>  tests/nodedevmdevctldata/mdevctl-create.argv  |   1 +
>>>  .../mdevctl-list-defined.argv                 |   1 +
>>>  .../mdevctl-list-multiple.json                |  59 ++
>>>  .../mdevctl-list-multiple.out.xml             |  39 ++
>>>  tests/nodedevmdevctltest.c                    | 222 +++++++-
>>>  tools/virsh-nodedev.c                         | 276 +++++++--
>>>  30 files changed, 1765 insertions(+), 189 deletions(-)
>>>  create mode 100644
>>> tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9=
>>> a70e21366-define.argv
>>>  create mode 100644
>>> tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9=
>>> a70e21366-define.json
>>>  create mode 100644
>>> tests/nodedevmdevctldata/mdev_d2441d39_495e_4243_ad9f_beb=
>>> 3f14c23d9-define.argv
>>>  create mode 100644
>>> tests/nodedevmdevctldata/mdev_d2441d39_495e_4243_ad9f_beb=
>>> 3f14c23d9-define.json
>>>  create mode 100644
>>> tests/nodedevmdevctldata/mdev_fedc4916_1ca8_49ac_b176_871=
>>> d16c13076-define.argv
>>>  create mode 100644
>>> tests/nodedevmdevctldata/mdev_fedc4916_1ca8_49ac_b176_871=
>>> d16c13076-define.json
>>>  create mode 100644 tests/nodedevmdevctldata/mdevctl-create.argv
>>>  create mode 100644 tests/nodedevmdevctldata/mdevctl-list-defined.argv
>>>  create mode 100644 tests/nodedevmdevctldata/mdevctl-list-multiple.json
>>>  create mode 100644
>>> tests/nodedevmdevctldata/mdevctl-list-multiple.out.xml
>>>
>>> --=20
>>> 2.26.2
>>>
>>>
>>>
Re: [libvirt PATCH v3 00/21] Add support for persistent mediated devices
Posted by Erik Skultety 3 years, 3 months ago
On Tue, Jan 05, 2021 at 11:50:11AM +0800, Yan Fu wrote:
> Tested with v6.10.0-283-g1948d4e61e.
> 
> 1.Can define/start/destroy mdev device successfully;
> 
> 2.'virsh nodedev-list' has no '--active' option, which is inconsistent with
> the description in the patch:
> # virsh nodedev-list --active
> error: command 'nodedev-list' doesn't support option --active
> 
> 3.virsh client hang when trying to destroy a mdev device which is using by
> a vm, and after that all 'virsh nodev*' cmds will hang. If restarting
> llibvirtd after that, libvirtd will hang.

It hangs because underneath a write to the 'remove' sysfs attribute is now
blocking for some reason and since we're relying on mdevctl to do it for us,
hence "it hangs". I'm not trying to make an excuse, it's plain wrong. I'd love
to rely on such a basic functionality, but it looks like we'll have to go with
a extremely ugly workaround and try to get the list of active domains from the
nodedev driver and see whether any of them has the device assigned before we
try to destroy the mdev via the nodedev driver.

However, in my testing libvirtd never hung after a restart, can you elaborate
on that?

> 
> 4.Define a mdev device with the uuid specified, but the mdev device defined
> seems using another uuid. Maybe it make a little confusion about the use of
> uuid in the xml:
> #cat mdev.xml
> <device>
>   <name>mdev_85531b6d_e5e4_41c1_aa2a_8844717f066a</name>   ****

Yeah, the easy way out here is to document that the <name> element is read
only, but that would be wrong, because we allow specifying it for domains,
networks, interfaces, etc. So, we should give the end user the option to
specify whatever name they want and generate one if none is provided.

Regards,
Erik

Re: [libvirt PATCH v3 00/21] Add support for persistent mediated devices
Posted by Jonathon Jongsma 3 years, 3 months ago
On Thu, 7 Jan 2021 17:43:54 +0100
Erik Skultety <eskultet@redhat.com> wrote:

> > 
> > 4.Define a mdev device with the uuid specified, but the mdev device
> > defined seems using another uuid. Maybe it make a little confusion
> > about the use of uuid in the xml:
> > #cat mdev.xml
> > <device>
> >   <name>mdev_85531b6d_e5e4_41c1_aa2a_8844717f066a</name>   ****  
> 
> Yeah, the easy way out here is to document that the <name> element is
> read only, but that would be wrong, because we allow specifying it
> for domains, networks, interfaces, etc. So, we should give the end
> user the option to specify whatever name they want and generate one
> if none is provided.
> 

Unfortunately, this appears to be difficult to achieve for persistent
devices. For mediated devices, we are using mdevctl as our backend
persistent device definition storage. But mdevctl doesn't provide a way
to attach additional metadata along with a mdev definition. So we would
need to either modify mdevctl to allow us to store additional data with
a device definition, or we would need to create (and keep synchronized)
a separate persistent storage for libvirt-specific metadata about
mediated devices.

It's also worth noting that 'nodedev-create' previously allowed you to
create vHBA devices in libvirt, but the 'name' element is explicitly
ignored (see https://wiki.libvirt.org/page/NPIV_in_libvirt):

    "NOTE: If you specify "name" for the vHBA, then it will be ignored.
    The kernel will automatically pick the next SCSI host name in
    sequence not already used."

That said, I do think that it will be useful or even necessary to be
able to create mdevs with a specific UUID, but I think that's a separate
issue than being able to specify the 'name' of the nodedev. It seems
better to do that by specifying a 'uuid' element within the mdev caps,
rather than trying to parse a UUID from an arbitrary name string. For
example:

    <device>
      <capability type='mdev'>
        <type id='i915-GVTg_V5_8'/>
        <uuid>901891a6-2077-4476-9465-53d8995b81b4</uuid>
      </capability>
      <parent>pci_0000_00_02_0</parent>
    </device>

Thoughts? 

Jonathon

Re: [libvirt PATCH v3 00/21] Add support for persistent mediated devices
Posted by Jonathon Jongsma 3 years, 3 months ago
On Thu, 7 Jan 2021 17:43:54 +0100
Erik Skultety <eskultet@redhat.com> wrote:

> > Tested with v6.10.0-283-g1948d4e61e.
> > 
> > 1.Can define/start/destroy mdev device successfully;
> > 
> > 2.'virsh nodedev-list' has no '--active' option, which is
> > inconsistent with the description in the patch:
> > # virsh nodedev-list --active
> > error: command 'nodedev-list' doesn't support option --active
> > 
> > 3.virsh client hang when trying to destroy a mdev device which is
> > using by a vm, and after that all 'virsh nodev*' cmds will hang. If
> > restarting llibvirtd after that, libvirtd will hang.  
> 
> It hangs because underneath a write to the 'remove' sysfs attribute
> is now blocking for some reason and since we're relying on mdevctl to
> do it for us, hence "it hangs". I'm not trying to make an excuse,
> it's plain wrong. I'd love to rely on such a basic functionality, but
> it looks like we'll have to go with a extremely ugly workaround and
> try to get the list of active domains from the nodedev driver and see
> whether any of them has the device assigned before we try to destroy
> the mdev via the nodedev driver.

So, I've been trying to figure out a way to do this, but as far as
I know, there's no way to get a list of active domains from within the
nodedev driver, and I can't think of any better ways to handle it. Any
ideas?

Jonathon

Re: [libvirt PATCH v3 00/21] Add support for persistent mediated devices
Posted by Daniel P. Berrangé 3 years, 2 months ago
On Fri, Jan 29, 2021 at 05:34:36PM -0600, Jonathon Jongsma wrote:
> On Thu, 7 Jan 2021 17:43:54 +0100
> Erik Skultety <eskultet@redhat.com> wrote:
> 
> > > Tested with v6.10.0-283-g1948d4e61e.
> > > 
> > > 1.Can define/start/destroy mdev device successfully;
> > > 
> > > 2.'virsh nodedev-list' has no '--active' option, which is
> > > inconsistent with the description in the patch:
> > > # virsh nodedev-list --active
> > > error: command 'nodedev-list' doesn't support option --active
> > > 
> > > 3.virsh client hang when trying to destroy a mdev device which is
> > > using by a vm, and after that all 'virsh nodev*' cmds will hang. If
> > > restarting llibvirtd after that, libvirtd will hang.  
> > 
> > It hangs because underneath a write to the 'remove' sysfs attribute
> > is now blocking for some reason and since we're relying on mdevctl to
> > do it for us, hence "it hangs". I'm not trying to make an excuse,
> > it's plain wrong. I'd love to rely on such a basic functionality, but
> > it looks like we'll have to go with a extremely ugly workaround and
> > try to get the list of active domains from the nodedev driver and see
> > whether any of them has the device assigned before we try to destroy
> > the mdev via the nodedev driver.
> 
> So, I've been trying to figure out a way to do this, but as far as
> I know, there's no way to get a list of active domains from within the
> nodedev driver, and I can't think of any better ways to handle it. Any
> ideas?

Correct, the nodedev driver isn't permitted to talk to any of the virt
drivers.

Is there anything in sysfs which reports whether the device is in use ?

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [libvirt PATCH v3 00/21] Add support for persistent mediated devices
Posted by Erik Skultety 3 years, 2 months ago
On Mon, Feb 01, 2021 at 09:42:32AM +0000, Daniel P. Berrangé wrote:
> On Fri, Jan 29, 2021 at 05:34:36PM -0600, Jonathon Jongsma wrote:
> > On Thu, 7 Jan 2021 17:43:54 +0100
> > Erik Skultety <eskultet@redhat.com> wrote:
> > 
> > > > Tested with v6.10.0-283-g1948d4e61e.
> > > > 
> > > > 1.Can define/start/destroy mdev device successfully;
> > > > 
> > > > 2.'virsh nodedev-list' has no '--active' option, which is
> > > > inconsistent with the description in the patch:
> > > > # virsh nodedev-list --active
> > > > error: command 'nodedev-list' doesn't support option --active
> > > > 
> > > > 3.virsh client hang when trying to destroy a mdev device which is
> > > > using by a vm, and after that all 'virsh nodev*' cmds will hang. If
> > > > restarting llibvirtd after that, libvirtd will hang.  
> > > 
> > > It hangs because underneath a write to the 'remove' sysfs attribute
> > > is now blocking for some reason and since we're relying on mdevctl to
> > > do it for us, hence "it hangs". I'm not trying to make an excuse,
> > > it's plain wrong. I'd love to rely on such a basic functionality, but
> > > it looks like we'll have to go with a extremely ugly workaround and
> > > try to get the list of active domains from the nodedev driver and see
> > > whether any of them has the device assigned before we try to destroy
> > > the mdev via the nodedev driver.
> > 
> > So, I've been trying to figure out a way to do this, but as far as
> > I know, there's no way to get a list of active domains from within the
> > nodedev driver, and I can't think of any better ways to handle it. Any
> > ideas?
> 
> Correct, the nodedev driver isn't permitted to talk to any of the virt
> drivers.

Oh, not even via secondary connection? What makes nodedev so special, since we
can open a secondary connection from e.g. the storage driver?

> 
> Is there anything in sysfs which reports whether the device is in use ?

Nothing that I know of, the way it used to work was that you tried to write to
sysfs and kernel returned a write error with "device in use" or something like
that, but that has changed since :(.

Erik

Re: [libvirt PATCH v3 00/21] Add support for persistent mediated devices
Posted by Daniel P. Berrangé 3 years, 2 months ago
On Mon, Feb 01, 2021 at 10:45:43AM +0100, Erik Skultety wrote:
> On Mon, Feb 01, 2021 at 09:42:32AM +0000, Daniel P. Berrangé wrote:
> > On Fri, Jan 29, 2021 at 05:34:36PM -0600, Jonathon Jongsma wrote:
> > > On Thu, 7 Jan 2021 17:43:54 +0100
> > > Erik Skultety <eskultet@redhat.com> wrote:
> > > 
> > > > > Tested with v6.10.0-283-g1948d4e61e.
> > > > > 
> > > > > 1.Can define/start/destroy mdev device successfully;
> > > > > 
> > > > > 2.'virsh nodedev-list' has no '--active' option, which is
> > > > > inconsistent with the description in the patch:
> > > > > # virsh nodedev-list --active
> > > > > error: command 'nodedev-list' doesn't support option --active
> > > > > 
> > > > > 3.virsh client hang when trying to destroy a mdev device which is
> > > > > using by a vm, and after that all 'virsh nodev*' cmds will hang. If
> > > > > restarting llibvirtd after that, libvirtd will hang.  
> > > > 
> > > > It hangs because underneath a write to the 'remove' sysfs attribute
> > > > is now blocking for some reason and since we're relying on mdevctl to
> > > > do it for us, hence "it hangs". I'm not trying to make an excuse,
> > > > it's plain wrong. I'd love to rely on such a basic functionality, but
> > > > it looks like we'll have to go with a extremely ugly workaround and
> > > > try to get the list of active domains from the nodedev driver and see
> > > > whether any of them has the device assigned before we try to destroy
> > > > the mdev via the nodedev driver.
> > > 
> > > So, I've been trying to figure out a way to do this, but as far as
> > > I know, there's no way to get a list of active domains from within the
> > > nodedev driver, and I can't think of any better ways to handle it. Any
> > > ideas?
> > 
> > Correct, the nodedev driver isn't permitted to talk to any of the virt
> > drivers.
> 
> Oh, not even via secondary connection? What makes nodedev so special, since we
> can open a secondary connection from e.g. the storage driver?

It is technically possible, but it should never be done, because it
introduces a bi-directional dependancy between the daemons which
introduces the danger of deadlocking them. None of the secondary
drivers should connect to the hypervisor drivers.

> > Is there anything in sysfs which reports whether the device is in use ?
> 
> Nothing that I know of, the way it used to work was that you tried to write to
> sysfs and kernel returned a write error with "device in use" or something like
> that, but that has changed since :(.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [libvirt PATCH v3 00/21] Add support for persistent mediated devices
Posted by Erik Skultety 3 years, 2 months ago
On Mon, Feb 01, 2021 at 09:48:11AM +0000, Daniel P. Berrangé wrote:
> On Mon, Feb 01, 2021 at 10:45:43AM +0100, Erik Skultety wrote:
> > On Mon, Feb 01, 2021 at 09:42:32AM +0000, Daniel P. Berrangé wrote:
> > > On Fri, Jan 29, 2021 at 05:34:36PM -0600, Jonathon Jongsma wrote:
> > > > On Thu, 7 Jan 2021 17:43:54 +0100
> > > > Erik Skultety <eskultet@redhat.com> wrote:
> > > > 
> > > > > > Tested with v6.10.0-283-g1948d4e61e.
> > > > > > 
> > > > > > 1.Can define/start/destroy mdev device successfully;
> > > > > > 
> > > > > > 2.'virsh nodedev-list' has no '--active' option, which is
> > > > > > inconsistent with the description in the patch:
> > > > > > # virsh nodedev-list --active
> > > > > > error: command 'nodedev-list' doesn't support option --active
> > > > > > 
> > > > > > 3.virsh client hang when trying to destroy a mdev device which is
> > > > > > using by a vm, and after that all 'virsh nodev*' cmds will hang. If
> > > > > > restarting llibvirtd after that, libvirtd will hang.  
> > > > > 
> > > > > It hangs because underneath a write to the 'remove' sysfs attribute
> > > > > is now blocking for some reason and since we're relying on mdevctl to
> > > > > do it for us, hence "it hangs". I'm not trying to make an excuse,
> > > > > it's plain wrong. I'd love to rely on such a basic functionality, but
> > > > > it looks like we'll have to go with a extremely ugly workaround and
> > > > > try to get the list of active domains from the nodedev driver and see
> > > > > whether any of them has the device assigned before we try to destroy
> > > > > the mdev via the nodedev driver.
> > > > 
> > > > So, I've been trying to figure out a way to do this, but as far as
> > > > I know, there's no way to get a list of active domains from within the
> > > > nodedev driver, and I can't think of any better ways to handle it. Any
> > > > ideas?
> > > 
> > > Correct, the nodedev driver isn't permitted to talk to any of the virt
> > > drivers.
> > 
> > Oh, not even via secondary connection? What makes nodedev so special, since we
> > can open a secondary connection from e.g. the storage driver?
> 
> It is technically possible, but it should never be done, because it
> introduces a bi-directional dependancy between the daemons which
> introduces the danger of deadlocking them. None of the secondary
> drivers should connect to the hypervisor drivers.
> 
> > > Is there anything in sysfs which reports whether the device is in use ?
> > 
> > Nothing that I know of, the way it used to work was that you tried to write to
> > sysfs and kernel returned a write error with "device in use" or something like
> > that, but that has changed since :(.

Without having tried this and since mdevctl is just a Bash script, can we
bypass mdevctl on destroys a little bit by constructing the path to the sysfs
attribute ourselves and perform a non-blocking write of zero bytes to see if we
get an error? If so, we'll skip mdevctl and report an error. If we don't, we'll
invoke mdevctl to remove the device in order to remain consistent. Would
that be an acceptable workaround (provided it would work)?

Erik

Re: [libvirt PATCH v3 00/21] Add support for persistent mediated devices
Posted by Jonathon Jongsma 3 years, 2 months ago
On Mon, 1 Feb 2021 11:33:08 +0100
Erik Skultety <eskultet@redhat.com> wrote:

> On Mon, Feb 01, 2021 at 09:48:11AM +0000, Daniel P. Berrangé wrote:
> > On Mon, Feb 01, 2021 at 10:45:43AM +0100, Erik Skultety wrote:  
> > > On Mon, Feb 01, 2021 at 09:42:32AM +0000, Daniel P. Berrangé
> > > wrote:  
> > > > On Fri, Jan 29, 2021 at 05:34:36PM -0600, Jonathon Jongsma
> > > > wrote:  
> > > > > On Thu, 7 Jan 2021 17:43:54 +0100
> > > > > Erik Skultety <eskultet@redhat.com> wrote:
> > > > >   
> > > > > > > Tested with v6.10.0-283-g1948d4e61e.
> > > > > > > 
> > > > > > > 1.Can define/start/destroy mdev device successfully;
> > > > > > > 
> > > > > > > 2.'virsh nodedev-list' has no '--active' option, which is
> > > > > > > inconsistent with the description in the patch:
> > > > > > > # virsh nodedev-list --active
> > > > > > > error: command 'nodedev-list' doesn't support option
> > > > > > > --active
> > > > > > > 
> > > > > > > 3.virsh client hang when trying to destroy a mdev device
> > > > > > > which is using by a vm, and after that all 'virsh nodev*'
> > > > > > > cmds will hang. If restarting llibvirtd after that,
> > > > > > > libvirtd will hang.    
> > > > > > 
> > > > > > It hangs because underneath a write to the 'remove' sysfs
> > > > > > attribute is now blocking for some reason and since we're
> > > > > > relying on mdevctl to do it for us, hence "it hangs". I'm
> > > > > > not trying to make an excuse, it's plain wrong. I'd love to
> > > > > > rely on such a basic functionality, but it looks like we'll
> > > > > > have to go with a extremely ugly workaround and try to get
> > > > > > the list of active domains from the nodedev driver and see
> > > > > > whether any of them has the device assigned before we try
> > > > > > to destroy the mdev via the nodedev driver.  
> > > > > 
> > > > > So, I've been trying to figure out a way to do this, but as
> > > > > far as I know, there's no way to get a list of active domains
> > > > > from within the nodedev driver, and I can't think of any
> > > > > better ways to handle it. Any ideas?  
> > > > 
> > > > Correct, the nodedev driver isn't permitted to talk to any of
> > > > the virt drivers.  
> > > 
> > > Oh, not even via secondary connection? What makes nodedev so
> > > special, since we can open a secondary connection from e.g. the
> > > storage driver?  
> > 
> > It is technically possible, but it should never be done, because it
> > introduces a bi-directional dependancy between the daemons which
> > introduces the danger of deadlocking them. None of the secondary
> > drivers should connect to the hypervisor drivers.
> >   
> > > > Is there anything in sysfs which reports whether the device is
> > > > in use ?  
> > > 
> > > Nothing that I know of, the way it used to work was that you
> > > tried to write to sysfs and kernel returned a write error with
> > > "device in use" or something like that, but that has changed
> > > since :(.  
> 
> Without having tried this and since mdevctl is just a Bash script,
> can we bypass mdevctl on destroys a little bit by constructing the
> path to the sysfs attribute ourselves and perform a non-blocking
> write of zero bytes to see if we get an error? If so, we'll skip
> mdevctl and report an error. If we don't, we'll invoke mdevctl to
> remove the device in order to remain consistent. Would that be an
> acceptable workaround (provided it would work)?

As far as I can tell, this doesn't work. According to my
tests, attempting to write zero bytes to $(mdev_sysfs_path)/remove
doesn't result in an error if the mdev is in use by a vm. It just
"successfully" writes zero bytes. Adding Alex to cc in case he has an
idea for a workaround here.

Jonathon

Re: [libvirt PATCH v3 00/21] Add support for persistent mediated devices
Posted by Alex Williamson 3 years, 2 months ago
On Mon, 1 Feb 2021 16:57:44 -0600
Jonathon Jongsma <jjongsma@redhat.com> wrote:

> On Mon, 1 Feb 2021 11:33:08 +0100
> Erik Skultety <eskultet@redhat.com> wrote:
> 
> > On Mon, Feb 01, 2021 at 09:48:11AM +0000, Daniel P. Berrangé wrote:  
> > > On Mon, Feb 01, 2021 at 10:45:43AM +0100, Erik Skultety wrote:    
> > > > On Mon, Feb 01, 2021 at 09:42:32AM +0000, Daniel P. Berrangé
> > > > wrote:    
> > > > > On Fri, Jan 29, 2021 at 05:34:36PM -0600, Jonathon Jongsma
> > > > > wrote:    
> > > > > > On Thu, 7 Jan 2021 17:43:54 +0100
> > > > > > Erik Skultety <eskultet@redhat.com> wrote:
> > > > > >     
> > > > > > > > Tested with v6.10.0-283-g1948d4e61e.
> > > > > > > > 
> > > > > > > > 1.Can define/start/destroy mdev device successfully;
> > > > > > > > 
> > > > > > > > 2.'virsh nodedev-list' has no '--active' option, which is
> > > > > > > > inconsistent with the description in the patch:
> > > > > > > > # virsh nodedev-list --active
> > > > > > > > error: command 'nodedev-list' doesn't support option
> > > > > > > > --active
> > > > > > > > 
> > > > > > > > 3.virsh client hang when trying to destroy a mdev device
> > > > > > > > which is using by a vm, and after that all 'virsh nodev*'
> > > > > > > > cmds will hang. If restarting llibvirtd after that,
> > > > > > > > libvirtd will hang.      
> > > > > > > 
> > > > > > > It hangs because underneath a write to the 'remove' sysfs
> > > > > > > attribute is now blocking for some reason and since we're
> > > > > > > relying on mdevctl to do it for us, hence "it hangs". I'm
> > > > > > > not trying to make an excuse, it's plain wrong. I'd love to
> > > > > > > rely on such a basic functionality, but it looks like we'll
> > > > > > > have to go with a extremely ugly workaround and try to get
> > > > > > > the list of active domains from the nodedev driver and see
> > > > > > > whether any of them has the device assigned before we try
> > > > > > > to destroy the mdev via the nodedev driver.    
> > > > > > 
> > > > > > So, I've been trying to figure out a way to do this, but as
> > > > > > far as I know, there's no way to get a list of active domains
> > > > > > from within the nodedev driver, and I can't think of any
> > > > > > better ways to handle it. Any ideas?    
> > > > > 
> > > > > Correct, the nodedev driver isn't permitted to talk to any of
> > > > > the virt drivers.    
> > > > 
> > > > Oh, not even via secondary connection? What makes nodedev so
> > > > special, since we can open a secondary connection from e.g. the
> > > > storage driver?    
> > > 
> > > It is technically possible, but it should never be done, because it
> > > introduces a bi-directional dependancy between the daemons which
> > > introduces the danger of deadlocking them. None of the secondary
> > > drivers should connect to the hypervisor drivers.
> > >     
> > > > > Is there anything in sysfs which reports whether the device is
> > > > > in use ?    
> > > > 
> > > > Nothing that I know of, the way it used to work was that you
> > > > tried to write to sysfs and kernel returned a write error with
> > > > "device in use" or something like that, but that has changed
> > > > since :(.    
> > 
> > Without having tried this and since mdevctl is just a Bash script,
> > can we bypass mdevctl on destroys a little bit by constructing the
> > path to the sysfs attribute ourselves and perform a non-blocking
> > write of zero bytes to see if we get an error? If so, we'll skip
> > mdevctl and report an error. If we don't, we'll invoke mdevctl to
> > remove the device in order to remain consistent. Would that be an
> > acceptable workaround (provided it would work)?  
> 
> As far as I can tell, this doesn't work. According to my
> tests, attempting to write zero bytes to $(mdev_sysfs_path)/remove
> doesn't result in an error if the mdev is in use by a vm. It just
> "successfully" writes zero bytes. Adding Alex to cc in case he has an
> idea for a workaround here.

[Cc +Connie]

I'm not really sure why mdevs are unique here.  When we write to
remove, the first step is to release the device from the driver, so
it's really the same as an unbind for a vfio-pci device.  PCI drivers
cannot return an error, an unbind is handled not as a request, but a
directive, so when the device is in use we block until the unbind can
complete.  With vfio-pci (and added upstream to the mdev core -
depending on vendor support), the driver remove callback triggers a
virtual interrupt to the user asking to cooperatively return the device
(triggering a hot-unplug in QEMU).  Has this really worked so well in
vfio-pci that we've forgotten that an unbind can block there too or are
we better about tracking something with PCI devices vs mdevs?

On idea for a solution would be that vfio only allows a single open of a
group at a time, so if libvirt were to open the group it could know
that it's unused.  If you can manage to close the group once you've
already triggered the remove/unbind, then I'd think the completion of
the write would be deterministic.  If the group is in use elsewhere,
the open should get back an -EBUSY and you probably ought to be careful
about removing/unbinding it anyway.  It might be possible to implement
this in mdevctl too, ie. redirect /dev/null to group file and fork,
fork the echo 1 > remove, kill the redirect, return a device in use
error if the initial redirect fails.  Thanks,

Alex

Re: [libvirt PATCH v3 00/21] Add support for persistent mediated devices
Posted by Erik Skultety 3 years, 2 months ago
On Mon, Feb 01, 2021 at 04:38:56PM -0700, Alex Williamson wrote:
> On Mon, 1 Feb 2021 16:57:44 -0600
> Jonathon Jongsma <jjongsma@redhat.com> wrote:
> 
> > On Mon, 1 Feb 2021 11:33:08 +0100
> > Erik Skultety <eskultet@redhat.com> wrote:
> > 
> > > On Mon, Feb 01, 2021 at 09:48:11AM +0000, Daniel P. Berrangé wrote:  
> > > > On Mon, Feb 01, 2021 at 10:45:43AM +0100, Erik Skultety wrote:    
> > > > > On Mon, Feb 01, 2021 at 09:42:32AM +0000, Daniel P. Berrangé
> > > > > wrote:    
> > > > > > On Fri, Jan 29, 2021 at 05:34:36PM -0600, Jonathon Jongsma
> > > > > > wrote:    
> > > > > > > On Thu, 7 Jan 2021 17:43:54 +0100
> > > > > > > Erik Skultety <eskultet@redhat.com> wrote:
> > > > > > >     
> > > > > > > > > Tested with v6.10.0-283-g1948d4e61e.
> > > > > > > > > 
> > > > > > > > > 1.Can define/start/destroy mdev device successfully;
> > > > > > > > > 
> > > > > > > > > 2.'virsh nodedev-list' has no '--active' option, which is
> > > > > > > > > inconsistent with the description in the patch:
> > > > > > > > > # virsh nodedev-list --active
> > > > > > > > > error: command 'nodedev-list' doesn't support option
> > > > > > > > > --active
> > > > > > > > > 
> > > > > > > > > 3.virsh client hang when trying to destroy a mdev device
> > > > > > > > > which is using by a vm, and after that all 'virsh nodev*'
> > > > > > > > > cmds will hang. If restarting llibvirtd after that,
> > > > > > > > > libvirtd will hang.      
> > > > > > > > 
> > > > > > > > It hangs because underneath a write to the 'remove' sysfs
> > > > > > > > attribute is now blocking for some reason and since we're
> > > > > > > > relying on mdevctl to do it for us, hence "it hangs". I'm
> > > > > > > > not trying to make an excuse, it's plain wrong. I'd love to
> > > > > > > > rely on such a basic functionality, but it looks like we'll
> > > > > > > > have to go with a extremely ugly workaround and try to get
> > > > > > > > the list of active domains from the nodedev driver and see
> > > > > > > > whether any of them has the device assigned before we try
> > > > > > > > to destroy the mdev via the nodedev driver.    
> > > > > > > 
> > > > > > > So, I've been trying to figure out a way to do this, but as
> > > > > > > far as I know, there's no way to get a list of active domains
> > > > > > > from within the nodedev driver, and I can't think of any
> > > > > > > better ways to handle it. Any ideas?    
> > > > > > 
> > > > > > Correct, the nodedev driver isn't permitted to talk to any of
> > > > > > the virt drivers.    
> > > > > 
> > > > > Oh, not even via secondary connection? What makes nodedev so
> > > > > special, since we can open a secondary connection from e.g. the
> > > > > storage driver?    
> > > > 
> > > > It is technically possible, but it should never be done, because it
> > > > introduces a bi-directional dependancy between the daemons which
> > > > introduces the danger of deadlocking them. None of the secondary
> > > > drivers should connect to the hypervisor drivers.
> > > >     
> > > > > > Is there anything in sysfs which reports whether the device is
> > > > > > in use ?    
> > > > > 
> > > > > Nothing that I know of, the way it used to work was that you
> > > > > tried to write to sysfs and kernel returned a write error with
> > > > > "device in use" or something like that, but that has changed
> > > > > since :(.    
> > > 
> > > Without having tried this and since mdevctl is just a Bash script,
> > > can we bypass mdevctl on destroys a little bit by constructing the
> > > path to the sysfs attribute ourselves and perform a non-blocking
> > > write of zero bytes to see if we get an error? If so, we'll skip
> > > mdevctl and report an error. If we don't, we'll invoke mdevctl to
> > > remove the device in order to remain consistent. Would that be an
> > > acceptable workaround (provided it would work)?  
> > 
> > As far as I can tell, this doesn't work. According to my
> > tests, attempting to write zero bytes to $(mdev_sysfs_path)/remove
> > doesn't result in an error if the mdev is in use by a vm. It just
> > "successfully" writes zero bytes. Adding Alex to cc in case he has an
> > idea for a workaround here.
> 
> [Cc +Connie]
> 
> I'm not really sure why mdevs are unique here.  When we write to
> remove, the first step is to release the device from the driver, so
> it's really the same as an unbind for a vfio-pci device.  PCI drivers
> cannot return an error, an unbind is handled not as a request, but a
> directive, so when the device is in use we block until the unbind can
> complete.  With vfio-pci (and added upstream to the mdev core -
> depending on vendor support), the driver remove callback triggers a
> virtual interrupt to the user asking to cooperatively return the device
> (triggering a hot-unplug in QEMU).  Has this really worked so well in
> vfio-pci that we've forgotten that an unbind can block there too or are
> we better about tracking something with PCI devices vs mdevs?

Does any of the current vendor guest drivers for mdev support unplug? While
I'm not trying to argue that unpluging a vfio-pci cannot block, it just works
seamlessly in majority of cases nowadays, but I guess we were in the same
situation with PCI assignment in the past?

The whole point here is IMO about a massive inconvenience for a library
consumer to be blocked on an operation and not knowing why, whereas when you
return an instant error saying why the operation cannot be completed right now
that opens the door for a necessary adjustment in their usage of the library.

> 
> On idea for a solution would be that vfio only allows a single open of a
> group at a time, so if libvirt were to open the group it could know
> that it's unused.  If you can manage to close the group once you've
> already triggered the remove/unbind, then I'd think the completion of
> the write would be deterministic.  If the group is in use elsewhere,
> the open should get back an -EBUSY and you probably ought to be careful

Honestly, ^this seems like a fairly straightforward workaround to me.

Erik

> about removing/unbinding it anyway.  It might be possible to implement
> this in mdevctl too, ie. redirect /dev/null to group file and fork,
> fork the echo 1 > remove, kill the redirect, return a device in use
> error if the initial redirect fails.  Thanks,
> 
> Alex

Re: [libvirt PATCH v3 00/21] Add support for persistent mediated devices
Posted by Jonathon Jongsma 3 years, 2 months ago
On Tue, 2 Feb 2021 08:28:52 +0100
Erik Skultety <eskultet@redhat.com> wrote:

> On Mon, Feb 01, 2021 at 04:38:56PM -0700, Alex Williamson wrote:
> > On Mon, 1 Feb 2021 16:57:44 -0600
> > Jonathon Jongsma <jjongsma@redhat.com> wrote:
> >   
> > > On Mon, 1 Feb 2021 11:33:08 +0100
> > > Erik Skultety <eskultet@redhat.com> wrote:
> > >   
> > > > On Mon, Feb 01, 2021 at 09:48:11AM +0000, Daniel P. Berrangé
> > > > wrote:    
> > > > > On Mon, Feb 01, 2021 at 10:45:43AM +0100, Erik Skultety
> > > > > wrote:      
> > > > > > On Mon, Feb 01, 2021 at 09:42:32AM +0000, Daniel P. Berrangé
> > > > > > wrote:      
> > > > > > > On Fri, Jan 29, 2021 at 05:34:36PM -0600, Jonathon Jongsma
> > > > > > > wrote:      
> > > > > > > > On Thu, 7 Jan 2021 17:43:54 +0100
> > > > > > > > Erik Skultety <eskultet@redhat.com> wrote:
> > > > > > > >       
> > > > > > > > > > Tested with v6.10.0-283-g1948d4e61e.
> > > > > > > > > > 
> > > > > > > > > > 1.Can define/start/destroy mdev device successfully;
> > > > > > > > > > 
> > > > > > > > > > 2.'virsh nodedev-list' has no '--active' option,
> > > > > > > > > > which is inconsistent with the description in the
> > > > > > > > > > patch: # virsh nodedev-list --active
> > > > > > > > > > error: command 'nodedev-list' doesn't support option
> > > > > > > > > > --active
> > > > > > > > > > 
> > > > > > > > > > 3.virsh client hang when trying to destroy a mdev
> > > > > > > > > > device which is using by a vm, and after that all
> > > > > > > > > > 'virsh nodev*' cmds will hang. If restarting
> > > > > > > > > > llibvirtd after that, libvirtd will hang.        
> > > > > > > > > 
> > > > > > > > > It hangs because underneath a write to the 'remove'
> > > > > > > > > sysfs attribute is now blocking for some reason and
> > > > > > > > > since we're relying on mdevctl to do it for us, hence
> > > > > > > > > "it hangs". I'm not trying to make an excuse, it's
> > > > > > > > > plain wrong. I'd love to rely on such a basic
> > > > > > > > > functionality, but it looks like we'll have to go
> > > > > > > > > with a extremely ugly workaround and try to get the
> > > > > > > > > list of active domains from the nodedev driver and
> > > > > > > > > see whether any of them has the device assigned
> > > > > > > > > before we try to destroy the mdev via the nodedev
> > > > > > > > > driver.      
> > > > > > > > 
> > > > > > > > So, I've been trying to figure out a way to do this,
> > > > > > > > but as far as I know, there's no way to get a list of
> > > > > > > > active domains from within the nodedev driver, and I
> > > > > > > > can't think of any better ways to handle it. Any ideas?
> > > > > > > >      
> > > > > > > 
> > > > > > > Correct, the nodedev driver isn't permitted to talk to
> > > > > > > any of the virt drivers.      
> > > > > > 
> > > > > > Oh, not even via secondary connection? What makes nodedev so
> > > > > > special, since we can open a secondary connection from e.g.
> > > > > > the storage driver?      
> > > > > 
> > > > > It is technically possible, but it should never be done,
> > > > > because it introduces a bi-directional dependancy between the
> > > > > daemons which introduces the danger of deadlocking them. None
> > > > > of the secondary drivers should connect to the hypervisor
> > > > > drivers. 
> > > > > > > Is there anything in sysfs which reports whether the
> > > > > > > device is in use ?      
> > > > > > 
> > > > > > Nothing that I know of, the way it used to work was that you
> > > > > > tried to write to sysfs and kernel returned a write error
> > > > > > with "device in use" or something like that, but that has
> > > > > > changed since :(.      
> > > > 
> > > > Without having tried this and since mdevctl is just a Bash
> > > > script, can we bypass mdevctl on destroys a little bit by
> > > > constructing the path to the sysfs attribute ourselves and
> > > > perform a non-blocking write of zero bytes to see if we get an
> > > > error? If so, we'll skip mdevctl and report an error. If we
> > > > don't, we'll invoke mdevctl to remove the device in order to
> > > > remain consistent. Would that be an acceptable workaround
> > > > (provided it would work)?    
> > > 
> > > As far as I can tell, this doesn't work. According to my
> > > tests, attempting to write zero bytes to $(mdev_sysfs_path)/remove
> > > doesn't result in an error if the mdev is in use by a vm. It just
> > > "successfully" writes zero bytes. Adding Alex to cc in case he
> > > has an idea for a workaround here.  
> > 
> > [Cc +Connie]
> > 
> > I'm not really sure why mdevs are unique here.  When we write to
> > remove, the first step is to release the device from the driver, so
> > it's really the same as an unbind for a vfio-pci device.  PCI
> > drivers cannot return an error, an unbind is handled not as a
> > request, but a directive, so when the device is in use we block
> > until the unbind can complete.  With vfio-pci (and added upstream
> > to the mdev core - depending on vendor support), the driver remove
> > callback triggers a virtual interrupt to the user asking to
> > cooperatively return the device (triggering a hot-unplug in QEMU).
> > Has this really worked so well in vfio-pci that we've forgotten
> > that an unbind can block there too or are we better about tracking
> > something with PCI devices vs mdevs?  
> 
> Does any of the current vendor guest drivers for mdev support unplug?
> While I'm not trying to argue that unpluging a vfio-pci cannot block,
> it just works seamlessly in majority of cases nowadays, but I guess
> we were in the same situation with PCI assignment in the past?
> 
> The whole point here is IMO about a massive inconvenience for a
> library consumer to be blocked on an operation and not knowing why,
> whereas when you return an instant error saying why the operation
> cannot be completed right now that opens the door for a necessary
> adjustment in their usage of the library.
> 
> > 
> > On idea for a solution would be that vfio only allows a single open
> > of a group at a time, so if libvirt were to open the group it could
> > know that it's unused.  If you can manage to close the group once
> > you've already triggered the remove/unbind, then I'd think the
> > completion of the write would be deterministic.  If the group is in
> > use elsewhere, the open should get back an -EBUSY and you probably
> > ought to be careful  
> 
> Honestly, ^this seems like a fairly straightforward workaround to me.
> 
> Erik

Yes, thanks. This seems pretty clean and simple to to implement, and I
can't really think of any significant downsides.

Jonathon

> 
> > about removing/unbinding it anyway.  It might be possible to
> > implement this in mdevctl too, ie. redirect /dev/null to group file
> > and fork, fork the echo 1 > remove, kill the redirect, return a
> > device in use error if the initial redirect fails.  Thanks,
> > 
> > Alex  


Re: [libvirt PATCH v3 00/21] Add support for persistent mediated devices
Posted by Cornelia Huck 3 years, 2 months ago
On Tue, 2 Feb 2021 08:28:52 +0100
Erik Skultety <eskultet@redhat.com> wrote:

> On Mon, Feb 01, 2021 at 04:38:56PM -0700, Alex Williamson wrote:
> > On Mon, 1 Feb 2021 16:57:44 -0600
> > Jonathon Jongsma <jjongsma@redhat.com> wrote:
> >   
> > > On Mon, 1 Feb 2021 11:33:08 +0100
> > > Erik Skultety <eskultet@redhat.com> wrote:
> > >   
> > > > On Mon, Feb 01, 2021 at 09:48:11AM +0000, Daniel P. Berrangé wrote:    
> > > > > On Mon, Feb 01, 2021 at 10:45:43AM +0100, Erik Skultety wrote:      
> > > > > > On Mon, Feb 01, 2021 at 09:42:32AM +0000, Daniel P. Berrangé
> > > > > > wrote:      
> > > > > > > On Fri, Jan 29, 2021 at 05:34:36PM -0600, Jonathon Jongsma
> > > > > > > wrote:      
> > > > > > > > On Thu, 7 Jan 2021 17:43:54 +0100
> > > > > > > > Erik Skultety <eskultet@redhat.com> wrote:
> > > > > > > >       
> > > > > > > > > > Tested with v6.10.0-283-g1948d4e61e.
> > > > > > > > > > 
> > > > > > > > > > 1.Can define/start/destroy mdev device successfully;
> > > > > > > > > > 
> > > > > > > > > > 2.'virsh nodedev-list' has no '--active' option, which is
> > > > > > > > > > inconsistent with the description in the patch:
> > > > > > > > > > # virsh nodedev-list --active
> > > > > > > > > > error: command 'nodedev-list' doesn't support option
> > > > > > > > > > --active
> > > > > > > > > > 
> > > > > > > > > > 3.virsh client hang when trying to destroy a mdev device
> > > > > > > > > > which is using by a vm, and after that all 'virsh nodev*'
> > > > > > > > > > cmds will hang. If restarting llibvirtd after that,
> > > > > > > > > > libvirtd will hang.        
> > > > > > > > > 
> > > > > > > > > It hangs because underneath a write to the 'remove' sysfs
> > > > > > > > > attribute is now blocking for some reason and since we're
> > > > > > > > > relying on mdevctl to do it for us, hence "it hangs". I'm
> > > > > > > > > not trying to make an excuse, it's plain wrong. I'd love to
> > > > > > > > > rely on such a basic functionality, but it looks like we'll
> > > > > > > > > have to go with a extremely ugly workaround and try to get
> > > > > > > > > the list of active domains from the nodedev driver and see
> > > > > > > > > whether any of them has the device assigned before we try
> > > > > > > > > to destroy the mdev via the nodedev driver.      
> > > > > > > > 
> > > > > > > > So, I've been trying to figure out a way to do this, but as
> > > > > > > > far as I know, there's no way to get a list of active domains
> > > > > > > > from within the nodedev driver, and I can't think of any
> > > > > > > > better ways to handle it. Any ideas?      
> > > > > > > 
> > > > > > > Correct, the nodedev driver isn't permitted to talk to any of
> > > > > > > the virt drivers.      
> > > > > > 
> > > > > > Oh, not even via secondary connection? What makes nodedev so
> > > > > > special, since we can open a secondary connection from e.g. the
> > > > > > storage driver?      
> > > > > 
> > > > > It is technically possible, but it should never be done, because it
> > > > > introduces a bi-directional dependancy between the daemons which
> > > > > introduces the danger of deadlocking them. None of the secondary
> > > > > drivers should connect to the hypervisor drivers.
> > > > >       
> > > > > > > Is there anything in sysfs which reports whether the device is
> > > > > > > in use ?      
> > > > > > 
> > > > > > Nothing that I know of, the way it used to work was that you
> > > > > > tried to write to sysfs and kernel returned a write error with
> > > > > > "device in use" or something like that, but that has changed
> > > > > > since :(.      
> > > > 
> > > > Without having tried this and since mdevctl is just a Bash script,
> > > > can we bypass mdevctl on destroys a little bit by constructing the
> > > > path to the sysfs attribute ourselves and perform a non-blocking
> > > > write of zero bytes to see if we get an error? If so, we'll skip
> > > > mdevctl and report an error. If we don't, we'll invoke mdevctl to
> > > > remove the device in order to remain consistent. Would that be an
> > > > acceptable workaround (provided it would work)?    
> > > 
> > > As far as I can tell, this doesn't work. According to my
> > > tests, attempting to write zero bytes to $(mdev_sysfs_path)/remove
> > > doesn't result in an error if the mdev is in use by a vm. It just
> > > "successfully" writes zero bytes. Adding Alex to cc in case he has an
> > > idea for a workaround here.  
> > 
> > [Cc +Connie]
> > 
> > I'm not really sure why mdevs are unique here.  When we write to
> > remove, the first step is to release the device from the driver, so
> > it's really the same as an unbind for a vfio-pci device.  PCI drivers
> > cannot return an error, an unbind is handled not as a request, but a
> > directive, so when the device is in use we block until the unbind can
> > complete.  With vfio-pci (and added upstream to the mdev core -
> > depending on vendor support), the driver remove callback triggers a
> > virtual interrupt to the user asking to cooperatively return the device
> > (triggering a hot-unplug in QEMU).  Has this really worked so well in
> > vfio-pci that we've forgotten that an unbind can block there too or are
> > we better about tracking something with PCI devices vs mdevs?  
> 
> Does any of the current vendor guest drivers for mdev support unplug? While
> I'm not trying to argue that unpluging a vfio-pci cannot block, it just works
> seamlessly in majority of cases nowadays, but I guess we were in the same
> situation with PCI assignment in the past?

I think this will only work for ccw right now (when using QEMU from
current git). Channel devices always support unplug (it's not a request
on the guest side, it's a notification.)

> 
> The whole point here is IMO about a massive inconvenience for a library
> consumer to be blocked on an operation and not knowing why, whereas when you
> return an instant error saying why the operation cannot be completed right now
> that opens the door for a necessary adjustment in their usage of the library.
> 
> > 
> > On idea for a solution would be that vfio only allows a single open of a
> > group at a time, so if libvirt were to open the group it could know
> > that it's unused.  If you can manage to close the group once you've
> > already triggered the remove/unbind, then I'd think the completion of
> > the write would be deterministic.  If the group is in use elsewhere,
> > the open should get back an -EBUSY and you probably ought to be careful  
> 
> Honestly, ^this seems like a fairly straightforward workaround to me.
> 
> Erik
> 
> > about removing/unbinding it anyway.  It might be possible to implement
> > this in mdevctl too, ie. redirect /dev/null to group file and fork,
> > fork the echo 1 > remove, kill the redirect, return a device in use
> > error if the initial redirect fails.  Thanks,
> > 
> > Alex  

Hacking something like that into mdevctl might be a good idea.

Re: [libvirt PATCH v3 00/21] Add support for persistent mediated devices
Posted by Erik Skultety 3 years, 3 months ago
On Thu, Dec 24, 2020 at 08:14:24AM -0600, Jonathon Jongsma wrote:
> This patch series follows the previously-merged series which added support for
> transient mediated devices. This series expands mdev support to include
> persistent device definitions. Again, it relies on mdevctl as the backend.
> 
> It follows the common libvirt pattern of APIs by adding the following new APIs
> for node devices:
>     - virNodeDeviceDefineXML() - defines a persistent device
>     - virNodeDeviceUndefine() - undefines a persistent device
>     - virNodeDeviceCreate() - starts a previously-defined device
> 
> It also adds virsh commands mapping to these new APIs: nodedev-define,
> nodedev-undefine, and nodedev-start.
> 
> Since we rely on mdevctl for the definition of ediated devices, we need a way
> to stay up-to-date with devices that are defined by mdevctl (outside of
> libvirt).  The method for staying up-to-date is currently a little bit crude
> due to the fact that mdevctl does not emit any events when new devices are
> added or removed. As a workaround, we create a file monitor for the mdevctl
> config directory and re-query mdevctl when we detect changes within that
> directory. In the future, mdevctl may introduce a more elegant solution.
> 
> changes in v3:
>  - streamlined tests -- removed some unnecessary duplication
>  - split out patch to factor out node device name generation function
>  - split nodeDeviceParseMdevctlChildDevice() into a separate function
>  - added follow-up patch to remove space-padded alignment in header
>  - refactored the mdevctl update handling significantly:
>    - no longer a separate persistent thread that gets signaled by a timer
>    - now piggybacks onto the existing udev thread and signals the thread in t=
> he
>      same way that the udev event does.
>    - Daniel suggested spawning a throw-away thread to handle mdevctl updates,
>      but that introduces the complexity of possibly serializing multiple
>      throw-away threads (e.g. if we get an 'created' event followed immediate=
> ly
>      by a 'deleted' event, two threads may be spawned and we'd need to ensure
>      they are properly ordered)
>  - added virNodeDeviceObjListForEach() and virNodeDeviceObjListRemoveLocked()
>    to simplify removing devices that are removed from mdevctl.
>  - coding style fixes
>  - NOTE: per Erik's request, I experimented with changing the way that mdevctl
>    commands were generated and tested (e.g. introducing something like
>    virMdevctlGetCommand(def, MDEVCTL_COMMAND_<SUBCOMMAND>, ...)), but it was
>    too invasive and awkward and didn't seem worthwhile

Thanks for giving ^this a try :)

I think we're converging nicely with the only remaining bit to agree on being
the mdevctl monitoring thread. I think we can target 7.1.0 with this series.

Regards,
Erik

Re: [libvirt PATCH v3 00/21] Add support for persistent mediated devices
Posted by Daniel P. Berrangé 3 years, 3 months ago
On Thu, Jan 07, 2021 at 05:50:10PM +0100, Erik Skultety wrote:
> On Thu, Dec 24, 2020 at 08:14:24AM -0600, Jonathon Jongsma wrote:
> > This patch series follows the previously-merged series which added support for
> > transient mediated devices. This series expands mdev support to include
> > persistent device definitions. Again, it relies on mdevctl as the backend.
> > 
> > It follows the common libvirt pattern of APIs by adding the following new APIs
> > for node devices:
> >     - virNodeDeviceDefineXML() - defines a persistent device
> >     - virNodeDeviceUndefine() - undefines a persistent device
> >     - virNodeDeviceCreate() - starts a previously-defined device
> > 
> > It also adds virsh commands mapping to these new APIs: nodedev-define,
> > nodedev-undefine, and nodedev-start.
> > 
> > Since we rely on mdevctl for the definition of ediated devices, we need a way
> > to stay up-to-date with devices that are defined by mdevctl (outside of
> > libvirt).  The method for staying up-to-date is currently a little bit crude
> > due to the fact that mdevctl does not emit any events when new devices are
> > added or removed. As a workaround, we create a file monitor for the mdevctl
> > config directory and re-query mdevctl when we detect changes within that
> > directory. In the future, mdevctl may introduce a more elegant solution.
> > 
> > changes in v3:
> >  - streamlined tests -- removed some unnecessary duplication
> >  - split out patch to factor out node device name generation function
> >  - split nodeDeviceParseMdevctlChildDevice() into a separate function
> >  - added follow-up patch to remove space-padded alignment in header
> >  - refactored the mdevctl update handling significantly:
> >    - no longer a separate persistent thread that gets signaled by a timer
> >    - now piggybacks onto the existing udev thread and signals the thread in t=
> > he
> >      same way that the udev event does.
> >    - Daniel suggested spawning a throw-away thread to handle mdevctl updates,
> >      but that introduces the complexity of possibly serializing multiple
> >      throw-away threads (e.g. if we get an 'created' event followed immediate=
> > ly
> >      by a 'deleted' event, two threads may be spawned and we'd need to ensure
> >      they are properly ordered)
> >  - added virNodeDeviceObjListForEach() and virNodeDeviceObjListRemoveLocked()
> >    to simplify removing devices that are removed from mdevctl.
> >  - coding style fixes
> >  - NOTE: per Erik's request, I experimented with changing the way that mdevctl
> >    commands were generated and tested (e.g. introducing something like
> >    virMdevctlGetCommand(def, MDEVCTL_COMMAND_<SUBCOMMAND>, ...)), but it was
> >    too invasive and awkward and didn't seem worthwhile
> 
> Thanks for giving ^this a try :)
> 
> I think we're converging nicely with the only remaining bit to agree on being
> the mdevctl monitoring thread. I think we can target 7.1.0 with this series.

The lifecycle events handling in this series is totally broken and needs
addressing before we can consider merging.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|