[PATCH v5 00/15] vDPA shadow virtqueue

Eugenio Pérez posted 15 patches 2 years, 1 month ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20220307153334.3854134-1-eperezma@redhat.com
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Peter Xu <peterx@redhat.com>, Jason Wang <jasowang@redhat.com>, Eric Blake <eblake@redhat.com>, Markus Armbruster <armbru@redhat.com>
There is a newer version of this series
qapi/net.json                      |   8 +-
hw/virtio/vhost-iova-tree.h        |  27 ++
hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
include/hw/virtio/vhost-vdpa.h     |   8 +
include/qemu/iova-tree.h           |  38 +-
hw/virtio/vhost-iova-tree.c        | 110 +++++
hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
net/vhost-vdpa.c                   |  48 ++-
util/iova-tree.c                   | 169 ++++++++
hw/virtio/meson.build              |   2 +-
11 files changed, 1633 insertions(+), 26 deletions(-)
create mode 100644 hw/virtio/vhost-iova-tree.h
create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
create mode 100644 hw/virtio/vhost-iova-tree.c
create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
[PATCH v5 00/15] vDPA shadow virtqueue
Posted by Eugenio Pérez 2 years, 1 month ago
This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
is intended as a new method of tracking the memory the devices touch
during a migration process: Instead of relay on vhost device's dirty
logging capability, SVQ intercepts the VQ dataplane forwarding the
descriptors between VM and device. This way qemu is the effective
writer of guests memory, like in qemu's virtio device operation.

When SVQ is enabled qemu offers a new virtual address space to the
device to read and write into, and it maps new vrings and the guest
memory in it. SVQ also intercepts kicks and calls between the device
and the guest. Used buffers relay would cause dirty memory being
tracked.

This effectively means that vDPA device passthrough is intercepted by
qemu. While SVQ should only be enabled at migration time, the switching
from regular mode to SVQ mode is left for a future series.

It is based on the ideas of DPDK SW assisted LM, in the series of
DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
not map the shadow vq in guest's VA, but in qemu's.

For qemu to use shadow virtqueues the guest virtio driver must not use
features like event_idx.

SVQ needs to be enabled with cmdline:

-netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on

The first three patches enables notifications forwarding with
assistance of qemu. It's easy to enable only this if the relevant
cmdline part of the last patch is applied on top of these.

Next four patches implement the actual buffer forwarding. However,
address are not translated from HVA so they will need a host device with
an iommu allowing them to access all of the HVA range.

The last part of the series uses properly the host iommu, so qemu
creates a new iova address space in the device's range and translates
the buffers in it. Finally, it adds the cmdline parameter.

Some simple performance tests with netperf were done. They used a nested
guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
baseline average of ~9009.96Mbps:
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
131072  16384  16384    30.01    9061.03
131072  16384  16384    30.01    8962.94
131072  16384  16384    30.01    9005.92

To enable SVQ buffers forwarding reduce throughput to about
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
131072  16384  16384    30.01    7689.72
131072  16384  16384    30.00    7752.07
131072  16384  16384    30.01    7750.30

However, many performance improvements were left out of this series for
simplicity, so difference should shrink in the future.

Comments are welcome.

TODO on future series:
* Event, indirect, packed, and others features of virtio.
* To support different set of features between the device<->SVQ and the
  SVQ<->guest communication.
* Support of device host notifier memory regions.
* To sepparate buffers forwarding in its own AIO context, so we can
  throw more threads to that task and we don't need to stop the main
  event loop.
* Support multiqueue virtio-net vdpa.
* Proper documentation.

Changes from v4:
* Iterate iova->hva tree instead on maintain own tree so we support HVA
  overlaps.
* Fix: Errno completion at failure.
* Rename x-svq to svq, so changes to stable does not affect cmdline parameter.

Changes from v3:
* Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
* Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
v3 link:
https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/

Changes from v2:
* Less assertions and more error handling in iova tree code.
* Better documentation, both fixing errors and making @param: format
* Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
  prefix at both times.
* Fix: Fo not use VirtQueueElement->len field, track separatedly.
* Split vhost_svq_{enable,disable}_notification, so the code looks more
  like the kernel driver code.
* Small improvements.
v2 link:
https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/

Changes from v1:
* Feature set at device->SVQ is now the same as SVQ->guest.
* Size of SVQ is not max available device size anymore, but guest's
  negotiated.
* Add VHOST_FILE_UNBIND kick and call fd treatment.
* Make SVQ a public struct
* Come back to previous approach to iova-tree
* Some assertions are now fail paths. Some errors are now log_guest.
* Only mask _F_LOG feature at vdpa_set_features svq enable path.
* Refactor some errors and messages. Add missing error unwindings.
* Add memory barrier at _F_NO_NOTIFY set.
* Stop checking for features flags out of transport range.
v1 link:
https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/

Changes from v4 RFC:
* Support of allocating / freeing iova ranges in IOVA tree. Extending
  already present iova-tree for that.
* Proper validation of guest features. Now SVQ can negotiate a
  different set of features with the device when enabled.
* Support of host notifiers memory regions
* Handling of SVQ full queue in case guest's descriptors span to
  different memory regions (qemu's VA chunks).
* Flush pending used buffers at end of SVQ operation.
* QMP command now looks by NetClientState name. Other devices will need
  to implement it's way to enable vdpa.
* Rename QMP command to set, so it looks more like a way of working
* Better use of qemu error system
* Make a few assertions proper error-handling paths.
* Add more documentation
* Less coupling of virtio / vhost, that could cause friction on changes
* Addressed many other small comments and small fixes.

Changes from v3 RFC:
  * Move everything to vhost-vdpa backend. A big change, this allowed
    some cleanup but more code has been added in other places.
  * More use of glib utilities, especially to manage memory.
v3 link:
https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html

Changes from v2 RFC:
  * Adding vhost-vdpa devices support
  * Fixed some memory leaks pointed by different comments
v2 link:
https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html

Changes from v1 RFC:
  * Use QMP instead of migration to start SVQ mode.
  * Only accepting IOMMU devices, closer behavior with target devices
    (vDPA)
  * Fix invalid masking/unmasking of vhost call fd.
  * Use of proper methods for synchronization.
  * No need to modify VirtIO device code, all of the changes are
    contained in vhost code.
  * Delete superfluous code.
  * An intermediate RFC was sent with only the notifications forwarding
    changes. It can be seen in
    https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
v1 link:
https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html

Eugenio Pérez (20):
      virtio: Add VIRTIO_F_QUEUE_STATE
      virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
      virtio: Add virtio_queue_is_host_notifier_enabled
      vhost: Make vhost_virtqueue_{start,stop} public
      vhost: Add x-vhost-enable-shadow-vq qmp
      vhost: Add VhostShadowVirtqueue
      vdpa: Register vdpa devices in a list
      vhost: Route guest->host notification through shadow virtqueue
      Add vhost_svq_get_svq_call_notifier
      Add vhost_svq_set_guest_call_notifier
      vdpa: Save call_fd in vhost-vdpa
      vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
      vhost: Route host->guest notification through shadow virtqueue
      virtio: Add vhost_shadow_vq_get_vring_addr
      vdpa: Save host and guest features
      vhost: Add vhost_svq_valid_device_features to shadow vq
      vhost: Shadow virtqueue buffers forwarding
      vhost: Add VhostIOVATree
      vhost: Use a tree to store memory mappings
      vdpa: Add custom IOTLB translations to SVQ

Eugenio Pérez (15):
  vhost: Add VhostShadowVirtqueue
  vhost: Add Shadow VirtQueue kick forwarding capabilities
  vhost: Add Shadow VirtQueue call forwarding capabilities
  vhost: Add vhost_svq_valid_features to shadow vq
  virtio: Add vhost_svq_get_vring_addr
  vdpa: adapt vhost_ops callbacks to svq
  vhost: Shadow virtqueue buffers forwarding
  util: Add iova_tree_alloc_map
  util: add iova_tree_find_iova
  vhost: Add VhostIOVATree
  vdpa: Add custom IOTLB translations to SVQ
  vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
  vdpa: Never set log_base addr if SVQ is enabled
  vdpa: Expose VHOST_F_LOG_ALL on SVQ
  vdpa: Add x-svq to NetdevVhostVDPAOptions

 qapi/net.json                      |   8 +-
 hw/virtio/vhost-iova-tree.h        |  27 ++
 hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
 include/hw/virtio/vhost-vdpa.h     |   8 +
 include/qemu/iova-tree.h           |  38 +-
 hw/virtio/vhost-iova-tree.c        | 110 +++++
 hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
 hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
 net/vhost-vdpa.c                   |  48 ++-
 util/iova-tree.c                   | 169 ++++++++
 hw/virtio/meson.build              |   2 +-
 11 files changed, 1633 insertions(+), 26 deletions(-)
 create mode 100644 hw/virtio/vhost-iova-tree.h
 create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
 create mode 100644 hw/virtio/vhost-iova-tree.c
 create mode 100644 hw/virtio/vhost-shadow-virtqueue.c

-- 
2.27.0

Re: [PATCH v5 00/15] vDPA shadow virtqueue
Posted by Jason Wang 2 years, 1 month ago
在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> is intended as a new method of tracking the memory the devices touch
> during a migration process: Instead of relay on vhost device's dirty
> logging capability, SVQ intercepts the VQ dataplane forwarding the
> descriptors between VM and device. This way qemu is the effective
> writer of guests memory, like in qemu's virtio device operation.
>
> When SVQ is enabled qemu offers a new virtual address space to the
> device to read and write into, and it maps new vrings and the guest
> memory in it. SVQ also intercepts kicks and calls between the device
> and the guest. Used buffers relay would cause dirty memory being
> tracked.
>
> This effectively means that vDPA device passthrough is intercepted by
> qemu. While SVQ should only be enabled at migration time, the switching
> from regular mode to SVQ mode is left for a future series.
>
> It is based on the ideas of DPDK SW assisted LM, in the series of
> DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> not map the shadow vq in guest's VA, but in qemu's.
>
> For qemu to use shadow virtqueues the guest virtio driver must not use
> features like event_idx.
>
> SVQ needs to be enabled with cmdline:
>
> -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
>
> The first three patches enables notifications forwarding with
> assistance of qemu. It's easy to enable only this if the relevant
> cmdline part of the last patch is applied on top of these.
>
> Next four patches implement the actual buffer forwarding. However,
> address are not translated from HVA so they will need a host device with
> an iommu allowing them to access all of the HVA range.
>
> The last part of the series uses properly the host iommu, so qemu
> creates a new iova address space in the device's range and translates
> the buffers in it. Finally, it adds the cmdline parameter.
>
> Some simple performance tests with netperf were done. They used a nested
> guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> baseline average of ~9009.96Mbps:
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
> 131072  16384  16384    30.01    9061.03
> 131072  16384  16384    30.01    8962.94
> 131072  16384  16384    30.01    9005.92
>
> To enable SVQ buffers forwarding reduce throughput to about
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
> 131072  16384  16384    30.01    7689.72
> 131072  16384  16384    30.00    7752.07
> 131072  16384  16384    30.01    7750.30
>
> However, many performance improvements were left out of this series for
> simplicity, so difference should shrink in the future.
>
> Comments are welcome.


Hi Michael:

What do you think of this series? It looks good to me as a start. The 
feature could only be enabled as a dedicated parameter. If you're ok, 
I'd try to make it for 7.0.

Thanks


>
> TODO on future series:
> * Event, indirect, packed, and others features of virtio.
> * To support different set of features between the device<->SVQ and the
>    SVQ<->guest communication.
> * Support of device host notifier memory regions.
> * To sepparate buffers forwarding in its own AIO context, so we can
>    throw more threads to that task and we don't need to stop the main
>    event loop.
> * Support multiqueue virtio-net vdpa.
> * Proper documentation.
>
> Changes from v4:
> * Iterate iova->hva tree instead on maintain own tree so we support HVA
>    overlaps.
> * Fix: Errno completion at failure.
> * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
>
> Changes from v3:
> * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> v3 link:
> https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
>
> Changes from v2:
> * Less assertions and more error handling in iova tree code.
> * Better documentation, both fixing errors and making @param: format
> * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
>    prefix at both times.
> * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> * Split vhost_svq_{enable,disable}_notification, so the code looks more
>    like the kernel driver code.
> * Small improvements.
> v2 link:
> https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
>
> Changes from v1:
> * Feature set at device->SVQ is now the same as SVQ->guest.
> * Size of SVQ is not max available device size anymore, but guest's
>    negotiated.
> * Add VHOST_FILE_UNBIND kick and call fd treatment.
> * Make SVQ a public struct
> * Come back to previous approach to iova-tree
> * Some assertions are now fail paths. Some errors are now log_guest.
> * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> * Refactor some errors and messages. Add missing error unwindings.
> * Add memory barrier at _F_NO_NOTIFY set.
> * Stop checking for features flags out of transport range.
> v1 link:
> https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
>
> Changes from v4 RFC:
> * Support of allocating / freeing iova ranges in IOVA tree. Extending
>    already present iova-tree for that.
> * Proper validation of guest features. Now SVQ can negotiate a
>    different set of features with the device when enabled.
> * Support of host notifiers memory regions
> * Handling of SVQ full queue in case guest's descriptors span to
>    different memory regions (qemu's VA chunks).
> * Flush pending used buffers at end of SVQ operation.
> * QMP command now looks by NetClientState name. Other devices will need
>    to implement it's way to enable vdpa.
> * Rename QMP command to set, so it looks more like a way of working
> * Better use of qemu error system
> * Make a few assertions proper error-handling paths.
> * Add more documentation
> * Less coupling of virtio / vhost, that could cause friction on changes
> * Addressed many other small comments and small fixes.
>
> Changes from v3 RFC:
>    * Move everything to vhost-vdpa backend. A big change, this allowed
>      some cleanup but more code has been added in other places.
>    * More use of glib utilities, especially to manage memory.
> v3 link:
> https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
>
> Changes from v2 RFC:
>    * Adding vhost-vdpa devices support
>    * Fixed some memory leaks pointed by different comments
> v2 link:
> https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
>
> Changes from v1 RFC:
>    * Use QMP instead of migration to start SVQ mode.
>    * Only accepting IOMMU devices, closer behavior with target devices
>      (vDPA)
>    * Fix invalid masking/unmasking of vhost call fd.
>    * Use of proper methods for synchronization.
>    * No need to modify VirtIO device code, all of the changes are
>      contained in vhost code.
>    * Delete superfluous code.
>    * An intermediate RFC was sent with only the notifications forwarding
>      changes. It can be seen in
>      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> v1 link:
> https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
>
> Eugenio Pérez (20):
>        virtio: Add VIRTIO_F_QUEUE_STATE
>        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
>        virtio: Add virtio_queue_is_host_notifier_enabled
>        vhost: Make vhost_virtqueue_{start,stop} public
>        vhost: Add x-vhost-enable-shadow-vq qmp
>        vhost: Add VhostShadowVirtqueue
>        vdpa: Register vdpa devices in a list
>        vhost: Route guest->host notification through shadow virtqueue
>        Add vhost_svq_get_svq_call_notifier
>        Add vhost_svq_set_guest_call_notifier
>        vdpa: Save call_fd in vhost-vdpa
>        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
>        vhost: Route host->guest notification through shadow virtqueue
>        virtio: Add vhost_shadow_vq_get_vring_addr
>        vdpa: Save host and guest features
>        vhost: Add vhost_svq_valid_device_features to shadow vq
>        vhost: Shadow virtqueue buffers forwarding
>        vhost: Add VhostIOVATree
>        vhost: Use a tree to store memory mappings
>        vdpa: Add custom IOTLB translations to SVQ
>
> Eugenio Pérez (15):
>    vhost: Add VhostShadowVirtqueue
>    vhost: Add Shadow VirtQueue kick forwarding capabilities
>    vhost: Add Shadow VirtQueue call forwarding capabilities
>    vhost: Add vhost_svq_valid_features to shadow vq
>    virtio: Add vhost_svq_get_vring_addr
>    vdpa: adapt vhost_ops callbacks to svq
>    vhost: Shadow virtqueue buffers forwarding
>    util: Add iova_tree_alloc_map
>    util: add iova_tree_find_iova
>    vhost: Add VhostIOVATree
>    vdpa: Add custom IOTLB translations to SVQ
>    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
>    vdpa: Never set log_base addr if SVQ is enabled
>    vdpa: Expose VHOST_F_LOG_ALL on SVQ
>    vdpa: Add x-svq to NetdevVhostVDPAOptions
>
>   qapi/net.json                      |   8 +-
>   hw/virtio/vhost-iova-tree.h        |  27 ++
>   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
>   include/hw/virtio/vhost-vdpa.h     |   8 +
>   include/qemu/iova-tree.h           |  38 +-
>   hw/virtio/vhost-iova-tree.c        | 110 +++++
>   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
>   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
>   net/vhost-vdpa.c                   |  48 ++-
>   util/iova-tree.c                   | 169 ++++++++
>   hw/virtio/meson.build              |   2 +-
>   11 files changed, 1633 insertions(+), 26 deletions(-)
>   create mode 100644 hw/virtio/vhost-iova-tree.h
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
>   create mode 100644 hw/virtio/vhost-iova-tree.c
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
>


Re: [PATCH v5 00/15] vDPA shadow virtqueue
Posted by Michael S. Tsirkin 2 years, 1 month ago
On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> 
> 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > is intended as a new method of tracking the memory the devices touch
> > during a migration process: Instead of relay on vhost device's dirty
> > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > descriptors between VM and device. This way qemu is the effective
> > writer of guests memory, like in qemu's virtio device operation.
> > 
> > When SVQ is enabled qemu offers a new virtual address space to the
> > device to read and write into, and it maps new vrings and the guest
> > memory in it. SVQ also intercepts kicks and calls between the device
> > and the guest. Used buffers relay would cause dirty memory being
> > tracked.
> > 
> > This effectively means that vDPA device passthrough is intercepted by
> > qemu. While SVQ should only be enabled at migration time, the switching
> > from regular mode to SVQ mode is left for a future series.
> > 
> > It is based on the ideas of DPDK SW assisted LM, in the series of
> > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > not map the shadow vq in guest's VA, but in qemu's.
> > 
> > For qemu to use shadow virtqueues the guest virtio driver must not use
> > features like event_idx.
> > 
> > SVQ needs to be enabled with cmdline:
> > 
> > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on

A stable API for an incomplete feature is a problem imho.


> > 
> > The first three patches enables notifications forwarding with
> > assistance of qemu. It's easy to enable only this if the relevant
> > cmdline part of the last patch is applied on top of these.
> > 
> > Next four patches implement the actual buffer forwarding. However,
> > address are not translated from HVA so they will need a host device with
> > an iommu allowing them to access all of the HVA range.
> > 
> > The last part of the series uses properly the host iommu, so qemu
> > creates a new iova address space in the device's range and translates
> > the buffers in it. Finally, it adds the cmdline parameter.
> > 
> > Some simple performance tests with netperf were done. They used a nested
> > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > baseline average of ~9009.96Mbps:
> > Recv   Send    Send
> > Socket Socket  Message  Elapsed
> > Size   Size    Size     Time     Throughput
> > bytes  bytes   bytes    secs.    10^6bits/sec
> > 131072  16384  16384    30.01    9061.03
> > 131072  16384  16384    30.01    8962.94
> > 131072  16384  16384    30.01    9005.92
> > 
> > To enable SVQ buffers forwarding reduce throughput to about
> > Recv   Send    Send
> > Socket Socket  Message  Elapsed
> > Size   Size    Size     Time     Throughput
> > bytes  bytes   bytes    secs.    10^6bits/sec
> > 131072  16384  16384    30.01    7689.72
> > 131072  16384  16384    30.00    7752.07
> > 131072  16384  16384    30.01    7750.30
> > 
> > However, many performance improvements were left out of this series for
> > simplicity, so difference should shrink in the future.
> > 
> > Comments are welcome.
> 
> 
> Hi Michael:
> 
> What do you think of this series? It looks good to me as a start. The
> feature could only be enabled as a dedicated parameter. If you're ok, I'd
> try to make it for 7.0.
> 
> Thanks

Well that's cutting it awfully close, and it's not really useful
at the current stage, is it?

The IOVA trick does not feel complete either.

> 
> > 
> > TODO on future series:
> > * Event, indirect, packed, and others features of virtio.
> > * To support different set of features between the device<->SVQ and the
> >    SVQ<->guest communication.
> > * Support of device host notifier memory regions.
> > * To sepparate buffers forwarding in its own AIO context, so we can
> >    throw more threads to that task and we don't need to stop the main
> >    event loop.
> > * Support multiqueue virtio-net vdpa.
> > * Proper documentation.
> > 
> > Changes from v4:
> > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> >    overlaps.
> > * Fix: Errno completion at failure.
> > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > 
> > Changes from v3:
> > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > v3 link:
> > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > 
> > Changes from v2:
> > * Less assertions and more error handling in iova tree code.
> > * Better documentation, both fixing errors and making @param: format
> > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> >    prefix at both times.
> > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> >    like the kernel driver code.
> > * Small improvements.
> > v2 link:
> > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > 
> > Changes from v1:
> > * Feature set at device->SVQ is now the same as SVQ->guest.
> > * Size of SVQ is not max available device size anymore, but guest's
> >    negotiated.
> > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > * Make SVQ a public struct
> > * Come back to previous approach to iova-tree
> > * Some assertions are now fail paths. Some errors are now log_guest.
> > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > * Refactor some errors and messages. Add missing error unwindings.
> > * Add memory barrier at _F_NO_NOTIFY set.
> > * Stop checking for features flags out of transport range.
> > v1 link:
> > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > 
> > Changes from v4 RFC:
> > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> >    already present iova-tree for that.
> > * Proper validation of guest features. Now SVQ can negotiate a
> >    different set of features with the device when enabled.
> > * Support of host notifiers memory regions
> > * Handling of SVQ full queue in case guest's descriptors span to
> >    different memory regions (qemu's VA chunks).
> > * Flush pending used buffers at end of SVQ operation.
> > * QMP command now looks by NetClientState name. Other devices will need
> >    to implement it's way to enable vdpa.
> > * Rename QMP command to set, so it looks more like a way of working
> > * Better use of qemu error system
> > * Make a few assertions proper error-handling paths.
> > * Add more documentation
> > * Less coupling of virtio / vhost, that could cause friction on changes
> > * Addressed many other small comments and small fixes.
> > 
> > Changes from v3 RFC:
> >    * Move everything to vhost-vdpa backend. A big change, this allowed
> >      some cleanup but more code has been added in other places.
> >    * More use of glib utilities, especially to manage memory.
> > v3 link:
> > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > 
> > Changes from v2 RFC:
> >    * Adding vhost-vdpa devices support
> >    * Fixed some memory leaks pointed by different comments
> > v2 link:
> > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > 
> > Changes from v1 RFC:
> >    * Use QMP instead of migration to start SVQ mode.
> >    * Only accepting IOMMU devices, closer behavior with target devices
> >      (vDPA)
> >    * Fix invalid masking/unmasking of vhost call fd.
> >    * Use of proper methods for synchronization.
> >    * No need to modify VirtIO device code, all of the changes are
> >      contained in vhost code.
> >    * Delete superfluous code.
> >    * An intermediate RFC was sent with only the notifications forwarding
> >      changes. It can be seen in
> >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > v1 link:
> > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > 
> > Eugenio Pérez (20):
> >        virtio: Add VIRTIO_F_QUEUE_STATE
> >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> >        virtio: Add virtio_queue_is_host_notifier_enabled
> >        vhost: Make vhost_virtqueue_{start,stop} public
> >        vhost: Add x-vhost-enable-shadow-vq qmp
> >        vhost: Add VhostShadowVirtqueue
> >        vdpa: Register vdpa devices in a list
> >        vhost: Route guest->host notification through shadow virtqueue
> >        Add vhost_svq_get_svq_call_notifier
> >        Add vhost_svq_set_guest_call_notifier
> >        vdpa: Save call_fd in vhost-vdpa
> >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> >        vhost: Route host->guest notification through shadow virtqueue
> >        virtio: Add vhost_shadow_vq_get_vring_addr
> >        vdpa: Save host and guest features
> >        vhost: Add vhost_svq_valid_device_features to shadow vq
> >        vhost: Shadow virtqueue buffers forwarding
> >        vhost: Add VhostIOVATree
> >        vhost: Use a tree to store memory mappings
> >        vdpa: Add custom IOTLB translations to SVQ
> > 
> > Eugenio Pérez (15):
> >    vhost: Add VhostShadowVirtqueue
> >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> >    vhost: Add Shadow VirtQueue call forwarding capabilities
> >    vhost: Add vhost_svq_valid_features to shadow vq
> >    virtio: Add vhost_svq_get_vring_addr
> >    vdpa: adapt vhost_ops callbacks to svq
> >    vhost: Shadow virtqueue buffers forwarding
> >    util: Add iova_tree_alloc_map
> >    util: add iova_tree_find_iova
> >    vhost: Add VhostIOVATree
> >    vdpa: Add custom IOTLB translations to SVQ
> >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> >    vdpa: Never set log_base addr if SVQ is enabled
> >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > 
> >   qapi/net.json                      |   8 +-
> >   hw/virtio/vhost-iova-tree.h        |  27 ++
> >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> >   include/hw/virtio/vhost-vdpa.h     |   8 +
> >   include/qemu/iova-tree.h           |  38 +-
> >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> >   net/vhost-vdpa.c                   |  48 ++-
> >   util/iova-tree.c                   | 169 ++++++++
> >   hw/virtio/meson.build              |   2 +-
> >   11 files changed, 1633 insertions(+), 26 deletions(-)
> >   create mode 100644 hw/virtio/vhost-iova-tree.h
> >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> >   create mode 100644 hw/virtio/vhost-iova-tree.c
> >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > 


Re: [PATCH v5 00/15] vDPA shadow virtqueue
Posted by Jason Wang 2 years, 1 month ago
On Tue, Mar 8, 2022 at 3:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> >
> > 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > > is intended as a new method of tracking the memory the devices touch
> > > during a migration process: Instead of relay on vhost device's dirty
> > > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > > descriptors between VM and device. This way qemu is the effective
> > > writer of guests memory, like in qemu's virtio device operation.
> > >
> > > When SVQ is enabled qemu offers a new virtual address space to the
> > > device to read and write into, and it maps new vrings and the guest
> > > memory in it. SVQ also intercepts kicks and calls between the device
> > > and the guest. Used buffers relay would cause dirty memory being
> > > tracked.
> > >
> > > This effectively means that vDPA device passthrough is intercepted by
> > > qemu. While SVQ should only be enabled at migration time, the switching
> > > from regular mode to SVQ mode is left for a future series.
> > >
> > > It is based on the ideas of DPDK SW assisted LM, in the series of
> > > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > > not map the shadow vq in guest's VA, but in qemu's.
> > >
> > > For qemu to use shadow virtqueues the guest virtio driver must not use
> > > features like event_idx.
> > >
> > > SVQ needs to be enabled with cmdline:
> > >
> > > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
>
> A stable API for an incomplete feature is a problem imho.

It should be "x-svq".

>
>
> > >
> > > The first three patches enables notifications forwarding with
> > > assistance of qemu. It's easy to enable only this if the relevant
> > > cmdline part of the last patch is applied on top of these.
> > >
> > > Next four patches implement the actual buffer forwarding. However,
> > > address are not translated from HVA so they will need a host device with
> > > an iommu allowing them to access all of the HVA range.
> > >
> > > The last part of the series uses properly the host iommu, so qemu
> > > creates a new iova address space in the device's range and translates
> > > the buffers in it. Finally, it adds the cmdline parameter.
> > >
> > > Some simple performance tests with netperf were done. They used a nested
> > > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > > baseline average of ~9009.96Mbps:
> > > Recv   Send    Send
> > > Socket Socket  Message  Elapsed
> > > Size   Size    Size     Time     Throughput
> > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > 131072  16384  16384    30.01    9061.03
> > > 131072  16384  16384    30.01    8962.94
> > > 131072  16384  16384    30.01    9005.92
> > >
> > > To enable SVQ buffers forwarding reduce throughput to about
> > > Recv   Send    Send
> > > Socket Socket  Message  Elapsed
> > > Size   Size    Size     Time     Throughput
> > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > 131072  16384  16384    30.01    7689.72
> > > 131072  16384  16384    30.00    7752.07
> > > 131072  16384  16384    30.01    7750.30
> > >
> > > However, many performance improvements were left out of this series for
> > > simplicity, so difference should shrink in the future.
> > >
> > > Comments are welcome.
> >
> >
> > Hi Michael:
> >
> > What do you think of this series? It looks good to me as a start. The
> > feature could only be enabled as a dedicated parameter. If you're ok, I'd
> > try to make it for 7.0.
> >
> > Thanks
>
> Well that's cutting it awfully close, and it's not really useful
> at the current stage, is it?

This allows vDPA to be migrated when using "x-svq=on". But anyhow it's
experimental.

>
> The IOVA trick does not feel complete either.

I don't get here. We don't use any IOVA trick as DPDK (it reserve IOVA
for shadow vq) did. So we won't suffer from the issues of DPDK.

Thanks

>
> >
> > >
> > > TODO on future series:
> > > * Event, indirect, packed, and others features of virtio.
> > > * To support different set of features between the device<->SVQ and the
> > >    SVQ<->guest communication.
> > > * Support of device host notifier memory regions.
> > > * To sepparate buffers forwarding in its own AIO context, so we can
> > >    throw more threads to that task and we don't need to stop the main
> > >    event loop.
> > > * Support multiqueue virtio-net vdpa.
> > > * Proper documentation.
> > >
> > > Changes from v4:
> > > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> > >    overlaps.
> > > * Fix: Errno completion at failure.
> > > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > >
> > > Changes from v3:
> > > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > > v3 link:
> > > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > >
> > > Changes from v2:
> > > * Less assertions and more error handling in iova tree code.
> > > * Better documentation, both fixing errors and making @param: format
> > > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> > >    prefix at both times.
> > > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> > >    like the kernel driver code.
> > > * Small improvements.
> > > v2 link:
> > > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > >
> > > Changes from v1:
> > > * Feature set at device->SVQ is now the same as SVQ->guest.
> > > * Size of SVQ is not max available device size anymore, but guest's
> > >    negotiated.
> > > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > > * Make SVQ a public struct
> > > * Come back to previous approach to iova-tree
> > > * Some assertions are now fail paths. Some errors are now log_guest.
> > > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > > * Refactor some errors and messages. Add missing error unwindings.
> > > * Add memory barrier at _F_NO_NOTIFY set.
> > > * Stop checking for features flags out of transport range.
> > > v1 link:
> > > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > >
> > > Changes from v4 RFC:
> > > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> > >    already present iova-tree for that.
> > > * Proper validation of guest features. Now SVQ can negotiate a
> > >    different set of features with the device when enabled.
> > > * Support of host notifiers memory regions
> > > * Handling of SVQ full queue in case guest's descriptors span to
> > >    different memory regions (qemu's VA chunks).
> > > * Flush pending used buffers at end of SVQ operation.
> > > * QMP command now looks by NetClientState name. Other devices will need
> > >    to implement it's way to enable vdpa.
> > > * Rename QMP command to set, so it looks more like a way of working
> > > * Better use of qemu error system
> > > * Make a few assertions proper error-handling paths.
> > > * Add more documentation
> > > * Less coupling of virtio / vhost, that could cause friction on changes
> > > * Addressed many other small comments and small fixes.
> > >
> > > Changes from v3 RFC:
> > >    * Move everything to vhost-vdpa backend. A big change, this allowed
> > >      some cleanup but more code has been added in other places.
> > >    * More use of glib utilities, especially to manage memory.
> > > v3 link:
> > > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > >
> > > Changes from v2 RFC:
> > >    * Adding vhost-vdpa devices support
> > >    * Fixed some memory leaks pointed by different comments
> > > v2 link:
> > > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > >
> > > Changes from v1 RFC:
> > >    * Use QMP instead of migration to start SVQ mode.
> > >    * Only accepting IOMMU devices, closer behavior with target devices
> > >      (vDPA)
> > >    * Fix invalid masking/unmasking of vhost call fd.
> > >    * Use of proper methods for synchronization.
> > >    * No need to modify VirtIO device code, all of the changes are
> > >      contained in vhost code.
> > >    * Delete superfluous code.
> > >    * An intermediate RFC was sent with only the notifications forwarding
> > >      changes. It can be seen in
> > >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > > v1 link:
> > > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > >
> > > Eugenio Pérez (20):
> > >        virtio: Add VIRTIO_F_QUEUE_STATE
> > >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> > >        virtio: Add virtio_queue_is_host_notifier_enabled
> > >        vhost: Make vhost_virtqueue_{start,stop} public
> > >        vhost: Add x-vhost-enable-shadow-vq qmp
> > >        vhost: Add VhostShadowVirtqueue
> > >        vdpa: Register vdpa devices in a list
> > >        vhost: Route guest->host notification through shadow virtqueue
> > >        Add vhost_svq_get_svq_call_notifier
> > >        Add vhost_svq_set_guest_call_notifier
> > >        vdpa: Save call_fd in vhost-vdpa
> > >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> > >        vhost: Route host->guest notification through shadow virtqueue
> > >        virtio: Add vhost_shadow_vq_get_vring_addr
> > >        vdpa: Save host and guest features
> > >        vhost: Add vhost_svq_valid_device_features to shadow vq
> > >        vhost: Shadow virtqueue buffers forwarding
> > >        vhost: Add VhostIOVATree
> > >        vhost: Use a tree to store memory mappings
> > >        vdpa: Add custom IOTLB translations to SVQ
> > >
> > > Eugenio Pérez (15):
> > >    vhost: Add VhostShadowVirtqueue
> > >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> > >    vhost: Add Shadow VirtQueue call forwarding capabilities
> > >    vhost: Add vhost_svq_valid_features to shadow vq
> > >    virtio: Add vhost_svq_get_vring_addr
> > >    vdpa: adapt vhost_ops callbacks to svq
> > >    vhost: Shadow virtqueue buffers forwarding
> > >    util: Add iova_tree_alloc_map
> > >    util: add iova_tree_find_iova
> > >    vhost: Add VhostIOVATree
> > >    vdpa: Add custom IOTLB translations to SVQ
> > >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> > >    vdpa: Never set log_base addr if SVQ is enabled
> > >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > >
> > >   qapi/net.json                      |   8 +-
> > >   hw/virtio/vhost-iova-tree.h        |  27 ++
> > >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> > >   include/hw/virtio/vhost-vdpa.h     |   8 +
> > >   include/qemu/iova-tree.h           |  38 +-
> > >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> > >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> > >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> > >   net/vhost-vdpa.c                   |  48 ++-
> > >   util/iova-tree.c                   | 169 ++++++++
> > >   hw/virtio/meson.build              |   2 +-
> > >   11 files changed, 1633 insertions(+), 26 deletions(-)
> > >   create mode 100644 hw/virtio/vhost-iova-tree.h
> > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> > >   create mode 100644 hw/virtio/vhost-iova-tree.c
> > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > >
>
Re: [PATCH v5 00/15] vDPA shadow virtqueue
Posted by Michael S. Tsirkin 2 years, 1 month ago
On Tue, Mar 08, 2022 at 03:14:35PM +0800, Jason Wang wrote:
> On Tue, Mar 8, 2022 at 3:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> > >
> > > 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > > > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > > > is intended as a new method of tracking the memory the devices touch
> > > > during a migration process: Instead of relay on vhost device's dirty
> > > > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > > > descriptors between VM and device. This way qemu is the effective
> > > > writer of guests memory, like in qemu's virtio device operation.
> > > >
> > > > When SVQ is enabled qemu offers a new virtual address space to the
> > > > device to read and write into, and it maps new vrings and the guest
> > > > memory in it. SVQ also intercepts kicks and calls between the device
> > > > and the guest. Used buffers relay would cause dirty memory being
> > > > tracked.
> > > >
> > > > This effectively means that vDPA device passthrough is intercepted by
> > > > qemu. While SVQ should only be enabled at migration time, the switching
> > > > from regular mode to SVQ mode is left for a future series.
> > > >
> > > > It is based on the ideas of DPDK SW assisted LM, in the series of
> > > > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > > > not map the shadow vq in guest's VA, but in qemu's.
> > > >
> > > > For qemu to use shadow virtqueues the guest virtio driver must not use
> > > > features like event_idx.
> > > >
> > > > SVQ needs to be enabled with cmdline:
> > > >
> > > > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
> >
> > A stable API for an incomplete feature is a problem imho.
> 
> It should be "x-svq".


Well look at patch 15.

> >
> >
> > > >
> > > > The first three patches enables notifications forwarding with
> > > > assistance of qemu. It's easy to enable only this if the relevant
> > > > cmdline part of the last patch is applied on top of these.
> > > >
> > > > Next four patches implement the actual buffer forwarding. However,
> > > > address are not translated from HVA so they will need a host device with
> > > > an iommu allowing them to access all of the HVA range.
> > > >
> > > > The last part of the series uses properly the host iommu, so qemu
> > > > creates a new iova address space in the device's range and translates
> > > > the buffers in it. Finally, it adds the cmdline parameter.
> > > >
> > > > Some simple performance tests with netperf were done. They used a nested
> > > > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > > > baseline average of ~9009.96Mbps:
> > > > Recv   Send    Send
> > > > Socket Socket  Message  Elapsed
> > > > Size   Size    Size     Time     Throughput
> > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > 131072  16384  16384    30.01    9061.03
> > > > 131072  16384  16384    30.01    8962.94
> > > > 131072  16384  16384    30.01    9005.92
> > > >
> > > > To enable SVQ buffers forwarding reduce throughput to about
> > > > Recv   Send    Send
> > > > Socket Socket  Message  Elapsed
> > > > Size   Size    Size     Time     Throughput
> > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > 131072  16384  16384    30.01    7689.72
> > > > 131072  16384  16384    30.00    7752.07
> > > > 131072  16384  16384    30.01    7750.30
> > > >
> > > > However, many performance improvements were left out of this series for
> > > > simplicity, so difference should shrink in the future.
> > > >
> > > > Comments are welcome.
> > >
> > >
> > > Hi Michael:
> > >
> > > What do you think of this series? It looks good to me as a start. The
> > > feature could only be enabled as a dedicated parameter. If you're ok, I'd
> > > try to make it for 7.0.
> > >
> > > Thanks
> >
> > Well that's cutting it awfully close, and it's not really useful
> > at the current stage, is it?
> 
> This allows vDPA to be migrated when using "x-svq=on".
> But anyhow it's
> experimental.

it's less experimental than incomplete. It seems pretty clearly not
the way it will work down the road, we don't want svq involved
at all times.

> >
> > The IOVA trick does not feel complete either.
> 
> I don't get here. We don't use any IOVA trick as DPDK (it reserve IOVA
> for shadow vq) did. So we won't suffer from the issues of DPDK.
> 
> Thanks

Maybe I misundrstand how this all works.
I refer to all the iova_tree_alloc_map things.

> >
> > >
> > > >
> > > > TODO on future series:
> > > > * Event, indirect, packed, and others features of virtio.
> > > > * To support different set of features between the device<->SVQ and the
> > > >    SVQ<->guest communication.
> > > > * Support of device host notifier memory regions.
> > > > * To sepparate buffers forwarding in its own AIO context, so we can
> > > >    throw more threads to that task and we don't need to stop the main
> > > >    event loop.
> > > > * Support multiqueue virtio-net vdpa.
> > > > * Proper documentation.
> > > >
> > > > Changes from v4:
> > > > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> > > >    overlaps.
> > > > * Fix: Errno completion at failure.
> > > > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > > >
> > > > Changes from v3:
> > > > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > > > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > > > v3 link:
> > > > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > > >
> > > > Changes from v2:
> > > > * Less assertions and more error handling in iova tree code.
> > > > * Better documentation, both fixing errors and making @param: format
> > > > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> > > >    prefix at both times.
> > > > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > > > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> > > >    like the kernel driver code.
> > > > * Small improvements.
> > > > v2 link:
> > > > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > > >
> > > > Changes from v1:
> > > > * Feature set at device->SVQ is now the same as SVQ->guest.
> > > > * Size of SVQ is not max available device size anymore, but guest's
> > > >    negotiated.
> > > > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > > > * Make SVQ a public struct
> > > > * Come back to previous approach to iova-tree
> > > > * Some assertions are now fail paths. Some errors are now log_guest.
> > > > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > > > * Refactor some errors and messages. Add missing error unwindings.
> > > > * Add memory barrier at _F_NO_NOTIFY set.
> > > > * Stop checking for features flags out of transport range.
> > > > v1 link:
> > > > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > > >
> > > > Changes from v4 RFC:
> > > > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> > > >    already present iova-tree for that.
> > > > * Proper validation of guest features. Now SVQ can negotiate a
> > > >    different set of features with the device when enabled.
> > > > * Support of host notifiers memory regions
> > > > * Handling of SVQ full queue in case guest's descriptors span to
> > > >    different memory regions (qemu's VA chunks).
> > > > * Flush pending used buffers at end of SVQ operation.
> > > > * QMP command now looks by NetClientState name. Other devices will need
> > > >    to implement it's way to enable vdpa.
> > > > * Rename QMP command to set, so it looks more like a way of working
> > > > * Better use of qemu error system
> > > > * Make a few assertions proper error-handling paths.
> > > > * Add more documentation
> > > > * Less coupling of virtio / vhost, that could cause friction on changes
> > > > * Addressed many other small comments and small fixes.
> > > >
> > > > Changes from v3 RFC:
> > > >    * Move everything to vhost-vdpa backend. A big change, this allowed
> > > >      some cleanup but more code has been added in other places.
> > > >    * More use of glib utilities, especially to manage memory.
> > > > v3 link:
> > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > > >
> > > > Changes from v2 RFC:
> > > >    * Adding vhost-vdpa devices support
> > > >    * Fixed some memory leaks pointed by different comments
> > > > v2 link:
> > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > > >
> > > > Changes from v1 RFC:
> > > >    * Use QMP instead of migration to start SVQ mode.
> > > >    * Only accepting IOMMU devices, closer behavior with target devices
> > > >      (vDPA)
> > > >    * Fix invalid masking/unmasking of vhost call fd.
> > > >    * Use of proper methods for synchronization.
> > > >    * No need to modify VirtIO device code, all of the changes are
> > > >      contained in vhost code.
> > > >    * Delete superfluous code.
> > > >    * An intermediate RFC was sent with only the notifications forwarding
> > > >      changes. It can be seen in
> > > >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > > > v1 link:
> > > > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > > >
> > > > Eugenio Pérez (20):
> > > >        virtio: Add VIRTIO_F_QUEUE_STATE
> > > >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> > > >        virtio: Add virtio_queue_is_host_notifier_enabled
> > > >        vhost: Make vhost_virtqueue_{start,stop} public
> > > >        vhost: Add x-vhost-enable-shadow-vq qmp
> > > >        vhost: Add VhostShadowVirtqueue
> > > >        vdpa: Register vdpa devices in a list
> > > >        vhost: Route guest->host notification through shadow virtqueue
> > > >        Add vhost_svq_get_svq_call_notifier
> > > >        Add vhost_svq_set_guest_call_notifier
> > > >        vdpa: Save call_fd in vhost-vdpa
> > > >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> > > >        vhost: Route host->guest notification through shadow virtqueue
> > > >        virtio: Add vhost_shadow_vq_get_vring_addr
> > > >        vdpa: Save host and guest features
> > > >        vhost: Add vhost_svq_valid_device_features to shadow vq
> > > >        vhost: Shadow virtqueue buffers forwarding
> > > >        vhost: Add VhostIOVATree
> > > >        vhost: Use a tree to store memory mappings
> > > >        vdpa: Add custom IOTLB translations to SVQ
> > > >
> > > > Eugenio Pérez (15):
> > > >    vhost: Add VhostShadowVirtqueue
> > > >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> > > >    vhost: Add Shadow VirtQueue call forwarding capabilities
> > > >    vhost: Add vhost_svq_valid_features to shadow vq
> > > >    virtio: Add vhost_svq_get_vring_addr
> > > >    vdpa: adapt vhost_ops callbacks to svq
> > > >    vhost: Shadow virtqueue buffers forwarding
> > > >    util: Add iova_tree_alloc_map
> > > >    util: add iova_tree_find_iova
> > > >    vhost: Add VhostIOVATree
> > > >    vdpa: Add custom IOTLB translations to SVQ
> > > >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> > > >    vdpa: Never set log_base addr if SVQ is enabled
> > > >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > >
> > > >   qapi/net.json                      |   8 +-
> > > >   hw/virtio/vhost-iova-tree.h        |  27 ++
> > > >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> > > >   include/hw/virtio/vhost-vdpa.h     |   8 +
> > > >   include/qemu/iova-tree.h           |  38 +-
> > > >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> > > >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> > > >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> > > >   net/vhost-vdpa.c                   |  48 ++-
> > > >   util/iova-tree.c                   | 169 ++++++++
> > > >   hw/virtio/meson.build              |   2 +-
> > > >   11 files changed, 1633 insertions(+), 26 deletions(-)
> > > >   create mode 100644 hw/virtio/vhost-iova-tree.h
> > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> > > >   create mode 100644 hw/virtio/vhost-iova-tree.c
> > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > > >
> >


Re: [PATCH v5 00/15] vDPA shadow virtqueue
Posted by Eugenio Perez Martin 2 years, 1 month ago
On Tue, Mar 8, 2022 at 8:28 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Mar 08, 2022 at 03:14:35PM +0800, Jason Wang wrote:
> > On Tue, Mar 8, 2022 at 3:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> > > >
> > > > 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > > > > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > > > > is intended as a new method of tracking the memory the devices touch
> > > > > during a migration process: Instead of relay on vhost device's dirty
> > > > > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > > > > descriptors between VM and device. This way qemu is the effective
> > > > > writer of guests memory, like in qemu's virtio device operation.
> > > > >
> > > > > When SVQ is enabled qemu offers a new virtual address space to the
> > > > > device to read and write into, and it maps new vrings and the guest
> > > > > memory in it. SVQ also intercepts kicks and calls between the device
> > > > > and the guest. Used buffers relay would cause dirty memory being
> > > > > tracked.
> > > > >
> > > > > This effectively means that vDPA device passthrough is intercepted by
> > > > > qemu. While SVQ should only be enabled at migration time, the switching
> > > > > from regular mode to SVQ mode is left for a future series.
> > > > >
> > > > > It is based on the ideas of DPDK SW assisted LM, in the series of
> > > > > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > > > > not map the shadow vq in guest's VA, but in qemu's.
> > > > >
> > > > > For qemu to use shadow virtqueues the guest virtio driver must not use
> > > > > features like event_idx.
> > > > >
> > > > > SVQ needs to be enabled with cmdline:
> > > > >
> > > > > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
> > >
> > > A stable API for an incomplete feature is a problem imho.
> >
> > It should be "x-svq".
>
>
> Well look at patch 15.
>

(Adding here for completion)

I was pointed to name it svq and simply mark it as experimental using
the tag @unstable, since x- prefix was not homogeneous between stable
and unstable features.

> > >
> > >
> > > > >
> > > > > The first three patches enables notifications forwarding with
> > > > > assistance of qemu. It's easy to enable only this if the relevant
> > > > > cmdline part of the last patch is applied on top of these.
> > > > >
> > > > > Next four patches implement the actual buffer forwarding. However,
> > > > > address are not translated from HVA so they will need a host device with
> > > > > an iommu allowing them to access all of the HVA range.
> > > > >
> > > > > The last part of the series uses properly the host iommu, so qemu
> > > > > creates a new iova address space in the device's range and translates
> > > > > the buffers in it. Finally, it adds the cmdline parameter.
> > > > >
> > > > > Some simple performance tests with netperf were done. They used a nested
> > > > > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > > > > baseline average of ~9009.96Mbps:
> > > > > Recv   Send    Send
> > > > > Socket Socket  Message  Elapsed
> > > > > Size   Size    Size     Time     Throughput
> > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > 131072  16384  16384    30.01    9061.03
> > > > > 131072  16384  16384    30.01    8962.94
> > > > > 131072  16384  16384    30.01    9005.92
> > > > >
> > > > > To enable SVQ buffers forwarding reduce throughput to about
> > > > > Recv   Send    Send
> > > > > Socket Socket  Message  Elapsed
> > > > > Size   Size    Size     Time     Throughput
> > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > 131072  16384  16384    30.01    7689.72
> > > > > 131072  16384  16384    30.00    7752.07
> > > > > 131072  16384  16384    30.01    7750.30
> > > > >
> > > > > However, many performance improvements were left out of this series for
> > > > > simplicity, so difference should shrink in the future.
> > > > >
> > > > > Comments are welcome.
> > > >
> > > >
> > > > Hi Michael:
> > > >
> > > > What do you think of this series? It looks good to me as a start. The
> > > > feature could only be enabled as a dedicated parameter. If you're ok, I'd
> > > > try to make it for 7.0.
> > > >
> > > > Thanks
> > >
> > > Well that's cutting it awfully close, and it's not really useful
> > > at the current stage, is it?
> >
> > This allows vDPA to be migrated when using "x-svq=on".
> > But anyhow it's
> > experimental.
>
> it's less experimental than incomplete. It seems pretty clearly not
> the way it will work down the road, we don't want svq involved
> at all times.
>

That's right, but it is the intended way it works at migration time.
Just the switch from passthrough to migration time is missing, because
that part is discussed upstream in virtio. But it's pretty close to
the final form IMO: Both virtual and real devices have been already
tested with the switching, it's just it will make this series way
bigger and hard to review.

It already enables the migration of the simpler devices in its current
form, and the migration with all the machinery to enable svq
dynamically has been POC based on this.

> > >
> > > The IOVA trick does not feel complete either.
> >
> > I don't get here. We don't use any IOVA trick as DPDK (it reserve IOVA
> > for shadow vq) did. So we won't suffer from the issues of DPDK.
> >
> > Thanks
>
> Maybe I misundrstand how this all works.
> I refer to all the iova_tree_alloc_map things.
>

It allocates iova ranges any time the device needs to access memory,
either from guest or for qemu memory like shadow vrings, in the range
the device supports.

That part is pretty opaque and self-contained from the caller point of
view, so in my opinion it is not worth changing at the moment; If it
needs changes like more performance we can use a tree like previous
versions of this series. But I can give more details of it if needed.

Thanks!

> > >
> > > >
> > > > >
> > > > > TODO on future series:
> > > > > * Event, indirect, packed, and others features of virtio.
> > > > > * To support different set of features between the device<->SVQ and the
> > > > >    SVQ<->guest communication.
> > > > > * Support of device host notifier memory regions.
> > > > > * To sepparate buffers forwarding in its own AIO context, so we can
> > > > >    throw more threads to that task and we don't need to stop the main
> > > > >    event loop.
> > > > > * Support multiqueue virtio-net vdpa.
> > > > > * Proper documentation.
> > > > >
> > > > > Changes from v4:
> > > > > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> > > > >    overlaps.
> > > > > * Fix: Errno completion at failure.
> > > > > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > > > >
> > > > > Changes from v3:
> > > > > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > > > > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > > > > v3 link:
> > > > > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > > > >
> > > > > Changes from v2:
> > > > > * Less assertions and more error handling in iova tree code.
> > > > > * Better documentation, both fixing errors and making @param: format
> > > > > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> > > > >    prefix at both times.
> > > > > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > > > > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> > > > >    like the kernel driver code.
> > > > > * Small improvements.
> > > > > v2 link:
> > > > > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > > > >
> > > > > Changes from v1:
> > > > > * Feature set at device->SVQ is now the same as SVQ->guest.
> > > > > * Size of SVQ is not max available device size anymore, but guest's
> > > > >    negotiated.
> > > > > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > > > > * Make SVQ a public struct
> > > > > * Come back to previous approach to iova-tree
> > > > > * Some assertions are now fail paths. Some errors are now log_guest.
> > > > > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > > > > * Refactor some errors and messages. Add missing error unwindings.
> > > > > * Add memory barrier at _F_NO_NOTIFY set.
> > > > > * Stop checking for features flags out of transport range.
> > > > > v1 link:
> > > > > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > > > >
> > > > > Changes from v4 RFC:
> > > > > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> > > > >    already present iova-tree for that.
> > > > > * Proper validation of guest features. Now SVQ can negotiate a
> > > > >    different set of features with the device when enabled.
> > > > > * Support of host notifiers memory regions
> > > > > * Handling of SVQ full queue in case guest's descriptors span to
> > > > >    different memory regions (qemu's VA chunks).
> > > > > * Flush pending used buffers at end of SVQ operation.
> > > > > * QMP command now looks by NetClientState name. Other devices will need
> > > > >    to implement it's way to enable vdpa.
> > > > > * Rename QMP command to set, so it looks more like a way of working
> > > > > * Better use of qemu error system
> > > > > * Make a few assertions proper error-handling paths.
> > > > > * Add more documentation
> > > > > * Less coupling of virtio / vhost, that could cause friction on changes
> > > > > * Addressed many other small comments and small fixes.
> > > > >
> > > > > Changes from v3 RFC:
> > > > >    * Move everything to vhost-vdpa backend. A big change, this allowed
> > > > >      some cleanup but more code has been added in other places.
> > > > >    * More use of glib utilities, especially to manage memory.
> > > > > v3 link:
> > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > > > >
> > > > > Changes from v2 RFC:
> > > > >    * Adding vhost-vdpa devices support
> > > > >    * Fixed some memory leaks pointed by different comments
> > > > > v2 link:
> > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > > > >
> > > > > Changes from v1 RFC:
> > > > >    * Use QMP instead of migration to start SVQ mode.
> > > > >    * Only accepting IOMMU devices, closer behavior with target devices
> > > > >      (vDPA)
> > > > >    * Fix invalid masking/unmasking of vhost call fd.
> > > > >    * Use of proper methods for synchronization.
> > > > >    * No need to modify VirtIO device code, all of the changes are
> > > > >      contained in vhost code.
> > > > >    * Delete superfluous code.
> > > > >    * An intermediate RFC was sent with only the notifications forwarding
> > > > >      changes. It can be seen in
> > > > >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > > > > v1 link:
> > > > > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > > > >
> > > > > Eugenio Pérez (20):
> > > > >        virtio: Add VIRTIO_F_QUEUE_STATE
> > > > >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> > > > >        virtio: Add virtio_queue_is_host_notifier_enabled
> > > > >        vhost: Make vhost_virtqueue_{start,stop} public
> > > > >        vhost: Add x-vhost-enable-shadow-vq qmp
> > > > >        vhost: Add VhostShadowVirtqueue
> > > > >        vdpa: Register vdpa devices in a list
> > > > >        vhost: Route guest->host notification through shadow virtqueue
> > > > >        Add vhost_svq_get_svq_call_notifier
> > > > >        Add vhost_svq_set_guest_call_notifier
> > > > >        vdpa: Save call_fd in vhost-vdpa
> > > > >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> > > > >        vhost: Route host->guest notification through shadow virtqueue
> > > > >        virtio: Add vhost_shadow_vq_get_vring_addr
> > > > >        vdpa: Save host and guest features
> > > > >        vhost: Add vhost_svq_valid_device_features to shadow vq
> > > > >        vhost: Shadow virtqueue buffers forwarding
> > > > >        vhost: Add VhostIOVATree
> > > > >        vhost: Use a tree to store memory mappings
> > > > >        vdpa: Add custom IOTLB translations to SVQ
> > > > >
> > > > > Eugenio Pérez (15):
> > > > >    vhost: Add VhostShadowVirtqueue
> > > > >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> > > > >    vhost: Add Shadow VirtQueue call forwarding capabilities
> > > > >    vhost: Add vhost_svq_valid_features to shadow vq
> > > > >    virtio: Add vhost_svq_get_vring_addr
> > > > >    vdpa: adapt vhost_ops callbacks to svq
> > > > >    vhost: Shadow virtqueue buffers forwarding
> > > > >    util: Add iova_tree_alloc_map
> > > > >    util: add iova_tree_find_iova
> > > > >    vhost: Add VhostIOVATree
> > > > >    vdpa: Add custom IOTLB translations to SVQ
> > > > >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> > > > >    vdpa: Never set log_base addr if SVQ is enabled
> > > > >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > >
> > > > >   qapi/net.json                      |   8 +-
> > > > >   hw/virtio/vhost-iova-tree.h        |  27 ++
> > > > >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> > > > >   include/hw/virtio/vhost-vdpa.h     |   8 +
> > > > >   include/qemu/iova-tree.h           |  38 +-
> > > > >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> > > > >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> > > > >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> > > > >   net/vhost-vdpa.c                   |  48 ++-
> > > > >   util/iova-tree.c                   | 169 ++++++++
> > > > >   hw/virtio/meson.build              |   2 +-
> > > > >   11 files changed, 1633 insertions(+), 26 deletions(-)
> > > > >   create mode 100644 hw/virtio/vhost-iova-tree.h
> > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> > > > >   create mode 100644 hw/virtio/vhost-iova-tree.c
> > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > > > >
> > >
>
Re: [PATCH v5 00/15] vDPA shadow virtqueue
Posted by Jason Wang 2 years, 1 month ago
On Tue, Mar 8, 2022 at 3:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Mar 08, 2022 at 03:14:35PM +0800, Jason Wang wrote:
> > On Tue, Mar 8, 2022 at 3:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> > > >
> > > > 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > > > > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > > > > is intended as a new method of tracking the memory the devices touch
> > > > > during a migration process: Instead of relay on vhost device's dirty
> > > > > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > > > > descriptors between VM and device. This way qemu is the effective
> > > > > writer of guests memory, like in qemu's virtio device operation.
> > > > >
> > > > > When SVQ is enabled qemu offers a new virtual address space to the
> > > > > device to read and write into, and it maps new vrings and the guest
> > > > > memory in it. SVQ also intercepts kicks and calls between the device
> > > > > and the guest. Used buffers relay would cause dirty memory being
> > > > > tracked.
> > > > >
> > > > > This effectively means that vDPA device passthrough is intercepted by
> > > > > qemu. While SVQ should only be enabled at migration time, the switching
> > > > > from regular mode to SVQ mode is left for a future series.
> > > > >
> > > > > It is based on the ideas of DPDK SW assisted LM, in the series of
> > > > > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > > > > not map the shadow vq in guest's VA, but in qemu's.
> > > > >
> > > > > For qemu to use shadow virtqueues the guest virtio driver must not use
> > > > > features like event_idx.
> > > > >
> > > > > SVQ needs to be enabled with cmdline:
> > > > >
> > > > > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
> > >
> > > A stable API for an incomplete feature is a problem imho.
> >
> > It should be "x-svq".
>
>
> Well look at patch 15.

It's a bug that needs to be fixed.

>
> > >
> > >
> > > > >
> > > > > The first three patches enables notifications forwarding with
> > > > > assistance of qemu. It's easy to enable only this if the relevant
> > > > > cmdline part of the last patch is applied on top of these.
> > > > >
> > > > > Next four patches implement the actual buffer forwarding. However,
> > > > > address are not translated from HVA so they will need a host device with
> > > > > an iommu allowing them to access all of the HVA range.
> > > > >
> > > > > The last part of the series uses properly the host iommu, so qemu
> > > > > creates a new iova address space in the device's range and translates
> > > > > the buffers in it. Finally, it adds the cmdline parameter.
> > > > >
> > > > > Some simple performance tests with netperf were done. They used a nested
> > > > > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > > > > baseline average of ~9009.96Mbps:
> > > > > Recv   Send    Send
> > > > > Socket Socket  Message  Elapsed
> > > > > Size   Size    Size     Time     Throughput
> > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > 131072  16384  16384    30.01    9061.03
> > > > > 131072  16384  16384    30.01    8962.94
> > > > > 131072  16384  16384    30.01    9005.92
> > > > >
> > > > > To enable SVQ buffers forwarding reduce throughput to about
> > > > > Recv   Send    Send
> > > > > Socket Socket  Message  Elapsed
> > > > > Size   Size    Size     Time     Throughput
> > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > 131072  16384  16384    30.01    7689.72
> > > > > 131072  16384  16384    30.00    7752.07
> > > > > 131072  16384  16384    30.01    7750.30
> > > > >
> > > > > However, many performance improvements were left out of this series for
> > > > > simplicity, so difference should shrink in the future.
> > > > >
> > > > > Comments are welcome.
> > > >
> > > >
> > > > Hi Michael:
> > > >
> > > > What do you think of this series? It looks good to me as a start. The
> > > > feature could only be enabled as a dedicated parameter. If you're ok, I'd
> > > > try to make it for 7.0.
> > > >
> > > > Thanks
> > >
> > > Well that's cutting it awfully close, and it's not really useful
> > > at the current stage, is it?
> >
> > This allows vDPA to be migrated when using "x-svq=on".
> > But anyhow it's
> > experimental.
>
> it's less experimental than incomplete. It seems pretty clearly not
> the way it will work down the road, we don't want svq involved
> at all times.

Right, but SVQ could be used for other places e.g providing migration
compatibility when the destination lacks some features.

>
> > >
> > > The IOVA trick does not feel complete either.
> >
> > I don't get here. We don't use any IOVA trick as DPDK (it reserve IOVA
> > for shadow vq) did. So we won't suffer from the issues of DPDK.
> >
> > Thanks
>
> Maybe I misundrstand how this all works.
> I refer to all the iova_tree_alloc_map things.

It's a simple IOVA allocater actually. Anything wrong with that?

I'm fine with making it for the future release.

Thanks

>
> > >
> > > >
> > > > >
> > > > > TODO on future series:
> > > > > * Event, indirect, packed, and others features of virtio.
> > > > > * To support different set of features between the device<->SVQ and the
> > > > >    SVQ<->guest communication.
> > > > > * Support of device host notifier memory regions.
> > > > > * To sepparate buffers forwarding in its own AIO context, so we can
> > > > >    throw more threads to that task and we don't need to stop the main
> > > > >    event loop.
> > > > > * Support multiqueue virtio-net vdpa.
> > > > > * Proper documentation.
> > > > >
> > > > > Changes from v4:
> > > > > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> > > > >    overlaps.
> > > > > * Fix: Errno completion at failure.
> > > > > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > > > >
> > > > > Changes from v3:
> > > > > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > > > > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > > > > v3 link:
> > > > > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > > > >
> > > > > Changes from v2:
> > > > > * Less assertions and more error handling in iova tree code.
> > > > > * Better documentation, both fixing errors and making @param: format
> > > > > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> > > > >    prefix at both times.
> > > > > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > > > > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> > > > >    like the kernel driver code.
> > > > > * Small improvements.
> > > > > v2 link:
> > > > > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > > > >
> > > > > Changes from v1:
> > > > > * Feature set at device->SVQ is now the same as SVQ->guest.
> > > > > * Size of SVQ is not max available device size anymore, but guest's
> > > > >    negotiated.
> > > > > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > > > > * Make SVQ a public struct
> > > > > * Come back to previous approach to iova-tree
> > > > > * Some assertions are now fail paths. Some errors are now log_guest.
> > > > > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > > > > * Refactor some errors and messages. Add missing error unwindings.
> > > > > * Add memory barrier at _F_NO_NOTIFY set.
> > > > > * Stop checking for features flags out of transport range.
> > > > > v1 link:
> > > > > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > > > >
> > > > > Changes from v4 RFC:
> > > > > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> > > > >    already present iova-tree for that.
> > > > > * Proper validation of guest features. Now SVQ can negotiate a
> > > > >    different set of features with the device when enabled.
> > > > > * Support of host notifiers memory regions
> > > > > * Handling of SVQ full queue in case guest's descriptors span to
> > > > >    different memory regions (qemu's VA chunks).
> > > > > * Flush pending used buffers at end of SVQ operation.
> > > > > * QMP command now looks by NetClientState name. Other devices will need
> > > > >    to implement it's way to enable vdpa.
> > > > > * Rename QMP command to set, so it looks more like a way of working
> > > > > * Better use of qemu error system
> > > > > * Make a few assertions proper error-handling paths.
> > > > > * Add more documentation
> > > > > * Less coupling of virtio / vhost, that could cause friction on changes
> > > > > * Addressed many other small comments and small fixes.
> > > > >
> > > > > Changes from v3 RFC:
> > > > >    * Move everything to vhost-vdpa backend. A big change, this allowed
> > > > >      some cleanup but more code has been added in other places.
> > > > >    * More use of glib utilities, especially to manage memory.
> > > > > v3 link:
> > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > > > >
> > > > > Changes from v2 RFC:
> > > > >    * Adding vhost-vdpa devices support
> > > > >    * Fixed some memory leaks pointed by different comments
> > > > > v2 link:
> > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > > > >
> > > > > Changes from v1 RFC:
> > > > >    * Use QMP instead of migration to start SVQ mode.
> > > > >    * Only accepting IOMMU devices, closer behavior with target devices
> > > > >      (vDPA)
> > > > >    * Fix invalid masking/unmasking of vhost call fd.
> > > > >    * Use of proper methods for synchronization.
> > > > >    * No need to modify VirtIO device code, all of the changes are
> > > > >      contained in vhost code.
> > > > >    * Delete superfluous code.
> > > > >    * An intermediate RFC was sent with only the notifications forwarding
> > > > >      changes. It can be seen in
> > > > >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > > > > v1 link:
> > > > > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > > > >
> > > > > Eugenio Pérez (20):
> > > > >        virtio: Add VIRTIO_F_QUEUE_STATE
> > > > >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> > > > >        virtio: Add virtio_queue_is_host_notifier_enabled
> > > > >        vhost: Make vhost_virtqueue_{start,stop} public
> > > > >        vhost: Add x-vhost-enable-shadow-vq qmp
> > > > >        vhost: Add VhostShadowVirtqueue
> > > > >        vdpa: Register vdpa devices in a list
> > > > >        vhost: Route guest->host notification through shadow virtqueue
> > > > >        Add vhost_svq_get_svq_call_notifier
> > > > >        Add vhost_svq_set_guest_call_notifier
> > > > >        vdpa: Save call_fd in vhost-vdpa
> > > > >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> > > > >        vhost: Route host->guest notification through shadow virtqueue
> > > > >        virtio: Add vhost_shadow_vq_get_vring_addr
> > > > >        vdpa: Save host and guest features
> > > > >        vhost: Add vhost_svq_valid_device_features to shadow vq
> > > > >        vhost: Shadow virtqueue buffers forwarding
> > > > >        vhost: Add VhostIOVATree
> > > > >        vhost: Use a tree to store memory mappings
> > > > >        vdpa: Add custom IOTLB translations to SVQ
> > > > >
> > > > > Eugenio Pérez (15):
> > > > >    vhost: Add VhostShadowVirtqueue
> > > > >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> > > > >    vhost: Add Shadow VirtQueue call forwarding capabilities
> > > > >    vhost: Add vhost_svq_valid_features to shadow vq
> > > > >    virtio: Add vhost_svq_get_vring_addr
> > > > >    vdpa: adapt vhost_ops callbacks to svq
> > > > >    vhost: Shadow virtqueue buffers forwarding
> > > > >    util: Add iova_tree_alloc_map
> > > > >    util: add iova_tree_find_iova
> > > > >    vhost: Add VhostIOVATree
> > > > >    vdpa: Add custom IOTLB translations to SVQ
> > > > >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> > > > >    vdpa: Never set log_base addr if SVQ is enabled
> > > > >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > >
> > > > >   qapi/net.json                      |   8 +-
> > > > >   hw/virtio/vhost-iova-tree.h        |  27 ++
> > > > >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> > > > >   include/hw/virtio/vhost-vdpa.h     |   8 +
> > > > >   include/qemu/iova-tree.h           |  38 +-
> > > > >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> > > > >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> > > > >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> > > > >   net/vhost-vdpa.c                   |  48 ++-
> > > > >   util/iova-tree.c                   | 169 ++++++++
> > > > >   hw/virtio/meson.build              |   2 +-
> > > > >   11 files changed, 1633 insertions(+), 26 deletions(-)
> > > > >   create mode 100644 hw/virtio/vhost-iova-tree.h
> > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> > > > >   create mode 100644 hw/virtio/vhost-iova-tree.c
> > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > > > >
> > >
>
Re: [PATCH v5 00/15] vDPA shadow virtqueue
Posted by Michael S. Tsirkin 2 years, 1 month ago
On Tue, Mar 08, 2022 at 03:34:17PM +0800, Jason Wang wrote:
> On Tue, Mar 8, 2022 at 3:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Mar 08, 2022 at 03:14:35PM +0800, Jason Wang wrote:
> > > On Tue, Mar 8, 2022 at 3:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> > > > >
> > > > > 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > > > > > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > > > > > is intended as a new method of tracking the memory the devices touch
> > > > > > during a migration process: Instead of relay on vhost device's dirty
> > > > > > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > > > > > descriptors between VM and device. This way qemu is the effective
> > > > > > writer of guests memory, like in qemu's virtio device operation.
> > > > > >
> > > > > > When SVQ is enabled qemu offers a new virtual address space to the
> > > > > > device to read and write into, and it maps new vrings and the guest
> > > > > > memory in it. SVQ also intercepts kicks and calls between the device
> > > > > > and the guest. Used buffers relay would cause dirty memory being
> > > > > > tracked.
> > > > > >
> > > > > > This effectively means that vDPA device passthrough is intercepted by
> > > > > > qemu. While SVQ should only be enabled at migration time, the switching
> > > > > > from regular mode to SVQ mode is left for a future series.
> > > > > >
> > > > > > It is based on the ideas of DPDK SW assisted LM, in the series of
> > > > > > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > > > > > not map the shadow vq in guest's VA, but in qemu's.
> > > > > >
> > > > > > For qemu to use shadow virtqueues the guest virtio driver must not use
> > > > > > features like event_idx.
> > > > > >
> > > > > > SVQ needs to be enabled with cmdline:
> > > > > >
> > > > > > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
> > > >
> > > > A stable API for an incomplete feature is a problem imho.
> > >
> > > It should be "x-svq".
> >
> >
> > Well look at patch 15.
> 
> It's a bug that needs to be fixed.
> 
> >
> > > >
> > > >
> > > > > >
> > > > > > The first three patches enables notifications forwarding with
> > > > > > assistance of qemu. It's easy to enable only this if the relevant
> > > > > > cmdline part of the last patch is applied on top of these.
> > > > > >
> > > > > > Next four patches implement the actual buffer forwarding. However,
> > > > > > address are not translated from HVA so they will need a host device with
> > > > > > an iommu allowing them to access all of the HVA range.
> > > > > >
> > > > > > The last part of the series uses properly the host iommu, so qemu
> > > > > > creates a new iova address space in the device's range and translates
> > > > > > the buffers in it. Finally, it adds the cmdline parameter.
> > > > > >
> > > > > > Some simple performance tests with netperf were done. They used a nested
> > > > > > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > > > > > baseline average of ~9009.96Mbps:
> > > > > > Recv   Send    Send
> > > > > > Socket Socket  Message  Elapsed
> > > > > > Size   Size    Size     Time     Throughput
> > > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > > 131072  16384  16384    30.01    9061.03
> > > > > > 131072  16384  16384    30.01    8962.94
> > > > > > 131072  16384  16384    30.01    9005.92
> > > > > >
> > > > > > To enable SVQ buffers forwarding reduce throughput to about
> > > > > > Recv   Send    Send
> > > > > > Socket Socket  Message  Elapsed
> > > > > > Size   Size    Size     Time     Throughput
> > > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > > 131072  16384  16384    30.01    7689.72
> > > > > > 131072  16384  16384    30.00    7752.07
> > > > > > 131072  16384  16384    30.01    7750.30
> > > > > >
> > > > > > However, many performance improvements were left out of this series for
> > > > > > simplicity, so difference should shrink in the future.
> > > > > >
> > > > > > Comments are welcome.
> > > > >
> > > > >
> > > > > Hi Michael:
> > > > >
> > > > > What do you think of this series? It looks good to me as a start. The
> > > > > feature could only be enabled as a dedicated parameter. If you're ok, I'd
> > > > > try to make it for 7.0.
> > > > >
> > > > > Thanks
> > > >
> > > > Well that's cutting it awfully close, and it's not really useful
> > > > at the current stage, is it?
> > >
> > > This allows vDPA to be migrated when using "x-svq=on".
> > > But anyhow it's
> > > experimental.
> >
> > it's less experimental than incomplete. It seems pretty clearly not
> > the way it will work down the road, we don't want svq involved
> > at all times.
> 
> Right, but SVQ could be used for other places e.g providing migration
> compatibility when the destination lacks some features.

In its current form? I don't see how.  Generally? Maybe but I suspect
we'll have to rework it completely for that.

> >
> > > >
> > > > The IOVA trick does not feel complete either.
> > >
> > > I don't get here. We don't use any IOVA trick as DPDK (it reserve IOVA
> > > for shadow vq) did. So we won't suffer from the issues of DPDK.
> > >
> > > Thanks
> >
> > Maybe I misundrstand how this all works.
> > I refer to all the iova_tree_alloc_map things.
> 
> It's a simple IOVA allocater actually. Anything wrong with that?

Not by itself but I'm not sure we can guarantee guest will not
attempt to use the IOVA addresses we are reserving down
the road.

> I'm fine with making it for the future release.
> 
> Thanks
> 
> >
> > > >
> > > > >
> > > > > >
> > > > > > TODO on future series:
> > > > > > * Event, indirect, packed, and others features of virtio.
> > > > > > * To support different set of features between the device<->SVQ and the
> > > > > >    SVQ<->guest communication.
> > > > > > * Support of device host notifier memory regions.
> > > > > > * To sepparate buffers forwarding in its own AIO context, so we can
> > > > > >    throw more threads to that task and we don't need to stop the main
> > > > > >    event loop.
> > > > > > * Support multiqueue virtio-net vdpa.
> > > > > > * Proper documentation.
> > > > > >
> > > > > > Changes from v4:
> > > > > > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> > > > > >    overlaps.
> > > > > > * Fix: Errno completion at failure.
> > > > > > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > > > > >
> > > > > > Changes from v3:
> > > > > > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > > > > > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > > > > > v3 link:
> > > > > > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > > > > >
> > > > > > Changes from v2:
> > > > > > * Less assertions and more error handling in iova tree code.
> > > > > > * Better documentation, both fixing errors and making @param: format
> > > > > > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> > > > > >    prefix at both times.
> > > > > > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > > > > > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> > > > > >    like the kernel driver code.
> > > > > > * Small improvements.
> > > > > > v2 link:
> > > > > > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > > > > >
> > > > > > Changes from v1:
> > > > > > * Feature set at device->SVQ is now the same as SVQ->guest.
> > > > > > * Size of SVQ is not max available device size anymore, but guest's
> > > > > >    negotiated.
> > > > > > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > > > > > * Make SVQ a public struct
> > > > > > * Come back to previous approach to iova-tree
> > > > > > * Some assertions are now fail paths. Some errors are now log_guest.
> > > > > > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > > > > > * Refactor some errors and messages. Add missing error unwindings.
> > > > > > * Add memory barrier at _F_NO_NOTIFY set.
> > > > > > * Stop checking for features flags out of transport range.
> > > > > > v1 link:
> > > > > > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > > > > >
> > > > > > Changes from v4 RFC:
> > > > > > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> > > > > >    already present iova-tree for that.
> > > > > > * Proper validation of guest features. Now SVQ can negotiate a
> > > > > >    different set of features with the device when enabled.
> > > > > > * Support of host notifiers memory regions
> > > > > > * Handling of SVQ full queue in case guest's descriptors span to
> > > > > >    different memory regions (qemu's VA chunks).
> > > > > > * Flush pending used buffers at end of SVQ operation.
> > > > > > * QMP command now looks by NetClientState name. Other devices will need
> > > > > >    to implement it's way to enable vdpa.
> > > > > > * Rename QMP command to set, so it looks more like a way of working
> > > > > > * Better use of qemu error system
> > > > > > * Make a few assertions proper error-handling paths.
> > > > > > * Add more documentation
> > > > > > * Less coupling of virtio / vhost, that could cause friction on changes
> > > > > > * Addressed many other small comments and small fixes.
> > > > > >
> > > > > > Changes from v3 RFC:
> > > > > >    * Move everything to vhost-vdpa backend. A big change, this allowed
> > > > > >      some cleanup but more code has been added in other places.
> > > > > >    * More use of glib utilities, especially to manage memory.
> > > > > > v3 link:
> > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > > > > >
> > > > > > Changes from v2 RFC:
> > > > > >    * Adding vhost-vdpa devices support
> > > > > >    * Fixed some memory leaks pointed by different comments
> > > > > > v2 link:
> > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > > > > >
> > > > > > Changes from v1 RFC:
> > > > > >    * Use QMP instead of migration to start SVQ mode.
> > > > > >    * Only accepting IOMMU devices, closer behavior with target devices
> > > > > >      (vDPA)
> > > > > >    * Fix invalid masking/unmasking of vhost call fd.
> > > > > >    * Use of proper methods for synchronization.
> > > > > >    * No need to modify VirtIO device code, all of the changes are
> > > > > >      contained in vhost code.
> > > > > >    * Delete superfluous code.
> > > > > >    * An intermediate RFC was sent with only the notifications forwarding
> > > > > >      changes. It can be seen in
> > > > > >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > > > > > v1 link:
> > > > > > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > > > > >
> > > > > > Eugenio Pérez (20):
> > > > > >        virtio: Add VIRTIO_F_QUEUE_STATE
> > > > > >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> > > > > >        virtio: Add virtio_queue_is_host_notifier_enabled
> > > > > >        vhost: Make vhost_virtqueue_{start,stop} public
> > > > > >        vhost: Add x-vhost-enable-shadow-vq qmp
> > > > > >        vhost: Add VhostShadowVirtqueue
> > > > > >        vdpa: Register vdpa devices in a list
> > > > > >        vhost: Route guest->host notification through shadow virtqueue
> > > > > >        Add vhost_svq_get_svq_call_notifier
> > > > > >        Add vhost_svq_set_guest_call_notifier
> > > > > >        vdpa: Save call_fd in vhost-vdpa
> > > > > >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> > > > > >        vhost: Route host->guest notification through shadow virtqueue
> > > > > >        virtio: Add vhost_shadow_vq_get_vring_addr
> > > > > >        vdpa: Save host and guest features
> > > > > >        vhost: Add vhost_svq_valid_device_features to shadow vq
> > > > > >        vhost: Shadow virtqueue buffers forwarding
> > > > > >        vhost: Add VhostIOVATree
> > > > > >        vhost: Use a tree to store memory mappings
> > > > > >        vdpa: Add custom IOTLB translations to SVQ
> > > > > >
> > > > > > Eugenio Pérez (15):
> > > > > >    vhost: Add VhostShadowVirtqueue
> > > > > >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> > > > > >    vhost: Add Shadow VirtQueue call forwarding capabilities
> > > > > >    vhost: Add vhost_svq_valid_features to shadow vq
> > > > > >    virtio: Add vhost_svq_get_vring_addr
> > > > > >    vdpa: adapt vhost_ops callbacks to svq
> > > > > >    vhost: Shadow virtqueue buffers forwarding
> > > > > >    util: Add iova_tree_alloc_map
> > > > > >    util: add iova_tree_find_iova
> > > > > >    vhost: Add VhostIOVATree
> > > > > >    vdpa: Add custom IOTLB translations to SVQ
> > > > > >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> > > > > >    vdpa: Never set log_base addr if SVQ is enabled
> > > > > >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > >
> > > > > >   qapi/net.json                      |   8 +-
> > > > > >   hw/virtio/vhost-iova-tree.h        |  27 ++
> > > > > >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> > > > > >   include/hw/virtio/vhost-vdpa.h     |   8 +
> > > > > >   include/qemu/iova-tree.h           |  38 +-
> > > > > >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> > > > > >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> > > > > >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> > > > > >   net/vhost-vdpa.c                   |  48 ++-
> > > > > >   util/iova-tree.c                   | 169 ++++++++
> > > > > >   hw/virtio/meson.build              |   2 +-
> > > > > >   11 files changed, 1633 insertions(+), 26 deletions(-)
> > > > > >   create mode 100644 hw/virtio/vhost-iova-tree.h
> > > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> > > > > >   create mode 100644 hw/virtio/vhost-iova-tree.c
> > > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > > > > >
> > > >
> >


Re: [PATCH v5 00/15] vDPA shadow virtqueue
Posted by Eugenio Perez Martin 2 years, 1 month ago
On Tue, Mar 8, 2022 at 8:55 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Mar 08, 2022 at 03:34:17PM +0800, Jason Wang wrote:
> > On Tue, Mar 8, 2022 at 3:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Mar 08, 2022 at 03:14:35PM +0800, Jason Wang wrote:
> > > > On Tue, Mar 8, 2022 at 3:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> > > > > >
> > > > > > 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > > > > > > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > > > > > > is intended as a new method of tracking the memory the devices touch
> > > > > > > during a migration process: Instead of relay on vhost device's dirty
> > > > > > > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > > > > > > descriptors between VM and device. This way qemu is the effective
> > > > > > > writer of guests memory, like in qemu's virtio device operation.
> > > > > > >
> > > > > > > When SVQ is enabled qemu offers a new virtual address space to the
> > > > > > > device to read and write into, and it maps new vrings and the guest
> > > > > > > memory in it. SVQ also intercepts kicks and calls between the device
> > > > > > > and the guest. Used buffers relay would cause dirty memory being
> > > > > > > tracked.
> > > > > > >
> > > > > > > This effectively means that vDPA device passthrough is intercepted by
> > > > > > > qemu. While SVQ should only be enabled at migration time, the switching
> > > > > > > from regular mode to SVQ mode is left for a future series.
> > > > > > >
> > > > > > > It is based on the ideas of DPDK SW assisted LM, in the series of
> > > > > > > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > > > > > > not map the shadow vq in guest's VA, but in qemu's.
> > > > > > >
> > > > > > > For qemu to use shadow virtqueues the guest virtio driver must not use
> > > > > > > features like event_idx.
> > > > > > >
> > > > > > > SVQ needs to be enabled with cmdline:
> > > > > > >
> > > > > > > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
> > > > >
> > > > > A stable API for an incomplete feature is a problem imho.
> > > >
> > > > It should be "x-svq".
> > >
> > >
> > > Well look at patch 15.
> >
> > It's a bug that needs to be fixed.
> >
> > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > The first three patches enables notifications forwarding with
> > > > > > > assistance of qemu. It's easy to enable only this if the relevant
> > > > > > > cmdline part of the last patch is applied on top of these.
> > > > > > >
> > > > > > > Next four patches implement the actual buffer forwarding. However,
> > > > > > > address are not translated from HVA so they will need a host device with
> > > > > > > an iommu allowing them to access all of the HVA range.
> > > > > > >
> > > > > > > The last part of the series uses properly the host iommu, so qemu
> > > > > > > creates a new iova address space in the device's range and translates
> > > > > > > the buffers in it. Finally, it adds the cmdline parameter.
> > > > > > >
> > > > > > > Some simple performance tests with netperf were done. They used a nested
> > > > > > > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > > > > > > baseline average of ~9009.96Mbps:
> > > > > > > Recv   Send    Send
> > > > > > > Socket Socket  Message  Elapsed
> > > > > > > Size   Size    Size     Time     Throughput
> > > > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > > > 131072  16384  16384    30.01    9061.03
> > > > > > > 131072  16384  16384    30.01    8962.94
> > > > > > > 131072  16384  16384    30.01    9005.92
> > > > > > >
> > > > > > > To enable SVQ buffers forwarding reduce throughput to about
> > > > > > > Recv   Send    Send
> > > > > > > Socket Socket  Message  Elapsed
> > > > > > > Size   Size    Size     Time     Throughput
> > > > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > > > 131072  16384  16384    30.01    7689.72
> > > > > > > 131072  16384  16384    30.00    7752.07
> > > > > > > 131072  16384  16384    30.01    7750.30
> > > > > > >
> > > > > > > However, many performance improvements were left out of this series for
> > > > > > > simplicity, so difference should shrink in the future.
> > > > > > >
> > > > > > > Comments are welcome.
> > > > > >
> > > > > >
> > > > > > Hi Michael:
> > > > > >
> > > > > > What do you think of this series? It looks good to me as a start. The
> > > > > > feature could only be enabled as a dedicated parameter. If you're ok, I'd
> > > > > > try to make it for 7.0.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Well that's cutting it awfully close, and it's not really useful
> > > > > at the current stage, is it?
> > > >
> > > > This allows vDPA to be migrated when using "x-svq=on".
> > > > But anyhow it's
> > > > experimental.
> > >
> > > it's less experimental than incomplete. It seems pretty clearly not
> > > the way it will work down the road, we don't want svq involved
> > > at all times.
> >
> > Right, but SVQ could be used for other places e.g providing migration
> > compatibility when the destination lacks some features.
>
> In its current form? I don't see how.  Generally? Maybe but I suspect
> we'll have to rework it completely for that.
>

RFCs of the series already do that: guest to SVQ communication
supported indirect descriptors and packed virtqueue but SVQ to device
did not. Even SVQ vring size could be different from guest's vring
size. It's not a big diff actually, I can send it as a RFC on top of
this to show it.

But that part was left out for simplicity and added as a TODO, so it
just negotiates one set of features instead of two.

> > >
> > > > >
> > > > > The IOVA trick does not feel complete either.
> > > >
> > > > I don't get here. We don't use any IOVA trick as DPDK (it reserve IOVA
> > > > for shadow vq) did. So we won't suffer from the issues of DPDK.
> > > >
> > > > Thanks
> > >
> > > Maybe I misundrstand how this all works.
> > > I refer to all the iova_tree_alloc_map things.
> >
> > It's a simple IOVA allocater actually. Anything wrong with that?
>
> Not by itself but I'm not sure we can guarantee guest will not
> attempt to use the IOVA addresses we are reserving down
> the road.
>

The SVQ vring (the one that device's see) does not use GPA addresses
anymore, but this new iova space. If the guest tries to expose a
writable buffer with the address of SVQ vring, VirtQueue would refuse
to translate it, generating an error. The same way it would generate
an error with emulated devices.

If we hot-plug more physical memory to the guest, this new range is
added as a totally new iova entry, which cannot collide with previous
entries, both GPA and SVQ. Same thing happens if the vhost device is
added or removed, or if the guest reset or start (set DRIVER_OK) the
device.

Thanks!

> > I'm fine with making it for the future release.
> >
> > Thanks
> >
> > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > TODO on future series:
> > > > > > > * Event, indirect, packed, and others features of virtio.
> > > > > > > * To support different set of features between the device<->SVQ and the
> > > > > > >    SVQ<->guest communication.
> > > > > > > * Support of device host notifier memory regions.
> > > > > > > * To sepparate buffers forwarding in its own AIO context, so we can
> > > > > > >    throw more threads to that task and we don't need to stop the main
> > > > > > >    event loop.
> > > > > > > * Support multiqueue virtio-net vdpa.
> > > > > > > * Proper documentation.
> > > > > > >
> > > > > > > Changes from v4:
> > > > > > > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> > > > > > >    overlaps.
> > > > > > > * Fix: Errno completion at failure.
> > > > > > > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > > > > > >
> > > > > > > Changes from v3:
> > > > > > > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > > > > > > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > > > > > > v3 link:
> > > > > > > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > > > > > >
> > > > > > > Changes from v2:
> > > > > > > * Less assertions and more error handling in iova tree code.
> > > > > > > * Better documentation, both fixing errors and making @param: format
> > > > > > > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> > > > > > >    prefix at both times.
> > > > > > > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > > > > > > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> > > > > > >    like the kernel driver code.
> > > > > > > * Small improvements.
> > > > > > > v2 link:
> > > > > > > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > > > > > >
> > > > > > > Changes from v1:
> > > > > > > * Feature set at device->SVQ is now the same as SVQ->guest.
> > > > > > > * Size of SVQ is not max available device size anymore, but guest's
> > > > > > >    negotiated.
> > > > > > > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > > > > > > * Make SVQ a public struct
> > > > > > > * Come back to previous approach to iova-tree
> > > > > > > * Some assertions are now fail paths. Some errors are now log_guest.
> > > > > > > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > > > > > > * Refactor some errors and messages. Add missing error unwindings.
> > > > > > > * Add memory barrier at _F_NO_NOTIFY set.
> > > > > > > * Stop checking for features flags out of transport range.
> > > > > > > v1 link:
> > > > > > > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > > > > > >
> > > > > > > Changes from v4 RFC:
> > > > > > > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> > > > > > >    already present iova-tree for that.
> > > > > > > * Proper validation of guest features. Now SVQ can negotiate a
> > > > > > >    different set of features with the device when enabled.
> > > > > > > * Support of host notifiers memory regions
> > > > > > > * Handling of SVQ full queue in case guest's descriptors span to
> > > > > > >    different memory regions (qemu's VA chunks).
> > > > > > > * Flush pending used buffers at end of SVQ operation.
> > > > > > > * QMP command now looks by NetClientState name. Other devices will need
> > > > > > >    to implement it's way to enable vdpa.
> > > > > > > * Rename QMP command to set, so it looks more like a way of working
> > > > > > > * Better use of qemu error system
> > > > > > > * Make a few assertions proper error-handling paths.
> > > > > > > * Add more documentation
> > > > > > > * Less coupling of virtio / vhost, that could cause friction on changes
> > > > > > > * Addressed many other small comments and small fixes.
> > > > > > >
> > > > > > > Changes from v3 RFC:
> > > > > > >    * Move everything to vhost-vdpa backend. A big change, this allowed
> > > > > > >      some cleanup but more code has been added in other places.
> > > > > > >    * More use of glib utilities, especially to manage memory.
> > > > > > > v3 link:
> > > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > > > > > >
> > > > > > > Changes from v2 RFC:
> > > > > > >    * Adding vhost-vdpa devices support
> > > > > > >    * Fixed some memory leaks pointed by different comments
> > > > > > > v2 link:
> > > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > > > > > >
> > > > > > > Changes from v1 RFC:
> > > > > > >    * Use QMP instead of migration to start SVQ mode.
> > > > > > >    * Only accepting IOMMU devices, closer behavior with target devices
> > > > > > >      (vDPA)
> > > > > > >    * Fix invalid masking/unmasking of vhost call fd.
> > > > > > >    * Use of proper methods for synchronization.
> > > > > > >    * No need to modify VirtIO device code, all of the changes are
> > > > > > >      contained in vhost code.
> > > > > > >    * Delete superfluous code.
> > > > > > >    * An intermediate RFC was sent with only the notifications forwarding
> > > > > > >      changes. It can be seen in
> > > > > > >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > > > > > > v1 link:
> > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > > > > > >
> > > > > > > Eugenio Pérez (20):
> > > > > > >        virtio: Add VIRTIO_F_QUEUE_STATE
> > > > > > >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> > > > > > >        virtio: Add virtio_queue_is_host_notifier_enabled
> > > > > > >        vhost: Make vhost_virtqueue_{start,stop} public
> > > > > > >        vhost: Add x-vhost-enable-shadow-vq qmp
> > > > > > >        vhost: Add VhostShadowVirtqueue
> > > > > > >        vdpa: Register vdpa devices in a list
> > > > > > >        vhost: Route guest->host notification through shadow virtqueue
> > > > > > >        Add vhost_svq_get_svq_call_notifier
> > > > > > >        Add vhost_svq_set_guest_call_notifier
> > > > > > >        vdpa: Save call_fd in vhost-vdpa
> > > > > > >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> > > > > > >        vhost: Route host->guest notification through shadow virtqueue
> > > > > > >        virtio: Add vhost_shadow_vq_get_vring_addr
> > > > > > >        vdpa: Save host and guest features
> > > > > > >        vhost: Add vhost_svq_valid_device_features to shadow vq
> > > > > > >        vhost: Shadow virtqueue buffers forwarding
> > > > > > >        vhost: Add VhostIOVATree
> > > > > > >        vhost: Use a tree to store memory mappings
> > > > > > >        vdpa: Add custom IOTLB translations to SVQ
> > > > > > >
> > > > > > > Eugenio Pérez (15):
> > > > > > >    vhost: Add VhostShadowVirtqueue
> > > > > > >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> > > > > > >    vhost: Add Shadow VirtQueue call forwarding capabilities
> > > > > > >    vhost: Add vhost_svq_valid_features to shadow vq
> > > > > > >    virtio: Add vhost_svq_get_vring_addr
> > > > > > >    vdpa: adapt vhost_ops callbacks to svq
> > > > > > >    vhost: Shadow virtqueue buffers forwarding
> > > > > > >    util: Add iova_tree_alloc_map
> > > > > > >    util: add iova_tree_find_iova
> > > > > > >    vhost: Add VhostIOVATree
> > > > > > >    vdpa: Add custom IOTLB translations to SVQ
> > > > > > >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> > > > > > >    vdpa: Never set log_base addr if SVQ is enabled
> > > > > > >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> > > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > > >
> > > > > > >   qapi/net.json                      |   8 +-
> > > > > > >   hw/virtio/vhost-iova-tree.h        |  27 ++
> > > > > > >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> > > > > > >   include/hw/virtio/vhost-vdpa.h     |   8 +
> > > > > > >   include/qemu/iova-tree.h           |  38 +-
> > > > > > >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> > > > > > >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> > > > > > >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> > > > > > >   net/vhost-vdpa.c                   |  48 ++-
> > > > > > >   util/iova-tree.c                   | 169 ++++++++
> > > > > > >   hw/virtio/meson.build              |   2 +-
> > > > > > >   11 files changed, 1633 insertions(+), 26 deletions(-)
> > > > > > >   create mode 100644 hw/virtio/vhost-iova-tree.h
> > > > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> > > > > > >   create mode 100644 hw/virtio/vhost-iova-tree.c
> > > > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > > > > > >
> > > > >
> > >
>
Re: [PATCH v5 00/15] vDPA shadow virtqueue
Posted by Michael S. Tsirkin 2 years, 1 month ago
On Tue, Mar 08, 2022 at 09:15:37AM +0100, Eugenio Perez Martin wrote:
> On Tue, Mar 8, 2022 at 8:55 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Mar 08, 2022 at 03:34:17PM +0800, Jason Wang wrote:
> > > On Tue, Mar 8, 2022 at 3:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Tue, Mar 08, 2022 at 03:14:35PM +0800, Jason Wang wrote:
> > > > > On Tue, Mar 8, 2022 at 3:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> > > > > > >
> > > > > > > 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > > > > > > > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > > > > > > > is intended as a new method of tracking the memory the devices touch
> > > > > > > > during a migration process: Instead of relay on vhost device's dirty
> > > > > > > > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > > > > > > > descriptors between VM and device. This way qemu is the effective
> > > > > > > > writer of guests memory, like in qemu's virtio device operation.
> > > > > > > >
> > > > > > > > When SVQ is enabled qemu offers a new virtual address space to the
> > > > > > > > device to read and write into, and it maps new vrings and the guest
> > > > > > > > memory in it. SVQ also intercepts kicks and calls between the device
> > > > > > > > and the guest. Used buffers relay would cause dirty memory being
> > > > > > > > tracked.
> > > > > > > >
> > > > > > > > This effectively means that vDPA device passthrough is intercepted by
> > > > > > > > qemu. While SVQ should only be enabled at migration time, the switching
> > > > > > > > from regular mode to SVQ mode is left for a future series.
> > > > > > > >
> > > > > > > > It is based on the ideas of DPDK SW assisted LM, in the series of
> > > > > > > > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > > > > > > > not map the shadow vq in guest's VA, but in qemu's.
> > > > > > > >
> > > > > > > > For qemu to use shadow virtqueues the guest virtio driver must not use
> > > > > > > > features like event_idx.
> > > > > > > >
> > > > > > > > SVQ needs to be enabled with cmdline:
> > > > > > > >
> > > > > > > > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
> > > > > >
> > > > > > A stable API for an incomplete feature is a problem imho.
> > > > >
> > > > > It should be "x-svq".
> > > >
> > > >
> > > > Well look at patch 15.
> > >
> > > It's a bug that needs to be fixed.
> > >
> > > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > The first three patches enables notifications forwarding with
> > > > > > > > assistance of qemu. It's easy to enable only this if the relevant
> > > > > > > > cmdline part of the last patch is applied on top of these.
> > > > > > > >
> > > > > > > > Next four patches implement the actual buffer forwarding. However,
> > > > > > > > address are not translated from HVA so they will need a host device with
> > > > > > > > an iommu allowing them to access all of the HVA range.
> > > > > > > >
> > > > > > > > The last part of the series uses properly the host iommu, so qemu
> > > > > > > > creates a new iova address space in the device's range and translates
> > > > > > > > the buffers in it. Finally, it adds the cmdline parameter.
> > > > > > > >
> > > > > > > > Some simple performance tests with netperf were done. They used a nested
> > > > > > > > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > > > > > > > baseline average of ~9009.96Mbps:
> > > > > > > > Recv   Send    Send
> > > > > > > > Socket Socket  Message  Elapsed
> > > > > > > > Size   Size    Size     Time     Throughput
> > > > > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > > > > 131072  16384  16384    30.01    9061.03
> > > > > > > > 131072  16384  16384    30.01    8962.94
> > > > > > > > 131072  16384  16384    30.01    9005.92
> > > > > > > >
> > > > > > > > To enable SVQ buffers forwarding reduce throughput to about
> > > > > > > > Recv   Send    Send
> > > > > > > > Socket Socket  Message  Elapsed
> > > > > > > > Size   Size    Size     Time     Throughput
> > > > > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > > > > 131072  16384  16384    30.01    7689.72
> > > > > > > > 131072  16384  16384    30.00    7752.07
> > > > > > > > 131072  16384  16384    30.01    7750.30
> > > > > > > >
> > > > > > > > However, many performance improvements were left out of this series for
> > > > > > > > simplicity, so difference should shrink in the future.
> > > > > > > >
> > > > > > > > Comments are welcome.
> > > > > > >
> > > > > > >
> > > > > > > Hi Michael:
> > > > > > >
> > > > > > > What do you think of this series? It looks good to me as a start. The
> > > > > > > feature could only be enabled as a dedicated parameter. If you're ok, I'd
> > > > > > > try to make it for 7.0.
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > Well that's cutting it awfully close, and it's not really useful
> > > > > > at the current stage, is it?
> > > > >
> > > > > This allows vDPA to be migrated when using "x-svq=on".
> > > > > But anyhow it's
> > > > > experimental.
> > > >
> > > > it's less experimental than incomplete. It seems pretty clearly not
> > > > the way it will work down the road, we don't want svq involved
> > > > at all times.
> > >
> > > Right, but SVQ could be used for other places e.g providing migration
> > > compatibility when the destination lacks some features.
> >
> > In its current form? I don't see how.  Generally? Maybe but I suspect
> > we'll have to rework it completely for that.
> >
> 
> RFCs of the series already do that: guest to SVQ communication
> supported indirect descriptors and packed virtqueue but SVQ to device
> did not. Even SVQ vring size could be different from guest's vring
> size. It's not a big diff actually, I can send it as a RFC on top of
> this to show it.
> 
> But that part was left out for simplicity and added as a TODO, so it
> just negotiates one set of features instead of two.
> 
> > > >
> > > > > >
> > > > > > The IOVA trick does not feel complete either.
> > > > >
> > > > > I don't get here. We don't use any IOVA trick as DPDK (it reserve IOVA
> > > > > for shadow vq) did. So we won't suffer from the issues of DPDK.
> > > > >
> > > > > Thanks
> > > >
> > > > Maybe I misundrstand how this all works.
> > > > I refer to all the iova_tree_alloc_map things.
> > >
> > > It's a simple IOVA allocater actually. Anything wrong with that?
> >
> > Not by itself but I'm not sure we can guarantee guest will not
> > attempt to use the IOVA addresses we are reserving down
> > the road.
> >
> 
> The SVQ vring (the one that device's see) does not use GPA addresses
> anymore, but this new iova space. If the guest tries to expose a
> writable buffer with the address of SVQ vring, VirtQueue would refuse
> to translate it, generating an error. The same way it would generate
> an error with emulated devices.

Right. But guests are not really set up to handle such
errors except by failing the transaction, are they?



> If we hot-plug more physical memory to the guest, this new range is
> added as a totally new iova entry, which cannot collide with previous
> entries, both GPA and SVQ. Same thing happens if the vhost device is
> added or removed, or if the guest reset or start (set DRIVER_OK) the
> device.
> 
> Thanks!
> 
> > > I'm fine with making it for the future release.
> > >
> > > Thanks
> > >
> > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > TODO on future series:
> > > > > > > > * Event, indirect, packed, and others features of virtio.
> > > > > > > > * To support different set of features between the device<->SVQ and the
> > > > > > > >    SVQ<->guest communication.
> > > > > > > > * Support of device host notifier memory regions.
> > > > > > > > * To sepparate buffers forwarding in its own AIO context, so we can
> > > > > > > >    throw more threads to that task and we don't need to stop the main
> > > > > > > >    event loop.
> > > > > > > > * Support multiqueue virtio-net vdpa.
> > > > > > > > * Proper documentation.
> > > > > > > >
> > > > > > > > Changes from v4:
> > > > > > > > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> > > > > > > >    overlaps.
> > > > > > > > * Fix: Errno completion at failure.
> > > > > > > > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > > > > > > >
> > > > > > > > Changes from v3:
> > > > > > > > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > > > > > > > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > > > > > > > v3 link:
> > > > > > > > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > > > > > > >
> > > > > > > > Changes from v2:
> > > > > > > > * Less assertions and more error handling in iova tree code.
> > > > > > > > * Better documentation, both fixing errors and making @param: format
> > > > > > > > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> > > > > > > >    prefix at both times.
> > > > > > > > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > > > > > > > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> > > > > > > >    like the kernel driver code.
> > > > > > > > * Small improvements.
> > > > > > > > v2 link:
> > > > > > > > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > > > > > > >
> > > > > > > > Changes from v1:
> > > > > > > > * Feature set at device->SVQ is now the same as SVQ->guest.
> > > > > > > > * Size of SVQ is not max available device size anymore, but guest's
> > > > > > > >    negotiated.
> > > > > > > > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > > > > > > > * Make SVQ a public struct
> > > > > > > > * Come back to previous approach to iova-tree
> > > > > > > > * Some assertions are now fail paths. Some errors are now log_guest.
> > > > > > > > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > > > > > > > * Refactor some errors and messages. Add missing error unwindings.
> > > > > > > > * Add memory barrier at _F_NO_NOTIFY set.
> > > > > > > > * Stop checking for features flags out of transport range.
> > > > > > > > v1 link:
> > > > > > > > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > > > > > > >
> > > > > > > > Changes from v4 RFC:
> > > > > > > > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> > > > > > > >    already present iova-tree for that.
> > > > > > > > * Proper validation of guest features. Now SVQ can negotiate a
> > > > > > > >    different set of features with the device when enabled.
> > > > > > > > * Support of host notifiers memory regions
> > > > > > > > * Handling of SVQ full queue in case guest's descriptors span to
> > > > > > > >    different memory regions (qemu's VA chunks).
> > > > > > > > * Flush pending used buffers at end of SVQ operation.
> > > > > > > > * QMP command now looks by NetClientState name. Other devices will need
> > > > > > > >    to implement it's way to enable vdpa.
> > > > > > > > * Rename QMP command to set, so it looks more like a way of working
> > > > > > > > * Better use of qemu error system
> > > > > > > > * Make a few assertions proper error-handling paths.
> > > > > > > > * Add more documentation
> > > > > > > > * Less coupling of virtio / vhost, that could cause friction on changes
> > > > > > > > * Addressed many other small comments and small fixes.
> > > > > > > >
> > > > > > > > Changes from v3 RFC:
> > > > > > > >    * Move everything to vhost-vdpa backend. A big change, this allowed
> > > > > > > >      some cleanup but more code has been added in other places.
> > > > > > > >    * More use of glib utilities, especially to manage memory.
> > > > > > > > v3 link:
> > > > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > > > > > > >
> > > > > > > > Changes from v2 RFC:
> > > > > > > >    * Adding vhost-vdpa devices support
> > > > > > > >    * Fixed some memory leaks pointed by different comments
> > > > > > > > v2 link:
> > > > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > > > > > > >
> > > > > > > > Changes from v1 RFC:
> > > > > > > >    * Use QMP instead of migration to start SVQ mode.
> > > > > > > >    * Only accepting IOMMU devices, closer behavior with target devices
> > > > > > > >      (vDPA)
> > > > > > > >    * Fix invalid masking/unmasking of vhost call fd.
> > > > > > > >    * Use of proper methods for synchronization.
> > > > > > > >    * No need to modify VirtIO device code, all of the changes are
> > > > > > > >      contained in vhost code.
> > > > > > > >    * Delete superfluous code.
> > > > > > > >    * An intermediate RFC was sent with only the notifications forwarding
> > > > > > > >      changes. It can be seen in
> > > > > > > >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > > > > > > > v1 link:
> > > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > > > > > > >
> > > > > > > > Eugenio Pérez (20):
> > > > > > > >        virtio: Add VIRTIO_F_QUEUE_STATE
> > > > > > > >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> > > > > > > >        virtio: Add virtio_queue_is_host_notifier_enabled
> > > > > > > >        vhost: Make vhost_virtqueue_{start,stop} public
> > > > > > > >        vhost: Add x-vhost-enable-shadow-vq qmp
> > > > > > > >        vhost: Add VhostShadowVirtqueue
> > > > > > > >        vdpa: Register vdpa devices in a list
> > > > > > > >        vhost: Route guest->host notification through shadow virtqueue
> > > > > > > >        Add vhost_svq_get_svq_call_notifier
> > > > > > > >        Add vhost_svq_set_guest_call_notifier
> > > > > > > >        vdpa: Save call_fd in vhost-vdpa
> > > > > > > >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> > > > > > > >        vhost: Route host->guest notification through shadow virtqueue
> > > > > > > >        virtio: Add vhost_shadow_vq_get_vring_addr
> > > > > > > >        vdpa: Save host and guest features
> > > > > > > >        vhost: Add vhost_svq_valid_device_features to shadow vq
> > > > > > > >        vhost: Shadow virtqueue buffers forwarding
> > > > > > > >        vhost: Add VhostIOVATree
> > > > > > > >        vhost: Use a tree to store memory mappings
> > > > > > > >        vdpa: Add custom IOTLB translations to SVQ
> > > > > > > >
> > > > > > > > Eugenio Pérez (15):
> > > > > > > >    vhost: Add VhostShadowVirtqueue
> > > > > > > >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> > > > > > > >    vhost: Add Shadow VirtQueue call forwarding capabilities
> > > > > > > >    vhost: Add vhost_svq_valid_features to shadow vq
> > > > > > > >    virtio: Add vhost_svq_get_vring_addr
> > > > > > > >    vdpa: adapt vhost_ops callbacks to svq
> > > > > > > >    vhost: Shadow virtqueue buffers forwarding
> > > > > > > >    util: Add iova_tree_alloc_map
> > > > > > > >    util: add iova_tree_find_iova
> > > > > > > >    vhost: Add VhostIOVATree
> > > > > > > >    vdpa: Add custom IOTLB translations to SVQ
> > > > > > > >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> > > > > > > >    vdpa: Never set log_base addr if SVQ is enabled
> > > > > > > >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> > > > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > > > >
> > > > > > > >   qapi/net.json                      |   8 +-
> > > > > > > >   hw/virtio/vhost-iova-tree.h        |  27 ++
> > > > > > > >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> > > > > > > >   include/hw/virtio/vhost-vdpa.h     |   8 +
> > > > > > > >   include/qemu/iova-tree.h           |  38 +-
> > > > > > > >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> > > > > > > >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> > > > > > > >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> > > > > > > >   net/vhost-vdpa.c                   |  48 ++-
> > > > > > > >   util/iova-tree.c                   | 169 ++++++++
> > > > > > > >   hw/virtio/meson.build              |   2 +-
> > > > > > > >   11 files changed, 1633 insertions(+), 26 deletions(-)
> > > > > > > >   create mode 100644 hw/virtio/vhost-iova-tree.h
> > > > > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> > > > > > > >   create mode 100644 hw/virtio/vhost-iova-tree.c
> > > > > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > > > > > > >
> > > > > >
> > > >
> >


Re: [PATCH v5 00/15] vDPA shadow virtqueue
Posted by Jason Wang 2 years, 1 month ago
On Tue, Mar 8, 2022 at 3:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Mar 08, 2022 at 03:34:17PM +0800, Jason Wang wrote:
> > On Tue, Mar 8, 2022 at 3:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Mar 08, 2022 at 03:14:35PM +0800, Jason Wang wrote:
> > > > On Tue, Mar 8, 2022 at 3:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang wrote:
> > > > > >
> > > > > > 在 2022/3/7 下午11:33, Eugenio Pérez 写道:
> > > > > > > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > > > > > > is intended as a new method of tracking the memory the devices touch
> > > > > > > during a migration process: Instead of relay on vhost device's dirty
> > > > > > > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > > > > > > descriptors between VM and device. This way qemu is the effective
> > > > > > > writer of guests memory, like in qemu's virtio device operation.
> > > > > > >
> > > > > > > When SVQ is enabled qemu offers a new virtual address space to the
> > > > > > > device to read and write into, and it maps new vrings and the guest
> > > > > > > memory in it. SVQ also intercepts kicks and calls between the device
> > > > > > > and the guest. Used buffers relay would cause dirty memory being
> > > > > > > tracked.
> > > > > > >
> > > > > > > This effectively means that vDPA device passthrough is intercepted by
> > > > > > > qemu. While SVQ should only be enabled at migration time, the switching
> > > > > > > from regular mode to SVQ mode is left for a future series.
> > > > > > >
> > > > > > > It is based on the ideas of DPDK SW assisted LM, in the series of
> > > > > > > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > > > > > > not map the shadow vq in guest's VA, but in qemu's.
> > > > > > >
> > > > > > > For qemu to use shadow virtqueues the guest virtio driver must not use
> > > > > > > features like event_idx.
> > > > > > >
> > > > > > > SVQ needs to be enabled with cmdline:
> > > > > > >
> > > > > > > -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
> > > > >
> > > > > A stable API for an incomplete feature is a problem imho.
> > > >
> > > > It should be "x-svq".
> > >
> > >
> > > Well look at patch 15.
> >
> > It's a bug that needs to be fixed.
> >
> > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > The first three patches enables notifications forwarding with
> > > > > > > assistance of qemu. It's easy to enable only this if the relevant
> > > > > > > cmdline part of the last patch is applied on top of these.
> > > > > > >
> > > > > > > Next four patches implement the actual buffer forwarding. However,
> > > > > > > address are not translated from HVA so they will need a host device with
> > > > > > > an iommu allowing them to access all of the HVA range.
> > > > > > >
> > > > > > > The last part of the series uses properly the host iommu, so qemu
> > > > > > > creates a new iova address space in the device's range and translates
> > > > > > > the buffers in it. Finally, it adds the cmdline parameter.
> > > > > > >
> > > > > > > Some simple performance tests with netperf were done. They used a nested
> > > > > > > guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> > > > > > > baseline average of ~9009.96Mbps:
> > > > > > > Recv   Send    Send
> > > > > > > Socket Socket  Message  Elapsed
> > > > > > > Size   Size    Size     Time     Throughput
> > > > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > > > 131072  16384  16384    30.01    9061.03
> > > > > > > 131072  16384  16384    30.01    8962.94
> > > > > > > 131072  16384  16384    30.01    9005.92
> > > > > > >
> > > > > > > To enable SVQ buffers forwarding reduce throughput to about
> > > > > > > Recv   Send    Send
> > > > > > > Socket Socket  Message  Elapsed
> > > > > > > Size   Size    Size     Time     Throughput
> > > > > > > bytes  bytes   bytes    secs.    10^6bits/sec
> > > > > > > 131072  16384  16384    30.01    7689.72
> > > > > > > 131072  16384  16384    30.00    7752.07
> > > > > > > 131072  16384  16384    30.01    7750.30
> > > > > > >
> > > > > > > However, many performance improvements were left out of this series for
> > > > > > > simplicity, so difference should shrink in the future.
> > > > > > >
> > > > > > > Comments are welcome.
> > > > > >
> > > > > >
> > > > > > Hi Michael:
> > > > > >
> > > > > > What do you think of this series? It looks good to me as a start. The
> > > > > > feature could only be enabled as a dedicated parameter. If you're ok, I'd
> > > > > > try to make it for 7.0.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Well that's cutting it awfully close, and it's not really useful
> > > > > at the current stage, is it?
> > > >
> > > > This allows vDPA to be migrated when using "x-svq=on".
> > > > But anyhow it's
> > > > experimental.
> > >
> > > it's less experimental than incomplete. It seems pretty clearly not
> > > the way it will work down the road, we don't want svq involved
> > > at all times.
> >
> > Right, but SVQ could be used for other places e.g providing migration
> > compatibility when the destination lacks some features.
>
> In its current form? I don't see how.  Generally?

Generally, yes.

> Maybe but I suspect
> we'll have to rework it completely for that.

Probably not, from what I see, it just needs some extension of the current code.

>
> > >
> > > > >
> > > > > The IOVA trick does not feel complete either.
> > > >
> > > > I don't get here. We don't use any IOVA trick as DPDK (it reserve IOVA
> > > > for shadow vq) did. So we won't suffer from the issues of DPDK.
> > > >
> > > > Thanks
> > >
> > > Maybe I misundrstand how this all works.
> > > I refer to all the iova_tree_alloc_map things.
> >
> > It's a simple IOVA allocater actually. Anything wrong with that?
>
> Not by itself but I'm not sure we can guarantee guest will not
> attempt to use the IOVA addresses we are reserving down
> the road.

The IOVA is allocated via the listeners and stored in the iova tree
per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
virtio core see GPA to HVA mapping. And we do a reverse lookup to find
the HVA->IOVA we allocated previously.  So we have double check here:

1) Qemu memory core to make sure the GPA that guest uses is valid
2) the IOVA tree that guarantees there will be no HVA beyond what
guest can see is used

So technically, there's no way for the guest to use the IOVA address
allocated for the shadow virtqueue.

Thanks

>
> > I'm fine with making it for the future release.
> >
> > Thanks
> >
> > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > TODO on future series:
> > > > > > > * Event, indirect, packed, and others features of virtio.
> > > > > > > * To support different set of features between the device<->SVQ and the
> > > > > > >    SVQ<->guest communication.
> > > > > > > * Support of device host notifier memory regions.
> > > > > > > * To sepparate buffers forwarding in its own AIO context, so we can
> > > > > > >    throw more threads to that task and we don't need to stop the main
> > > > > > >    event loop.
> > > > > > > * Support multiqueue virtio-net vdpa.
> > > > > > > * Proper documentation.
> > > > > > >
> > > > > > > Changes from v4:
> > > > > > > * Iterate iova->hva tree instead on maintain own tree so we support HVA
> > > > > > >    overlaps.
> > > > > > > * Fix: Errno completion at failure.
> > > > > > > * Rename x-svq to svq, so changes to stable does not affect cmdline parameter.
> > > > > > >
> > > > > > > Changes from v3:
> > > > > > > * Add @unstable feature to NetdevVhostVDPAOptions.x-svq.
> > > > > > > * Fix uncomplete mapping (by 1 byte) of memory regions if svq is enabled.
> > > > > > > v3 link:
> > > > > > > https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma@redhat.com/
> > > > > > >
> > > > > > > Changes from v2:
> > > > > > > * Less assertions and more error handling in iova tree code.
> > > > > > > * Better documentation, both fixing errors and making @param: format
> > > > > > > * Homogeneize SVQ avail_idx_shadow and shadow_used_idx to make shadow a
> > > > > > >    prefix at both times.
> > > > > > > * Fix: Fo not use VirtQueueElement->len field, track separatedly.
> > > > > > > * Split vhost_svq_{enable,disable}_notification, so the code looks more
> > > > > > >    like the kernel driver code.
> > > > > > > * Small improvements.
> > > > > > > v2 link:
> > > > > > > https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA@mail.gmail.com/
> > > > > > >
> > > > > > > Changes from v1:
> > > > > > > * Feature set at device->SVQ is now the same as SVQ->guest.
> > > > > > > * Size of SVQ is not max available device size anymore, but guest's
> > > > > > >    negotiated.
> > > > > > > * Add VHOST_FILE_UNBIND kick and call fd treatment.
> > > > > > > * Make SVQ a public struct
> > > > > > > * Come back to previous approach to iova-tree
> > > > > > > * Some assertions are now fail paths. Some errors are now log_guest.
> > > > > > > * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> > > > > > > * Refactor some errors and messages. Add missing error unwindings.
> > > > > > > * Add memory barrier at _F_NO_NOTIFY set.
> > > > > > > * Stop checking for features flags out of transport range.
> > > > > > > v1 link:
> > > > > > > https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201@redhat.com/
> > > > > > >
> > > > > > > Changes from v4 RFC:
> > > > > > > * Support of allocating / freeing iova ranges in IOVA tree. Extending
> > > > > > >    already present iova-tree for that.
> > > > > > > * Proper validation of guest features. Now SVQ can negotiate a
> > > > > > >    different set of features with the device when enabled.
> > > > > > > * Support of host notifiers memory regions
> > > > > > > * Handling of SVQ full queue in case guest's descriptors span to
> > > > > > >    different memory regions (qemu's VA chunks).
> > > > > > > * Flush pending used buffers at end of SVQ operation.
> > > > > > > * QMP command now looks by NetClientState name. Other devices will need
> > > > > > >    to implement it's way to enable vdpa.
> > > > > > > * Rename QMP command to set, so it looks more like a way of working
> > > > > > > * Better use of qemu error system
> > > > > > > * Make a few assertions proper error-handling paths.
> > > > > > > * Add more documentation
> > > > > > > * Less coupling of virtio / vhost, that could cause friction on changes
> > > > > > > * Addressed many other small comments and small fixes.
> > > > > > >
> > > > > > > Changes from v3 RFC:
> > > > > > >    * Move everything to vhost-vdpa backend. A big change, this allowed
> > > > > > >      some cleanup but more code has been added in other places.
> > > > > > >    * More use of glib utilities, especially to manage memory.
> > > > > > > v3 link:
> > > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > > > > > >
> > > > > > > Changes from v2 RFC:
> > > > > > >    * Adding vhost-vdpa devices support
> > > > > > >    * Fixed some memory leaks pointed by different comments
> > > > > > > v2 link:
> > > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > > > > > >
> > > > > > > Changes from v1 RFC:
> > > > > > >    * Use QMP instead of migration to start SVQ mode.
> > > > > > >    * Only accepting IOMMU devices, closer behavior with target devices
> > > > > > >      (vDPA)
> > > > > > >    * Fix invalid masking/unmasking of vhost call fd.
> > > > > > >    * Use of proper methods for synchronization.
> > > > > > >    * No need to modify VirtIO device code, all of the changes are
> > > > > > >      contained in vhost code.
> > > > > > >    * Delete superfluous code.
> > > > > > >    * An intermediate RFC was sent with only the notifications forwarding
> > > > > > >      changes. It can be seen in
> > > > > > >      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> > > > > > > v1 link:
> > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > > > > > >
> > > > > > > Eugenio Pérez (20):
> > > > > > >        virtio: Add VIRTIO_F_QUEUE_STATE
> > > > > > >        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> > > > > > >        virtio: Add virtio_queue_is_host_notifier_enabled
> > > > > > >        vhost: Make vhost_virtqueue_{start,stop} public
> > > > > > >        vhost: Add x-vhost-enable-shadow-vq qmp
> > > > > > >        vhost: Add VhostShadowVirtqueue
> > > > > > >        vdpa: Register vdpa devices in a list
> > > > > > >        vhost: Route guest->host notification through shadow virtqueue
> > > > > > >        Add vhost_svq_get_svq_call_notifier
> > > > > > >        Add vhost_svq_set_guest_call_notifier
> > > > > > >        vdpa: Save call_fd in vhost-vdpa
> > > > > > >        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> > > > > > >        vhost: Route host->guest notification through shadow virtqueue
> > > > > > >        virtio: Add vhost_shadow_vq_get_vring_addr
> > > > > > >        vdpa: Save host and guest features
> > > > > > >        vhost: Add vhost_svq_valid_device_features to shadow vq
> > > > > > >        vhost: Shadow virtqueue buffers forwarding
> > > > > > >        vhost: Add VhostIOVATree
> > > > > > >        vhost: Use a tree to store memory mappings
> > > > > > >        vdpa: Add custom IOTLB translations to SVQ
> > > > > > >
> > > > > > > Eugenio Pérez (15):
> > > > > > >    vhost: Add VhostShadowVirtqueue
> > > > > > >    vhost: Add Shadow VirtQueue kick forwarding capabilities
> > > > > > >    vhost: Add Shadow VirtQueue call forwarding capabilities
> > > > > > >    vhost: Add vhost_svq_valid_features to shadow vq
> > > > > > >    virtio: Add vhost_svq_get_vring_addr
> > > > > > >    vdpa: adapt vhost_ops callbacks to svq
> > > > > > >    vhost: Shadow virtqueue buffers forwarding
> > > > > > >    util: Add iova_tree_alloc_map
> > > > > > >    util: add iova_tree_find_iova
> > > > > > >    vhost: Add VhostIOVATree
> > > > > > >    vdpa: Add custom IOTLB translations to SVQ
> > > > > > >    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
> > > > > > >    vdpa: Never set log_base addr if SVQ is enabled
> > > > > > >    vdpa: Expose VHOST_F_LOG_ALL on SVQ
> > > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > > >
> > > > > > >   qapi/net.json                      |   8 +-
> > > > > > >   hw/virtio/vhost-iova-tree.h        |  27 ++
> > > > > > >   hw/virtio/vhost-shadow-virtqueue.h |  87 ++++
> > > > > > >   include/hw/virtio/vhost-vdpa.h     |   8 +
> > > > > > >   include/qemu/iova-tree.h           |  38 +-
> > > > > > >   hw/virtio/vhost-iova-tree.c        | 110 +++++
> > > > > > >   hw/virtio/vhost-shadow-virtqueue.c | 637 +++++++++++++++++++++++++++++
> > > > > > >   hw/virtio/vhost-vdpa.c             | 525 +++++++++++++++++++++++-
> > > > > > >   net/vhost-vdpa.c                   |  48 ++-
> > > > > > >   util/iova-tree.c                   | 169 ++++++++
> > > > > > >   hw/virtio/meson.build              |   2 +-
> > > > > > >   11 files changed, 1633 insertions(+), 26 deletions(-)
> > > > > > >   create mode 100644 hw/virtio/vhost-iova-tree.h
> > > > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> > > > > > >   create mode 100644 hw/virtio/vhost-iova-tree.c
> > > > > > >   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
> > > > > > >
> > > > >
> > >
>
Re: [PATCH v5 00/15] vDPA shadow virtqueue
Posted by Michael S. Tsirkin 2 years, 1 month ago
On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
> Generally, yes.


So generally I support the idea of merging code gradually.  And merging
with an unstable flag to enable it is a reasonable way to do it.
However we are half a day away from soft freeze, so this will just
result in the feature getting to users in it's current not really
useable form. If we just want to simplify upstreaming then
merging patches 1-14 for now would be one way to do it.
If you want to do it through your tree then ok

Acked-by: Michael S. Tsirkin <mst@redhat.com>


-- 
MST
Re: [PATCH v5 00/15] vDPA shadow virtqueue
Posted by Jason Wang 2 years, 1 month ago
On Tue, Mar 8, 2022 at 6:46 PM Michael S. Tsirkin <mst@redhat.com> wrote:

> On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
> > Generally, yes.
>
>
> So generally I support the idea of merging code gradually.  And merging
> with an unstable flag to enable it is a reasonable way to do it.
> However we are half a day away from soft freeze, so this will just
> result in the feature getting to users in it's current not really
> useable form. If we just want to simplify upstreaming then
> merging patches 1-14 for now would be one way to do it.
>

Yes.


> If you want to do it through your tree then ok
>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>

Thanks. Will send a pull request soon.


>
>
> --
> MST
>
>
Re: [PATCH v5 00/15] vDPA shadow virtqueue
Posted by Michael S. Tsirkin 2 years, 1 month ago
On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
> > Not by itself but I'm not sure we can guarantee guest will not
> > attempt to use the IOVA addresses we are reserving down
> > the road.
> 
> The IOVA is allocated via the listeners and stored in the iova tree
> per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
> virtio core see GPA to HVA mapping. And we do a reverse lookup to find
> the HVA->IOVA we allocated previously.  So we have double check here:
> 
> 1) Qemu memory core to make sure the GPA that guest uses is valid
> 2) the IOVA tree that guarantees there will be no HVA beyond what
> guest can see is used
> 
> So technically, there's no way for the guest to use the IOVA address
> allocated for the shadow virtqueue.
> 
> Thanks

I mean, IOVA is programmed in the host hardware to translate to HPA, right?

-- 
MST
Re: [PATCH v5 00/15] vDPA shadow virtqueue
Posted by Eugenio Perez Martin 2 years, 1 month ago
On Tue, Mar 8, 2022 at 11:48 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
> > > Not by itself but I'm not sure we can guarantee guest will not
> > > attempt to use the IOVA addresses we are reserving down
> > > the road.
> >
> > The IOVA is allocated via the listeners and stored in the iova tree
> > per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
> > virtio core see GPA to HVA mapping. And we do a reverse lookup to find
> > the HVA->IOVA we allocated previously.  So we have double check here:
> >
> > 1) Qemu memory core to make sure the GPA that guest uses is valid
> > 2) the IOVA tree that guarantees there will be no HVA beyond what
> > guest can see is used
> >
> > So technically, there's no way for the guest to use the IOVA address
> > allocated for the shadow virtqueue.
> >
> > Thanks
>
> I mean, IOVA is programmed in the host hardware to translate to HPA, right?
>

Yes, that's right if the device uses physical maps. Also to note, SVQ
vring is allocated in multiples of host huge pages to avoid garbage or
unintended access from the device.

If a vdpa device uses physical addresses, kernel vdpa will pin qemu
memory first and then will send IOVA to HPA translation to hardware.
But this IOVA space is not controlled by the guest, but by SVQ. If a
guest's virtqueue buffer cannot be translated first to GPA, it will
not be forwarded.

Thanks!
Re: [PATCH v5 00/15] vDPA shadow virtqueue
Posted by Michael S. Tsirkin 2 years, 1 month ago
On Tue, Mar 08, 2022 at 12:37:33PM +0100, Eugenio Perez Martin wrote:
> On Tue, Mar 8, 2022 at 11:48 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
> > > > Not by itself but I'm not sure we can guarantee guest will not
> > > > attempt to use the IOVA addresses we are reserving down
> > > > the road.
> > >
> > > The IOVA is allocated via the listeners and stored in the iova tree
> > > per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
> > > virtio core see GPA to HVA mapping. And we do a reverse lookup to find
> > > the HVA->IOVA we allocated previously.  So we have double check here:
> > >
> > > 1) Qemu memory core to make sure the GPA that guest uses is valid
> > > 2) the IOVA tree that guarantees there will be no HVA beyond what
> > > guest can see is used
> > >
> > > So technically, there's no way for the guest to use the IOVA address
> > > allocated for the shadow virtqueue.
> > >
> > > Thanks
> >
> > I mean, IOVA is programmed in the host hardware to translate to HPA, right?
> >
> 
> Yes, that's right if the device uses physical maps. Also to note, SVQ
> vring is allocated in multiples of host huge pages to avoid garbage or
> unintended access from the device.
> 
> If a vdpa device uses physical addresses, kernel vdpa will pin qemu
> memory first and then will send IOVA to HPA translation to hardware.
> But this IOVA space is not controlled by the guest, but by SVQ. If a
> guest's virtqueue buffer cannot be translated first to GPA, it will
> not be forwarded.
> 
> Thanks!

Right. So if guests send a buffer where buffer address overlaps the
range we used for the SVQ, then I think at the moment guest won't work.

-- 
MST
Re: [PATCH v5 00/15] vDPA shadow virtqueue
Posted by Eugenio Perez Martin 2 years, 1 month ago
On Tue, Mar 8, 2022 at 1:17 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Mar 08, 2022 at 12:37:33PM +0100, Eugenio Perez Martin wrote:
> > On Tue, Mar 8, 2022 at 11:48 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
> > > > > Not by itself but I'm not sure we can guarantee guest will not
> > > > > attempt to use the IOVA addresses we are reserving down
> > > > > the road.
> > > >
> > > > The IOVA is allocated via the listeners and stored in the iova tree
> > > > per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
> > > > virtio core see GPA to HVA mapping. And we do a reverse lookup to find
> > > > the HVA->IOVA we allocated previously.  So we have double check here:
> > > >
> > > > 1) Qemu memory core to make sure the GPA that guest uses is valid
> > > > 2) the IOVA tree that guarantees there will be no HVA beyond what
> > > > guest can see is used
> > > >
> > > > So technically, there's no way for the guest to use the IOVA address
> > > > allocated for the shadow virtqueue.
> > > >
> > > > Thanks
> > >
> > > I mean, IOVA is programmed in the host hardware to translate to HPA, right?
> > >
> >
> > Yes, that's right if the device uses physical maps. Also to note, SVQ
> > vring is allocated in multiples of host huge pages to avoid garbage or
> > unintended access from the device.
> >
> > If a vdpa device uses physical addresses, kernel vdpa will pin qemu
> > memory first and then will send IOVA to HPA translation to hardware.
> > But this IOVA space is not controlled by the guest, but by SVQ. If a
> > guest's virtqueue buffer cannot be translated first to GPA, it will
> > not be forwarded.
> >
> > Thanks!
>
> Right. So if guests send a buffer where buffer address overlaps the
> range we used for the SVQ, then I think at the moment guest won't work.
>

I'm going to dissect a few cases so we're able to sync where the POV
differs. Letting out vIOMMU for simplicity.

If qemu uses an emulated device, it reads VirtQueue and translates
addresses from GPA to HVA via virtqueue_pop. If the guest places an
address out of GPA, dma_memory_map returns an error ("virtio: bogus
descriptor or out of resources").

It doesn't make sense to say "the buffer address overlaps with qemu
memory" here since the conversion function is not defined for all GPA.
If the range is not in GPA, it's a bogus descriptor.

Now we use a vdpa device that uses physical mapping and we start qemu
with no svq. When qemu starts it  maps IOVA == GPA to HVA. When the
vdpa kernel receives the mapping, it pins the HVA memory, obtaining
HPA, and sends the memory map IOVA == GPA to HPA mappings to the
hardware. This is supported here.

If we add SVQ, the IOVA is not GPA anymore. GPA chunks are mapped to
IOVA, and SVQ is mapped too to IOVA, they don't overlap so the device
can access them both. When the memory listener tells vdpa that a new
chunk of memory is added, the code of SVQ does not care about GPA: It
allocates a free region of IOVA for the HVA region of the guest's
memory. GPA to HVA is already tracked and translated by virtqueue_pop.

Let's use example numbers:
- SVQ occupies HVA [0xa000, 0xb000). It's the first one to call
iova_tree_alloc_map, so it's mapped that IOVA [0,0x1000) translates to
[0xa000, 0xb000).
- The memory listener now reports GPA from [0, 0x1000), translated to
HVA [0x8000, 0x9000). The new call to iova_tree_alloc_map assigns the
IOVA [0x1000, 0x2000) to HVA [0x8000, 0x9000).

Then that IOVA tree is sent to the device. From the kernel POV is the
same: It gets HVA addresses, pins them, and configures the hardware so
it can translate IOVA (!= GPA) to HPA.

SVQ now reads descriptors from the guest using virtqueue_pop, so SVQ
as it's caller does not use GPA to address them but HVA. If the
guest's vring descriptor is outside of GPA [0, 0x1000), it's an error
as in emulated device. After that, it translates HVA to IOVA with the
iova-tree. The result must be within [0x1000, 0x2000).

So guests should not be able to write qemu's memory outside the
guest's memory unless it hits a bug either in SVQ code or in qemu's
Virtqueue/DMA system.

Let me know if this makes sense to you.

Thanks!
Re: [PATCH v5 00/15] vDPA shadow virtqueue
Posted by Jason Wang 2 years, 1 month ago
在 2022/3/8 下午8:16, Michael S. Tsirkin 写道:
> On Tue, Mar 08, 2022 at 12:37:33PM +0100, Eugenio Perez Martin wrote:
>> On Tue, Mar 8, 2022 at 11:48 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>>> On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
>>>>> Not by itself but I'm not sure we can guarantee guest will not
>>>>> attempt to use the IOVA addresses we are reserving down
>>>>> the road.
>>>> The IOVA is allocated via the listeners and stored in the iova tree
>>>> per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
>>>> virtio core see GPA to HVA mapping. And we do a reverse lookup to find
>>>> the HVA->IOVA we allocated previously.  So we have double check here:
>>>>
>>>> 1) Qemu memory core to make sure the GPA that guest uses is valid
>>>> 2) the IOVA tree that guarantees there will be no HVA beyond what
>>>> guest can see is used
>>>>
>>>> So technically, there's no way for the guest to use the IOVA address
>>>> allocated for the shadow virtqueue.
>>>>
>>>> Thanks
>>> I mean, IOVA is programmed in the host hardware to translate to HPA, right?
>>>
>> Yes, that's right if the device uses physical maps. Also to note, SVQ
>> vring is allocated in multiples of host huge pages to avoid garbage or
>> unintended access from the device.
>>
>> If a vdpa device uses physical addresses, kernel vdpa will pin qemu
>> memory first and then will send IOVA to HPA translation to hardware.
>> But this IOVA space is not controlled by the guest, but by SVQ. If a
>> guest's virtqueue buffer cannot be translated first to GPA, it will
>> not be forwarded.
>>
>> Thanks!
> Right. So if guests send a buffer where buffer address overlaps the
> range we used for the SVQ, then I think at the moment guest won't work.


There's no way for a guest to do this, it can only use GPA but the Qemu 
won't let vDPA to use GPA as IOVA. Dedicated IOVA ranges were allocated 
for those GPA ranges so SVQ won't use IOVA that is overlapped with what 
Guest use.

Thanks


>


Re: [PATCH v5 00/15] vDPA shadow virtqueue
Posted by Michael S. Tsirkin 2 years, 1 month ago
On Wed, Mar 09, 2022 at 11:38:35AM +0800, Jason Wang wrote:
> 
> 在 2022/3/8 下午8:16, Michael S. Tsirkin 写道:
> > On Tue, Mar 08, 2022 at 12:37:33PM +0100, Eugenio Perez Martin wrote:
> > > On Tue, Mar 8, 2022 at 11:48 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
> > > > > > Not by itself but I'm not sure we can guarantee guest will not
> > > > > > attempt to use the IOVA addresses we are reserving down
> > > > > > the road.
> > > > > The IOVA is allocated via the listeners and stored in the iova tree
> > > > > per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
> > > > > virtio core see GPA to HVA mapping. And we do a reverse lookup to find
> > > > > the HVA->IOVA we allocated previously.  So we have double check here:
> > > > > 
> > > > > 1) Qemu memory core to make sure the GPA that guest uses is valid
> > > > > 2) the IOVA tree that guarantees there will be no HVA beyond what
> > > > > guest can see is used
> > > > > 
> > > > > So technically, there's no way for the guest to use the IOVA address
> > > > > allocated for the shadow virtqueue.
> > > > > 
> > > > > Thanks
> > > > I mean, IOVA is programmed in the host hardware to translate to HPA, right?
> > > > 
> > > Yes, that's right if the device uses physical maps. Also to note, SVQ
> > > vring is allocated in multiples of host huge pages to avoid garbage or
> > > unintended access from the device.
> > > 
> > > If a vdpa device uses physical addresses, kernel vdpa will pin qemu
> > > memory first and then will send IOVA to HPA translation to hardware.
> > > But this IOVA space is not controlled by the guest, but by SVQ. If a
> > > guest's virtqueue buffer cannot be translated first to GPA, it will
> > > not be forwarded.
> > > 
> > > Thanks!
> > Right. So if guests send a buffer where buffer address overlaps the
> > range we used for the SVQ, then I think at the moment guest won't work.
> 
> 
> There's no way for a guest to do this, it can only use GPA

With a vIOMMU it can.

> but the Qemu
> won't let vDPA to use GPA as IOVA. Dedicated IOVA ranges were allocated for
> those GPA ranges so SVQ won't use IOVA that is overlapped with what Guest
> use.
> 
> Thanks
> 
> 
> > 


Re: [PATCH v5 00/15] vDPA shadow virtqueue
Posted by Jason Wang 2 years, 1 month ago
On Wed, Mar 9, 2022 at 3:30 PM Michael S. Tsirkin <mst@redhat.com> wrote:

> On Wed, Mar 09, 2022 at 11:38:35AM +0800, Jason Wang wrote:
> >
> > 在 2022/3/8 下午8:16, Michael S. Tsirkin 写道:
> > > On Tue, Mar 08, 2022 at 12:37:33PM +0100, Eugenio Perez Martin wrote:
> > > > On Tue, Mar 8, 2022 at 11:48 AM Michael S. Tsirkin <mst@redhat.com>
> wrote:
> > > > > On Tue, Mar 08, 2022 at 04:20:53PM +0800, Jason Wang wrote:
> > > > > > > Not by itself but I'm not sure we can guarantee guest will not
> > > > > > > attempt to use the IOVA addresses we are reserving down
> > > > > > > the road.
> > > > > > The IOVA is allocated via the listeners and stored in the iova
> tree
> > > > > > per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
> > > > > > virtio core see GPA to HVA mapping. And we do a reverse lookup
> to find
> > > > > > the HVA->IOVA we allocated previously.  So we have double check
> here:
> > > > > >
> > > > > > 1) Qemu memory core to make sure the GPA that guest uses is valid
> > > > > > 2) the IOVA tree that guarantees there will be no HVA beyond what
> > > > > > guest can see is used
> > > > > >
> > > > > > So technically, there's no way for the guest to use the IOVA
> address
> > > > > > allocated for the shadow virtqueue.
> > > > > >
> > > > > > Thanks
> > > > > I mean, IOVA is programmed in the host hardware to translate to
> HPA, right?
> > > > >
> > > > Yes, that's right if the device uses physical maps. Also to note, SVQ
> > > > vring is allocated in multiples of host huge pages to avoid garbage
> or
> > > > unintended access from the device.
> > > >
> > > > If a vdpa device uses physical addresses, kernel vdpa will pin qemu
> > > > memory first and then will send IOVA to HPA translation to hardware.
> > > > But this IOVA space is not controlled by the guest, but by SVQ. If a
> > > > guest's virtqueue buffer cannot be translated first to GPA, it will
> > > > not be forwarded.
> > > >
> > > > Thanks!
> > > Right. So if guests send a buffer where buffer address overlaps the
> > > range we used for the SVQ, then I think at the moment guest won't work.
> >
> >
> > There's no way for a guest to do this, it can only use GPA
>
> With a vIOMMU it can.
>

It should be the same or I may miss something.

With a vIOMMU, vDPA devices still won't use gIOVA. Instead the device will
use the IOVA that is managed by the Qemu.

Listeners: IOVA->HVA
Qemu virtqueue helper: gIOVA->GPA->HVA
SVQ: HVA->IOVA

So SVQ will use an IOVA that is overlapped with gIOVA/GPA

Thanks


>
> > but the Qemu
> > won't let vDPA to use GPA as IOVA. Dedicated IOVA ranges were allocated
> for
> > those GPA ranges so SVQ won't use IOVA that is overlapped with what Guest
> > use.
> >
> > Thanks
> >
> >
> > >
>
>