[Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature

Tiwei Bie posted 1 patch 6 years ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20180411072027.5656-1-tiwei.bie@intel.com
Test checkpatch passed
Test docker-build@min-glib passed
Test docker-mingw@fedora passed
Test s390x passed
docs/interop/vhost-user.txt       |  9 ++++++++
hw/virtio/vhost-backend.c         |  7 ++++++
hw/virtio/vhost-user.c            |  8 +++++++
hw/virtio/vhost.c                 | 47 ++++++++++++++++++++++++++++++++++++---
include/hw/virtio/vhost-backend.h |  3 +++
5 files changed, 71 insertions(+), 3 deletions(-)
[Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Tiwei Bie 6 years ago
This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
feature for vhost-user. By default, vhost-user backend needs
to query the IOTLBs from QEMU after meeting unknown IOVAs.
With this protocol feature negotiated, QEMU will provide all
the IOTLBs to vhost-user backend without waiting for the
queries from backend. This is helpful when using a hardware
accelerator which is not able to handle unknown IOVAs at the
vhost-user backend.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
The idea of this patch is to let QEMU push all the IOTLBs
to vhost-user backend without waiting for the queries from
the backend. Because hardware accelerator at the vhost-user
backend may not be able to handle unknown IOVAs.

This is just a RFC for now. It seems that, it doesn't work
as expected when guest is using kernel driver (To handle
this case, it seems that some RAM regions' events also need
to be listened). Any comments would be appreciated! Thanks!

 docs/interop/vhost-user.txt       |  9 ++++++++
 hw/virtio/vhost-backend.c         |  7 ++++++
 hw/virtio/vhost-user.c            |  8 +++++++
 hw/virtio/vhost.c                 | 47 ++++++++++++++++++++++++++++++++++++---
 include/hw/virtio/vhost-backend.h |  3 +++
 5 files changed, 71 insertions(+), 3 deletions(-)

diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index 534caab18a..73e07f9dad 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -380,6 +380,7 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
 #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
 #define VHOST_USER_PROTOCOL_F_CONFIG         9
+#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10
 
 Master message types
 --------------------
@@ -797,3 +798,11 @@ resilient for selective requests.
 For the message types that already solicit a reply from the client, the
 presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
 no behavioural change. (See the 'Communication' section for details.)
+
+VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
+------------------------------------
+By default, vhost-user backend needs to query the IOTLBs from QEMU after
+meeting unknown IOVAs. With this protocol feature negotiated, QEMU will
+provide all the IOTLBs to vhost backend without waiting for the queries
+from backend. This is helpful when using a hardware accelerator which is
+not able to handle unknown IOVAs at the vhost-user backend.
diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index 7f09efab8b..d72994e9a5 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -233,6 +233,11 @@ static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev,
         qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL);
 }
 
+static bool vhost_kernel_need_all_device_iotlb(struct vhost_dev *dev)
+{
+    return false;
+}
+
 static const VhostOps kernel_ops = {
         .backend_type = VHOST_BACKEND_TYPE_KERNEL,
         .vhost_backend_init = vhost_kernel_init,
@@ -264,6 +269,8 @@ static const VhostOps kernel_ops = {
 #endif /* CONFIG_VHOST_VSOCK */
         .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback,
         .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg,
+        .vhost_backend_need_all_device_iotlb =
+                                vhost_kernel_need_all_device_iotlb,
 };
 
 int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type)
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 38da8692bb..30e0b1c13b 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -47,6 +47,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
     VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
     VHOST_USER_PROTOCOL_F_CONFIG = 9,
+    VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10,
     VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
     return 0;
 }
 
+static bool vhost_user_need_all_device_iotlb(struct vhost_dev *dev)
+{
+    return virtio_has_feature(dev->protocol_features,
+                              VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB);
+}
+
 const VhostOps user_ops = {
         .backend_type = VHOST_BACKEND_TYPE_USER,
         .vhost_backend_init = vhost_user_init,
@@ -1611,4 +1618,5 @@ const VhostOps user_ops = {
         .vhost_set_config = vhost_user_set_config,
         .vhost_crypto_create_session = vhost_user_crypto_create_session,
         .vhost_crypto_close_session = vhost_user_crypto_close_session,
+        .vhost_backend_need_all_device_iotlb = vhost_user_need_all_device_iotlb,
 };
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index f51bf573d5..70922ee998 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -48,6 +48,9 @@ static unsigned int used_memslots;
 static QLIST_HEAD(, vhost_dev) vhost_devices =
     QLIST_HEAD_INITIALIZER(vhost_devices);
 
+static int vhost_memory_region_lookup(struct vhost_dev *hdev, uint64_t gpa,
+                                      uint64_t *uaddr, uint64_t *len);
+
 bool vhost_has_free_slot(void)
 {
     unsigned int slots_limit = ~0U;
@@ -634,12 +637,39 @@ static void vhost_region_addnop(MemoryListener *listener,
     vhost_region_add_section(dev, section);
 }
 
-static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+static void vhost_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
 {
     struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n);
     struct vhost_dev *hdev = iommu->hdev;
     hwaddr iova = iotlb->iova + iommu->iommu_offset;
 
+    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
+        uint64_t uaddr, len;
+
+        rcu_read_lock();
+
+        if (iotlb->target_as != NULL) {
+            if (vhost_memory_region_lookup(hdev, iotlb->translated_addr,
+                        &uaddr, &len)) {
+                error_report("Fail to lookup the translated address "
+                        "%"PRIx64, iotlb->translated_addr);
+                goto out;
+            }
+
+            len = MIN(iotlb->addr_mask + 1, len);
+            iova = iova & ~iotlb->addr_mask;
+
+            if (vhost_backend_update_device_iotlb(hdev, iova, uaddr,
+                                                  len, iotlb->perm)) {
+                error_report("Fail to update device iotlb");
+                goto out;
+            }
+        }
+out:
+        rcu_read_unlock();
+        return;
+    }
+
     if (vhost_backend_invalidate_device_iotlb(hdev, iova,
                                               iotlb->addr_mask + 1)) {
         error_report("Fail to invalidate device iotlb");
@@ -652,6 +682,7 @@ static void vhost_iommu_region_add(MemoryListener *listener,
     struct vhost_dev *dev = container_of(listener, struct vhost_dev,
                                          iommu_listener);
     struct vhost_iommu *iommu;
+    IOMMUNotifierFlag flags;
     Int128 end;
 
     if (!memory_region_is_iommu(section->mr)) {
@@ -662,8 +693,15 @@ static void vhost_iommu_region_add(MemoryListener *listener,
     end = int128_add(int128_make64(section->offset_within_region),
                      section->size);
     end = int128_sub(end, int128_one());
-    iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify,
-                        IOMMU_NOTIFIER_UNMAP,
+
+    if (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) {
+        flags = IOMMU_NOTIFIER_ALL;
+    } else {
+        flags = IOMMU_NOTIFIER_UNMAP;
+    }
+
+    iommu_notifier_init(&iommu->n, vhost_iommu_map_notify,
+                        flags,
                         section->offset_within_region,
                         int128_get64(end));
     iommu->mr = section->mr;
@@ -673,6 +711,9 @@ static void vhost_iommu_region_add(MemoryListener *listener,
     memory_region_register_iommu_notifier(section->mr, &iommu->n);
     QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next);
     /* TODO: can replay help performance here? */
+    if (flags == IOMMU_NOTIFIER_ALL) {
+        memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), &iommu->n);
+    }
 }
 
 static void vhost_iommu_region_del(MemoryListener *listener,
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index 5dac61f9ea..602fd987a4 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -101,6 +101,8 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev,
 typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
                                              uint64_t session_id);
 
+typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct vhost_dev *dev);
+
 typedef struct VhostOps {
     VhostBackendType backend_type;
     vhost_backend_init vhost_backend_init;
@@ -138,6 +140,7 @@ typedef struct VhostOps {
     vhost_set_config_op vhost_set_config;
     vhost_crypto_create_session_op vhost_crypto_create_session;
     vhost_crypto_close_session_op vhost_crypto_close_session;
+    vhost_backend_need_all_device_iotlb_op vhost_backend_need_all_device_iotlb;
 } VhostOps;
 
 extern const VhostOps user_ops;
-- 
2.11.0


Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Peter Xu 6 years ago
On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:

[...]

> This is just a RFC for now. It seems that, it doesn't work
> as expected when guest is using kernel driver (To handle
> this case, it seems that some RAM regions' events also need
> to be listened). Any comments would be appreciated! Thanks!

Hi, Tiwei,

What's your kernel command line in the guest?  Is iommu=pt there?

-- 
Peter Xu

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Tiwei Bie 6 years ago
Hi Peter,

On Wed, Apr 11, 2018 at 04:00:36PM +0800, Peter Xu wrote:
> On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
> 
> [...]
> 
> > This is just a RFC for now. It seems that, it doesn't work
> > as expected when guest is using kernel driver (To handle
> > this case, it seems that some RAM regions' events also need
> > to be listened). Any comments would be appreciated! Thanks!
> 
> Hi, Tiwei,
> 
> What's your kernel command line in the guest?  Is iommu=pt there?

Yeah, you are right! The related things in kernel command line are:

iommu=pt intel_iommu=on

Hmm, how will this param affect vIOMMU's behaviour?..

Best regards,
Tiwei Bie

> 
> -- 
> Peter Xu

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Peter Xu 6 years ago
On Wed, Apr 11, 2018 at 04:25:56PM +0800, Tiwei Bie wrote:
> Hi Peter,
> 
> On Wed, Apr 11, 2018 at 04:00:36PM +0800, Peter Xu wrote:
> > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
> > 
> > [...]
> > 
> > > This is just a RFC for now. It seems that, it doesn't work
> > > as expected when guest is using kernel driver (To handle
> > > this case, it seems that some RAM regions' events also need
> > > to be listened). Any comments would be appreciated! Thanks!
> > 
> > Hi, Tiwei,
> > 
> > What's your kernel command line in the guest?  Is iommu=pt there?
> 
> Yeah, you are right! The related things in kernel command line are:
> 
> iommu=pt intel_iommu=on
> 
> Hmm, how will this param affect vIOMMU's behaviour?..

If iommu=pt is there, guest kernel will try to bypass IOMMU, the IOMMU
regions will be disabled completely in that case, hence it's very
possible that your IOMMU memory listeners won't get anything useful.

Maybe you can consider removing iommu=pt in the guest parameter to see
whether the guest kernel driver could work.

-- 
Peter Xu

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Tiwei Bie 6 years ago
On Wed, Apr 11, 2018 at 04:37:16PM +0800, Peter Xu wrote:
> On Wed, Apr 11, 2018 at 04:25:56PM +0800, Tiwei Bie wrote:
> > On Wed, Apr 11, 2018 at 04:00:36PM +0800, Peter Xu wrote:
> > > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
> > > 
> > > [...]
> > > 
> > > > This is just a RFC for now. It seems that, it doesn't work
> > > > as expected when guest is using kernel driver (To handle
> > > > this case, it seems that some RAM regions' events also need
> > > > to be listened). Any comments would be appreciated! Thanks!
> > > 
> > > Hi, Tiwei,
> > > 
> > > What's your kernel command line in the guest?  Is iommu=pt there?
> > 
> > Yeah, you are right! The related things in kernel command line are:
> > 
> > iommu=pt intel_iommu=on
> > 
> > Hmm, how will this param affect vIOMMU's behaviour?..
> 
> If iommu=pt is there, guest kernel will try to bypass IOMMU, the IOMMU
> regions will be disabled completely in that case, hence it's very
> possible that your IOMMU memory listeners won't get anything useful.
> 
> Maybe you can consider removing iommu=pt in the guest parameter to see
> whether the guest kernel driver could work.

Cool. I'll give it a try! Considering we may also need to
handle the iommu=pt case, is there any event in QEMU can
be used to know whether the IOMMU regions are disabled
or enabled by the guest?

Best regards,
Tiwei Bie

> 
> -- 
> Peter Xu

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Peter Xu 6 years ago
On Wed, Apr 11, 2018 at 04:55:25PM +0800, Tiwei Bie wrote:
> On Wed, Apr 11, 2018 at 04:37:16PM +0800, Peter Xu wrote:
> > On Wed, Apr 11, 2018 at 04:25:56PM +0800, Tiwei Bie wrote:
> > > On Wed, Apr 11, 2018 at 04:00:36PM +0800, Peter Xu wrote:
> > > > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
> > > > 
> > > > [...]
> > > > 
> > > > > This is just a RFC for now. It seems that, it doesn't work
> > > > > as expected when guest is using kernel driver (To handle
> > > > > this case, it seems that some RAM regions' events also need
> > > > > to be listened). Any comments would be appreciated! Thanks!
> > > > 
> > > > Hi, Tiwei,
> > > > 
> > > > What's your kernel command line in the guest?  Is iommu=pt there?
> > > 
> > > Yeah, you are right! The related things in kernel command line are:
> > > 
> > > iommu=pt intel_iommu=on
> > > 
> > > Hmm, how will this param affect vIOMMU's behaviour?..
> > 
> > If iommu=pt is there, guest kernel will try to bypass IOMMU, the IOMMU
> > regions will be disabled completely in that case, hence it's very
> > possible that your IOMMU memory listeners won't get anything useful.
> > 
> > Maybe you can consider removing iommu=pt in the guest parameter to see
> > whether the guest kernel driver could work.
> 
> Cool. I'll give it a try! Considering we may also need to
> handle the iommu=pt case, is there any event in QEMU can
> be used to know whether the IOMMU regions are disabled
> or enabled by the guest?

You may consider to use similar way like below patch to detect that:

  https://patchwork.kernel.org/patch/9735739/

That was a patch trying to preheat the vhost cache.  It was refused at
that time, but IMHO the logic can be used.  You can just send the
updates only if your new flag set.  Then that won't be a preheat any
more, but a must for your hardwares to work.

-- 
Peter Xu

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Tiwei Bie 6 years ago
On Wed, Apr 11, 2018 at 05:16:47PM +0800, Peter Xu wrote:
> On Wed, Apr 11, 2018 at 04:55:25PM +0800, Tiwei Bie wrote:
> > On Wed, Apr 11, 2018 at 04:37:16PM +0800, Peter Xu wrote:
> > > On Wed, Apr 11, 2018 at 04:25:56PM +0800, Tiwei Bie wrote:
> > > > On Wed, Apr 11, 2018 at 04:00:36PM +0800, Peter Xu wrote:
> > > > > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
> > > > > 
> > > > > [...]
> > > > > 
> > > > > > This is just a RFC for now. It seems that, it doesn't work
> > > > > > as expected when guest is using kernel driver (To handle
> > > > > > this case, it seems that some RAM regions' events also need
> > > > > > to be listened). Any comments would be appreciated! Thanks!
> > > > > 
> > > > > Hi, Tiwei,
> > > > > 
> > > > > What's your kernel command line in the guest?  Is iommu=pt there?
> > > > 
> > > > Yeah, you are right! The related things in kernel command line are:
> > > > 
> > > > iommu=pt intel_iommu=on
> > > > 
> > > > Hmm, how will this param affect vIOMMU's behaviour?..
> > > 
> > > If iommu=pt is there, guest kernel will try to bypass IOMMU, the IOMMU
> > > regions will be disabled completely in that case, hence it's very
> > > possible that your IOMMU memory listeners won't get anything useful.
> > > 
> > > Maybe you can consider removing iommu=pt in the guest parameter to see
> > > whether the guest kernel driver could work.
> > 
> > Cool. I'll give it a try! Considering we may also need to
> > handle the iommu=pt case, is there any event in QEMU can
> > be used to know whether the IOMMU regions are disabled
> > or enabled by the guest?
> 
> You may consider to use similar way like below patch to detect that:
> 
>   https://patchwork.kernel.org/patch/9735739/
> 
> That was a patch trying to preheat the vhost cache.  It was refused at
> that time, but IMHO the logic can be used.  You can just send the
> updates only if your new flag set.  Then that won't be a preheat any
> more, but a must for your hardwares to work.

It looks pretty helpful in my case! Thank you so much!

Best regards,
Tiwei Bie

> 
> -- 
> Peter Xu

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Jason Wang 6 years ago

On 2018年04月11日 15:20, Tiwei Bie wrote:
> This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> feature for vhost-user. By default, vhost-user backend needs
> to query the IOTLBs from QEMU after meeting unknown IOVAs.
> With this protocol feature negotiated, QEMU will provide all
> the IOTLBs to vhost-user backend without waiting for the
> queries from backend. This is helpful when using a hardware
> accelerator which is not able to handle unknown IOVAs at the
> vhost-user backend.
>
> Signed-off-by: Tiwei Bie<tiwei.bie@intel.com>
> ---
> The idea of this patch is to let QEMU push all the IOTLBs
> to vhost-user backend without waiting for the queries from
> the backend. Because hardware accelerator at the vhost-user
> backend may not be able to handle unknown IOVAs.
>
> This is just a RFC for now. It seems that, it doesn't work
> as expected when guest is using kernel driver (To handle
> this case, it seems that some RAM regions' events also need
> to be listened). Any comments would be appreciated! Thanks!

Interesting, a quick question is why this is needed? Can we just use 
exist IOTLB update message?

It looks to me at least kernel does not need this.

Thanks

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Tiwei Bie 6 years ago
On Wed, Apr 11, 2018 at 04:01:19PM +0800, Jason Wang wrote:
> On 2018年04月11日 15:20, Tiwei Bie wrote:
> > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > feature for vhost-user. By default, vhost-user backend needs
> > to query the IOTLBs from QEMU after meeting unknown IOVAs.
> > With this protocol feature negotiated, QEMU will provide all
> > the IOTLBs to vhost-user backend without waiting for the
> > queries from backend. This is helpful when using a hardware
> > accelerator which is not able to handle unknown IOVAs at the
> > vhost-user backend.
> > 
> > Signed-off-by: Tiwei Bie<tiwei.bie@intel.com>
> > ---
> > The idea of this patch is to let QEMU push all the IOTLBs
> > to vhost-user backend without waiting for the queries from
> > the backend. Because hardware accelerator at the vhost-user
> > backend may not be able to handle unknown IOVAs.
> > 
> > This is just a RFC for now. It seems that, it doesn't work
> > as expected when guest is using kernel driver (To handle
> > this case, it seems that some RAM regions' events also need
> > to be listened). Any comments would be appreciated! Thanks!
> 
> Interesting, a quick question is why this is needed? Can we just use exist
> IOTLB update message?

Yeah, we are still using the existing IOTLB update messages
to send the IOTLB messages to backend. The only difference
is that, QEMU won't wait for the queries before sending the
IOTLB update messages.

> 
> It looks to me at least kernel does not need this.

Something similar in kernel vhost is that, for kernel vhost,
QEMU needs to push the IOTLBs of some ring addrs to kernel
vhost backend without waiting for the queries.

Best regards,
Tiwei Bie

> 
> Thanks

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Jason Wang 6 years ago

On 2018年04月11日 16:38, Tiwei Bie wrote:
> On Wed, Apr 11, 2018 at 04:01:19PM +0800, Jason Wang wrote:
>> On 2018年04月11日 15:20, Tiwei Bie wrote:
>>> This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
>>> feature for vhost-user. By default, vhost-user backend needs
>>> to query the IOTLBs from QEMU after meeting unknown IOVAs.
>>> With this protocol feature negotiated, QEMU will provide all
>>> the IOTLBs to vhost-user backend without waiting for the
>>> queries from backend. This is helpful when using a hardware
>>> accelerator which is not able to handle unknown IOVAs at the
>>> vhost-user backend.
>>>
>>> Signed-off-by: Tiwei Bie<tiwei.bie@intel.com>
>>> ---
>>> The idea of this patch is to let QEMU push all the IOTLBs
>>> to vhost-user backend without waiting for the queries from
>>> the backend. Because hardware accelerator at the vhost-user
>>> backend may not be able to handle unknown IOVAs.
>>>
>>> This is just a RFC for now. It seems that, it doesn't work
>>> as expected when guest is using kernel driver (To handle
>>> this case, it seems that some RAM regions' events also need
>>> to be listened). Any comments would be appreciated! Thanks!
>> Interesting, a quick question is why this is needed? Can we just use exist
>> IOTLB update message?
> Yeah, we are still using the existing IOTLB update messages
> to send the IOTLB messages to backend. The only difference
> is that, QEMU won't wait for the queries before sending the
> IOTLB update messages.

Yes, my question is not very clear. I mean why must need a new feature 
bit? It looks to me qemu code can work without this.

Thanks

>
>> It looks to me at least kernel does not need this.
> Something similar in kernel vhost is that, for kernel vhost,
> QEMU needs to push the IOTLBs of some ring addrs to kernel
> vhost backend without waiting for the queries.
>
> Best regards,
> Tiwei Bie
>
>> Thanks


Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Michael S. Tsirkin 6 years ago
On Wed, Apr 11, 2018 at 09:41:05PM +0800, Jason Wang wrote:
> 
> 
> On 2018年04月11日 16:38, Tiwei Bie wrote:
> > On Wed, Apr 11, 2018 at 04:01:19PM +0800, Jason Wang wrote:
> > > On 2018年04月11日 15:20, Tiwei Bie wrote:
> > > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > > > feature for vhost-user. By default, vhost-user backend needs
> > > > to query the IOTLBs from QEMU after meeting unknown IOVAs.
> > > > With this protocol feature negotiated, QEMU will provide all
> > > > the IOTLBs to vhost-user backend without waiting for the
> > > > queries from backend. This is helpful when using a hardware
> > > > accelerator which is not able to handle unknown IOVAs at the
> > > > vhost-user backend.
> > > > 
> > > > Signed-off-by: Tiwei Bie<tiwei.bie@intel.com>
> > > > ---
> > > > The idea of this patch is to let QEMU push all the IOTLBs
> > > > to vhost-user backend without waiting for the queries from
> > > > the backend. Because hardware accelerator at the vhost-user
> > > > backend may not be able to handle unknown IOVAs.
> > > > 
> > > > This is just a RFC for now. It seems that, it doesn't work
> > > > as expected when guest is using kernel driver (To handle
> > > > this case, it seems that some RAM regions' events also need
> > > > to be listened). Any comments would be appreciated! Thanks!
> > > Interesting, a quick question is why this is needed? Can we just use exist
> > > IOTLB update message?
> > Yeah, we are still using the existing IOTLB update messages
> > to send the IOTLB messages to backend. The only difference
> > is that, QEMU won't wait for the queries before sending the
> > IOTLB update messages.
> 
> Yes, my question is not very clear. I mean why must need a new feature bit?
> It looks to me qemu code can work without this.
> 
> Thanks

Generally we avoid adding new messages without a protocol feature bit.
While careful analysis might sometimes prove it's not a strict
requirement, it's just overall a clean and robust approach.

> > 
> > > It looks to me at least kernel does not need this.
> > Something similar in kernel vhost is that, for kernel vhost,
> > QEMU needs to push the IOTLBs of some ring addrs to kernel
> > vhost backend without waiting for the queries.
> > 
> > Best regards,
> > Tiwei Bie
> > 
> > > Thanks

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Jason Wang 6 years ago

On 2018年04月12日 01:00, Michael S. Tsirkin wrote:
> On Wed, Apr 11, 2018 at 09:41:05PM +0800, Jason Wang wrote:
>> On 2018年04月11日 16:38, Tiwei Bie wrote:
>>> On Wed, Apr 11, 2018 at 04:01:19PM +0800, Jason Wang wrote:
>>>> On 2018年04月11日 15:20, Tiwei Bie wrote:
>>>>> This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
>>>>> feature for vhost-user. By default, vhost-user backend needs
>>>>> to query the IOTLBs from QEMU after meeting unknown IOVAs.
>>>>> With this protocol feature negotiated, QEMU will provide all
>>>>> the IOTLBs to vhost-user backend without waiting for the
>>>>> queries from backend. This is helpful when using a hardware
>>>>> accelerator which is not able to handle unknown IOVAs at the
>>>>> vhost-user backend.
>>>>>
>>>>> Signed-off-by: Tiwei Bie<tiwei.bie@intel.com>
>>>>> ---
>>>>> The idea of this patch is to let QEMU push all the IOTLBs
>>>>> to vhost-user backend without waiting for the queries from
>>>>> the backend. Because hardware accelerator at the vhost-user
>>>>> backend may not be able to handle unknown IOVAs.
>>>>>
>>>>> This is just a RFC for now. It seems that, it doesn't work
>>>>> as expected when guest is using kernel driver (To handle
>>>>> this case, it seems that some RAM regions' events also need
>>>>> to be listened). Any comments would be appreciated! Thanks!
>>>> Interesting, a quick question is why this is needed? Can we just use exist
>>>> IOTLB update message?
>>> Yeah, we are still using the existing IOTLB update messages
>>> to send the IOTLB messages to backend. The only difference
>>> is that, QEMU won't wait for the queries before sending the
>>> IOTLB update messages.
>> Yes, my question is not very clear. I mean why must need a new feature bit?
>> It looks to me qemu code can work without this.
>>
>> Thanks
> Generally we avoid adding new messages without a protocol feature bit.
> While careful analysis might sometimes prove it's not a strict
> requirement, it's just overall a clean and robust approach.
>

Right but the looks like the patch does not introduce any new type of 
messages.

Thanks


Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Michael S. Tsirkin 6 years ago
On Thu, Apr 12, 2018 at 11:23:31AM +0800, Jason Wang wrote:
> 
> 
> On 2018年04月12日 01:00, Michael S. Tsirkin wrote:
> > On Wed, Apr 11, 2018 at 09:41:05PM +0800, Jason Wang wrote:
> > > On 2018年04月11日 16:38, Tiwei Bie wrote:
> > > > On Wed, Apr 11, 2018 at 04:01:19PM +0800, Jason Wang wrote:
> > > > > On 2018年04月11日 15:20, Tiwei Bie wrote:
> > > > > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > > > > > feature for vhost-user. By default, vhost-user backend needs
> > > > > > to query the IOTLBs from QEMU after meeting unknown IOVAs.
> > > > > > With this protocol feature negotiated, QEMU will provide all
> > > > > > the IOTLBs to vhost-user backend without waiting for the
> > > > > > queries from backend. This is helpful when using a hardware
> > > > > > accelerator which is not able to handle unknown IOVAs at the
> > > > > > vhost-user backend.
> > > > > > 
> > > > > > Signed-off-by: Tiwei Bie<tiwei.bie@intel.com>
> > > > > > ---
> > > > > > The idea of this patch is to let QEMU push all the IOTLBs
> > > > > > to vhost-user backend without waiting for the queries from
> > > > > > the backend. Because hardware accelerator at the vhost-user
> > > > > > backend may not be able to handle unknown IOVAs.
> > > > > > 
> > > > > > This is just a RFC for now. It seems that, it doesn't work
> > > > > > as expected when guest is using kernel driver (To handle
> > > > > > this case, it seems that some RAM regions' events also need
> > > > > > to be listened). Any comments would be appreciated! Thanks!
> > > > > Interesting, a quick question is why this is needed? Can we just use exist
> > > > > IOTLB update message?
> > > > Yeah, we are still using the existing IOTLB update messages
> > > > to send the IOTLB messages to backend. The only difference
> > > > is that, QEMU won't wait for the queries before sending the
> > > > IOTLB update messages.
> > > Yes, my question is not very clear. I mean why must need a new feature bit?
> > > It looks to me qemu code can work without this.
> > > 
> > > Thanks
> > Generally we avoid adding new messages without a protocol feature bit.
> > While careful analysis might sometimes prove it's not a strict
> > requirement, it's just overall a clean and robust approach.
> > 
> 
> Right but the looks like the patch does not introduce any new type of
> messages.
> 
> Thanks

In this case remote needs to know that it will send these messages.

-- 
MST

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Jason Wang 6 years ago

On 2018年04月12日 11:37, Michael S. Tsirkin wrote:
>>>>> Yeah, we are still using the existing IOTLB update messages
>>>>> to send the IOTLB messages to backend. The only difference
>>>>> is that, QEMU won't wait for the queries before sending the
>>>>> IOTLB update messages.
>>>> Yes, my question is not very clear. I mean why must need a new feature bit?
>>>> It looks to me qemu code can work without this.
>>>>
>>>> Thanks
>>> Generally we avoid adding new messages without a protocol feature bit.
>>> While careful analysis might sometimes prove it's not a strict
>>> requirement, it's just overall a clean and robust approach.
>>>
>> Right but the looks like the patch does not introduce any new type of
>> messages.
>>
>> Thanks
> In this case remote needs to know that it will send these messages.
>
> -- MST

Ok, if some backend does not expect qemu will send without any query 
from itself, I agree we need a new bit.

But it looks more like a workaround for buggy backend, at least vhost 
kernel does not need to know about this.

Thanks



Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Michael S. Tsirkin 6 years ago
On Wed, Apr 11, 2018 at 04:38:53PM +0800, Tiwei Bie wrote:
> On Wed, Apr 11, 2018 at 04:01:19PM +0800, Jason Wang wrote:
> > On 2018年04月11日 15:20, Tiwei Bie wrote:
> > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > > feature for vhost-user. By default, vhost-user backend needs
> > > to query the IOTLBs from QEMU after meeting unknown IOVAs.
> > > With this protocol feature negotiated, QEMU will provide all
> > > the IOTLBs to vhost-user backend without waiting for the
> > > queries from backend. This is helpful when using a hardware
> > > accelerator which is not able to handle unknown IOVAs at the
> > > vhost-user backend.
> > > 
> > > Signed-off-by: Tiwei Bie<tiwei.bie@intel.com>
> > > ---
> > > The idea of this patch is to let QEMU push all the IOTLBs
> > > to vhost-user backend without waiting for the queries from
> > > the backend. Because hardware accelerator at the vhost-user
> > > backend may not be able to handle unknown IOVAs.
> > > 
> > > This is just a RFC for now. It seems that, it doesn't work
> > > as expected when guest is using kernel driver (To handle
> > > this case, it seems that some RAM regions' events also need
> > > to be listened). Any comments would be appreciated! Thanks!
> > 
> > Interesting, a quick question is why this is needed? Can we just use exist
> > IOTLB update message?
> 
> Yeah, we are still using the existing IOTLB update messages
> to send the IOTLB messages to backend. The only difference
> is that, QEMU won't wait for the queries before sending the
> IOTLB update messages.

So I have a concern with that, in that without any flow
control the socket buffer used by vhost-user might become
full.

We don't currently expect that.


Again, my understanding is that the biggest benefit from use of a
hardware accelerator is when notifications can be passed-through to the
guest.

And since that involves VFIO, and since VFIO already needs to support
all kinds of IOMMUs, one wonders whether one can just pass that directly
to the VFIO instead of shipping it to vhost-user.


> > 
> > It looks to me at least kernel does not need this.
> 
> Something similar in kernel vhost is that, for kernel vhost,
> QEMU needs to push the IOTLBs of some ring addrs to kernel
> vhost backend without waiting for the queries.
> 
> Best regards,
> Tiwei Bie

That's really to work around a bug in kernel, we keep this around since
we want to support old kernels.

> > 
> > Thanks

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Tiwei Bie 6 years ago
On Wed, Apr 11, 2018 at 08:37:17PM +0300, Michael S. Tsirkin wrote:
> On Wed, Apr 11, 2018 at 04:38:53PM +0800, Tiwei Bie wrote:
> > On Wed, Apr 11, 2018 at 04:01:19PM +0800, Jason Wang wrote:
> > > On 2018年04月11日 15:20, Tiwei Bie wrote:
> > > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > > > feature for vhost-user. By default, vhost-user backend needs
> > > > to query the IOTLBs from QEMU after meeting unknown IOVAs.
> > > > With this protocol feature negotiated, QEMU will provide all
> > > > the IOTLBs to vhost-user backend without waiting for the
> > > > queries from backend. This is helpful when using a hardware
> > > > accelerator which is not able to handle unknown IOVAs at the
> > > > vhost-user backend.
> > > > 
> > > > Signed-off-by: Tiwei Bie<tiwei.bie@intel.com>
> > > > ---
> > > > The idea of this patch is to let QEMU push all the IOTLBs
> > > > to vhost-user backend without waiting for the queries from
> > > > the backend. Because hardware accelerator at the vhost-user
> > > > backend may not be able to handle unknown IOVAs.
> > > > 
> > > > This is just a RFC for now. It seems that, it doesn't work
> > > > as expected when guest is using kernel driver (To handle
> > > > this case, it seems that some RAM regions' events also need
> > > > to be listened). Any comments would be appreciated! Thanks!
> > > 
> > > Interesting, a quick question is why this is needed? Can we just use exist
> > > IOTLB update message?
> > 
> > Yeah, we are still using the existing IOTLB update messages
> > to send the IOTLB messages to backend. The only difference
> > is that, QEMU won't wait for the queries before sending the
> > IOTLB update messages.
> 
> So I have a concern with that, in that without any flow
> control the socket buffer used by vhost-user might become
> full.

Each IOTLB update message needs a reply. So I think it
won't happen.

Best regards,
Tiwei Bie

> 
> We don't currently expect that.
> 
> 
> Again, my understanding is that the biggest benefit from use of a
> hardware accelerator is when notifications can be passed-through to the
> guest.
> 
> And since that involves VFIO, and since VFIO already needs to support
> all kinds of IOMMUs, one wonders whether one can just pass that directly
> to the VFIO instead of shipping it to vhost-user.
> 
> 
> > > 
> > > It looks to me at least kernel does not need this.
> > 
> > Something similar in kernel vhost is that, for kernel vhost,
> > QEMU needs to push the IOTLBs of some ring addrs to kernel
> > vhost backend without waiting for the queries.
> > 
> > Best regards,
> > Tiwei Bie
> 
> That's really to work around a bug in kernel, we keep this around since
> we want to support old kernels.
> 
> > > 
> > > Thanks

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Jason Wang 6 years ago

On 2018年04月12日 09:44, Tiwei Bie wrote:
> On Wed, Apr 11, 2018 at 08:37:17PM +0300, Michael S. Tsirkin wrote:
>> On Wed, Apr 11, 2018 at 04:38:53PM +0800, Tiwei Bie wrote:
>>> On Wed, Apr 11, 2018 at 04:01:19PM +0800, Jason Wang wrote:
>>>> On 2018年04月11日 15:20, Tiwei Bie wrote:
>>>>> This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
>>>>> feature for vhost-user. By default, vhost-user backend needs
>>>>> to query the IOTLBs from QEMU after meeting unknown IOVAs.
>>>>> With this protocol feature negotiated, QEMU will provide all
>>>>> the IOTLBs to vhost-user backend without waiting for the
>>>>> queries from backend. This is helpful when using a hardware
>>>>> accelerator which is not able to handle unknown IOVAs at the
>>>>> vhost-user backend.
>>>>>
>>>>> Signed-off-by: Tiwei Bie<tiwei.bie@intel.com>
>>>>> ---
>>>>> The idea of this patch is to let QEMU push all the IOTLBs
>>>>> to vhost-user backend without waiting for the queries from
>>>>> the backend. Because hardware accelerator at the vhost-user
>>>>> backend may not be able to handle unknown IOVAs.
>>>>>
>>>>> This is just a RFC for now. It seems that, it doesn't work
>>>>> as expected when guest is using kernel driver (To handle
>>>>> this case, it seems that some RAM regions' events also need
>>>>> to be listened). Any comments would be appreciated! Thanks!
>>>> Interesting, a quick question is why this is needed? Can we just use exist
>>>> IOTLB update message?
>>> Yeah, we are still using the existing IOTLB update messages
>>> to send the IOTLB messages to backend. The only difference
>>> is that, QEMU won't wait for the queries before sending the
>>> IOTLB update messages.
>> So I have a concern with that, in that without any flow
>> control the socket buffer used by vhost-user might become
>> full.
> Each IOTLB update message needs a reply. So I think it
> won't happen.

Is this what we've already done now? I don't find any statement on this 
in vhost-user.txt?

"""
  * VHOST_USER_IOTLB_MSG

       Id: 22
       Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type)
       Master payload: struct vhost_iotlb_msg
       Slave payload: u64

       Send IOTLB messages with struct vhost_iotlb_msg as payload.
       Master sends such requests to update and invalidate entries in 
the device
       IOTLB. The slave has to acknowledge the request with sending zero 
as u64
       payload for success, non-zero otherwise.
       This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
       has been successfully negotiated.
"""

And you probably need to modify the following statement:

"""
The master isn't expected to take the initiative to send IOTLB update 
messages,
as the slave sends IOTLB miss messages for the guest virtual memory areas it
needs to access.
"""

Thanks


>
> Best regards,
> Tiwei Bie
>


Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Tiwei Bie 6 years ago
On Thu, Apr 12, 2018 at 03:38:50PM +0800, Jason Wang wrote:
> On 2018年04月12日 09:44, Tiwei Bie wrote:
> > On Wed, Apr 11, 2018 at 08:37:17PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, Apr 11, 2018 at 04:38:53PM +0800, Tiwei Bie wrote:
> > > > On Wed, Apr 11, 2018 at 04:01:19PM +0800, Jason Wang wrote:
> > > > > On 2018年04月11日 15:20, Tiwei Bie wrote:
> > > > > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > > > > > feature for vhost-user. By default, vhost-user backend needs
> > > > > > to query the IOTLBs from QEMU after meeting unknown IOVAs.
> > > > > > With this protocol feature negotiated, QEMU will provide all
> > > > > > the IOTLBs to vhost-user backend without waiting for the
> > > > > > queries from backend. This is helpful when using a hardware
> > > > > > accelerator which is not able to handle unknown IOVAs at the
> > > > > > vhost-user backend.
> > > > > > 
> > > > > > Signed-off-by: Tiwei Bie<tiwei.bie@intel.com>
> > > > > > ---
> > > > > > The idea of this patch is to let QEMU push all the IOTLBs
> > > > > > to vhost-user backend without waiting for the queries from
> > > > > > the backend. Because hardware accelerator at the vhost-user
> > > > > > backend may not be able to handle unknown IOVAs.
> > > > > > 
> > > > > > This is just a RFC for now. It seems that, it doesn't work
> > > > > > as expected when guest is using kernel driver (To handle
> > > > > > this case, it seems that some RAM regions' events also need
> > > > > > to be listened). Any comments would be appreciated! Thanks!
> > > > > Interesting, a quick question is why this is needed? Can we just use exist
> > > > > IOTLB update message?
> > > > Yeah, we are still using the existing IOTLB update messages
> > > > to send the IOTLB messages to backend. The only difference
> > > > is that, QEMU won't wait for the queries before sending the
> > > > IOTLB update messages.
> > > So I have a concern with that, in that without any flow
> > > control the socket buffer used by vhost-user might become
> > > full.
> > Each IOTLB update message needs a reply. So I think it
> > won't happen.
> 
> Is this what we've already done now? I don't find any statement on this in
> vhost-user.txt?
> 
> """
>  * VHOST_USER_IOTLB_MSG
> 
>       Id: 22
>       Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type)
>       Master payload: struct vhost_iotlb_msg
>       Slave payload: u64
> 
>       Send IOTLB messages with struct vhost_iotlb_msg as payload.
>       Master sends such requests to update and invalidate entries in the
> device
>       IOTLB. The slave has to acknowledge the request with sending zero as
> u64
>       payload for success, non-zero otherwise.
>       This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
>       has been successfully negotiated.
> """

Yeah, it's what we've already done now. It's in the above
statement you quoted:

"The slave has to acknowledge the request with sending zero as
u64 payload for success, non-zero otherwise."

> 
> And you probably need to modify the following statement:
> 
> """
> The master isn't expected to take the initiative to send IOTLB update
> messages,
> as the slave sends IOTLB miss messages for the guest virtual memory areas it
> needs to access.
> """

Yeah, you're right. Thanks! This statement needs some minor updates.

Best regards,
Tiwei Bie

> 
> Thanks
> 
> 
> > 
> > Best regards,
> > Tiwei Bie
> > 
> 

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Jason Wang 6 years ago

On 2018年04月12日 16:10, Tiwei Bie wrote:
> On Thu, Apr 12, 2018 at 03:38:50PM +0800, Jason Wang wrote:
>> On 2018年04月12日 09:44, Tiwei Bie wrote:
>>> On Wed, Apr 11, 2018 at 08:37:17PM +0300, Michael S. Tsirkin wrote:
>>>> On Wed, Apr 11, 2018 at 04:38:53PM +0800, Tiwei Bie wrote:
>>>>> On Wed, Apr 11, 2018 at 04:01:19PM +0800, Jason Wang wrote:
>>>>>> On 2018年04月11日 15:20, Tiwei Bie wrote:
>>>>>>> This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
>>>>>>> feature for vhost-user. By default, vhost-user backend needs
>>>>>>> to query the IOTLBs from QEMU after meeting unknown IOVAs.
>>>>>>> With this protocol feature negotiated, QEMU will provide all
>>>>>>> the IOTLBs to vhost-user backend without waiting for the
>>>>>>> queries from backend. This is helpful when using a hardware
>>>>>>> accelerator which is not able to handle unknown IOVAs at the
>>>>>>> vhost-user backend.
>>>>>>>
>>>>>>> Signed-off-by: Tiwei Bie<tiwei.bie@intel.com>
>>>>>>> ---
>>>>>>> The idea of this patch is to let QEMU push all the IOTLBs
>>>>>>> to vhost-user backend without waiting for the queries from
>>>>>>> the backend. Because hardware accelerator at the vhost-user
>>>>>>> backend may not be able to handle unknown IOVAs.
>>>>>>>
>>>>>>> This is just a RFC for now. It seems that, it doesn't work
>>>>>>> as expected when guest is using kernel driver (To handle
>>>>>>> this case, it seems that some RAM regions' events also need
>>>>>>> to be listened). Any comments would be appreciated! Thanks!
>>>>>> Interesting, a quick question is why this is needed? Can we just use exist
>>>>>> IOTLB update message?
>>>>> Yeah, we are still using the existing IOTLB update messages
>>>>> to send the IOTLB messages to backend. The only difference
>>>>> is that, QEMU won't wait for the queries before sending the
>>>>> IOTLB update messages.
>>>> So I have a concern with that, in that without any flow
>>>> control the socket buffer used by vhost-user might become
>>>> full.
>>> Each IOTLB update message needs a reply. So I think it
>>> won't happen.
>> Is this what we've already done now? I don't find any statement on this in
>> vhost-user.txt?
>>
>> """
>>   * VHOST_USER_IOTLB_MSG
>>
>>        Id: 22
>>        Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type)
>>        Master payload: struct vhost_iotlb_msg
>>        Slave payload: u64
>>
>>        Send IOTLB messages with struct vhost_iotlb_msg as payload.
>>        Master sends such requests to update and invalidate entries in the
>> device
>>        IOTLB. The slave has to acknowledge the request with sending zero as
>> u64
>>        payload for success, non-zero otherwise.
>>        This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
>>        has been successfully negotiated.
>> """
> Yeah, it's what we've already done now. It's in the above
> statement you quoted:
>
> "The slave has to acknowledge the request with sending zero as
> u64 payload for success, non-zero otherwise."

My bad.

Actually, there's a minor optimization here. When QI is enabled, we only 
need replay ack for wait descriptor, this allows some kinds of batching. 
Another interesting idea is to send multiqueue request through a single 
message.

Thanks

>
>> And you probably need to modify the following statement:
>>
>> """
>> The master isn't expected to take the initiative to send IOTLB update
>> messages,
>> as the slave sends IOTLB miss messages for the guest virtual memory areas it
>> needs to access.
>> """
> Yeah, you're right. Thanks! This statement needs some minor updates.
>
> Best regards,
> Tiwei Bie
>
>> Thanks
>>
>>
>>> Best regards,
>>> Tiwei Bie
>>>


Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Michael S. Tsirkin 6 years ago
On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
> This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> feature for vhost-user. By default, vhost-user backend needs
> to query the IOTLBs from QEMU after meeting unknown IOVAs.
> With this protocol feature negotiated, QEMU will provide all
> the IOTLBs to vhost-user backend without waiting for the
> queries from backend. This is helpful when using a hardware
> accelerator which is not able to handle unknown IOVAs at the
> vhost-user backend.
> 
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>

This is potentially a large amount of data to be sent
on a socket.

I had an impression that a hardware accelerator was using
VFIO anyway. Given this, can't we have QEMU program
the shadow IOMMU tables into VFIO directly?


> ---
> The idea of this patch is to let QEMU push all the IOTLBs
> to vhost-user backend without waiting for the queries from
> the backend. Because hardware accelerator at the vhost-user
> backend may not be able to handle unknown IOVAs.
> 
> This is just a RFC for now. It seems that, it doesn't work
> as expected when guest is using kernel driver (To handle
> this case, it seems that some RAM regions' events also need
> to be listened). Any comments would be appreciated! Thanks!
> 
>  docs/interop/vhost-user.txt       |  9 ++++++++
>  hw/virtio/vhost-backend.c         |  7 ++++++
>  hw/virtio/vhost-user.c            |  8 +++++++
>  hw/virtio/vhost.c                 | 47 ++++++++++++++++++++++++++++++++++++---
>  include/hw/virtio/vhost-backend.h |  3 +++
>  5 files changed, 71 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> index 534caab18a..73e07f9dad 100644
> --- a/docs/interop/vhost-user.txt
> +++ b/docs/interop/vhost-user.txt
> @@ -380,6 +380,7 @@ Protocol features
>  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
>  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
>  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> +#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10
>  
>  Master message types
>  --------------------
> @@ -797,3 +798,11 @@ resilient for selective requests.
>  For the message types that already solicit a reply from the client, the
>  presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
>  no behavioural change. (See the 'Communication' section for details.)
> +
> +VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> +------------------------------------
> +By default, vhost-user backend needs to query the IOTLBs from QEMU after
> +meeting unknown IOVAs. With this protocol feature negotiated, QEMU will
> +provide all the IOTLBs to vhost backend without waiting for the queries
> +from backend. This is helpful when using a hardware accelerator which is
> +not able to handle unknown IOVAs at the vhost-user backend.
> diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> index 7f09efab8b..d72994e9a5 100644
> --- a/hw/virtio/vhost-backend.c
> +++ b/hw/virtio/vhost-backend.c
> @@ -233,6 +233,11 @@ static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev,
>          qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL);
>  }
>  
> +static bool vhost_kernel_need_all_device_iotlb(struct vhost_dev *dev)
> +{
> +    return false;
> +}
> +
>  static const VhostOps kernel_ops = {
>          .backend_type = VHOST_BACKEND_TYPE_KERNEL,
>          .vhost_backend_init = vhost_kernel_init,
> @@ -264,6 +269,8 @@ static const VhostOps kernel_ops = {
>  #endif /* CONFIG_VHOST_VSOCK */
>          .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback,
>          .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg,
> +        .vhost_backend_need_all_device_iotlb =
> +                                vhost_kernel_need_all_device_iotlb,
>  };
>  
>  int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type)
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 38da8692bb..30e0b1c13b 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -47,6 +47,7 @@ enum VhostUserProtocolFeature {
>      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
>      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
>      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> +    VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10,
>      VHOST_USER_PROTOCOL_F_MAX
>  };
>  
> @@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
>      return 0;
>  }
>  
> +static bool vhost_user_need_all_device_iotlb(struct vhost_dev *dev)
> +{
> +    return virtio_has_feature(dev->protocol_features,
> +                              VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB);
> +}
> +
>  const VhostOps user_ops = {
>          .backend_type = VHOST_BACKEND_TYPE_USER,
>          .vhost_backend_init = vhost_user_init,
> @@ -1611,4 +1618,5 @@ const VhostOps user_ops = {
>          .vhost_set_config = vhost_user_set_config,
>          .vhost_crypto_create_session = vhost_user_crypto_create_session,
>          .vhost_crypto_close_session = vhost_user_crypto_close_session,
> +        .vhost_backend_need_all_device_iotlb = vhost_user_need_all_device_iotlb,
>  };
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index f51bf573d5..70922ee998 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -48,6 +48,9 @@ static unsigned int used_memslots;
>  static QLIST_HEAD(, vhost_dev) vhost_devices =
>      QLIST_HEAD_INITIALIZER(vhost_devices);
>  
> +static int vhost_memory_region_lookup(struct vhost_dev *hdev, uint64_t gpa,
> +                                      uint64_t *uaddr, uint64_t *len);
> +
>  bool vhost_has_free_slot(void)
>  {
>      unsigned int slots_limit = ~0U;
> @@ -634,12 +637,39 @@ static void vhost_region_addnop(MemoryListener *listener,
>      vhost_region_add_section(dev, section);
>  }
>  
> -static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> +static void vhost_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>  {
>      struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n);
>      struct vhost_dev *hdev = iommu->hdev;
>      hwaddr iova = iotlb->iova + iommu->iommu_offset;
>  
> +    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> +        uint64_t uaddr, len;
> +
> +        rcu_read_lock();
> +
> +        if (iotlb->target_as != NULL) {
> +            if (vhost_memory_region_lookup(hdev, iotlb->translated_addr,
> +                        &uaddr, &len)) {
> +                error_report("Fail to lookup the translated address "
> +                        "%"PRIx64, iotlb->translated_addr);
> +                goto out;
> +            }
> +
> +            len = MIN(iotlb->addr_mask + 1, len);
> +            iova = iova & ~iotlb->addr_mask;
> +
> +            if (vhost_backend_update_device_iotlb(hdev, iova, uaddr,
> +                                                  len, iotlb->perm)) {
> +                error_report("Fail to update device iotlb");
> +                goto out;
> +            }
> +        }
> +out:
> +        rcu_read_unlock();
> +        return;
> +    }
> +
>      if (vhost_backend_invalidate_device_iotlb(hdev, iova,
>                                                iotlb->addr_mask + 1)) {
>          error_report("Fail to invalidate device iotlb");
> @@ -652,6 +682,7 @@ static void vhost_iommu_region_add(MemoryListener *listener,
>      struct vhost_dev *dev = container_of(listener, struct vhost_dev,
>                                           iommu_listener);
>      struct vhost_iommu *iommu;
> +    IOMMUNotifierFlag flags;
>      Int128 end;
>  
>      if (!memory_region_is_iommu(section->mr)) {
> @@ -662,8 +693,15 @@ static void vhost_iommu_region_add(MemoryListener *listener,
>      end = int128_add(int128_make64(section->offset_within_region),
>                       section->size);
>      end = int128_sub(end, int128_one());
> -    iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify,
> -                        IOMMU_NOTIFIER_UNMAP,
> +
> +    if (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) {
> +        flags = IOMMU_NOTIFIER_ALL;
> +    } else {
> +        flags = IOMMU_NOTIFIER_UNMAP;
> +    }
> +
> +    iommu_notifier_init(&iommu->n, vhost_iommu_map_notify,
> +                        flags,
>                          section->offset_within_region,
>                          int128_get64(end));
>      iommu->mr = section->mr;
> @@ -673,6 +711,9 @@ static void vhost_iommu_region_add(MemoryListener *listener,
>      memory_region_register_iommu_notifier(section->mr, &iommu->n);
>      QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next);
>      /* TODO: can replay help performance here? */
> +    if (flags == IOMMU_NOTIFIER_ALL) {
> +        memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), &iommu->n);
> +    }
>  }
>  
>  static void vhost_iommu_region_del(MemoryListener *listener,
> diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> index 5dac61f9ea..602fd987a4 100644
> --- a/include/hw/virtio/vhost-backend.h
> +++ b/include/hw/virtio/vhost-backend.h
> @@ -101,6 +101,8 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev,
>  typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
>                                               uint64_t session_id);
>  
> +typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct vhost_dev *dev);
> +
>  typedef struct VhostOps {
>      VhostBackendType backend_type;
>      vhost_backend_init vhost_backend_init;
> @@ -138,6 +140,7 @@ typedef struct VhostOps {
>      vhost_set_config_op vhost_set_config;
>      vhost_crypto_create_session_op vhost_crypto_create_session;
>      vhost_crypto_close_session_op vhost_crypto_close_session;
> +    vhost_backend_need_all_device_iotlb_op vhost_backend_need_all_device_iotlb;
>  } VhostOps;
>  
>  extern const VhostOps user_ops;
> -- 
> 2.11.0

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Jason Wang 6 years ago

On 2018年04月11日 21:22, Michael S. Tsirkin wrote:
> On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
>> This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
>> feature for vhost-user. By default, vhost-user backend needs
>> to query the IOTLBs from QEMU after meeting unknown IOVAs.
>> With this protocol feature negotiated, QEMU will provide all
>> the IOTLBs to vhost-user backend without waiting for the
>> queries from backend. This is helpful when using a hardware
>> accelerator which is not able to handle unknown IOVAs at the
>> vhost-user backend.
>>
>> Signed-off-by: Tiwei Bie<tiwei.bie@intel.com>
> This is potentially a large amount of data to be sent
> on a socket.
>
> I had an impression that a hardware accelerator was using
> VFIO anyway. Given this, can't we have QEMU program
> the shadow IOMMU tables into VFIO directly?
>
>

So if we want to do this, my understanding is we need to implement the 
driver inside qemu (or link it to qemu).

Thanks


Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Tiwei Bie 6 years ago
On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote:
> On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
> > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > feature for vhost-user. By default, vhost-user backend needs
> > to query the IOTLBs from QEMU after meeting unknown IOVAs.
> > With this protocol feature negotiated, QEMU will provide all
> > the IOTLBs to vhost-user backend without waiting for the
> > queries from backend. This is helpful when using a hardware
> > accelerator which is not able to handle unknown IOVAs at the
> > vhost-user backend.
> > 
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> 
> This is potentially a large amount of data to be sent
> on a socket.

If we take the hardware accelerator out of this picture, we
will find that it's actually a question of "pre-loading" vs
"lazy-loading". I think neither of them is perfect.

For "pre-loading", as you said, we may have a tough starting.
But for "lazy-loading", we can't have a predictable performance.
A sudden, unexpected performance drop may happen at any time,
because we may meet an unknown IOVA at any time in this case.
Once we meet an unknown IOVA, the backend's data path will need
to stop and query the mapping of the IOVA via the socket and
wait for the reply. And the latency is not negligible (sometimes
it's even unacceptable), especially in high performance network
case. So maybe it's better to make both of them available to
the vhost backend.

> 
> I had an impression that a hardware accelerator was using
> VFIO anyway. Given this, can't we have QEMU program
> the shadow IOMMU tables into VFIO directly?

I think it's a good idea! Currently, my concern about it is
that, the hardware device may not use IOMMU and it may have
its builtin address translation unit. And it will be a pain
for device vendors to teach VFIO to be able to work with the
builtin address translation unit.

Best regards,
Tiwei Bie

> 
> 
> > ---
> > The idea of this patch is to let QEMU push all the IOTLBs
> > to vhost-user backend without waiting for the queries from
> > the backend. Because hardware accelerator at the vhost-user
> > backend may not be able to handle unknown IOVAs.
> > 
> > This is just a RFC for now. It seems that, it doesn't work
> > as expected when guest is using kernel driver (To handle
> > this case, it seems that some RAM regions' events also need
> > to be listened). Any comments would be appreciated! Thanks!
> > 
> >  docs/interop/vhost-user.txt       |  9 ++++++++
> >  hw/virtio/vhost-backend.c         |  7 ++++++
> >  hw/virtio/vhost-user.c            |  8 +++++++
> >  hw/virtio/vhost.c                 | 47 ++++++++++++++++++++++++++++++++++++---
> >  include/hw/virtio/vhost-backend.h |  3 +++
> >  5 files changed, 71 insertions(+), 3 deletions(-)
> > 
> > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > index 534caab18a..73e07f9dad 100644
> > --- a/docs/interop/vhost-user.txt
> > +++ b/docs/interop/vhost-user.txt
> > @@ -380,6 +380,7 @@ Protocol features
> >  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> >  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
> >  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> > +#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10
> >  
> >  Master message types
> >  --------------------
> > @@ -797,3 +798,11 @@ resilient for selective requests.
> >  For the message types that already solicit a reply from the client, the
> >  presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
> >  no behavioural change. (See the 'Communication' section for details.)
> > +
> > +VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > +------------------------------------
> > +By default, vhost-user backend needs to query the IOTLBs from QEMU after
> > +meeting unknown IOVAs. With this protocol feature negotiated, QEMU will
> > +provide all the IOTLBs to vhost backend without waiting for the queries
> > +from backend. This is helpful when using a hardware accelerator which is
> > +not able to handle unknown IOVAs at the vhost-user backend.
> > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> > index 7f09efab8b..d72994e9a5 100644
> > --- a/hw/virtio/vhost-backend.c
> > +++ b/hw/virtio/vhost-backend.c
> > @@ -233,6 +233,11 @@ static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev,
> >          qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL);
> >  }
> >  
> > +static bool vhost_kernel_need_all_device_iotlb(struct vhost_dev *dev)
> > +{
> > +    return false;
> > +}
> > +
> >  static const VhostOps kernel_ops = {
> >          .backend_type = VHOST_BACKEND_TYPE_KERNEL,
> >          .vhost_backend_init = vhost_kernel_init,
> > @@ -264,6 +269,8 @@ static const VhostOps kernel_ops = {
> >  #endif /* CONFIG_VHOST_VSOCK */
> >          .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback,
> >          .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg,
> > +        .vhost_backend_need_all_device_iotlb =
> > +                                vhost_kernel_need_all_device_iotlb,
> >  };
> >  
> >  int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type)
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index 38da8692bb..30e0b1c13b 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -47,6 +47,7 @@ enum VhostUserProtocolFeature {
> >      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> >      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
> >      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> > +    VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10,
> >      VHOST_USER_PROTOCOL_F_MAX
> >  };
> >  
> > @@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
> >      return 0;
> >  }
> >  
> > +static bool vhost_user_need_all_device_iotlb(struct vhost_dev *dev)
> > +{
> > +    return virtio_has_feature(dev->protocol_features,
> > +                              VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB);
> > +}
> > +
> >  const VhostOps user_ops = {
> >          .backend_type = VHOST_BACKEND_TYPE_USER,
> >          .vhost_backend_init = vhost_user_init,
> > @@ -1611,4 +1618,5 @@ const VhostOps user_ops = {
> >          .vhost_set_config = vhost_user_set_config,
> >          .vhost_crypto_create_session = vhost_user_crypto_create_session,
> >          .vhost_crypto_close_session = vhost_user_crypto_close_session,
> > +        .vhost_backend_need_all_device_iotlb = vhost_user_need_all_device_iotlb,
> >  };
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > index f51bf573d5..70922ee998 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -48,6 +48,9 @@ static unsigned int used_memslots;
> >  static QLIST_HEAD(, vhost_dev) vhost_devices =
> >      QLIST_HEAD_INITIALIZER(vhost_devices);
> >  
> > +static int vhost_memory_region_lookup(struct vhost_dev *hdev, uint64_t gpa,
> > +                                      uint64_t *uaddr, uint64_t *len);
> > +
> >  bool vhost_has_free_slot(void)
> >  {
> >      unsigned int slots_limit = ~0U;
> > @@ -634,12 +637,39 @@ static void vhost_region_addnop(MemoryListener *listener,
> >      vhost_region_add_section(dev, section);
> >  }
> >  
> > -static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > +static void vhost_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> >  {
> >      struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n);
> >      struct vhost_dev *hdev = iommu->hdev;
> >      hwaddr iova = iotlb->iova + iommu->iommu_offset;
> >  
> > +    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> > +        uint64_t uaddr, len;
> > +
> > +        rcu_read_lock();
> > +
> > +        if (iotlb->target_as != NULL) {
> > +            if (vhost_memory_region_lookup(hdev, iotlb->translated_addr,
> > +                        &uaddr, &len)) {
> > +                error_report("Fail to lookup the translated address "
> > +                        "%"PRIx64, iotlb->translated_addr);
> > +                goto out;
> > +            }
> > +
> > +            len = MIN(iotlb->addr_mask + 1, len);
> > +            iova = iova & ~iotlb->addr_mask;
> > +
> > +            if (vhost_backend_update_device_iotlb(hdev, iova, uaddr,
> > +                                                  len, iotlb->perm)) {
> > +                error_report("Fail to update device iotlb");
> > +                goto out;
> > +            }
> > +        }
> > +out:
> > +        rcu_read_unlock();
> > +        return;
> > +    }
> > +
> >      if (vhost_backend_invalidate_device_iotlb(hdev, iova,
> >                                                iotlb->addr_mask + 1)) {
> >          error_report("Fail to invalidate device iotlb");
> > @@ -652,6 +682,7 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> >      struct vhost_dev *dev = container_of(listener, struct vhost_dev,
> >                                           iommu_listener);
> >      struct vhost_iommu *iommu;
> > +    IOMMUNotifierFlag flags;
> >      Int128 end;
> >  
> >      if (!memory_region_is_iommu(section->mr)) {
> > @@ -662,8 +693,15 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> >      end = int128_add(int128_make64(section->offset_within_region),
> >                       section->size);
> >      end = int128_sub(end, int128_one());
> > -    iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify,
> > -                        IOMMU_NOTIFIER_UNMAP,
> > +
> > +    if (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) {
> > +        flags = IOMMU_NOTIFIER_ALL;
> > +    } else {
> > +        flags = IOMMU_NOTIFIER_UNMAP;
> > +    }
> > +
> > +    iommu_notifier_init(&iommu->n, vhost_iommu_map_notify,
> > +                        flags,
> >                          section->offset_within_region,
> >                          int128_get64(end));
> >      iommu->mr = section->mr;
> > @@ -673,6 +711,9 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> >      memory_region_register_iommu_notifier(section->mr, &iommu->n);
> >      QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next);
> >      /* TODO: can replay help performance here? */
> > +    if (flags == IOMMU_NOTIFIER_ALL) {
> > +        memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), &iommu->n);
> > +    }
> >  }
> >  
> >  static void vhost_iommu_region_del(MemoryListener *listener,
> > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> > index 5dac61f9ea..602fd987a4 100644
> > --- a/include/hw/virtio/vhost-backend.h
> > +++ b/include/hw/virtio/vhost-backend.h
> > @@ -101,6 +101,8 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev,
> >  typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
> >                                               uint64_t session_id);
> >  
> > +typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct vhost_dev *dev);
> > +
> >  typedef struct VhostOps {
> >      VhostBackendType backend_type;
> >      vhost_backend_init vhost_backend_init;
> > @@ -138,6 +140,7 @@ typedef struct VhostOps {
> >      vhost_set_config_op vhost_set_config;
> >      vhost_crypto_create_session_op vhost_crypto_create_session;
> >      vhost_crypto_close_session_op vhost_crypto_close_session;
> > +    vhost_backend_need_all_device_iotlb_op vhost_backend_need_all_device_iotlb;
> >  } VhostOps;
> >  
> >  extern const VhostOps user_ops;
> > -- 
> > 2.11.0

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Michael S. Tsirkin 6 years ago
On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote:
> On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
> > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > > feature for vhost-user. By default, vhost-user backend needs
> > > to query the IOTLBs from QEMU after meeting unknown IOVAs.
> > > With this protocol feature negotiated, QEMU will provide all
> > > the IOTLBs to vhost-user backend without waiting for the
> > > queries from backend. This is helpful when using a hardware
> > > accelerator which is not able to handle unknown IOVAs at the
> > > vhost-user backend.
> > > 
> > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > 
> > This is potentially a large amount of data to be sent
> > on a socket.
> 
> If we take the hardware accelerator out of this picture, we
> will find that it's actually a question of "pre-loading" vs
> "lazy-loading". I think neither of them is perfect.
> 
> For "pre-loading", as you said, we may have a tough starting.
> But for "lazy-loading", we can't have a predictable performance.
> A sudden, unexpected performance drop may happen at any time,
> because we may meet an unknown IOVA at any time in this case.

That's how hardware behaves too though. So we can expect guests
to try to optimize locality.

> Once we meet an unknown IOVA, the backend's data path will need
> to stop and query the mapping of the IOVA via the socket and
> wait for the reply. And the latency is not negligible (sometimes
> it's even unacceptable), especially in high performance network
> case. So maybe it's better to make both of them available to
> the vhost backend.
> 
> > 
> > I had an impression that a hardware accelerator was using
> > VFIO anyway. Given this, can't we have QEMU program
> > the shadow IOMMU tables into VFIO directly?
> 
> I think it's a good idea! Currently, my concern about it is
> that, the hardware device may not use IOMMU and it may have
> its builtin address translation unit. And it will be a pain
> for device vendors to teach VFIO to be able to work with the
> builtin address translation unit.

I think such drivers would have to interate with VFIO somehow.
Otherwise, what is the plan for assigning such devices then?


> Best regards,
> Tiwei Bie
> 
> > 
> > 
> > > ---
> > > The idea of this patch is to let QEMU push all the IOTLBs
> > > to vhost-user backend without waiting for the queries from
> > > the backend. Because hardware accelerator at the vhost-user
> > > backend may not be able to handle unknown IOVAs.
> > > 
> > > This is just a RFC for now. It seems that, it doesn't work
> > > as expected when guest is using kernel driver (To handle
> > > this case, it seems that some RAM regions' events also need
> > > to be listened). Any comments would be appreciated! Thanks!
> > > 
> > >  docs/interop/vhost-user.txt       |  9 ++++++++
> > >  hw/virtio/vhost-backend.c         |  7 ++++++
> > >  hw/virtio/vhost-user.c            |  8 +++++++
> > >  hw/virtio/vhost.c                 | 47 ++++++++++++++++++++++++++++++++++++---
> > >  include/hw/virtio/vhost-backend.h |  3 +++
> > >  5 files changed, 71 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > > index 534caab18a..73e07f9dad 100644
> > > --- a/docs/interop/vhost-user.txt
> > > +++ b/docs/interop/vhost-user.txt
> > > @@ -380,6 +380,7 @@ Protocol features
> > >  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> > >  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
> > >  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> > > +#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10
> > >  
> > >  Master message types
> > >  --------------------
> > > @@ -797,3 +798,11 @@ resilient for selective requests.
> > >  For the message types that already solicit a reply from the client, the
> > >  presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
> > >  no behavioural change. (See the 'Communication' section for details.)
> > > +
> > > +VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > > +------------------------------------
> > > +By default, vhost-user backend needs to query the IOTLBs from QEMU after
> > > +meeting unknown IOVAs. With this protocol feature negotiated, QEMU will
> > > +provide all the IOTLBs to vhost backend without waiting for the queries
> > > +from backend. This is helpful when using a hardware accelerator which is
> > > +not able to handle unknown IOVAs at the vhost-user backend.
> > > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> > > index 7f09efab8b..d72994e9a5 100644
> > > --- a/hw/virtio/vhost-backend.c
> > > +++ b/hw/virtio/vhost-backend.c
> > > @@ -233,6 +233,11 @@ static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev,
> > >          qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL);
> > >  }
> > >  
> > > +static bool vhost_kernel_need_all_device_iotlb(struct vhost_dev *dev)
> > > +{
> > > +    return false;
> > > +}
> > > +
> > >  static const VhostOps kernel_ops = {
> > >          .backend_type = VHOST_BACKEND_TYPE_KERNEL,
> > >          .vhost_backend_init = vhost_kernel_init,
> > > @@ -264,6 +269,8 @@ static const VhostOps kernel_ops = {
> > >  #endif /* CONFIG_VHOST_VSOCK */
> > >          .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback,
> > >          .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg,
> > > +        .vhost_backend_need_all_device_iotlb =
> > > +                                vhost_kernel_need_all_device_iotlb,
> > >  };
> > >  
> > >  int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type)
> > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > index 38da8692bb..30e0b1c13b 100644
> > > --- a/hw/virtio/vhost-user.c
> > > +++ b/hw/virtio/vhost-user.c
> > > @@ -47,6 +47,7 @@ enum VhostUserProtocolFeature {
> > >      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> > >      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
> > >      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> > > +    VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10,
> > >      VHOST_USER_PROTOCOL_F_MAX
> > >  };
> > >  
> > > @@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
> > >      return 0;
> > >  }
> > >  
> > > +static bool vhost_user_need_all_device_iotlb(struct vhost_dev *dev)
> > > +{
> > > +    return virtio_has_feature(dev->protocol_features,
> > > +                              VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB);
> > > +}
> > > +
> > >  const VhostOps user_ops = {
> > >          .backend_type = VHOST_BACKEND_TYPE_USER,
> > >          .vhost_backend_init = vhost_user_init,
> > > @@ -1611,4 +1618,5 @@ const VhostOps user_ops = {
> > >          .vhost_set_config = vhost_user_set_config,
> > >          .vhost_crypto_create_session = vhost_user_crypto_create_session,
> > >          .vhost_crypto_close_session = vhost_user_crypto_close_session,
> > > +        .vhost_backend_need_all_device_iotlb = vhost_user_need_all_device_iotlb,
> > >  };
> > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > index f51bf573d5..70922ee998 100644
> > > --- a/hw/virtio/vhost.c
> > > +++ b/hw/virtio/vhost.c
> > > @@ -48,6 +48,9 @@ static unsigned int used_memslots;
> > >  static QLIST_HEAD(, vhost_dev) vhost_devices =
> > >      QLIST_HEAD_INITIALIZER(vhost_devices);
> > >  
> > > +static int vhost_memory_region_lookup(struct vhost_dev *hdev, uint64_t gpa,
> > > +                                      uint64_t *uaddr, uint64_t *len);
> > > +
> > >  bool vhost_has_free_slot(void)
> > >  {
> > >      unsigned int slots_limit = ~0U;
> > > @@ -634,12 +637,39 @@ static void vhost_region_addnop(MemoryListener *listener,
> > >      vhost_region_add_section(dev, section);
> > >  }
> > >  
> > > -static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > +static void vhost_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > >  {
> > >      struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n);
> > >      struct vhost_dev *hdev = iommu->hdev;
> > >      hwaddr iova = iotlb->iova + iommu->iommu_offset;
> > >  
> > > +    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> > > +        uint64_t uaddr, len;
> > > +
> > > +        rcu_read_lock();
> > > +
> > > +        if (iotlb->target_as != NULL) {
> > > +            if (vhost_memory_region_lookup(hdev, iotlb->translated_addr,
> > > +                        &uaddr, &len)) {
> > > +                error_report("Fail to lookup the translated address "
> > > +                        "%"PRIx64, iotlb->translated_addr);
> > > +                goto out;
> > > +            }
> > > +
> > > +            len = MIN(iotlb->addr_mask + 1, len);
> > > +            iova = iova & ~iotlb->addr_mask;
> > > +
> > > +            if (vhost_backend_update_device_iotlb(hdev, iova, uaddr,
> > > +                                                  len, iotlb->perm)) {
> > > +                error_report("Fail to update device iotlb");
> > > +                goto out;
> > > +            }
> > > +        }
> > > +out:
> > > +        rcu_read_unlock();
> > > +        return;
> > > +    }
> > > +
> > >      if (vhost_backend_invalidate_device_iotlb(hdev, iova,
> > >                                                iotlb->addr_mask + 1)) {
> > >          error_report("Fail to invalidate device iotlb");
> > > @@ -652,6 +682,7 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> > >      struct vhost_dev *dev = container_of(listener, struct vhost_dev,
> > >                                           iommu_listener);
> > >      struct vhost_iommu *iommu;
> > > +    IOMMUNotifierFlag flags;
> > >      Int128 end;
> > >  
> > >      if (!memory_region_is_iommu(section->mr)) {
> > > @@ -662,8 +693,15 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> > >      end = int128_add(int128_make64(section->offset_within_region),
> > >                       section->size);
> > >      end = int128_sub(end, int128_one());
> > > -    iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify,
> > > -                        IOMMU_NOTIFIER_UNMAP,
> > > +
> > > +    if (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) {
> > > +        flags = IOMMU_NOTIFIER_ALL;
> > > +    } else {
> > > +        flags = IOMMU_NOTIFIER_UNMAP;
> > > +    }
> > > +
> > > +    iommu_notifier_init(&iommu->n, vhost_iommu_map_notify,
> > > +                        flags,
> > >                          section->offset_within_region,
> > >                          int128_get64(end));
> > >      iommu->mr = section->mr;
> > > @@ -673,6 +711,9 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> > >      memory_region_register_iommu_notifier(section->mr, &iommu->n);
> > >      QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next);
> > >      /* TODO: can replay help performance here? */
> > > +    if (flags == IOMMU_NOTIFIER_ALL) {
> > > +        memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), &iommu->n);
> > > +    }
> > >  }
> > >  
> > >  static void vhost_iommu_region_del(MemoryListener *listener,
> > > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> > > index 5dac61f9ea..602fd987a4 100644
> > > --- a/include/hw/virtio/vhost-backend.h
> > > +++ b/include/hw/virtio/vhost-backend.h
> > > @@ -101,6 +101,8 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev,
> > >  typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
> > >                                               uint64_t session_id);
> > >  
> > > +typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct vhost_dev *dev);
> > > +
> > >  typedef struct VhostOps {
> > >      VhostBackendType backend_type;
> > >      vhost_backend_init vhost_backend_init;
> > > @@ -138,6 +140,7 @@ typedef struct VhostOps {
> > >      vhost_set_config_op vhost_set_config;
> > >      vhost_crypto_create_session_op vhost_crypto_create_session;
> > >      vhost_crypto_close_session_op vhost_crypto_close_session;
> > > +    vhost_backend_need_all_device_iotlb_op vhost_backend_need_all_device_iotlb;
> > >  } VhostOps;
> > >  
> > >  extern const VhostOps user_ops;
> > > -- 
> > > 2.11.0

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Tiwei Bie 6 years ago
On Thu, Apr 12, 2018 at 04:29:29AM +0300, Michael S. Tsirkin wrote:
> On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote:
> > On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
> > > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > > > feature for vhost-user. By default, vhost-user backend needs
> > > > to query the IOTLBs from QEMU after meeting unknown IOVAs.
> > > > With this protocol feature negotiated, QEMU will provide all
> > > > the IOTLBs to vhost-user backend without waiting for the
> > > > queries from backend. This is helpful when using a hardware
> > > > accelerator which is not able to handle unknown IOVAs at the
> > > > vhost-user backend.
> > > > 
> > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > 
> > > This is potentially a large amount of data to be sent
> > > on a socket.
> > 
> > If we take the hardware accelerator out of this picture, we
> > will find that it's actually a question of "pre-loading" vs
> > "lazy-loading". I think neither of them is perfect.
> > 
> > For "pre-loading", as you said, we may have a tough starting.
> > But for "lazy-loading", we can't have a predictable performance.
> > A sudden, unexpected performance drop may happen at any time,
> > because we may meet an unknown IOVA at any time in this case.
> 
> That's how hardware behaves too though. So we can expect guests
> to try to optimize locality.

The difference is that, the software implementation needs to
query the mappings via socket. And it's much slower..

> 
> > Once we meet an unknown IOVA, the backend's data path will need
> > to stop and query the mapping of the IOVA via the socket and
> > wait for the reply. And the latency is not negligible (sometimes
> > it's even unacceptable), especially in high performance network
> > case. So maybe it's better to make both of them available to
> > the vhost backend.
> > 
> > > 
> > > I had an impression that a hardware accelerator was using
> > > VFIO anyway. Given this, can't we have QEMU program
> > > the shadow IOMMU tables into VFIO directly?
> > 
> > I think it's a good idea! Currently, my concern about it is
> > that, the hardware device may not use IOMMU and it may have
> > its builtin address translation unit. And it will be a pain
> > for device vendors to teach VFIO to be able to work with the
> > builtin address translation unit.
> 
> I think such drivers would have to interate with VFIO somehow.
> Otherwise, what is the plan for assigning such devices then?

Such devices are just for vhost data path acceleration.
They have many available queue pairs, the switch logic
will be done among those queue pairs. And different queue
pairs will serve different VMs directly.

Best regards,
Tiwei Bie

> 
> 
> > Best regards,
> > Tiwei Bie
> > 
> > > 
> > > 
> > > > ---
> > > > The idea of this patch is to let QEMU push all the IOTLBs
> > > > to vhost-user backend without waiting for the queries from
> > > > the backend. Because hardware accelerator at the vhost-user
> > > > backend may not be able to handle unknown IOVAs.
> > > > 
> > > > This is just a RFC for now. It seems that, it doesn't work
> > > > as expected when guest is using kernel driver (To handle
> > > > this case, it seems that some RAM regions' events also need
> > > > to be listened). Any comments would be appreciated! Thanks!
> > > > 
> > > >  docs/interop/vhost-user.txt       |  9 ++++++++
> > > >  hw/virtio/vhost-backend.c         |  7 ++++++
> > > >  hw/virtio/vhost-user.c            |  8 +++++++
> > > >  hw/virtio/vhost.c                 | 47 ++++++++++++++++++++++++++++++++++++---
> > > >  include/hw/virtio/vhost-backend.h |  3 +++
> > > >  5 files changed, 71 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > > > index 534caab18a..73e07f9dad 100644
> > > > --- a/docs/interop/vhost-user.txt
> > > > +++ b/docs/interop/vhost-user.txt
> > > > @@ -380,6 +380,7 @@ Protocol features
> > > >  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> > > >  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
> > > >  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> > > > +#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10
> > > >  
> > > >  Master message types
> > > >  --------------------
> > > > @@ -797,3 +798,11 @@ resilient for selective requests.
> > > >  For the message types that already solicit a reply from the client, the
> > > >  presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
> > > >  no behavioural change. (See the 'Communication' section for details.)
> > > > +
> > > > +VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > > > +------------------------------------
> > > > +By default, vhost-user backend needs to query the IOTLBs from QEMU after
> > > > +meeting unknown IOVAs. With this protocol feature negotiated, QEMU will
> > > > +provide all the IOTLBs to vhost backend without waiting for the queries
> > > > +from backend. This is helpful when using a hardware accelerator which is
> > > > +not able to handle unknown IOVAs at the vhost-user backend.
> > > > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> > > > index 7f09efab8b..d72994e9a5 100644
> > > > --- a/hw/virtio/vhost-backend.c
> > > > +++ b/hw/virtio/vhost-backend.c
> > > > @@ -233,6 +233,11 @@ static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev,
> > > >          qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL);
> > > >  }
> > > >  
> > > > +static bool vhost_kernel_need_all_device_iotlb(struct vhost_dev *dev)
> > > > +{
> > > > +    return false;
> > > > +}
> > > > +
> > > >  static const VhostOps kernel_ops = {
> > > >          .backend_type = VHOST_BACKEND_TYPE_KERNEL,
> > > >          .vhost_backend_init = vhost_kernel_init,
> > > > @@ -264,6 +269,8 @@ static const VhostOps kernel_ops = {
> > > >  #endif /* CONFIG_VHOST_VSOCK */
> > > >          .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback,
> > > >          .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg,
> > > > +        .vhost_backend_need_all_device_iotlb =
> > > > +                                vhost_kernel_need_all_device_iotlb,
> > > >  };
> > > >  
> > > >  int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type)
> > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > > index 38da8692bb..30e0b1c13b 100644
> > > > --- a/hw/virtio/vhost-user.c
> > > > +++ b/hw/virtio/vhost-user.c
> > > > @@ -47,6 +47,7 @@ enum VhostUserProtocolFeature {
> > > >      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> > > >      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
> > > >      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> > > > +    VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10,
> > > >      VHOST_USER_PROTOCOL_F_MAX
> > > >  };
> > > >  
> > > > @@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
> > > >      return 0;
> > > >  }
> > > >  
> > > > +static bool vhost_user_need_all_device_iotlb(struct vhost_dev *dev)
> > > > +{
> > > > +    return virtio_has_feature(dev->protocol_features,
> > > > +                              VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB);
> > > > +}
> > > > +
> > > >  const VhostOps user_ops = {
> > > >          .backend_type = VHOST_BACKEND_TYPE_USER,
> > > >          .vhost_backend_init = vhost_user_init,
> > > > @@ -1611,4 +1618,5 @@ const VhostOps user_ops = {
> > > >          .vhost_set_config = vhost_user_set_config,
> > > >          .vhost_crypto_create_session = vhost_user_crypto_create_session,
> > > >          .vhost_crypto_close_session = vhost_user_crypto_close_session,
> > > > +        .vhost_backend_need_all_device_iotlb = vhost_user_need_all_device_iotlb,
> > > >  };
> > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > > index f51bf573d5..70922ee998 100644
> > > > --- a/hw/virtio/vhost.c
> > > > +++ b/hw/virtio/vhost.c
> > > > @@ -48,6 +48,9 @@ static unsigned int used_memslots;
> > > >  static QLIST_HEAD(, vhost_dev) vhost_devices =
> > > >      QLIST_HEAD_INITIALIZER(vhost_devices);
> > > >  
> > > > +static int vhost_memory_region_lookup(struct vhost_dev *hdev, uint64_t gpa,
> > > > +                                      uint64_t *uaddr, uint64_t *len);
> > > > +
> > > >  bool vhost_has_free_slot(void)
> > > >  {
> > > >      unsigned int slots_limit = ~0U;
> > > > @@ -634,12 +637,39 @@ static void vhost_region_addnop(MemoryListener *listener,
> > > >      vhost_region_add_section(dev, section);
> > > >  }
> > > >  
> > > > -static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > > +static void vhost_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > >  {
> > > >      struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n);
> > > >      struct vhost_dev *hdev = iommu->hdev;
> > > >      hwaddr iova = iotlb->iova + iommu->iommu_offset;
> > > >  
> > > > +    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> > > > +        uint64_t uaddr, len;
> > > > +
> > > > +        rcu_read_lock();
> > > > +
> > > > +        if (iotlb->target_as != NULL) {
> > > > +            if (vhost_memory_region_lookup(hdev, iotlb->translated_addr,
> > > > +                        &uaddr, &len)) {
> > > > +                error_report("Fail to lookup the translated address "
> > > > +                        "%"PRIx64, iotlb->translated_addr);
> > > > +                goto out;
> > > > +            }
> > > > +
> > > > +            len = MIN(iotlb->addr_mask + 1, len);
> > > > +            iova = iova & ~iotlb->addr_mask;
> > > > +
> > > > +            if (vhost_backend_update_device_iotlb(hdev, iova, uaddr,
> > > > +                                                  len, iotlb->perm)) {
> > > > +                error_report("Fail to update device iotlb");
> > > > +                goto out;
> > > > +            }
> > > > +        }
> > > > +out:
> > > > +        rcu_read_unlock();
> > > > +        return;
> > > > +    }
> > > > +
> > > >      if (vhost_backend_invalidate_device_iotlb(hdev, iova,
> > > >                                                iotlb->addr_mask + 1)) {
> > > >          error_report("Fail to invalidate device iotlb");
> > > > @@ -652,6 +682,7 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> > > >      struct vhost_dev *dev = container_of(listener, struct vhost_dev,
> > > >                                           iommu_listener);
> > > >      struct vhost_iommu *iommu;
> > > > +    IOMMUNotifierFlag flags;
> > > >      Int128 end;
> > > >  
> > > >      if (!memory_region_is_iommu(section->mr)) {
> > > > @@ -662,8 +693,15 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> > > >      end = int128_add(int128_make64(section->offset_within_region),
> > > >                       section->size);
> > > >      end = int128_sub(end, int128_one());
> > > > -    iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify,
> > > > -                        IOMMU_NOTIFIER_UNMAP,
> > > > +
> > > > +    if (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) {
> > > > +        flags = IOMMU_NOTIFIER_ALL;
> > > > +    } else {
> > > > +        flags = IOMMU_NOTIFIER_UNMAP;
> > > > +    }
> > > > +
> > > > +    iommu_notifier_init(&iommu->n, vhost_iommu_map_notify,
> > > > +                        flags,
> > > >                          section->offset_within_region,
> > > >                          int128_get64(end));
> > > >      iommu->mr = section->mr;
> > > > @@ -673,6 +711,9 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> > > >      memory_region_register_iommu_notifier(section->mr, &iommu->n);
> > > >      QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next);
> > > >      /* TODO: can replay help performance here? */
> > > > +    if (flags == IOMMU_NOTIFIER_ALL) {
> > > > +        memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), &iommu->n);
> > > > +    }
> > > >  }
> > > >  
> > > >  static void vhost_iommu_region_del(MemoryListener *listener,
> > > > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> > > > index 5dac61f9ea..602fd987a4 100644
> > > > --- a/include/hw/virtio/vhost-backend.h
> > > > +++ b/include/hw/virtio/vhost-backend.h
> > > > @@ -101,6 +101,8 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev,
> > > >  typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
> > > >                                               uint64_t session_id);
> > > >  
> > > > +typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct vhost_dev *dev);
> > > > +
> > > >  typedef struct VhostOps {
> > > >      VhostBackendType backend_type;
> > > >      vhost_backend_init vhost_backend_init;
> > > > @@ -138,6 +140,7 @@ typedef struct VhostOps {
> > > >      vhost_set_config_op vhost_set_config;
> > > >      vhost_crypto_create_session_op vhost_crypto_create_session;
> > > >      vhost_crypto_close_session_op vhost_crypto_close_session;
> > > > +    vhost_backend_need_all_device_iotlb_op vhost_backend_need_all_device_iotlb;
> > > >  } VhostOps;
> > > >  
> > > >  extern const VhostOps user_ops;
> > > > -- 
> > > > 2.11.0

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Michael S. Tsirkin 6 years ago
On Thu, Apr 12, 2018 at 09:39:43AM +0800, Tiwei Bie wrote:
> On Thu, Apr 12, 2018 at 04:29:29AM +0300, Michael S. Tsirkin wrote:
> > On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote:
> > > On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
> > > > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > > > > feature for vhost-user. By default, vhost-user backend needs
> > > > > to query the IOTLBs from QEMU after meeting unknown IOVAs.
> > > > > With this protocol feature negotiated, QEMU will provide all
> > > > > the IOTLBs to vhost-user backend without waiting for the
> > > > > queries from backend. This is helpful when using a hardware
> > > > > accelerator which is not able to handle unknown IOVAs at the
> > > > > vhost-user backend.
> > > > > 
> > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > 
> > > > This is potentially a large amount of data to be sent
> > > > on a socket.
> > > 
> > > If we take the hardware accelerator out of this picture, we
> > > will find that it's actually a question of "pre-loading" vs
> > > "lazy-loading". I think neither of them is perfect.
> > > 
> > > For "pre-loading", as you said, we may have a tough starting.
> > > But for "lazy-loading", we can't have a predictable performance.
> > > A sudden, unexpected performance drop may happen at any time,
> > > because we may meet an unknown IOVA at any time in this case.
> > 
> > That's how hardware behaves too though. So we can expect guests
> > to try to optimize locality.
> 
> The difference is that, the software implementation needs to
> query the mappings via socket. And it's much slower..

If you are proposing this new feature as an optimization,
then I'd like to see numbers showing the performance gains.

It's definitely possible to optimize things out.  Pre-loading isn't
where I would start optimizing though.  For example, DPDK could have its
own VTD emulation, then it could access guest memory directly.


> > 
> > > Once we meet an unknown IOVA, the backend's data path will need
> > > to stop and query the mapping of the IOVA via the socket and
> > > wait for the reply. And the latency is not negligible (sometimes
> > > it's even unacceptable), especially in high performance network
> > > case. So maybe it's better to make both of them available to
> > > the vhost backend.
> > > 
> > > > 
> > > > I had an impression that a hardware accelerator was using
> > > > VFIO anyway. Given this, can't we have QEMU program
> > > > the shadow IOMMU tables into VFIO directly?
> > > 
> > > I think it's a good idea! Currently, my concern about it is
> > > that, the hardware device may not use IOMMU and it may have
> > > its builtin address translation unit. And it will be a pain
> > > for device vendors to teach VFIO to be able to work with the
> > > builtin address translation unit.
> > 
> > I think such drivers would have to interate with VFIO somehow.
> > Otherwise, what is the plan for assigning such devices then?
> 
> Such devices are just for vhost data path acceleration.

That's not true I think.  E.g. RDMA devices have an on-card MMU.

> They have many available queue pairs, the switch logic
> will be done among those queue pairs. And different queue
> pairs will serve different VMs directly.
> 
> Best regards,
> Tiwei Bie

The way I would do it is attach different PASID values to
different queues. This way you can use the standard IOMMU
to enforce protection.



> > 
> > 
> > > Best regards,
> > > Tiwei Bie
> > > 
> > > > 
> > > > 
> > > > > ---
> > > > > The idea of this patch is to let QEMU push all the IOTLBs
> > > > > to vhost-user backend without waiting for the queries from
> > > > > the backend. Because hardware accelerator at the vhost-user
> > > > > backend may not be able to handle unknown IOVAs.
> > > > > 
> > > > > This is just a RFC for now. It seems that, it doesn't work
> > > > > as expected when guest is using kernel driver (To handle
> > > > > this case, it seems that some RAM regions' events also need
> > > > > to be listened). Any comments would be appreciated! Thanks!
> > > > > 
> > > > >  docs/interop/vhost-user.txt       |  9 ++++++++
> > > > >  hw/virtio/vhost-backend.c         |  7 ++++++
> > > > >  hw/virtio/vhost-user.c            |  8 +++++++
> > > > >  hw/virtio/vhost.c                 | 47 ++++++++++++++++++++++++++++++++++++---
> > > > >  include/hw/virtio/vhost-backend.h |  3 +++
> > > > >  5 files changed, 71 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > > > > index 534caab18a..73e07f9dad 100644
> > > > > --- a/docs/interop/vhost-user.txt
> > > > > +++ b/docs/interop/vhost-user.txt
> > > > > @@ -380,6 +380,7 @@ Protocol features
> > > > >  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> > > > >  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
> > > > >  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> > > > > +#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10
> > > > >  
> > > > >  Master message types
> > > > >  --------------------
> > > > > @@ -797,3 +798,11 @@ resilient for selective requests.
> > > > >  For the message types that already solicit a reply from the client, the
> > > > >  presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
> > > > >  no behavioural change. (See the 'Communication' section for details.)
> > > > > +
> > > > > +VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > > > > +------------------------------------
> > > > > +By default, vhost-user backend needs to query the IOTLBs from QEMU after
> > > > > +meeting unknown IOVAs. With this protocol feature negotiated, QEMU will
> > > > > +provide all the IOTLBs to vhost backend without waiting for the queries
> > > > > +from backend. This is helpful when using a hardware accelerator which is
> > > > > +not able to handle unknown IOVAs at the vhost-user backend.
> > > > > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> > > > > index 7f09efab8b..d72994e9a5 100644
> > > > > --- a/hw/virtio/vhost-backend.c
> > > > > +++ b/hw/virtio/vhost-backend.c
> > > > > @@ -233,6 +233,11 @@ static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev,
> > > > >          qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL);
> > > > >  }
> > > > >  
> > > > > +static bool vhost_kernel_need_all_device_iotlb(struct vhost_dev *dev)
> > > > > +{
> > > > > +    return false;
> > > > > +}
> > > > > +
> > > > >  static const VhostOps kernel_ops = {
> > > > >          .backend_type = VHOST_BACKEND_TYPE_KERNEL,
> > > > >          .vhost_backend_init = vhost_kernel_init,
> > > > > @@ -264,6 +269,8 @@ static const VhostOps kernel_ops = {
> > > > >  #endif /* CONFIG_VHOST_VSOCK */
> > > > >          .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback,
> > > > >          .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg,
> > > > > +        .vhost_backend_need_all_device_iotlb =
> > > > > +                                vhost_kernel_need_all_device_iotlb,
> > > > >  };
> > > > >  
> > > > >  int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type)
> > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > > > index 38da8692bb..30e0b1c13b 100644
> > > > > --- a/hw/virtio/vhost-user.c
> > > > > +++ b/hw/virtio/vhost-user.c
> > > > > @@ -47,6 +47,7 @@ enum VhostUserProtocolFeature {
> > > > >      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> > > > >      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
> > > > >      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> > > > > +    VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10,
> > > > >      VHOST_USER_PROTOCOL_F_MAX
> > > > >  };
> > > > >  
> > > > > @@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
> > > > >      return 0;
> > > > >  }
> > > > >  
> > > > > +static bool vhost_user_need_all_device_iotlb(struct vhost_dev *dev)
> > > > > +{
> > > > > +    return virtio_has_feature(dev->protocol_features,
> > > > > +                              VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB);
> > > > > +}
> > > > > +
> > > > >  const VhostOps user_ops = {
> > > > >          .backend_type = VHOST_BACKEND_TYPE_USER,
> > > > >          .vhost_backend_init = vhost_user_init,
> > > > > @@ -1611,4 +1618,5 @@ const VhostOps user_ops = {
> > > > >          .vhost_set_config = vhost_user_set_config,
> > > > >          .vhost_crypto_create_session = vhost_user_crypto_create_session,
> > > > >          .vhost_crypto_close_session = vhost_user_crypto_close_session,
> > > > > +        .vhost_backend_need_all_device_iotlb = vhost_user_need_all_device_iotlb,
> > > > >  };
> > > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > > > index f51bf573d5..70922ee998 100644
> > > > > --- a/hw/virtio/vhost.c
> > > > > +++ b/hw/virtio/vhost.c
> > > > > @@ -48,6 +48,9 @@ static unsigned int used_memslots;
> > > > >  static QLIST_HEAD(, vhost_dev) vhost_devices =
> > > > >      QLIST_HEAD_INITIALIZER(vhost_devices);
> > > > >  
> > > > > +static int vhost_memory_region_lookup(struct vhost_dev *hdev, uint64_t gpa,
> > > > > +                                      uint64_t *uaddr, uint64_t *len);
> > > > > +
> > > > >  bool vhost_has_free_slot(void)
> > > > >  {
> > > > >      unsigned int slots_limit = ~0U;
> > > > > @@ -634,12 +637,39 @@ static void vhost_region_addnop(MemoryListener *listener,
> > > > >      vhost_region_add_section(dev, section);
> > > > >  }
> > > > >  
> > > > > -static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > > > +static void vhost_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > > >  {
> > > > >      struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n);
> > > > >      struct vhost_dev *hdev = iommu->hdev;
> > > > >      hwaddr iova = iotlb->iova + iommu->iommu_offset;
> > > > >  
> > > > > +    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> > > > > +        uint64_t uaddr, len;
> > > > > +
> > > > > +        rcu_read_lock();
> > > > > +
> > > > > +        if (iotlb->target_as != NULL) {
> > > > > +            if (vhost_memory_region_lookup(hdev, iotlb->translated_addr,
> > > > > +                        &uaddr, &len)) {
> > > > > +                error_report("Fail to lookup the translated address "
> > > > > +                        "%"PRIx64, iotlb->translated_addr);
> > > > > +                goto out;
> > > > > +            }
> > > > > +
> > > > > +            len = MIN(iotlb->addr_mask + 1, len);
> > > > > +            iova = iova & ~iotlb->addr_mask;
> > > > > +
> > > > > +            if (vhost_backend_update_device_iotlb(hdev, iova, uaddr,
> > > > > +                                                  len, iotlb->perm)) {
> > > > > +                error_report("Fail to update device iotlb");
> > > > > +                goto out;
> > > > > +            }
> > > > > +        }
> > > > > +out:
> > > > > +        rcu_read_unlock();
> > > > > +        return;
> > > > > +    }
> > > > > +
> > > > >      if (vhost_backend_invalidate_device_iotlb(hdev, iova,
> > > > >                                                iotlb->addr_mask + 1)) {
> > > > >          error_report("Fail to invalidate device iotlb");
> > > > > @@ -652,6 +682,7 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> > > > >      struct vhost_dev *dev = container_of(listener, struct vhost_dev,
> > > > >                                           iommu_listener);
> > > > >      struct vhost_iommu *iommu;
> > > > > +    IOMMUNotifierFlag flags;
> > > > >      Int128 end;
> > > > >  
> > > > >      if (!memory_region_is_iommu(section->mr)) {
> > > > > @@ -662,8 +693,15 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> > > > >      end = int128_add(int128_make64(section->offset_within_region),
> > > > >                       section->size);
> > > > >      end = int128_sub(end, int128_one());
> > > > > -    iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify,
> > > > > -                        IOMMU_NOTIFIER_UNMAP,
> > > > > +
> > > > > +    if (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) {
> > > > > +        flags = IOMMU_NOTIFIER_ALL;
> > > > > +    } else {
> > > > > +        flags = IOMMU_NOTIFIER_UNMAP;
> > > > > +    }
> > > > > +
> > > > > +    iommu_notifier_init(&iommu->n, vhost_iommu_map_notify,
> > > > > +                        flags,
> > > > >                          section->offset_within_region,
> > > > >                          int128_get64(end));
> > > > >      iommu->mr = section->mr;
> > > > > @@ -673,6 +711,9 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> > > > >      memory_region_register_iommu_notifier(section->mr, &iommu->n);
> > > > >      QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next);
> > > > >      /* TODO: can replay help performance here? */
> > > > > +    if (flags == IOMMU_NOTIFIER_ALL) {
> > > > > +        memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), &iommu->n);
> > > > > +    }
> > > > >  }
> > > > >  
> > > > >  static void vhost_iommu_region_del(MemoryListener *listener,
> > > > > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> > > > > index 5dac61f9ea..602fd987a4 100644
> > > > > --- a/include/hw/virtio/vhost-backend.h
> > > > > +++ b/include/hw/virtio/vhost-backend.h
> > > > > @@ -101,6 +101,8 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev,
> > > > >  typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
> > > > >                                               uint64_t session_id);
> > > > >  
> > > > > +typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct vhost_dev *dev);
> > > > > +
> > > > >  typedef struct VhostOps {
> > > > >      VhostBackendType backend_type;
> > > > >      vhost_backend_init vhost_backend_init;
> > > > > @@ -138,6 +140,7 @@ typedef struct VhostOps {
> > > > >      vhost_set_config_op vhost_set_config;
> > > > >      vhost_crypto_create_session_op vhost_crypto_create_session;
> > > > >      vhost_crypto_close_session_op vhost_crypto_close_session;
> > > > > +    vhost_backend_need_all_device_iotlb_op vhost_backend_need_all_device_iotlb;
> > > > >  } VhostOps;
> > > > >  
> > > > >  extern const VhostOps user_ops;
> > > > > -- 
> > > > > 2.11.0

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Tiwei Bie 6 years ago
On Thu, Apr 12, 2018 at 04:57:13AM +0300, Michael S. Tsirkin wrote:
> On Thu, Apr 12, 2018 at 09:39:43AM +0800, Tiwei Bie wrote:
> > On Thu, Apr 12, 2018 at 04:29:29AM +0300, Michael S. Tsirkin wrote:
> > > On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote:
> > > > On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
> > > > > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > > > > > feature for vhost-user. By default, vhost-user backend needs
> > > > > > to query the IOTLBs from QEMU after meeting unknown IOVAs.
> > > > > > With this protocol feature negotiated, QEMU will provide all
> > > > > > the IOTLBs to vhost-user backend without waiting for the
> > > > > > queries from backend. This is helpful when using a hardware
> > > > > > accelerator which is not able to handle unknown IOVAs at the
> > > > > > vhost-user backend.
> > > > > > 
> > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > 
> > > > > This is potentially a large amount of data to be sent
> > > > > on a socket.
> > > > 
> > > > If we take the hardware accelerator out of this picture, we
> > > > will find that it's actually a question of "pre-loading" vs
> > > > "lazy-loading". I think neither of them is perfect.
> > > > 
> > > > For "pre-loading", as you said, we may have a tough starting.
> > > > But for "lazy-loading", we can't have a predictable performance.
> > > > A sudden, unexpected performance drop may happen at any time,
> > > > because we may meet an unknown IOVA at any time in this case.
> > > 
> > > That's how hardware behaves too though. So we can expect guests
> > > to try to optimize locality.
> > 
> > The difference is that, the software implementation needs to
> > query the mappings via socket. And it's much slower..
> 
> If you are proposing this new feature as an optimization,
> then I'd like to see numbers showing the performance gains.
> 
> It's definitely possible to optimize things out.  Pre-loading isn't
> where I would start optimizing though.  For example, DPDK could have its
> own VTD emulation, then it could access guest memory directly.
> 
> 
> > > 
> > > > Once we meet an unknown IOVA, the backend's data path will need
> > > > to stop and query the mapping of the IOVA via the socket and
> > > > wait for the reply. And the latency is not negligible (sometimes
> > > > it's even unacceptable), especially in high performance network
> > > > case. So maybe it's better to make both of them available to
> > > > the vhost backend.
> > > > 
> > > > > 
> > > > > I had an impression that a hardware accelerator was using
> > > > > VFIO anyway. Given this, can't we have QEMU program
> > > > > the shadow IOMMU tables into VFIO directly?
> > > > 
> > > > I think it's a good idea! Currently, my concern about it is
> > > > that, the hardware device may not use IOMMU and it may have
> > > > its builtin address translation unit. And it will be a pain
> > > > for device vendors to teach VFIO to be able to work with the
> > > > builtin address translation unit.
> > > 
> > > I think such drivers would have to interate with VFIO somehow.
> > > Otherwise, what is the plan for assigning such devices then?
> > 
> > Such devices are just for vhost data path acceleration.
> 
> That's not true I think.  E.g. RDMA devices have an on-card MMU.
> 
> > They have many available queue pairs, the switch logic
> > will be done among those queue pairs. And different queue
> > pairs will serve different VMs directly.
> > 
> > Best regards,
> > Tiwei Bie
> 
> The way I would do it is attach different PASID values to
> different queues. This way you can use the standard IOMMU
> to enforce protection.

I'm thinking about the case that device wants to use its
builtin address translation, although I'm not really sure
whether there will be a real product work in this way.

Honestly, I like your idea, and I don't object to it. I'll
do more investigations on it. And for the feature proposed
in this RFC, I just think maybe it's better to provide one
more possibility for the backend to support vIOMMU.

Anyway, the work about adding the vIOMMU support in vDPA is
just started few days ago. I'll do more investigations on
each possibility. Thanks! :)

Best regards,
Tiwei Bie

> 
> 
> > > 
> > > 
> > > > Best regards,
> > > > Tiwei Bie
> > > > 
> > > > > 
> > > > > 
> > > > > > ---
> > > > > > The idea of this patch is to let QEMU push all the IOTLBs
> > > > > > to vhost-user backend without waiting for the queries from
> > > > > > the backend. Because hardware accelerator at the vhost-user
> > > > > > backend may not be able to handle unknown IOVAs.
> > > > > > 
> > > > > > This is just a RFC for now. It seems that, it doesn't work
> > > > > > as expected when guest is using kernel driver (To handle
> > > > > > this case, it seems that some RAM regions' events also need
> > > > > > to be listened). Any comments would be appreciated! Thanks!
> > > > > > 
> > > > > >  docs/interop/vhost-user.txt       |  9 ++++++++
> > > > > >  hw/virtio/vhost-backend.c         |  7 ++++++
> > > > > >  hw/virtio/vhost-user.c            |  8 +++++++
> > > > > >  hw/virtio/vhost.c                 | 47 ++++++++++++++++++++++++++++++++++++---
> > > > > >  include/hw/virtio/vhost-backend.h |  3 +++
> > > > > >  5 files changed, 71 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > > > > > index 534caab18a..73e07f9dad 100644
> > > > > > --- a/docs/interop/vhost-user.txt
> > > > > > +++ b/docs/interop/vhost-user.txt
> > > > > > @@ -380,6 +380,7 @@ Protocol features
> > > > > >  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> > > > > >  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
> > > > > >  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> > > > > > +#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10
> > > > > >  
> > > > > >  Master message types
> > > > > >  --------------------
> > > > > > @@ -797,3 +798,11 @@ resilient for selective requests.
> > > > > >  For the message types that already solicit a reply from the client, the
> > > > > >  presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
> > > > > >  no behavioural change. (See the 'Communication' section for details.)
> > > > > > +
> > > > > > +VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > > > > > +------------------------------------
> > > > > > +By default, vhost-user backend needs to query the IOTLBs from QEMU after
> > > > > > +meeting unknown IOVAs. With this protocol feature negotiated, QEMU will
> > > > > > +provide all the IOTLBs to vhost backend without waiting for the queries
> > > > > > +from backend. This is helpful when using a hardware accelerator which is
> > > > > > +not able to handle unknown IOVAs at the vhost-user backend.
> > > > > > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> > > > > > index 7f09efab8b..d72994e9a5 100644
> > > > > > --- a/hw/virtio/vhost-backend.c
> > > > > > +++ b/hw/virtio/vhost-backend.c
> > > > > > @@ -233,6 +233,11 @@ static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev,
> > > > > >          qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL);
> > > > > >  }
> > > > > >  
> > > > > > +static bool vhost_kernel_need_all_device_iotlb(struct vhost_dev *dev)
> > > > > > +{
> > > > > > +    return false;
> > > > > > +}
> > > > > > +
> > > > > >  static const VhostOps kernel_ops = {
> > > > > >          .backend_type = VHOST_BACKEND_TYPE_KERNEL,
> > > > > >          .vhost_backend_init = vhost_kernel_init,
> > > > > > @@ -264,6 +269,8 @@ static const VhostOps kernel_ops = {
> > > > > >  #endif /* CONFIG_VHOST_VSOCK */
> > > > > >          .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback,
> > > > > >          .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg,
> > > > > > +        .vhost_backend_need_all_device_iotlb =
> > > > > > +                                vhost_kernel_need_all_device_iotlb,
> > > > > >  };
> > > > > >  
> > > > > >  int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type)
> > > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > > > > index 38da8692bb..30e0b1c13b 100644
> > > > > > --- a/hw/virtio/vhost-user.c
> > > > > > +++ b/hw/virtio/vhost-user.c
> > > > > > @@ -47,6 +47,7 @@ enum VhostUserProtocolFeature {
> > > > > >      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> > > > > >      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
> > > > > >      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> > > > > > +    VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10,
> > > > > >      VHOST_USER_PROTOCOL_F_MAX
> > > > > >  };
> > > > > >  
> > > > > > @@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
> > > > > >      return 0;
> > > > > >  }
> > > > > >  
> > > > > > +static bool vhost_user_need_all_device_iotlb(struct vhost_dev *dev)
> > > > > > +{
> > > > > > +    return virtio_has_feature(dev->protocol_features,
> > > > > > +                              VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB);
> > > > > > +}
> > > > > > +
> > > > > >  const VhostOps user_ops = {
> > > > > >          .backend_type = VHOST_BACKEND_TYPE_USER,
> > > > > >          .vhost_backend_init = vhost_user_init,
> > > > > > @@ -1611,4 +1618,5 @@ const VhostOps user_ops = {
> > > > > >          .vhost_set_config = vhost_user_set_config,
> > > > > >          .vhost_crypto_create_session = vhost_user_crypto_create_session,
> > > > > >          .vhost_crypto_close_session = vhost_user_crypto_close_session,
> > > > > > +        .vhost_backend_need_all_device_iotlb = vhost_user_need_all_device_iotlb,
> > > > > >  };
> > > > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > > > > index f51bf573d5..70922ee998 100644
> > > > > > --- a/hw/virtio/vhost.c
> > > > > > +++ b/hw/virtio/vhost.c
> > > > > > @@ -48,6 +48,9 @@ static unsigned int used_memslots;
> > > > > >  static QLIST_HEAD(, vhost_dev) vhost_devices =
> > > > > >      QLIST_HEAD_INITIALIZER(vhost_devices);
> > > > > >  
> > > > > > +static int vhost_memory_region_lookup(struct vhost_dev *hdev, uint64_t gpa,
> > > > > > +                                      uint64_t *uaddr, uint64_t *len);
> > > > > > +
> > > > > >  bool vhost_has_free_slot(void)
> > > > > >  {
> > > > > >      unsigned int slots_limit = ~0U;
> > > > > > @@ -634,12 +637,39 @@ static void vhost_region_addnop(MemoryListener *listener,
> > > > > >      vhost_region_add_section(dev, section);
> > > > > >  }
> > > > > >  
> > > > > > -static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > > > > +static void vhost_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > > > >  {
> > > > > >      struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n);
> > > > > >      struct vhost_dev *hdev = iommu->hdev;
> > > > > >      hwaddr iova = iotlb->iova + iommu->iommu_offset;
> > > > > >  
> > > > > > +    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> > > > > > +        uint64_t uaddr, len;
> > > > > > +
> > > > > > +        rcu_read_lock();
> > > > > > +
> > > > > > +        if (iotlb->target_as != NULL) {
> > > > > > +            if (vhost_memory_region_lookup(hdev, iotlb->translated_addr,
> > > > > > +                        &uaddr, &len)) {
> > > > > > +                error_report("Fail to lookup the translated address "
> > > > > > +                        "%"PRIx64, iotlb->translated_addr);
> > > > > > +                goto out;
> > > > > > +            }
> > > > > > +
> > > > > > +            len = MIN(iotlb->addr_mask + 1, len);
> > > > > > +            iova = iova & ~iotlb->addr_mask;
> > > > > > +
> > > > > > +            if (vhost_backend_update_device_iotlb(hdev, iova, uaddr,
> > > > > > +                                                  len, iotlb->perm)) {
> > > > > > +                error_report("Fail to update device iotlb");
> > > > > > +                goto out;
> > > > > > +            }
> > > > > > +        }
> > > > > > +out:
> > > > > > +        rcu_read_unlock();
> > > > > > +        return;
> > > > > > +    }
> > > > > > +
> > > > > >      if (vhost_backend_invalidate_device_iotlb(hdev, iova,
> > > > > >                                                iotlb->addr_mask + 1)) {
> > > > > >          error_report("Fail to invalidate device iotlb");
> > > > > > @@ -652,6 +682,7 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> > > > > >      struct vhost_dev *dev = container_of(listener, struct vhost_dev,
> > > > > >                                           iommu_listener);
> > > > > >      struct vhost_iommu *iommu;
> > > > > > +    IOMMUNotifierFlag flags;
> > > > > >      Int128 end;
> > > > > >  
> > > > > >      if (!memory_region_is_iommu(section->mr)) {
> > > > > > @@ -662,8 +693,15 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> > > > > >      end = int128_add(int128_make64(section->offset_within_region),
> > > > > >                       section->size);
> > > > > >      end = int128_sub(end, int128_one());
> > > > > > -    iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify,
> > > > > > -                        IOMMU_NOTIFIER_UNMAP,
> > > > > > +
> > > > > > +    if (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) {
> > > > > > +        flags = IOMMU_NOTIFIER_ALL;
> > > > > > +    } else {
> > > > > > +        flags = IOMMU_NOTIFIER_UNMAP;
> > > > > > +    }
> > > > > > +
> > > > > > +    iommu_notifier_init(&iommu->n, vhost_iommu_map_notify,
> > > > > > +                        flags,
> > > > > >                          section->offset_within_region,
> > > > > >                          int128_get64(end));
> > > > > >      iommu->mr = section->mr;
> > > > > > @@ -673,6 +711,9 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> > > > > >      memory_region_register_iommu_notifier(section->mr, &iommu->n);
> > > > > >      QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next);
> > > > > >      /* TODO: can replay help performance here? */
> > > > > > +    if (flags == IOMMU_NOTIFIER_ALL) {
> > > > > > +        memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), &iommu->n);
> > > > > > +    }
> > > > > >  }
> > > > > >  
> > > > > >  static void vhost_iommu_region_del(MemoryListener *listener,
> > > > > > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> > > > > > index 5dac61f9ea..602fd987a4 100644
> > > > > > --- a/include/hw/virtio/vhost-backend.h
> > > > > > +++ b/include/hw/virtio/vhost-backend.h
> > > > > > @@ -101,6 +101,8 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev,
> > > > > >  typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
> > > > > >                                               uint64_t session_id);
> > > > > >  
> > > > > > +typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct vhost_dev *dev);
> > > > > > +
> > > > > >  typedef struct VhostOps {
> > > > > >      VhostBackendType backend_type;
> > > > > >      vhost_backend_init vhost_backend_init;
> > > > > > @@ -138,6 +140,7 @@ typedef struct VhostOps {
> > > > > >      vhost_set_config_op vhost_set_config;
> > > > > >      vhost_crypto_create_session_op vhost_crypto_create_session;
> > > > > >      vhost_crypto_close_session_op vhost_crypto_close_session;
> > > > > > +    vhost_backend_need_all_device_iotlb_op vhost_backend_need_all_device_iotlb;
> > > > > >  } VhostOps;
> > > > > >  
> > > > > >  extern const VhostOps user_ops;
> > > > > > -- 
> > > > > > 2.11.0

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Michael S. Tsirkin 6 years ago
On Thu, Apr 12, 2018 at 10:35:05AM +0800, Tiwei Bie wrote:
> On Thu, Apr 12, 2018 at 04:57:13AM +0300, Michael S. Tsirkin wrote:
> > On Thu, Apr 12, 2018 at 09:39:43AM +0800, Tiwei Bie wrote:
> > > On Thu, Apr 12, 2018 at 04:29:29AM +0300, Michael S. Tsirkin wrote:
> > > > On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote:
> > > > > On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
> > > > > > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > > > > > > feature for vhost-user. By default, vhost-user backend needs
> > > > > > > to query the IOTLBs from QEMU after meeting unknown IOVAs.
> > > > > > > With this protocol feature negotiated, QEMU will provide all
> > > > > > > the IOTLBs to vhost-user backend without waiting for the
> > > > > > > queries from backend. This is helpful when using a hardware
> > > > > > > accelerator which is not able to handle unknown IOVAs at the
> > > > > > > vhost-user backend.
> > > > > > > 
> > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > 
> > > > > > This is potentially a large amount of data to be sent
> > > > > > on a socket.
> > > > > 
> > > > > If we take the hardware accelerator out of this picture, we
> > > > > will find that it's actually a question of "pre-loading" vs
> > > > > "lazy-loading". I think neither of them is perfect.
> > > > > 
> > > > > For "pre-loading", as you said, we may have a tough starting.
> > > > > But for "lazy-loading", we can't have a predictable performance.
> > > > > A sudden, unexpected performance drop may happen at any time,
> > > > > because we may meet an unknown IOVA at any time in this case.
> > > > 
> > > > That's how hardware behaves too though. So we can expect guests
> > > > to try to optimize locality.
> > > 
> > > The difference is that, the software implementation needs to
> > > query the mappings via socket. And it's much slower..
> > 
> > If you are proposing this new feature as an optimization,
> > then I'd like to see numbers showing the performance gains.
> > 
> > It's definitely possible to optimize things out.  Pre-loading isn't
> > where I would start optimizing though.  For example, DPDK could have its
> > own VTD emulation, then it could access guest memory directly.
> > 
> > 
> > > > 
> > > > > Once we meet an unknown IOVA, the backend's data path will need
> > > > > to stop and query the mapping of the IOVA via the socket and
> > > > > wait for the reply. And the latency is not negligible (sometimes
> > > > > it's even unacceptable), especially in high performance network
> > > > > case. So maybe it's better to make both of them available to
> > > > > the vhost backend.
> > > > > 
> > > > > > 
> > > > > > I had an impression that a hardware accelerator was using
> > > > > > VFIO anyway. Given this, can't we have QEMU program
> > > > > > the shadow IOMMU tables into VFIO directly?
> > > > > 
> > > > > I think it's a good idea! Currently, my concern about it is
> > > > > that, the hardware device may not use IOMMU and it may have
> > > > > its builtin address translation unit. And it will be a pain
> > > > > for device vendors to teach VFIO to be able to work with the
> > > > > builtin address translation unit.
> > > > 
> > > > I think such drivers would have to interate with VFIO somehow.
> > > > Otherwise, what is the plan for assigning such devices then?
> > > 
> > > Such devices are just for vhost data path acceleration.
> > 
> > That's not true I think.  E.g. RDMA devices have an on-card MMU.
> > 
> > > They have many available queue pairs, the switch logic
> > > will be done among those queue pairs. And different queue
> > > pairs will serve different VMs directly.
> > > 
> > > Best regards,
> > > Tiwei Bie
> > 
> > The way I would do it is attach different PASID values to
> > different queues. This way you can use the standard IOMMU
> > to enforce protection.
> 
> I'm thinking about the case that device wants to use its
> builtin address translation, although I'm not really sure
> whether there will be a real product work in this way.
> 
> Honestly, I like your idea, and I don't object to it. I'll
> do more investigations on it. And for the feature proposed
> in this RFC, I just think maybe it's better to provide one
> more possibility for the backend to support vIOMMU.
> 
> Anyway, the work about adding the vIOMMU support in vDPA is
> just started few days ago. I'll do more investigations on
> each possibility. Thanks! :)
> 
> Best regards,
> Tiwei Bie

There are more advantages to using request with PASID:

You can use hardware support for nesting, having guest supply 1st level
translation and host second level translation.

I actually had an idea to do something like this for AMD
and ARM which support nesting even for requests with PASID,
having intel benefit too would be nice.



> > 
> > 
> > > > 
> > > > 
> > > > > Best regards,
> > > > > Tiwei Bie
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > > ---
> > > > > > > The idea of this patch is to let QEMU push all the IOTLBs
> > > > > > > to vhost-user backend without waiting for the queries from
> > > > > > > the backend. Because hardware accelerator at the vhost-user
> > > > > > > backend may not be able to handle unknown IOVAs.
> > > > > > > 
> > > > > > > This is just a RFC for now. It seems that, it doesn't work
> > > > > > > as expected when guest is using kernel driver (To handle
> > > > > > > this case, it seems that some RAM regions' events also need
> > > > > > > to be listened). Any comments would be appreciated! Thanks!
> > > > > > > 
> > > > > > >  docs/interop/vhost-user.txt       |  9 ++++++++
> > > > > > >  hw/virtio/vhost-backend.c         |  7 ++++++
> > > > > > >  hw/virtio/vhost-user.c            |  8 +++++++
> > > > > > >  hw/virtio/vhost.c                 | 47 ++++++++++++++++++++++++++++++++++++---
> > > > > > >  include/hw/virtio/vhost-backend.h |  3 +++
> > > > > > >  5 files changed, 71 insertions(+), 3 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > > > > > > index 534caab18a..73e07f9dad 100644
> > > > > > > --- a/docs/interop/vhost-user.txt
> > > > > > > +++ b/docs/interop/vhost-user.txt
> > > > > > > @@ -380,6 +380,7 @@ Protocol features
> > > > > > >  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> > > > > > >  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
> > > > > > >  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> > > > > > > +#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10
> > > > > > >  
> > > > > > >  Master message types
> > > > > > >  --------------------
> > > > > > > @@ -797,3 +798,11 @@ resilient for selective requests.
> > > > > > >  For the message types that already solicit a reply from the client, the
> > > > > > >  presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
> > > > > > >  no behavioural change. (See the 'Communication' section for details.)
> > > > > > > +
> > > > > > > +VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > > > > > > +------------------------------------
> > > > > > > +By default, vhost-user backend needs to query the IOTLBs from QEMU after
> > > > > > > +meeting unknown IOVAs. With this protocol feature negotiated, QEMU will
> > > > > > > +provide all the IOTLBs to vhost backend without waiting for the queries
> > > > > > > +from backend. This is helpful when using a hardware accelerator which is
> > > > > > > +not able to handle unknown IOVAs at the vhost-user backend.
> > > > > > > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> > > > > > > index 7f09efab8b..d72994e9a5 100644
> > > > > > > --- a/hw/virtio/vhost-backend.c
> > > > > > > +++ b/hw/virtio/vhost-backend.c
> > > > > > > @@ -233,6 +233,11 @@ static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev,
> > > > > > >          qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL);
> > > > > > >  }
> > > > > > >  
> > > > > > > +static bool vhost_kernel_need_all_device_iotlb(struct vhost_dev *dev)
> > > > > > > +{
> > > > > > > +    return false;
> > > > > > > +}
> > > > > > > +
> > > > > > >  static const VhostOps kernel_ops = {
> > > > > > >          .backend_type = VHOST_BACKEND_TYPE_KERNEL,
> > > > > > >          .vhost_backend_init = vhost_kernel_init,
> > > > > > > @@ -264,6 +269,8 @@ static const VhostOps kernel_ops = {
> > > > > > >  #endif /* CONFIG_VHOST_VSOCK */
> > > > > > >          .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback,
> > > > > > >          .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg,
> > > > > > > +        .vhost_backend_need_all_device_iotlb =
> > > > > > > +                                vhost_kernel_need_all_device_iotlb,
> > > > > > >  };
> > > > > > >  
> > > > > > >  int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type)
> > > > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > > > > > index 38da8692bb..30e0b1c13b 100644
> > > > > > > --- a/hw/virtio/vhost-user.c
> > > > > > > +++ b/hw/virtio/vhost-user.c
> > > > > > > @@ -47,6 +47,7 @@ enum VhostUserProtocolFeature {
> > > > > > >      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> > > > > > >      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
> > > > > > >      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> > > > > > > +    VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10,
> > > > > > >      VHOST_USER_PROTOCOL_F_MAX
> > > > > > >  };
> > > > > > >  
> > > > > > > @@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
> > > > > > >      return 0;
> > > > > > >  }
> > > > > > >  
> > > > > > > +static bool vhost_user_need_all_device_iotlb(struct vhost_dev *dev)
> > > > > > > +{
> > > > > > > +    return virtio_has_feature(dev->protocol_features,
> > > > > > > +                              VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB);
> > > > > > > +}
> > > > > > > +
> > > > > > >  const VhostOps user_ops = {
> > > > > > >          .backend_type = VHOST_BACKEND_TYPE_USER,
> > > > > > >          .vhost_backend_init = vhost_user_init,
> > > > > > > @@ -1611,4 +1618,5 @@ const VhostOps user_ops = {
> > > > > > >          .vhost_set_config = vhost_user_set_config,
> > > > > > >          .vhost_crypto_create_session = vhost_user_crypto_create_session,
> > > > > > >          .vhost_crypto_close_session = vhost_user_crypto_close_session,
> > > > > > > +        .vhost_backend_need_all_device_iotlb = vhost_user_need_all_device_iotlb,
> > > > > > >  };
> > > > > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > > > > > index f51bf573d5..70922ee998 100644
> > > > > > > --- a/hw/virtio/vhost.c
> > > > > > > +++ b/hw/virtio/vhost.c
> > > > > > > @@ -48,6 +48,9 @@ static unsigned int used_memslots;
> > > > > > >  static QLIST_HEAD(, vhost_dev) vhost_devices =
> > > > > > >      QLIST_HEAD_INITIALIZER(vhost_devices);
> > > > > > >  
> > > > > > > +static int vhost_memory_region_lookup(struct vhost_dev *hdev, uint64_t gpa,
> > > > > > > +                                      uint64_t *uaddr, uint64_t *len);
> > > > > > > +
> > > > > > >  bool vhost_has_free_slot(void)
> > > > > > >  {
> > > > > > >      unsigned int slots_limit = ~0U;
> > > > > > > @@ -634,12 +637,39 @@ static void vhost_region_addnop(MemoryListener *listener,
> > > > > > >      vhost_region_add_section(dev, section);
> > > > > > >  }
> > > > > > >  
> > > > > > > -static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > > > > > +static void vhost_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > > > > >  {
> > > > > > >      struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n);
> > > > > > >      struct vhost_dev *hdev = iommu->hdev;
> > > > > > >      hwaddr iova = iotlb->iova + iommu->iommu_offset;
> > > > > > >  
> > > > > > > +    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> > > > > > > +        uint64_t uaddr, len;
> > > > > > > +
> > > > > > > +        rcu_read_lock();
> > > > > > > +
> > > > > > > +        if (iotlb->target_as != NULL) {
> > > > > > > +            if (vhost_memory_region_lookup(hdev, iotlb->translated_addr,
> > > > > > > +                        &uaddr, &len)) {
> > > > > > > +                error_report("Fail to lookup the translated address "
> > > > > > > +                        "%"PRIx64, iotlb->translated_addr);
> > > > > > > +                goto out;
> > > > > > > +            }
> > > > > > > +
> > > > > > > +            len = MIN(iotlb->addr_mask + 1, len);
> > > > > > > +            iova = iova & ~iotlb->addr_mask;
> > > > > > > +
> > > > > > > +            if (vhost_backend_update_device_iotlb(hdev, iova, uaddr,
> > > > > > > +                                                  len, iotlb->perm)) {
> > > > > > > +                error_report("Fail to update device iotlb");
> > > > > > > +                goto out;
> > > > > > > +            }
> > > > > > > +        }
> > > > > > > +out:
> > > > > > > +        rcu_read_unlock();
> > > > > > > +        return;
> > > > > > > +    }
> > > > > > > +
> > > > > > >      if (vhost_backend_invalidate_device_iotlb(hdev, iova,
> > > > > > >                                                iotlb->addr_mask + 1)) {
> > > > > > >          error_report("Fail to invalidate device iotlb");
> > > > > > > @@ -652,6 +682,7 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> > > > > > >      struct vhost_dev *dev = container_of(listener, struct vhost_dev,
> > > > > > >                                           iommu_listener);
> > > > > > >      struct vhost_iommu *iommu;
> > > > > > > +    IOMMUNotifierFlag flags;
> > > > > > >      Int128 end;
> > > > > > >  
> > > > > > >      if (!memory_region_is_iommu(section->mr)) {
> > > > > > > @@ -662,8 +693,15 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> > > > > > >      end = int128_add(int128_make64(section->offset_within_region),
> > > > > > >                       section->size);
> > > > > > >      end = int128_sub(end, int128_one());
> > > > > > > -    iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify,
> > > > > > > -                        IOMMU_NOTIFIER_UNMAP,
> > > > > > > +
> > > > > > > +    if (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) {
> > > > > > > +        flags = IOMMU_NOTIFIER_ALL;
> > > > > > > +    } else {
> > > > > > > +        flags = IOMMU_NOTIFIER_UNMAP;
> > > > > > > +    }
> > > > > > > +
> > > > > > > +    iommu_notifier_init(&iommu->n, vhost_iommu_map_notify,
> > > > > > > +                        flags,
> > > > > > >                          section->offset_within_region,
> > > > > > >                          int128_get64(end));
> > > > > > >      iommu->mr = section->mr;
> > > > > > > @@ -673,6 +711,9 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> > > > > > >      memory_region_register_iommu_notifier(section->mr, &iommu->n);
> > > > > > >      QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next);
> > > > > > >      /* TODO: can replay help performance here? */
> > > > > > > +    if (flags == IOMMU_NOTIFIER_ALL) {
> > > > > > > +        memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), &iommu->n);
> > > > > > > +    }
> > > > > > >  }
> > > > > > >  
> > > > > > >  static void vhost_iommu_region_del(MemoryListener *listener,
> > > > > > > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> > > > > > > index 5dac61f9ea..602fd987a4 100644
> > > > > > > --- a/include/hw/virtio/vhost-backend.h
> > > > > > > +++ b/include/hw/virtio/vhost-backend.h
> > > > > > > @@ -101,6 +101,8 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev,
> > > > > > >  typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
> > > > > > >                                               uint64_t session_id);
> > > > > > >  
> > > > > > > +typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct vhost_dev *dev);
> > > > > > > +
> > > > > > >  typedef struct VhostOps {
> > > > > > >      VhostBackendType backend_type;
> > > > > > >      vhost_backend_init vhost_backend_init;
> > > > > > > @@ -138,6 +140,7 @@ typedef struct VhostOps {
> > > > > > >      vhost_set_config_op vhost_set_config;
> > > > > > >      vhost_crypto_create_session_op vhost_crypto_create_session;
> > > > > > >      vhost_crypto_close_session_op vhost_crypto_close_session;
> > > > > > > +    vhost_backend_need_all_device_iotlb_op vhost_backend_need_all_device_iotlb;
> > > > > > >  } VhostOps;
> > > > > > >  
> > > > > > >  extern const VhostOps user_ops;
> > > > > > > -- 
> > > > > > > 2.11.0

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Michael S. Tsirkin 6 years ago
On Thu, Apr 12, 2018 at 06:20:55AM +0300, Michael S. Tsirkin wrote:
> On Thu, Apr 12, 2018 at 10:35:05AM +0800, Tiwei Bie wrote:
> > On Thu, Apr 12, 2018 at 04:57:13AM +0300, Michael S. Tsirkin wrote:
> > > On Thu, Apr 12, 2018 at 09:39:43AM +0800, Tiwei Bie wrote:
> > > > On Thu, Apr 12, 2018 at 04:29:29AM +0300, Michael S. Tsirkin wrote:
> > > > > On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote:
> > > > > > On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote:
> > > > > > > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
> > > > > > > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > > > > > > > feature for vhost-user. By default, vhost-user backend needs
> > > > > > > > to query the IOTLBs from QEMU after meeting unknown IOVAs.
> > > > > > > > With this protocol feature negotiated, QEMU will provide all
> > > > > > > > the IOTLBs to vhost-user backend without waiting for the
> > > > > > > > queries from backend. This is helpful when using a hardware
> > > > > > > > accelerator which is not able to handle unknown IOVAs at the
> > > > > > > > vhost-user backend.
> > > > > > > > 
> > > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > > 
> > > > > > > This is potentially a large amount of data to be sent
> > > > > > > on a socket.
> > > > > > 
> > > > > > If we take the hardware accelerator out of this picture, we
> > > > > > will find that it's actually a question of "pre-loading" vs
> > > > > > "lazy-loading". I think neither of them is perfect.
> > > > > > 
> > > > > > For "pre-loading", as you said, we may have a tough starting.
> > > > > > But for "lazy-loading", we can't have a predictable performance.
> > > > > > A sudden, unexpected performance drop may happen at any time,
> > > > > > because we may meet an unknown IOVA at any time in this case.
> > > > > 
> > > > > That's how hardware behaves too though. So we can expect guests
> > > > > to try to optimize locality.
> > > > 
> > > > The difference is that, the software implementation needs to
> > > > query the mappings via socket. And it's much slower..
> > > 
> > > If you are proposing this new feature as an optimization,
> > > then I'd like to see numbers showing the performance gains.
> > > 
> > > It's definitely possible to optimize things out.  Pre-loading isn't
> > > where I would start optimizing though.  For example, DPDK could have its
> > > own VTD emulation, then it could access guest memory directly.
> > > 
> > > 
> > > > > 
> > > > > > Once we meet an unknown IOVA, the backend's data path will need
> > > > > > to stop and query the mapping of the IOVA via the socket and
> > > > > > wait for the reply. And the latency is not negligible (sometimes
> > > > > > it's even unacceptable), especially in high performance network
> > > > > > case. So maybe it's better to make both of them available to
> > > > > > the vhost backend.
> > > > > > 
> > > > > > > 
> > > > > > > I had an impression that a hardware accelerator was using
> > > > > > > VFIO anyway. Given this, can't we have QEMU program
> > > > > > > the shadow IOMMU tables into VFIO directly?
> > > > > > 
> > > > > > I think it's a good idea! Currently, my concern about it is
> > > > > > that, the hardware device may not use IOMMU and it may have
> > > > > > its builtin address translation unit. And it will be a pain
> > > > > > for device vendors to teach VFIO to be able to work with the
> > > > > > builtin address translation unit.
> > > > > 
> > > > > I think such drivers would have to interate with VFIO somehow.
> > > > > Otherwise, what is the plan for assigning such devices then?
> > > > 
> > > > Such devices are just for vhost data path acceleration.
> > > 
> > > That's not true I think.  E.g. RDMA devices have an on-card MMU.
> > > 
> > > > They have many available queue pairs, the switch logic
> > > > will be done among those queue pairs. And different queue
> > > > pairs will serve different VMs directly.
> > > > 
> > > > Best regards,
> > > > Tiwei Bie
> > > 
> > > The way I would do it is attach different PASID values to
> > > different queues. This way you can use the standard IOMMU
> > > to enforce protection.
> > 
> > I'm thinking about the case that device wants to use its
> > builtin address translation, although I'm not really sure
> > whether there will be a real product work in this way.
> > 
> > Honestly, I like your idea, and I don't object to it. I'll
> > do more investigations on it. And for the feature proposed
> > in this RFC, I just think maybe it's better to provide one
> > more possibility for the backend to support vIOMMU.
> > 
> > Anyway, the work about adding the vIOMMU support in vDPA is
> > just started few days ago. I'll do more investigations on
> > each possibility. Thanks! :)
> > 
> > Best regards,
> > Tiwei Bie
> 
> There are more advantages to using request with PASID:
> 
> You can use hardware support for nesting, having guest supply 1st level
> translation and host second level translation.
> 
> I actually had an idea to do something like this for AMD
> and ARM which support nesting even for requests with PASID,
> having intel benefit too would be nice.

Something else to consider is implementing PRS capability.


In theory this could then go like this:

- get page request from device
- fetch request from VTD page tables
- use response to issue a page response message


This would match the current vhost-user model.





> 
> 
> > > 
> > > 
> > > > > 
> > > > > 
> > > > > > Best regards,
> > > > > > Tiwei Bie
> > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > > ---
> > > > > > > > The idea of this patch is to let QEMU push all the IOTLBs
> > > > > > > > to vhost-user backend without waiting for the queries from
> > > > > > > > the backend. Because hardware accelerator at the vhost-user
> > > > > > > > backend may not be able to handle unknown IOVAs.
> > > > > > > > 
> > > > > > > > This is just a RFC for now. It seems that, it doesn't work
> > > > > > > > as expected when guest is using kernel driver (To handle
> > > > > > > > this case, it seems that some RAM regions' events also need
> > > > > > > > to be listened). Any comments would be appreciated! Thanks!
> > > > > > > > 
> > > > > > > >  docs/interop/vhost-user.txt       |  9 ++++++++
> > > > > > > >  hw/virtio/vhost-backend.c         |  7 ++++++
> > > > > > > >  hw/virtio/vhost-user.c            |  8 +++++++
> > > > > > > >  hw/virtio/vhost.c                 | 47 ++++++++++++++++++++++++++++++++++++---
> > > > > > > >  include/hw/virtio/vhost-backend.h |  3 +++
> > > > > > > >  5 files changed, 71 insertions(+), 3 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > > > > > > > index 534caab18a..73e07f9dad 100644
> > > > > > > > --- a/docs/interop/vhost-user.txt
> > > > > > > > +++ b/docs/interop/vhost-user.txt
> > > > > > > > @@ -380,6 +380,7 @@ Protocol features
> > > > > > > >  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> > > > > > > >  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
> > > > > > > >  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> > > > > > > > +#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10
> > > > > > > >  
> > > > > > > >  Master message types
> > > > > > > >  --------------------
> > > > > > > > @@ -797,3 +798,11 @@ resilient for selective requests.
> > > > > > > >  For the message types that already solicit a reply from the client, the
> > > > > > > >  presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
> > > > > > > >  no behavioural change. (See the 'Communication' section for details.)
> > > > > > > > +
> > > > > > > > +VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > > > > > > > +------------------------------------
> > > > > > > > +By default, vhost-user backend needs to query the IOTLBs from QEMU after
> > > > > > > > +meeting unknown IOVAs. With this protocol feature negotiated, QEMU will
> > > > > > > > +provide all the IOTLBs to vhost backend without waiting for the queries
> > > > > > > > +from backend. This is helpful when using a hardware accelerator which is
> > > > > > > > +not able to handle unknown IOVAs at the vhost-user backend.
> > > > > > > > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> > > > > > > > index 7f09efab8b..d72994e9a5 100644
> > > > > > > > --- a/hw/virtio/vhost-backend.c
> > > > > > > > +++ b/hw/virtio/vhost-backend.c
> > > > > > > > @@ -233,6 +233,11 @@ static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev,
> > > > > > > >          qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL);
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > +static bool vhost_kernel_need_all_device_iotlb(struct vhost_dev *dev)
> > > > > > > > +{
> > > > > > > > +    return false;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  static const VhostOps kernel_ops = {
> > > > > > > >          .backend_type = VHOST_BACKEND_TYPE_KERNEL,
> > > > > > > >          .vhost_backend_init = vhost_kernel_init,
> > > > > > > > @@ -264,6 +269,8 @@ static const VhostOps kernel_ops = {
> > > > > > > >  #endif /* CONFIG_VHOST_VSOCK */
> > > > > > > >          .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback,
> > > > > > > >          .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg,
> > > > > > > > +        .vhost_backend_need_all_device_iotlb =
> > > > > > > > +                                vhost_kernel_need_all_device_iotlb,
> > > > > > > >  };
> > > > > > > >  
> > > > > > > >  int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type)
> > > > > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > > > > > > index 38da8692bb..30e0b1c13b 100644
> > > > > > > > --- a/hw/virtio/vhost-user.c
> > > > > > > > +++ b/hw/virtio/vhost-user.c
> > > > > > > > @@ -47,6 +47,7 @@ enum VhostUserProtocolFeature {
> > > > > > > >      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> > > > > > > >      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
> > > > > > > >      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> > > > > > > > +    VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10,
> > > > > > > >      VHOST_USER_PROTOCOL_F_MAX
> > > > > > > >  };
> > > > > > > >  
> > > > > > > > @@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
> > > > > > > >      return 0;
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > +static bool vhost_user_need_all_device_iotlb(struct vhost_dev *dev)
> > > > > > > > +{
> > > > > > > > +    return virtio_has_feature(dev->protocol_features,
> > > > > > > > +                              VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  const VhostOps user_ops = {
> > > > > > > >          .backend_type = VHOST_BACKEND_TYPE_USER,
> > > > > > > >          .vhost_backend_init = vhost_user_init,
> > > > > > > > @@ -1611,4 +1618,5 @@ const VhostOps user_ops = {
> > > > > > > >          .vhost_set_config = vhost_user_set_config,
> > > > > > > >          .vhost_crypto_create_session = vhost_user_crypto_create_session,
> > > > > > > >          .vhost_crypto_close_session = vhost_user_crypto_close_session,
> > > > > > > > +        .vhost_backend_need_all_device_iotlb = vhost_user_need_all_device_iotlb,
> > > > > > > >  };
> > > > > > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > > > > > > index f51bf573d5..70922ee998 100644
> > > > > > > > --- a/hw/virtio/vhost.c
> > > > > > > > +++ b/hw/virtio/vhost.c
> > > > > > > > @@ -48,6 +48,9 @@ static unsigned int used_memslots;
> > > > > > > >  static QLIST_HEAD(, vhost_dev) vhost_devices =
> > > > > > > >      QLIST_HEAD_INITIALIZER(vhost_devices);
> > > > > > > >  
> > > > > > > > +static int vhost_memory_region_lookup(struct vhost_dev *hdev, uint64_t gpa,
> > > > > > > > +                                      uint64_t *uaddr, uint64_t *len);
> > > > > > > > +
> > > > > > > >  bool vhost_has_free_slot(void)
> > > > > > > >  {
> > > > > > > >      unsigned int slots_limit = ~0U;
> > > > > > > > @@ -634,12 +637,39 @@ static void vhost_region_addnop(MemoryListener *listener,
> > > > > > > >      vhost_region_add_section(dev, section);
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > -static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > > > > > > +static void vhost_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > > > > > >  {
> > > > > > > >      struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n);
> > > > > > > >      struct vhost_dev *hdev = iommu->hdev;
> > > > > > > >      hwaddr iova = iotlb->iova + iommu->iommu_offset;
> > > > > > > >  
> > > > > > > > +    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> > > > > > > > +        uint64_t uaddr, len;
> > > > > > > > +
> > > > > > > > +        rcu_read_lock();
> > > > > > > > +
> > > > > > > > +        if (iotlb->target_as != NULL) {
> > > > > > > > +            if (vhost_memory_region_lookup(hdev, iotlb->translated_addr,
> > > > > > > > +                        &uaddr, &len)) {
> > > > > > > > +                error_report("Fail to lookup the translated address "
> > > > > > > > +                        "%"PRIx64, iotlb->translated_addr);
> > > > > > > > +                goto out;
> > > > > > > > +            }
> > > > > > > > +
> > > > > > > > +            len = MIN(iotlb->addr_mask + 1, len);
> > > > > > > > +            iova = iova & ~iotlb->addr_mask;
> > > > > > > > +
> > > > > > > > +            if (vhost_backend_update_device_iotlb(hdev, iova, uaddr,
> > > > > > > > +                                                  len, iotlb->perm)) {
> > > > > > > > +                error_report("Fail to update device iotlb");
> > > > > > > > +                goto out;
> > > > > > > > +            }
> > > > > > > > +        }
> > > > > > > > +out:
> > > > > > > > +        rcu_read_unlock();
> > > > > > > > +        return;
> > > > > > > > +    }
> > > > > > > > +
> > > > > > > >      if (vhost_backend_invalidate_device_iotlb(hdev, iova,
> > > > > > > >                                                iotlb->addr_mask + 1)) {
> > > > > > > >          error_report("Fail to invalidate device iotlb");
> > > > > > > > @@ -652,6 +682,7 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> > > > > > > >      struct vhost_dev *dev = container_of(listener, struct vhost_dev,
> > > > > > > >                                           iommu_listener);
> > > > > > > >      struct vhost_iommu *iommu;
> > > > > > > > +    IOMMUNotifierFlag flags;
> > > > > > > >      Int128 end;
> > > > > > > >  
> > > > > > > >      if (!memory_region_is_iommu(section->mr)) {
> > > > > > > > @@ -662,8 +693,15 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> > > > > > > >      end = int128_add(int128_make64(section->offset_within_region),
> > > > > > > >                       section->size);
> > > > > > > >      end = int128_sub(end, int128_one());
> > > > > > > > -    iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify,
> > > > > > > > -                        IOMMU_NOTIFIER_UNMAP,
> > > > > > > > +
> > > > > > > > +    if (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) {
> > > > > > > > +        flags = IOMMU_NOTIFIER_ALL;
> > > > > > > > +    } else {
> > > > > > > > +        flags = IOMMU_NOTIFIER_UNMAP;
> > > > > > > > +    }
> > > > > > > > +
> > > > > > > > +    iommu_notifier_init(&iommu->n, vhost_iommu_map_notify,
> > > > > > > > +                        flags,
> > > > > > > >                          section->offset_within_region,
> > > > > > > >                          int128_get64(end));
> > > > > > > >      iommu->mr = section->mr;
> > > > > > > > @@ -673,6 +711,9 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> > > > > > > >      memory_region_register_iommu_notifier(section->mr, &iommu->n);
> > > > > > > >      QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next);
> > > > > > > >      /* TODO: can replay help performance here? */
> > > > > > > > +    if (flags == IOMMU_NOTIFIER_ALL) {
> > > > > > > > +        memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), &iommu->n);
> > > > > > > > +    }
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  static void vhost_iommu_region_del(MemoryListener *listener,
> > > > > > > > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> > > > > > > > index 5dac61f9ea..602fd987a4 100644
> > > > > > > > --- a/include/hw/virtio/vhost-backend.h
> > > > > > > > +++ b/include/hw/virtio/vhost-backend.h
> > > > > > > > @@ -101,6 +101,8 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev,
> > > > > > > >  typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
> > > > > > > >                                               uint64_t session_id);
> > > > > > > >  
> > > > > > > > +typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct vhost_dev *dev);
> > > > > > > > +
> > > > > > > >  typedef struct VhostOps {
> > > > > > > >      VhostBackendType backend_type;
> > > > > > > >      vhost_backend_init vhost_backend_init;
> > > > > > > > @@ -138,6 +140,7 @@ typedef struct VhostOps {
> > > > > > > >      vhost_set_config_op vhost_set_config;
> > > > > > > >      vhost_crypto_create_session_op vhost_crypto_create_session;
> > > > > > > >      vhost_crypto_close_session_op vhost_crypto_close_session;
> > > > > > > > +    vhost_backend_need_all_device_iotlb_op vhost_backend_need_all_device_iotlb;
> > > > > > > >  } VhostOps;
> > > > > > > >  
> > > > > > > >  extern const VhostOps user_ops;
> > > > > > > > -- 
> > > > > > > > 2.11.0

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Jason Wang 6 years ago

On 2018年04月12日 11:35, Michael S. Tsirkin wrote:
>> There are more advantages to using request with PASID:
>>
>> You can use hardware support for nesting, having guest supply 1st level
>> translation and host second level translation.
>>
>> I actually had an idea to do something like this for AMD
>> and ARM which support nesting even for requests with PASID,
>> having intel benefit too would be nice.
> Something else to consider is implementing PRS capability.
>
>
> In theory this could then go like this:
>
> - get page request from device
> - fetch request from VTD page tables
> - use response to issue a page response message
>
>
> This would match the current vhost-user model.

This requires IOMMU driver can forward this to VFIO and then VFIO can 
forward it to userspace. Looks like a lot of changes and it would be 
even slower than what is proposed in this patch.

Thanks

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Michael S. Tsirkin 6 years ago
On Thu, Apr 12, 2018 at 11:43:41AM +0800, Jason Wang wrote:
> 
> 
> On 2018年04月12日 11:35, Michael S. Tsirkin wrote:
> > > There are more advantages to using request with PASID:
> > > 
> > > You can use hardware support for nesting, having guest supply 1st level
> > > translation and host second level translation.
> > > 
> > > I actually had an idea to do something like this for AMD
> > > and ARM which support nesting even for requests with PASID,
> > > having intel benefit too would be nice.
> > Something else to consider is implementing PRS capability.
> > 
> > 
> > In theory this could then go like this:
> > 
> > - get page request from device
> > - fetch request from VTD page tables
> > - use response to issue a page response message
> > 
> > 
> > This would match the current vhost-user model.
> 
> This requires IOMMU driver can forward this to VFIO and then VFIO can
> forward it to userspace. Looks like a lot of changes and it would be even
> slower than what is proposed in this patch.
> 
> Thanks


It would work better for a static table as only accessed pages would
need to be sent.  Slower for the dynamic case but dynamic case needs
hardware support to work properly in any case.

-- 
MST

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Jason Wang 6 years ago

On 2018年04月12日 09:57, Michael S. Tsirkin wrote:
> On Thu, Apr 12, 2018 at 09:39:43AM +0800, Tiwei Bie wrote:
>> On Thu, Apr 12, 2018 at 04:29:29AM +0300, Michael S. Tsirkin wrote:
>>> On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote:
>>>> On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote:
>>>>> On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
>>>>>> This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
>>>>>> feature for vhost-user. By default, vhost-user backend needs
>>>>>> to query the IOTLBs from QEMU after meeting unknown IOVAs.
>>>>>> With this protocol feature negotiated, QEMU will provide all
>>>>>> the IOTLBs to vhost-user backend without waiting for the
>>>>>> queries from backend. This is helpful when using a hardware
>>>>>> accelerator which is not able to handle unknown IOVAs at the
>>>>>> vhost-user backend.
>>>>>>
>>>>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>>>> This is potentially a large amount of data to be sent
>>>>> on a socket.
>>>> If we take the hardware accelerator out of this picture, we
>>>> will find that it's actually a question of "pre-loading" vs
>>>> "lazy-loading". I think neither of them is perfect.
>>>>
>>>> For "pre-loading", as you said, we may have a tough starting.
>>>> But for "lazy-loading", we can't have a predictable performance.
>>>> A sudden, unexpected performance drop may happen at any time,
>>>> because we may meet an unknown IOVA at any time in this case.
>>> That's how hardware behaves too though. So we can expect guests
>>> to try to optimize locality.
>> The difference is that, the software implementation needs to
>> query the mappings via socket. And it's much slower..
> If you are proposing this new feature as an optimization,
> then I'd like to see numbers showing the performance gains.
>
> It's definitely possible to optimize things out.  Pre-loading isn't
> where I would start optimizing though.  For example, DPDK could have its
> own VTD emulation, then it could access guest memory directly.

Have vtd emulation in dpdk have many disadvantages:

- vendor locked, can only work for intel
- duplication of codes and bugs
- a huge number of new message types needs to be invented

So I tend to go to a reverse way, link dpdk to qemu.

>
>
>>>> Once we meet an unknown IOVA, the backend's data path will need
>>>> to stop and query the mapping of the IOVA via the socket and
>>>> wait for the reply. And the latency is not negligible (sometimes
>>>> it's even unacceptable), especially in high performance network
>>>> case. So maybe it's better to make both of them available to
>>>> the vhost backend.
>>>>
>>>>> I had an impression that a hardware accelerator was using
>>>>> VFIO anyway. Given this, can't we have QEMU program
>>>>> the shadow IOMMU tables into VFIO directly?
>>>> I think it's a good idea! Currently, my concern about it is
>>>> that, the hardware device may not use IOMMU and it may have
>>>> its builtin address translation unit. And it will be a pain
>>>> for device vendors to teach VFIO to be able to work with the
>>>> builtin address translation unit.
>>> I think such drivers would have to interate with VFIO somehow.
>>> Otherwise, what is the plan for assigning such devices then?
>> Such devices are just for vhost data path acceleration.
> That's not true I think.  E.g. RDMA devices have an on-card MMU.
>
>> They have many available queue pairs, the switch logic
>> will be done among those queue pairs. And different queue
>> pairs will serve different VMs directly.
>>
>> Best regards,
>> Tiwei Bie
> The way I would do it is attach different PASID values to
> different queues. This way you can use the standard IOMMU
> to enforce protection.

So that's just shared virtual memory on host which can share iova 
address space between a specific queue pair and a process. I'm not sure 
how hard can exist vhost-user backend to support this.

Thanks

>
>
>
>>>
>>>> Best regards,
>>>> Tiwei Bie
>>>>
>>>>>
>>>>>> ---
>>>>>> The idea of this patch is to let QEMU push all the IOTLBs
>>>>>> to vhost-user backend without waiting for the queries from
>>>>>> the backend. Because hardware accelerator at the vhost-user
>>>>>> backend may not be able to handle unknown IOVAs.
>>>>>>
>>>>>> This is just a RFC for now. It seems that, it doesn't work
>>>>>> as expected when guest is using kernel driver (To handle
>>>>>> this case, it seems that some RAM regions' events also need
>>>>>> to be listened). Any comments would be appreciated! Thanks!
>>>>>>
>>>>>>   docs/interop/vhost-user.txt       |  9 ++++++++
>>>>>>   hw/virtio/vhost-backend.c         |  7 ++++++
>>>>>>   hw/virtio/vhost-user.c            |  8 +++++++
>>>>>>   hw/virtio/vhost.c                 | 47 ++++++++++++++++++++++++++++++++++++---
>>>>>>   include/hw/virtio/vhost-backend.h |  3 +++
>>>>>>   5 files changed, 71 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
>>>>>> index 534caab18a..73e07f9dad 100644
>>>>>> --- a/docs/interop/vhost-user.txt
>>>>>> +++ b/docs/interop/vhost-user.txt
>>>>>> @@ -380,6 +380,7 @@ Protocol features
>>>>>>   #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
>>>>>>   #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
>>>>>>   #define VHOST_USER_PROTOCOL_F_CONFIG         9
>>>>>> +#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10
>>>>>>   
>>>>>>   Master message types
>>>>>>   --------------------
>>>>>> @@ -797,3 +798,11 @@ resilient for selective requests.
>>>>>>   For the message types that already solicit a reply from the client, the
>>>>>>   presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
>>>>>>   no behavioural change. (See the 'Communication' section for details.)
>>>>>> +
>>>>>> +VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
>>>>>> +------------------------------------
>>>>>> +By default, vhost-user backend needs to query the IOTLBs from QEMU after
>>>>>> +meeting unknown IOVAs. With this protocol feature negotiated, QEMU will
>>>>>> +provide all the IOTLBs to vhost backend without waiting for the queries
>>>>>> +from backend. This is helpful when using a hardware accelerator which is
>>>>>> +not able to handle unknown IOVAs at the vhost-user backend.
>>>>>> diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
>>>>>> index 7f09efab8b..d72994e9a5 100644
>>>>>> --- a/hw/virtio/vhost-backend.c
>>>>>> +++ b/hw/virtio/vhost-backend.c
>>>>>> @@ -233,6 +233,11 @@ static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev,
>>>>>>           qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL);
>>>>>>   }
>>>>>>   
>>>>>> +static bool vhost_kernel_need_all_device_iotlb(struct vhost_dev *dev)
>>>>>> +{
>>>>>> +    return false;
>>>>>> +}
>>>>>> +
>>>>>>   static const VhostOps kernel_ops = {
>>>>>>           .backend_type = VHOST_BACKEND_TYPE_KERNEL,
>>>>>>           .vhost_backend_init = vhost_kernel_init,
>>>>>> @@ -264,6 +269,8 @@ static const VhostOps kernel_ops = {
>>>>>>   #endif /* CONFIG_VHOST_VSOCK */
>>>>>>           .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback,
>>>>>>           .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg,
>>>>>> +        .vhost_backend_need_all_device_iotlb =
>>>>>> +                                vhost_kernel_need_all_device_iotlb,
>>>>>>   };
>>>>>>   
>>>>>>   int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type)
>>>>>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>>>>>> index 38da8692bb..30e0b1c13b 100644
>>>>>> --- a/hw/virtio/vhost-user.c
>>>>>> +++ b/hw/virtio/vhost-user.c
>>>>>> @@ -47,6 +47,7 @@ enum VhostUserProtocolFeature {
>>>>>>       VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
>>>>>>       VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
>>>>>>       VHOST_USER_PROTOCOL_F_CONFIG = 9,
>>>>>> +    VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10,
>>>>>>       VHOST_USER_PROTOCOL_F_MAX
>>>>>>   };
>>>>>>   
>>>>>> @@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
>>>>>>       return 0;
>>>>>>   }
>>>>>>   
>>>>>> +static bool vhost_user_need_all_device_iotlb(struct vhost_dev *dev)
>>>>>> +{
>>>>>> +    return virtio_has_feature(dev->protocol_features,
>>>>>> +                              VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB);
>>>>>> +}
>>>>>> +
>>>>>>   const VhostOps user_ops = {
>>>>>>           .backend_type = VHOST_BACKEND_TYPE_USER,
>>>>>>           .vhost_backend_init = vhost_user_init,
>>>>>> @@ -1611,4 +1618,5 @@ const VhostOps user_ops = {
>>>>>>           .vhost_set_config = vhost_user_set_config,
>>>>>>           .vhost_crypto_create_session = vhost_user_crypto_create_session,
>>>>>>           .vhost_crypto_close_session = vhost_user_crypto_close_session,
>>>>>> +        .vhost_backend_need_all_device_iotlb = vhost_user_need_all_device_iotlb,
>>>>>>   };
>>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>>>>> index f51bf573d5..70922ee998 100644
>>>>>> --- a/hw/virtio/vhost.c
>>>>>> +++ b/hw/virtio/vhost.c
>>>>>> @@ -48,6 +48,9 @@ static unsigned int used_memslots;
>>>>>>   static QLIST_HEAD(, vhost_dev) vhost_devices =
>>>>>>       QLIST_HEAD_INITIALIZER(vhost_devices);
>>>>>>   
>>>>>> +static int vhost_memory_region_lookup(struct vhost_dev *hdev, uint64_t gpa,
>>>>>> +                                      uint64_t *uaddr, uint64_t *len);
>>>>>> +
>>>>>>   bool vhost_has_free_slot(void)
>>>>>>   {
>>>>>>       unsigned int slots_limit = ~0U;
>>>>>> @@ -634,12 +637,39 @@ static void vhost_region_addnop(MemoryListener *listener,
>>>>>>       vhost_region_add_section(dev, section);
>>>>>>   }
>>>>>>   
>>>>>> -static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>>>>>> +static void vhost_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>>>>>>   {
>>>>>>       struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n);
>>>>>>       struct vhost_dev *hdev = iommu->hdev;
>>>>>>       hwaddr iova = iotlb->iova + iommu->iommu_offset;
>>>>>>   
>>>>>> +    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
>>>>>> +        uint64_t uaddr, len;
>>>>>> +
>>>>>> +        rcu_read_lock();
>>>>>> +
>>>>>> +        if (iotlb->target_as != NULL) {
>>>>>> +            if (vhost_memory_region_lookup(hdev, iotlb->translated_addr,
>>>>>> +                        &uaddr, &len)) {
>>>>>> +                error_report("Fail to lookup the translated address "
>>>>>> +                        "%"PRIx64, iotlb->translated_addr);
>>>>>> +                goto out;
>>>>>> +            }
>>>>>> +
>>>>>> +            len = MIN(iotlb->addr_mask + 1, len);
>>>>>> +            iova = iova & ~iotlb->addr_mask;
>>>>>> +
>>>>>> +            if (vhost_backend_update_device_iotlb(hdev, iova, uaddr,
>>>>>> +                                                  len, iotlb->perm)) {
>>>>>> +                error_report("Fail to update device iotlb");
>>>>>> +                goto out;
>>>>>> +            }
>>>>>> +        }
>>>>>> +out:
>>>>>> +        rcu_read_unlock();
>>>>>> +        return;
>>>>>> +    }
>>>>>> +
>>>>>>       if (vhost_backend_invalidate_device_iotlb(hdev, iova,
>>>>>>                                                 iotlb->addr_mask + 1)) {
>>>>>>           error_report("Fail to invalidate device iotlb");
>>>>>> @@ -652,6 +682,7 @@ static void vhost_iommu_region_add(MemoryListener *listener,
>>>>>>       struct vhost_dev *dev = container_of(listener, struct vhost_dev,
>>>>>>                                            iommu_listener);
>>>>>>       struct vhost_iommu *iommu;
>>>>>> +    IOMMUNotifierFlag flags;
>>>>>>       Int128 end;
>>>>>>   
>>>>>>       if (!memory_region_is_iommu(section->mr)) {
>>>>>> @@ -662,8 +693,15 @@ static void vhost_iommu_region_add(MemoryListener *listener,
>>>>>>       end = int128_add(int128_make64(section->offset_within_region),
>>>>>>                        section->size);
>>>>>>       end = int128_sub(end, int128_one());
>>>>>> -    iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify,
>>>>>> -                        IOMMU_NOTIFIER_UNMAP,
>>>>>> +
>>>>>> +    if (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) {
>>>>>> +        flags = IOMMU_NOTIFIER_ALL;
>>>>>> +    } else {
>>>>>> +        flags = IOMMU_NOTIFIER_UNMAP;
>>>>>> +    }
>>>>>> +
>>>>>> +    iommu_notifier_init(&iommu->n, vhost_iommu_map_notify,
>>>>>> +                        flags,
>>>>>>                           section->offset_within_region,
>>>>>>                           int128_get64(end));
>>>>>>       iommu->mr = section->mr;
>>>>>> @@ -673,6 +711,9 @@ static void vhost_iommu_region_add(MemoryListener *listener,
>>>>>>       memory_region_register_iommu_notifier(section->mr, &iommu->n);
>>>>>>       QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next);
>>>>>>       /* TODO: can replay help performance here? */
>>>>>> +    if (flags == IOMMU_NOTIFIER_ALL) {
>>>>>> +        memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), &iommu->n);
>>>>>> +    }
>>>>>>   }
>>>>>>   
>>>>>>   static void vhost_iommu_region_del(MemoryListener *listener,
>>>>>> diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
>>>>>> index 5dac61f9ea..602fd987a4 100644
>>>>>> --- a/include/hw/virtio/vhost-backend.h
>>>>>> +++ b/include/hw/virtio/vhost-backend.h
>>>>>> @@ -101,6 +101,8 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev,
>>>>>>   typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
>>>>>>                                                uint64_t session_id);
>>>>>>   
>>>>>> +typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct vhost_dev *dev);
>>>>>> +
>>>>>>   typedef struct VhostOps {
>>>>>>       VhostBackendType backend_type;
>>>>>>       vhost_backend_init vhost_backend_init;
>>>>>> @@ -138,6 +140,7 @@ typedef struct VhostOps {
>>>>>>       vhost_set_config_op vhost_set_config;
>>>>>>       vhost_crypto_create_session_op vhost_crypto_create_session;
>>>>>>       vhost_crypto_close_session_op vhost_crypto_close_session;
>>>>>> +    vhost_backend_need_all_device_iotlb_op vhost_backend_need_all_device_iotlb;
>>>>>>   } VhostOps;
>>>>>>   
>>>>>>   extern const VhostOps user_ops;
>>>>>> -- 
>>>>>> 2.11.0


Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Michael S. Tsirkin 6 years ago
On Thu, Apr 12, 2018 at 11:37:35AM +0800, Jason Wang wrote:
> 
> 
> On 2018年04月12日 09:57, Michael S. Tsirkin wrote:
> > On Thu, Apr 12, 2018 at 09:39:43AM +0800, Tiwei Bie wrote:
> > > On Thu, Apr 12, 2018 at 04:29:29AM +0300, Michael S. Tsirkin wrote:
> > > > On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote:
> > > > > On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
> > > > > > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > > > > > > feature for vhost-user. By default, vhost-user backend needs
> > > > > > > to query the IOTLBs from QEMU after meeting unknown IOVAs.
> > > > > > > With this protocol feature negotiated, QEMU will provide all
> > > > > > > the IOTLBs to vhost-user backend without waiting for the
> > > > > > > queries from backend. This is helpful when using a hardware
> > > > > > > accelerator which is not able to handle unknown IOVAs at the
> > > > > > > vhost-user backend.
> > > > > > > 
> > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > This is potentially a large amount of data to be sent
> > > > > > on a socket.
> > > > > If we take the hardware accelerator out of this picture, we
> > > > > will find that it's actually a question of "pre-loading" vs
> > > > > "lazy-loading". I think neither of them is perfect.
> > > > > 
> > > > > For "pre-loading", as you said, we may have a tough starting.
> > > > > But for "lazy-loading", we can't have a predictable performance.
> > > > > A sudden, unexpected performance drop may happen at any time,
> > > > > because we may meet an unknown IOVA at any time in this case.
> > > > That's how hardware behaves too though. So we can expect guests
> > > > to try to optimize locality.
> > > The difference is that, the software implementation needs to
> > > query the mappings via socket. And it's much slower..
> > If you are proposing this new feature as an optimization,
> > then I'd like to see numbers showing the performance gains.
> > 
> > It's definitely possible to optimize things out.  Pre-loading isn't
> > where I would start optimizing though.  For example, DPDK could have its
> > own VTD emulation, then it could access guest memory directly.
> 
> Have vtd emulation in dpdk have many disadvantages:
> 
> - vendor locked, can only work for intel

I don't see what would prevent other vendors from doing the same.

> - duplication of codes and bugs
> - a huge number of new message types needs to be invented

Oh, just the flush I'd wager.

> So I tend to go to a reverse way, link dpdk to qemu.

Won't really help as people want to build software using dpdk.


> > 
> > 
> > > > > Once we meet an unknown IOVA, the backend's data path will need
> > > > > to stop and query the mapping of the IOVA via the socket and
> > > > > wait for the reply. And the latency is not negligible (sometimes
> > > > > it's even unacceptable), especially in high performance network
> > > > > case. So maybe it's better to make both of them available to
> > > > > the vhost backend.
> > > > > 
> > > > > > I had an impression that a hardware accelerator was using
> > > > > > VFIO anyway. Given this, can't we have QEMU program
> > > > > > the shadow IOMMU tables into VFIO directly?
> > > > > I think it's a good idea! Currently, my concern about it is
> > > > > that, the hardware device may not use IOMMU and it may have
> > > > > its builtin address translation unit. And it will be a pain
> > > > > for device vendors to teach VFIO to be able to work with the
> > > > > builtin address translation unit.
> > > > I think such drivers would have to interate with VFIO somehow.
> > > > Otherwise, what is the plan for assigning such devices then?
> > > Such devices are just for vhost data path acceleration.
> > That's not true I think.  E.g. RDMA devices have an on-card MMU.
> > 
> > > They have many available queue pairs, the switch logic
> > > will be done among those queue pairs. And different queue
> > > pairs will serve different VMs directly.
> > > 
> > > Best regards,
> > > Tiwei Bie
> > The way I would do it is attach different PASID values to
> > different queues. This way you can use the standard IOMMU
> > to enforce protection.
> 
> So that's just shared virtual memory on host which can share iova address
> space between a specific queue pair and a process. I'm not sure how hard can
> exist vhost-user backend to support this.
> 
> Thanks

That would be VFIO's job, nothing to do with vhost-user besides
sharing the VFIO descriptor.

> > 
> > 
> > 
> > > > 
> > > > > Best regards,
> > > > > Tiwei Bie
> > > > > 
> > > > > > 
> > > > > > > ---
> > > > > > > The idea of this patch is to let QEMU push all the IOTLBs
> > > > > > > to vhost-user backend without waiting for the queries from
> > > > > > > the backend. Because hardware accelerator at the vhost-user
> > > > > > > backend may not be able to handle unknown IOVAs.
> > > > > > > 
> > > > > > > This is just a RFC for now. It seems that, it doesn't work
> > > > > > > as expected when guest is using kernel driver (To handle
> > > > > > > this case, it seems that some RAM regions' events also need
> > > > > > > to be listened). Any comments would be appreciated! Thanks!
> > > > > > > 
> > > > > > >   docs/interop/vhost-user.txt       |  9 ++++++++
> > > > > > >   hw/virtio/vhost-backend.c         |  7 ++++++
> > > > > > >   hw/virtio/vhost-user.c            |  8 +++++++
> > > > > > >   hw/virtio/vhost.c                 | 47 ++++++++++++++++++++++++++++++++++++---
> > > > > > >   include/hw/virtio/vhost-backend.h |  3 +++
> > > > > > >   5 files changed, 71 insertions(+), 3 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > > > > > > index 534caab18a..73e07f9dad 100644
> > > > > > > --- a/docs/interop/vhost-user.txt
> > > > > > > +++ b/docs/interop/vhost-user.txt
> > > > > > > @@ -380,6 +380,7 @@ Protocol features
> > > > > > >   #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> > > > > > >   #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
> > > > > > >   #define VHOST_USER_PROTOCOL_F_CONFIG         9
> > > > > > > +#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10
> > > > > > >   Master message types
> > > > > > >   --------------------
> > > > > > > @@ -797,3 +798,11 @@ resilient for selective requests.
> > > > > > >   For the message types that already solicit a reply from the client, the
> > > > > > >   presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
> > > > > > >   no behavioural change. (See the 'Communication' section for details.)
> > > > > > > +
> > > > > > > +VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > > > > > > +------------------------------------
> > > > > > > +By default, vhost-user backend needs to query the IOTLBs from QEMU after
> > > > > > > +meeting unknown IOVAs. With this protocol feature negotiated, QEMU will
> > > > > > > +provide all the IOTLBs to vhost backend without waiting for the queries
> > > > > > > +from backend. This is helpful when using a hardware accelerator which is
> > > > > > > +not able to handle unknown IOVAs at the vhost-user backend.
> > > > > > > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> > > > > > > index 7f09efab8b..d72994e9a5 100644
> > > > > > > --- a/hw/virtio/vhost-backend.c
> > > > > > > +++ b/hw/virtio/vhost-backend.c
> > > > > > > @@ -233,6 +233,11 @@ static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev,
> > > > > > >           qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL);
> > > > > > >   }
> > > > > > > +static bool vhost_kernel_need_all_device_iotlb(struct vhost_dev *dev)
> > > > > > > +{
> > > > > > > +    return false;
> > > > > > > +}
> > > > > > > +
> > > > > > >   static const VhostOps kernel_ops = {
> > > > > > >           .backend_type = VHOST_BACKEND_TYPE_KERNEL,
> > > > > > >           .vhost_backend_init = vhost_kernel_init,
> > > > > > > @@ -264,6 +269,8 @@ static const VhostOps kernel_ops = {
> > > > > > >   #endif /* CONFIG_VHOST_VSOCK */
> > > > > > >           .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback,
> > > > > > >           .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg,
> > > > > > > +        .vhost_backend_need_all_device_iotlb =
> > > > > > > +                                vhost_kernel_need_all_device_iotlb,
> > > > > > >   };
> > > > > > >   int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type)
> > > > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > > > > > index 38da8692bb..30e0b1c13b 100644
> > > > > > > --- a/hw/virtio/vhost-user.c
> > > > > > > +++ b/hw/virtio/vhost-user.c
> > > > > > > @@ -47,6 +47,7 @@ enum VhostUserProtocolFeature {
> > > > > > >       VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> > > > > > >       VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
> > > > > > >       VHOST_USER_PROTOCOL_F_CONFIG = 9,
> > > > > > > +    VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10,
> > > > > > >       VHOST_USER_PROTOCOL_F_MAX
> > > > > > >   };
> > > > > > > @@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
> > > > > > >       return 0;
> > > > > > >   }
> > > > > > > +static bool vhost_user_need_all_device_iotlb(struct vhost_dev *dev)
> > > > > > > +{
> > > > > > > +    return virtio_has_feature(dev->protocol_features,
> > > > > > > +                              VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB);
> > > > > > > +}
> > > > > > > +
> > > > > > >   const VhostOps user_ops = {
> > > > > > >           .backend_type = VHOST_BACKEND_TYPE_USER,
> > > > > > >           .vhost_backend_init = vhost_user_init,
> > > > > > > @@ -1611,4 +1618,5 @@ const VhostOps user_ops = {
> > > > > > >           .vhost_set_config = vhost_user_set_config,
> > > > > > >           .vhost_crypto_create_session = vhost_user_crypto_create_session,
> > > > > > >           .vhost_crypto_close_session = vhost_user_crypto_close_session,
> > > > > > > +        .vhost_backend_need_all_device_iotlb = vhost_user_need_all_device_iotlb,
> > > > > > >   };
> > > > > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > > > > > index f51bf573d5..70922ee998 100644
> > > > > > > --- a/hw/virtio/vhost.c
> > > > > > > +++ b/hw/virtio/vhost.c
> > > > > > > @@ -48,6 +48,9 @@ static unsigned int used_memslots;
> > > > > > >   static QLIST_HEAD(, vhost_dev) vhost_devices =
> > > > > > >       QLIST_HEAD_INITIALIZER(vhost_devices);
> > > > > > > +static int vhost_memory_region_lookup(struct vhost_dev *hdev, uint64_t gpa,
> > > > > > > +                                      uint64_t *uaddr, uint64_t *len);
> > > > > > > +
> > > > > > >   bool vhost_has_free_slot(void)
> > > > > > >   {
> > > > > > >       unsigned int slots_limit = ~0U;
> > > > > > > @@ -634,12 +637,39 @@ static void vhost_region_addnop(MemoryListener *listener,
> > > > > > >       vhost_region_add_section(dev, section);
> > > > > > >   }
> > > > > > > -static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > > > > > +static void vhost_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > > > > > >   {
> > > > > > >       struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n);
> > > > > > >       struct vhost_dev *hdev = iommu->hdev;
> > > > > > >       hwaddr iova = iotlb->iova + iommu->iommu_offset;
> > > > > > > +    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> > > > > > > +        uint64_t uaddr, len;
> > > > > > > +
> > > > > > > +        rcu_read_lock();
> > > > > > > +
> > > > > > > +        if (iotlb->target_as != NULL) {
> > > > > > > +            if (vhost_memory_region_lookup(hdev, iotlb->translated_addr,
> > > > > > > +                        &uaddr, &len)) {
> > > > > > > +                error_report("Fail to lookup the translated address "
> > > > > > > +                        "%"PRIx64, iotlb->translated_addr);
> > > > > > > +                goto out;
> > > > > > > +            }
> > > > > > > +
> > > > > > > +            len = MIN(iotlb->addr_mask + 1, len);
> > > > > > > +            iova = iova & ~iotlb->addr_mask;
> > > > > > > +
> > > > > > > +            if (vhost_backend_update_device_iotlb(hdev, iova, uaddr,
> > > > > > > +                                                  len, iotlb->perm)) {
> > > > > > > +                error_report("Fail to update device iotlb");
> > > > > > > +                goto out;
> > > > > > > +            }
> > > > > > > +        }
> > > > > > > +out:
> > > > > > > +        rcu_read_unlock();
> > > > > > > +        return;
> > > > > > > +    }
> > > > > > > +
> > > > > > >       if (vhost_backend_invalidate_device_iotlb(hdev, iova,
> > > > > > >                                                 iotlb->addr_mask + 1)) {
> > > > > > >           error_report("Fail to invalidate device iotlb");
> > > > > > > @@ -652,6 +682,7 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> > > > > > >       struct vhost_dev *dev = container_of(listener, struct vhost_dev,
> > > > > > >                                            iommu_listener);
> > > > > > >       struct vhost_iommu *iommu;
> > > > > > > +    IOMMUNotifierFlag flags;
> > > > > > >       Int128 end;
> > > > > > >       if (!memory_region_is_iommu(section->mr)) {
> > > > > > > @@ -662,8 +693,15 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> > > > > > >       end = int128_add(int128_make64(section->offset_within_region),
> > > > > > >                        section->size);
> > > > > > >       end = int128_sub(end, int128_one());
> > > > > > > -    iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify,
> > > > > > > -                        IOMMU_NOTIFIER_UNMAP,
> > > > > > > +
> > > > > > > +    if (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) {
> > > > > > > +        flags = IOMMU_NOTIFIER_ALL;
> > > > > > > +    } else {
> > > > > > > +        flags = IOMMU_NOTIFIER_UNMAP;
> > > > > > > +    }
> > > > > > > +
> > > > > > > +    iommu_notifier_init(&iommu->n, vhost_iommu_map_notify,
> > > > > > > +                        flags,
> > > > > > >                           section->offset_within_region,
> > > > > > >                           int128_get64(end));
> > > > > > >       iommu->mr = section->mr;
> > > > > > > @@ -673,6 +711,9 @@ static void vhost_iommu_region_add(MemoryListener *listener,
> > > > > > >       memory_region_register_iommu_notifier(section->mr, &iommu->n);
> > > > > > >       QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next);
> > > > > > >       /* TODO: can replay help performance here? */
> > > > > > > +    if (flags == IOMMU_NOTIFIER_ALL) {
> > > > > > > +        memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), &iommu->n);
> > > > > > > +    }
> > > > > > >   }
> > > > > > >   static void vhost_iommu_region_del(MemoryListener *listener,
> > > > > > > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> > > > > > > index 5dac61f9ea..602fd987a4 100644
> > > > > > > --- a/include/hw/virtio/vhost-backend.h
> > > > > > > +++ b/include/hw/virtio/vhost-backend.h
> > > > > > > @@ -101,6 +101,8 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev,
> > > > > > >   typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
> > > > > > >                                                uint64_t session_id);
> > > > > > > +typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct vhost_dev *dev);
> > > > > > > +
> > > > > > >   typedef struct VhostOps {
> > > > > > >       VhostBackendType backend_type;
> > > > > > >       vhost_backend_init vhost_backend_init;
> > > > > > > @@ -138,6 +140,7 @@ typedef struct VhostOps {
> > > > > > >       vhost_set_config_op vhost_set_config;
> > > > > > >       vhost_crypto_create_session_op vhost_crypto_create_session;
> > > > > > >       vhost_crypto_close_session_op vhost_crypto_close_session;
> > > > > > > +    vhost_backend_need_all_device_iotlb_op vhost_backend_need_all_device_iotlb;
> > > > > > >   } VhostOps;
> > > > > > >   extern const VhostOps user_ops;
> > > > > > > -- 
> > > > > > > 2.11.0

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Jason Wang 6 years ago

On 2018年04月12日 11:41, Michael S. Tsirkin wrote:
> On Thu, Apr 12, 2018 at 11:37:35AM +0800, Jason Wang wrote:
>>
>> On 2018年04月12日 09:57, Michael S. Tsirkin wrote:
>>> On Thu, Apr 12, 2018 at 09:39:43AM +0800, Tiwei Bie wrote:
>>>> On Thu, Apr 12, 2018 at 04:29:29AM +0300, Michael S. Tsirkin wrote:
>>>>> On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote:
>>>>>> On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
>>>>>>>> This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
>>>>>>>> feature for vhost-user. By default, vhost-user backend needs
>>>>>>>> to query the IOTLBs from QEMU after meeting unknown IOVAs.
>>>>>>>> With this protocol feature negotiated, QEMU will provide all
>>>>>>>> the IOTLBs to vhost-user backend without waiting for the
>>>>>>>> queries from backend. This is helpful when using a hardware
>>>>>>>> accelerator which is not able to handle unknown IOVAs at the
>>>>>>>> vhost-user backend.
>>>>>>>>
>>>>>>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>>>>>> This is potentially a large amount of data to be sent
>>>>>>> on a socket.
>>>>>> If we take the hardware accelerator out of this picture, we
>>>>>> will find that it's actually a question of "pre-loading" vs
>>>>>> "lazy-loading". I think neither of them is perfect.
>>>>>>
>>>>>> For "pre-loading", as you said, we may have a tough starting.
>>>>>> But for "lazy-loading", we can't have a predictable performance.
>>>>>> A sudden, unexpected performance drop may happen at any time,
>>>>>> because we may meet an unknown IOVA at any time in this case.
>>>>> That's how hardware behaves too though. So we can expect guests
>>>>> to try to optimize locality.
>>>> The difference is that, the software implementation needs to
>>>> query the mappings via socket. And it's much slower..
>>> If you are proposing this new feature as an optimization,
>>> then I'd like to see numbers showing the performance gains.
>>>
>>> It's definitely possible to optimize things out.  Pre-loading isn't
>>> where I would start optimizing though.  For example, DPDK could have its
>>> own VTD emulation, then it could access guest memory directly.
>> Have vtd emulation in dpdk have many disadvantages:
>>
>> - vendor locked, can only work for intel
> I don't see what would prevent other vendors from doing the same.

Technically it can, two questions here:

- Shouldn't we keep vhost-user vendor/transport independent?
- Do we really prefer the split device model here, it means to implement 
datapath in two places at least. Personally I prefer to keep all virt 
stuffs inside qemu.

>
>> - duplication of codes and bugs
>> - a huge number of new message types needs to be invented
> Oh, just the flush I'd wager.

Not only flush, but also error reporting, context entry programming and 
even PRS in the future. And we need a feature negotiation between them 
like vhost to keep the compatibility for future features. This sounds 
not good.

>
>> So I tend to go to a reverse way, link dpdk to qemu.
> Won't really help as people want to build software using dpdk.

Well I believe the main use case it vDPA which is hardware virtio 
offload. For building software using dpdk like ovs-dpdk, it's another 
interesting topic. We can seek solution other than linking dpdk to qemu, 
e.g we can do all virtio and packet copy stuffs inside a qemu IOThread 
and use another inter-process channel to communicate with OVS-dpdk (or 
another virtio-user here). The key is to hide all virtualization details 
from OVS-dpdk.

>
>
>>>
>>>>>> Once we meet an unknown IOVA, the backend's data path will need
>>>>>> to stop and query the mapping of the IOVA via the socket and
>>>>>> wait for the reply. And the latency is not negligible (sometimes
>>>>>> it's even unacceptable), especially in high performance network
>>>>>> case. So maybe it's better to make both of them available to
>>>>>> the vhost backend.
>>>>>>
>>>>>>> I had an impression that a hardware accelerator was using
>>>>>>> VFIO anyway. Given this, can't we have QEMU program
>>>>>>> the shadow IOMMU tables into VFIO directly?
>>>>>> I think it's a good idea! Currently, my concern about it is
>>>>>> that, the hardware device may not use IOMMU and it may have
>>>>>> its builtin address translation unit. And it will be a pain
>>>>>> for device vendors to teach VFIO to be able to work with the
>>>>>> builtin address translation unit.
>>>>> I think such drivers would have to interate with VFIO somehow.
>>>>> Otherwise, what is the plan for assigning such devices then?
>>>> Such devices are just for vhost data path acceleration.
>>> That's not true I think.  E.g. RDMA devices have an on-card MMU.
>>>
>>>> They have many available queue pairs, the switch logic
>>>> will be done among those queue pairs. And different queue
>>>> pairs will serve different VMs directly.
>>>>
>>>> Best regards,
>>>> Tiwei Bie
>>> The way I would do it is attach different PASID values to
>>> different queues. This way you can use the standard IOMMU
>>> to enforce protection.
>> So that's just shared virtual memory on host which can share iova address
>> space between a specific queue pair and a process. I'm not sure how hard can
>> exist vhost-user backend to support this.
>>
>> Thanks
> That would be VFIO's job, nothing to do with vhost-user besides
> sharing the VFIO descriptor.

At least dpdk need to offload DMA mapping setup to qemu.

Thanks

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Stefan Hajnoczi 6 years ago
On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
> This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB

Naming suggestion: "PREFILL_IOTLB" instead of "NEED_ALL_IOTLB".

I wasn't able to guess what "NEED_ALL_IOTLB" does.  "PREFILL_IOTLB"
communicates that memory translation will be set up in advance.
Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Jason Wang 6 years ago

On 2018年04月16日 15:47, Stefan Hajnoczi wrote:
> On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
>> This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> Naming suggestion: "PREFILL_IOTLB" instead of "NEED_ALL_IOTLB".
>
> I wasn't able to guess what "NEED_ALL_IOTLB" does.  "PREFILL_IOTLB"
> communicates that memory translation will be set up in advance.

I agree we need a better name here but it should not have anything like 
"IOTLB" since it actually try to do page table shadowing on the remote 
vhost-user backend.

Thanks

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Posted by Tiwei Bie 6 years ago
On Mon, Apr 16, 2018 at 03:47:06PM +0800, Stefan Hajnoczi wrote:
> On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
> > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> 
> Naming suggestion: "PREFILL_IOTLB" instead of "NEED_ALL_IOTLB".
> 
> I wasn't able to guess what "NEED_ALL_IOTLB" does.  "PREFILL_IOTLB"
> communicates that memory translation will be set up in advance.

By naming it as NEED_ALL_IOTLB, I want to indicate that,
without this, it's hard or impossible for backend to get
all the IOTLBs (because backend needs to meet IOVAs first
before getting the corresponding IOTLBs, and backend can't
tell whether it has met all the IOVAs or not, especially
when dynamic mapping is being used by guest). But with
this feature negotiated, backend will expect to get all
the IOTLBs. But it seems that, the name (NEED_ALL_IOTLB)
isn't good. And thanks for your suggestion! :)

Best regards,
Tiwei Bie