Changeset
backends/cryptodev-vhost-user.c     |  20 +++-
docs/interop/vhost-user.txt         |  33 ++++++
hw/block/vhost-user-blk.c           |  22 +++-
hw/scsi/vhost-user-scsi.c           |  20 +++-
hw/virtio/Makefile.objs             |   2 +-
hw/virtio/vhost-stub.c              |  10 ++
hw/virtio/vhost-user.c              | 206 ++++++++++++++++++++++++++++++++++--
hw/virtio/vhost.c                   |   9 +-
hw/virtio/virtio-pci.c              |  22 ++++
hw/virtio/virtio.c                  |  13 +++
include/hw/virtio/vhost-backend.h   |   4 +
include/hw/virtio/vhost-user-blk.h  |   2 +
include/hw/virtio/vhost-user-scsi.h |   2 +
include/hw/virtio/vhost-user.h      |  28 +++++
include/hw/virtio/virtio-bus.h      |   2 +
include/hw/virtio/virtio.h          |   2 +
net/vhost-user.c                    |  78 +++++++++-----
17 files changed, 433 insertions(+), 42 deletions(-)
create mode 100644 include/hw/virtio/vhost-user.h
Git apply log
Switched to a new branch '20180412151232.17506-1-tiwei.bie@intel.com'
Applying: vhost-user: add Net prefix to internal state structure
Applying: vhost-user: introduce shared vhost-user state
Applying: vhost-user: support receiving file descriptors in slave_read
Applying: virtio: support setting memory region based host notifier
Applying: vhost: allow backends to filter memory sections
Applying: vhost-user: support registering external host notifiers
To https://github.com/patchew-project/qemu
 * [new tag]         patchew/20180412151232.17506-1-tiwei.bie@intel.com -> patchew/20180412151232.17506-1-tiwei.bie@intel.com
Test passed: checkpatch

loading

Test passed: docker-mingw@fedora

loading

Test passed: docker-build@min-glib

loading

Test passed: s390x

loading

[Qemu-devel] [PATCH v3 0/6] Extend vhost-user to support registering external host notifiers
Posted by Tiwei Bie, 1 week ago
The original subject is: Extend vhost-user to support VFIO based accelerators

Update notes
============

Now, this patch set just focuses on adding the support for
registering memory region based host notifiers. With this
support, guest driver in the VM will be able to notify the
hardware device at the vhost backend directly.

It's one of the most important things in vDPA -- the host
notification offload. Because, normally, the hardware device
heavily depends on the notifications. Without this support,
there will be a lot of VM-Exit happen due to the notifications
from guest driver (it will drop the VM performance) and a
lot of CPU resources wasted to do the notification relay
(it will make the hardware offload less attractive, because
one important goal of hardware offload is to free the CPU
resources).

More backgrounds of this patch set can be found from the
cover letter of the previous versions:

RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html
v1:  http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06028.html
v2:  http://lists.nongnu.org/archive/html/qemu-devel/2018-03/msg05009.html

v2 -> v3:
- A better implementation of the shared vhost-user state (MST);
- Use bus callback to add/delete subregions for notification (MST);
- Refine APIs' names which add/delete subregions for notification (MST);
- Refine the doc of the new vhost-user types and messages (MST);
- Separate host notification offload from the guest notification offload (MST);
- Drop the guest notification offload support from this patch;
- Add memory filter for vhost backend to filter the sections they can handle;

v1 -> v2:
- Add some explanations about why extend vhost-user in commit log (Paolo);
- Bug fix in slave_read() according to Stefan's fix in DPDK;
- Remove IOMMU feature check and related commit log;
- Some minor refinements;
- Rebase to the latest QEMU;

RFC -> v1:
- Add some details about how vDPA works in cover letter (Alexey)
- Add some details about the OVS offload use-case in cover letter (Jason)
- Move PCI specific stuffs out of vhost-user (Jason)
- Handle the virtual IOMMU case (Jason)
- Move VFIO group management code into vfio/common.c (Alex)
- Various refinements;
(approximately sorted by comment posting time)

Tiwei Bie (6):
  vhost-user: add Net prefix to internal state structure
  vhost-user: introduce shared vhost-user state
  vhost-user: support receiving file descriptors in slave_read
  virtio: support setting memory region based host notifier
  vhost: allow backends to filter memory sections
  vhost-user: support registering external host notifiers

 backends/cryptodev-vhost-user.c     |  20 +++-
 docs/interop/vhost-user.txt         |  33 ++++++
 hw/block/vhost-user-blk.c           |  22 +++-
 hw/scsi/vhost-user-scsi.c           |  20 +++-
 hw/virtio/Makefile.objs             |   2 +-
 hw/virtio/vhost-stub.c              |  10 ++
 hw/virtio/vhost-user.c              | 206 ++++++++++++++++++++++++++++++++++--
 hw/virtio/vhost.c                   |   9 +-
 hw/virtio/virtio-pci.c              |  22 ++++
 hw/virtio/virtio.c                  |  13 +++
 include/hw/virtio/vhost-backend.h   |   4 +
 include/hw/virtio/vhost-user-blk.h  |   2 +
 include/hw/virtio/vhost-user-scsi.h |   2 +
 include/hw/virtio/vhost-user.h      |  28 +++++
 include/hw/virtio/virtio-bus.h      |   2 +
 include/hw/virtio/virtio.h          |   2 +
 net/vhost-user.c                    |  78 +++++++++-----
 17 files changed, 433 insertions(+), 42 deletions(-)
 create mode 100644 include/hw/virtio/vhost-user.h

-- 
2.11.0


[Qemu-devel] [PATCH v3 1/6] vhost-user: add Net prefix to internal state structure
Posted by Tiwei Bie, 1 week ago
We are going to introduce a shared vhost user state which
will be named as 'VhostUserState'. So add 'Net' prefix to
the existing internal state structure in the vhost-user
netdev to avoid conflict.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 net/vhost-user.c | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index e0f16c895b..fa28aad12d 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -20,38 +20,38 @@
 #include "qemu/option.h"
 #include "trace.h"
 
-typedef struct VhostUserState {
+typedef struct NetVhostUserState {
     NetClientState nc;
     CharBackend chr; /* only queue index 0 */
     VHostNetState *vhost_net;
     guint watch;
     uint64_t acked_features;
     bool started;
-} VhostUserState;
+} NetVhostUserState;
 
 VHostNetState *vhost_user_get_vhost_net(NetClientState *nc)
 {
-    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+    NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
     assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_USER);
     return s->vhost_net;
 }
 
 uint64_t vhost_user_get_acked_features(NetClientState *nc)
 {
-    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+    NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
     assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_USER);
     return s->acked_features;
 }
 
 static void vhost_user_stop(int queues, NetClientState *ncs[])
 {
-    VhostUserState *s;
+    NetVhostUserState *s;
     int i;
 
     for (i = 0; i < queues; i++) {
         assert(ncs[i]->info->type == NET_CLIENT_DRIVER_VHOST_USER);
 
-        s = DO_UPCAST(VhostUserState, nc, ncs[i]);
+        s = DO_UPCAST(NetVhostUserState, nc, ncs[i]);
 
         if (s->vhost_net) {
             /* save acked features */
@@ -68,7 +68,7 @@ static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
 {
     VhostNetOptions options;
     struct vhost_net *net = NULL;
-    VhostUserState *s;
+    NetVhostUserState *s;
     int max_queues;
     int i;
 
@@ -77,7 +77,7 @@ static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
     for (i = 0; i < queues; i++) {
         assert(ncs[i]->info->type == NET_CLIENT_DRIVER_VHOST_USER);
 
-        s = DO_UPCAST(VhostUserState, nc, ncs[i]);
+        s = DO_UPCAST(NetVhostUserState, nc, ncs[i]);
 
         options.net_backend = ncs[i];
         options.opaque      = be;
@@ -123,7 +123,7 @@ static ssize_t vhost_user_receive(NetClientState *nc, const uint8_t *buf,
        without GUEST_ANNOUNCE capability.
      */
     if (size == 60) {
-        VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+        NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
         int r;
         static int display_rarp_failure = 1;
         char mac_addr[6];
@@ -146,7 +146,7 @@ static ssize_t vhost_user_receive(NetClientState *nc, const uint8_t *buf,
 
 static void vhost_user_cleanup(NetClientState *nc)
 {
-    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+    NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
 
     if (s->vhost_net) {
         vhost_net_cleanup(s->vhost_net);
@@ -180,7 +180,7 @@ static bool vhost_user_has_ufo(NetClientState *nc)
 
 static NetClientInfo net_vhost_user_info = {
         .type = NET_CLIENT_DRIVER_VHOST_USER,
-        .size = sizeof(VhostUserState),
+        .size = sizeof(NetVhostUserState),
         .receive = vhost_user_receive,
         .cleanup = vhost_user_cleanup,
         .has_vnet_hdr = vhost_user_has_vnet_hdr,
@@ -190,7 +190,7 @@ static NetClientInfo net_vhost_user_info = {
 static gboolean net_vhost_user_watch(GIOChannel *chan, GIOCondition cond,
                                            void *opaque)
 {
-    VhostUserState *s = opaque;
+    NetVhostUserState *s = opaque;
 
     qemu_chr_fe_disconnect(&s->chr);
 
@@ -203,7 +203,7 @@ static void chr_closed_bh(void *opaque)
 {
     const char *name = opaque;
     NetClientState *ncs[MAX_QUEUE_NUM];
-    VhostUserState *s;
+    NetVhostUserState *s;
     Error *err = NULL;
     int queues;
 
@@ -212,7 +212,7 @@ static void chr_closed_bh(void *opaque)
                                           MAX_QUEUE_NUM);
     assert(queues < MAX_QUEUE_NUM);
 
-    s = DO_UPCAST(VhostUserState, nc, ncs[0]);
+    s = DO_UPCAST(NetVhostUserState, nc, ncs[0]);
 
     qmp_set_link(name, false, &err);
     vhost_user_stop(queues, ncs);
@@ -229,7 +229,7 @@ static void net_vhost_user_event(void *opaque, int event)
 {
     const char *name = opaque;
     NetClientState *ncs[MAX_QUEUE_NUM];
-    VhostUserState *s;
+    NetVhostUserState *s;
     Chardev *chr;
     Error *err = NULL;
     int queues;
@@ -239,7 +239,7 @@ static void net_vhost_user_event(void *opaque, int event)
                                           MAX_QUEUE_NUM);
     assert(queues < MAX_QUEUE_NUM);
 
-    s = DO_UPCAST(VhostUserState, nc, ncs[0]);
+    s = DO_UPCAST(NetVhostUserState, nc, ncs[0]);
     chr = qemu_chr_fe_get_driver(&s->chr);
     trace_vhost_user_event(chr->label, event);
     switch (event) {
@@ -283,7 +283,7 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
 {
     Error *err = NULL;
     NetClientState *nc, *nc0 = NULL;
-    VhostUserState *s;
+    NetVhostUserState *s;
     int i;
 
     assert(name);
@@ -296,7 +296,7 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
         nc->queue_index = i;
         if (!nc0) {
             nc0 = nc;
-            s = DO_UPCAST(VhostUserState, nc, nc);
+            s = DO_UPCAST(NetVhostUserState, nc, nc);
             if (!qemu_chr_fe_init(&s->chr, chr, &err)) {
                 error_report_err(err);
                 return -1;
@@ -305,7 +305,7 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
 
     }
 
-    s = DO_UPCAST(VhostUserState, nc, nc0);
+    s = DO_UPCAST(NetVhostUserState, nc, nc0);
     do {
         if (qemu_chr_fe_wait_connected(&s->chr, &err) < 0) {
             error_report_err(err);
-- 
2.11.0


[Qemu-devel] [PATCH v3 2/6] vhost-user: introduce shared vhost-user state
Posted by Tiwei Bie, 1 week ago
When multi queue is enabled e.g. for a virtio-net device,
each queue pair will have a vhost_dev, and the only thing
shared between vhost devs currently is the chardev. This
patch introduces a vhost-user state structure which will
be shared by all vhost devs of the same virtio device.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 backends/cryptodev-vhost-user.c     | 20 ++++++++++++++++++-
 hw/block/vhost-user-blk.c           | 22 +++++++++++++++++++-
 hw/scsi/vhost-user-scsi.c           | 20 ++++++++++++++++++-
 hw/virtio/Makefile.objs             |  2 +-
 hw/virtio/vhost-stub.c              | 10 ++++++++++
 hw/virtio/vhost-user.c              | 31 +++++++++++++++++++---------
 include/hw/virtio/vhost-user-blk.h  |  2 ++
 include/hw/virtio/vhost-user-scsi.h |  2 ++
 include/hw/virtio/vhost-user.h      | 20 +++++++++++++++++++
 net/vhost-user.c                    | 40 ++++++++++++++++++++++++++++++-------
 10 files changed, 149 insertions(+), 20 deletions(-)
 create mode 100644 include/hw/virtio/vhost-user.h

diff --git a/backends/cryptodev-vhost-user.c b/backends/cryptodev-vhost-user.c
index 862d4f2580..d52daccfcd 100644
--- a/backends/cryptodev-vhost-user.c
+++ b/backends/cryptodev-vhost-user.c
@@ -26,6 +26,7 @@
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/error-report.h"
+#include "hw/virtio/vhost-user.h"
 #include "standard-headers/linux/virtio_crypto.h"
 #include "sysemu/cryptodev-vhost.h"
 #include "chardev/char-fe.h"
@@ -46,6 +47,7 @@
 typedef struct CryptoDevBackendVhostUser {
     CryptoDevBackend parent_obj;
 
+    VhostUserState *vhost_user;
     CharBackend chr;
     char *chr_name;
     bool opened;
@@ -102,7 +104,7 @@ cryptodev_vhost_user_start(int queues,
             continue;
         }
 
-        options.opaque = &s->chr;
+        options.opaque = s->vhost_user;
         options.backend_type = VHOST_BACKEND_TYPE_USER;
         options.cc = b->conf.peers.ccs[i];
         s->vhost_crypto[i] = cryptodev_vhost_init(&options);
@@ -185,6 +187,7 @@ static void cryptodev_vhost_user_init(
     size_t i;
     Error *local_err = NULL;
     Chardev *chr;
+    VhostUserState *user;
     CryptoDevBackendClient *cc;
     CryptoDevBackendVhostUser *s =
                       CRYPTODEV_BACKEND_VHOST_USER(backend);
@@ -215,6 +218,15 @@ static void cryptodev_vhost_user_init(
         }
     }
 
+    user = vhost_user_init();
+    if (!user) {
+        error_setg(errp, "Failed to init vhost_user");
+        return;
+    }
+
+    user->chr = &s->chr;
+    s->vhost_user = user;
+
     qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
                      cryptodev_vhost_user_event, NULL, s, NULL, true);
 
@@ -299,6 +311,12 @@ static void cryptodev_vhost_user_cleanup(
             backend->conf.peers.ccs[i] = NULL;
         }
     }
+
+    if (s->vhost_user) {
+        vhost_user_cleanup(s->vhost_user);
+        g_free(s->vhost_user);
+        s->vhost_user = NULL;
+    }
 }
 
 static void cryptodev_vhost_user_set_chardev(Object *obj,
diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 262baca432..4021d71c31 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -229,6 +229,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
+    VhostUserState *user;
     int i, ret;
 
     if (!s->chardev.chr) {
@@ -246,6 +247,15 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
         return;
     }
 
+    user = vhost_user_init();
+    if (!user) {
+        error_setg(errp, "vhost-user-blk: failed to init vhost_user");
+        return;
+    }
+
+    user->chr = &s->chardev;
+    s->vhost_user = user;
+
     virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
                 sizeof(struct virtio_blk_config));
 
@@ -261,7 +271,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
 
     vhost_dev_set_config_notifier(&s->dev, &blk_ops);
 
-    ret = vhost_dev_init(&s->dev, &s->chardev, VHOST_BACKEND_TYPE_USER, 0);
+    ret = vhost_dev_init(&s->dev, s->vhost_user, VHOST_BACKEND_TYPE_USER, 0);
     if (ret < 0) {
         error_setg(errp, "vhost-user-blk: vhost initialization failed: %s",
                    strerror(-ret));
@@ -286,6 +296,10 @@ vhost_err:
 virtio_err:
     g_free(s->dev.vqs);
     virtio_cleanup(vdev);
+
+    vhost_user_cleanup(user);
+    g_free(user);
+    s->vhost_user = NULL;
 }
 
 static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
@@ -297,6 +311,12 @@ static void vhost_user_blk_device_unrealize(DeviceState *dev, Error **errp)
     vhost_dev_cleanup(&s->dev);
     g_free(s->dev.vqs);
     virtio_cleanup(vdev);
+
+    if (s->vhost_user) {
+        vhost_user_cleanup(s->vhost_user);
+        g_free(s->vhost_user);
+        s->vhost_user = NULL;
+    }
 }
 
 static void vhost_user_blk_instance_init(Object *obj)
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index 9389ed48e0..9355cfdf07 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -69,6 +69,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
     VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
     VHostUserSCSI *s = VHOST_USER_SCSI(dev);
     VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
+    VhostUserState *user;
     Error *err = NULL;
     int ret;
 
@@ -85,19 +86,30 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
         return;
     }
 
+    user = vhost_user_init();
+    if (!user) {
+        error_setg(errp, "vhost-user-scsi: failed to init vhost_user");
+        return;
+    }
+    user->chr = &vs->conf.chardev;
+
     vsc->dev.nvqs = 2 + vs->conf.num_queues;
     vsc->dev.vqs = g_new(struct vhost_virtqueue, vsc->dev.nvqs);
     vsc->dev.vq_index = 0;
     vsc->dev.backend_features = 0;
 
-    ret = vhost_dev_init(&vsc->dev, (void *)&vs->conf.chardev,
+    ret = vhost_dev_init(&vsc->dev, user,
                          VHOST_BACKEND_TYPE_USER, 0);
     if (ret < 0) {
         error_setg(errp, "vhost-user-scsi: vhost initialization failed: %s",
                    strerror(-ret));
+        vhost_user_cleanup(user);
+        g_free(user);
         return;
     }
 
+    s->vhost_user = user;
+
     /* Channel and lun both are 0 for bootable vhost-user-scsi disk */
     vsc->channel = 0;
     vsc->lun = 0;
@@ -117,6 +129,12 @@ static void vhost_user_scsi_unrealize(DeviceState *dev, Error **errp)
     g_free(vsc->dev.vqs);
 
     virtio_scsi_common_unrealize(dev, errp);
+
+    if (s->vhost_user) {
+        vhost_user_cleanup(s->vhost_user);
+        g_free(s->vhost_user);
+        s->vhost_user = NULL;
+    }
 }
 
 static uint64_t vhost_user_scsi_get_features(VirtIODevice *vdev,
diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index 765d363c1f..030969e28c 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -11,5 +11,5 @@ obj-y += virtio-crypto.o
 obj-$(CONFIG_VIRTIO_PCI) += virtio-crypto-pci.o
 endif
 
-common-obj-$(call lnot,$(CONFIG_LINUX)) += vhost-stub.o
+common-obj-$(call lnot,$(call land,$(CONFIG_VIRTIO),$(CONFIG_LINUX))) += vhost-stub.o
 common-obj-$(CONFIG_ALL) += vhost-stub.o
diff --git a/hw/virtio/vhost-stub.c b/hw/virtio/vhost-stub.c
index 2d76cdebdc..049089b5e2 100644
--- a/hw/virtio/vhost-stub.c
+++ b/hw/virtio/vhost-stub.c
@@ -1,7 +1,17 @@
 #include "qemu/osdep.h"
 #include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-user.h"
 
 bool vhost_has_free_slot(void)
 {
     return true;
 }
+
+VhostUserState *vhost_user_init(void)
+{
+    return NULL;
+}
+
+void vhost_user_cleanup(VhostUserState *user)
+{
+}
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 38da8692bb..91edd95453 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -11,6 +11,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-user.h"
 #include "hw/virtio/vhost-backend.h"
 #include "hw/virtio/virtio-net.h"
 #include "chardev/char-fe.h"
@@ -173,7 +174,8 @@ static VhostUserMsg m __attribute__ ((unused));
 
 struct vhost_user {
     struct vhost_dev *dev;
-    CharBackend *chr;
+    /* Shared between vhost devs of the same virtio device */
+    VhostUserState *user;
     int slave_fd;
     NotifierWithReturn postcopy_notifier;
     struct PostCopyFD  postcopy_fd;
@@ -199,7 +201,7 @@ static bool ioeventfd_enabled(void)
 static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
 {
     struct vhost_user *u = dev->opaque;
-    CharBackend *chr = u->chr;
+    CharBackend *chr = u->user->chr;
     uint8_t *p = (uint8_t *) msg;
     int r, size = VHOST_USER_HDR_SIZE;
 
@@ -285,7 +287,7 @@ static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
                             int *fds, int fd_num)
 {
     struct vhost_user *u = dev->opaque;
-    CharBackend *chr = u->chr;
+    CharBackend *chr = u->user->chr;
     int ret, size = VHOST_USER_HDR_SIZE + msg->hdr.size;
 
     /*
@@ -1044,7 +1046,7 @@ static int vhost_user_postcopy_waker(struct PostCopyFD *pcfd, RAMBlock *rb,
 static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
 {
     struct vhost_user *u = dev->opaque;
-    CharBackend *chr = u->chr;
+    CharBackend *chr = u->user->chr;
     int ufd;
     VhostUserMsg msg = {
         .hdr.request = VHOST_USER_POSTCOPY_ADVISE,
@@ -1182,7 +1184,7 @@ static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
     return 0;
 }
 
-static int vhost_user_init(struct vhost_dev *dev, void *opaque)
+static int vhost_user_backend_init(struct vhost_dev *dev, void *opaque)
 {
     uint64_t features, protocol_features;
     struct vhost_user *u;
@@ -1191,7 +1193,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
     u = g_new0(struct vhost_user, 1);
-    u->chr = opaque;
+    u->user = opaque;
     u->slave_fd = -1;
     u->dev = dev;
     dev->opaque = u;
@@ -1267,7 +1269,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
     return 0;
 }
 
-static int vhost_user_cleanup(struct vhost_dev *dev)
+static int vhost_user_backend_cleanup(struct vhost_dev *dev)
 {
     struct vhost_user *u;
 
@@ -1581,10 +1583,21 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
     return 0;
 }
 
+VhostUserState *vhost_user_init(void)
+{
+    VhostUserState *user = g_new0(struct VhostUserState, 1);
+
+    return user;
+}
+
+void vhost_user_cleanup(VhostUserState *user)
+{
+}
+
 const VhostOps user_ops = {
         .backend_type = VHOST_BACKEND_TYPE_USER,
-        .vhost_backend_init = vhost_user_init,
-        .vhost_backend_cleanup = vhost_user_cleanup,
+        .vhost_backend_init = vhost_user_backend_init,
+        .vhost_backend_cleanup = vhost_user_backend_cleanup,
         .vhost_backend_memslots_limit = vhost_user_memslots_limit,
         .vhost_set_log_base = vhost_user_set_log_base,
         .vhost_set_mem_table = vhost_user_set_mem_table,
diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h
index 5804cc904a..f1258ae545 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -21,6 +21,7 @@
 #include "hw/block/block.h"
 #include "chardev/char-fe.h"
 #include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-user.h"
 
 #define TYPE_VHOST_USER_BLK "vhost-user-blk"
 #define VHOST_USER_BLK(obj) \
@@ -36,6 +37,7 @@ typedef struct VHostUserBlk {
     uint32_t config_wce;
     uint32_t config_ro;
     struct vhost_dev dev;
+    VhostUserState *vhost_user;
 } VHostUserBlk;
 
 #endif
diff --git a/include/hw/virtio/vhost-user-scsi.h b/include/hw/virtio/vhost-user-scsi.h
index 01861f78d0..3ec34ae867 100644
--- a/include/hw/virtio/vhost-user-scsi.h
+++ b/include/hw/virtio/vhost-user-scsi.h
@@ -21,6 +21,7 @@
 #include "hw/qdev.h"
 #include "hw/virtio/virtio-scsi.h"
 #include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-user.h"
 #include "hw/virtio/vhost-scsi-common.h"
 
 #define TYPE_VHOST_USER_SCSI "vhost-user-scsi"
@@ -30,6 +31,7 @@
 typedef struct VHostUserSCSI {
     VHostSCSICommon parent_obj;
     uint64_t host_features;
+    VhostUserState *vhost_user;
 } VHostUserSCSI;
 
 #endif /* VHOST_USER_SCSI_H */
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
new file mode 100644
index 0000000000..eb8bc0d90d
--- /dev/null
+++ b/include/hw/virtio/vhost-user.h
@@ -0,0 +1,20 @@
+/*
+ * Copyright (c) 2017-2018 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_VIRTIO_VHOST_USER_H
+#define HW_VIRTIO_VHOST_USER_H
+
+#include "chardev/char-fe.h"
+
+typedef struct VhostUserState {
+    CharBackend *chr;
+} VhostUserState;
+
+VhostUserState *vhost_user_init(void);
+void vhost_user_cleanup(VhostUserState *user);
+
+#endif
diff --git a/net/vhost-user.c b/net/vhost-user.c
index fa28aad12d..525a061acf 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -12,6 +12,7 @@
 #include "clients.h"
 #include "net/vhost_net.h"
 #include "net/vhost-user.h"
+#include "hw/virtio/vhost-user.h"
 #include "chardev/char-fe.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-net.h"
@@ -23,6 +24,7 @@
 typedef struct NetVhostUserState {
     NetClientState nc;
     CharBackend chr; /* only queue index 0 */
+    VhostUserState *vhost_user;
     VHostNetState *vhost_net;
     guint watch;
     uint64_t acked_features;
@@ -64,7 +66,8 @@ static void vhost_user_stop(int queues, NetClientState *ncs[])
     }
 }
 
-static int vhost_user_start(int queues, NetClientState *ncs[], CharBackend *be)
+static int vhost_user_start(int queues, NetClientState *ncs[],
+                            VhostUserState *be)
 {
     VhostNetOptions options;
     struct vhost_net *net = NULL;
@@ -144,7 +147,7 @@ static ssize_t vhost_user_receive(NetClientState *nc, const uint8_t *buf,
     return size;
 }
 
-static void vhost_user_cleanup(NetClientState *nc)
+static void net_vhost_user_cleanup(NetClientState *nc)
 {
     NetVhostUserState *s = DO_UPCAST(NetVhostUserState, nc, nc);
 
@@ -153,6 +156,11 @@ static void vhost_user_cleanup(NetClientState *nc)
         g_free(s->vhost_net);
         s->vhost_net = NULL;
     }
+    if (s->vhost_user) {
+        vhost_user_cleanup(s->vhost_user);
+        g_free(s->vhost_user);
+        s->vhost_user = NULL;
+    }
     if (nc->queue_index == 0) {
         if (s->watch) {
             g_source_remove(s->watch);
@@ -182,7 +190,7 @@ static NetClientInfo net_vhost_user_info = {
         .type = NET_CLIENT_DRIVER_VHOST_USER,
         .size = sizeof(NetVhostUserState),
         .receive = vhost_user_receive,
-        .cleanup = vhost_user_cleanup,
+        .cleanup = net_vhost_user_cleanup,
         .has_vnet_hdr = vhost_user_has_vnet_hdr,
         .has_ufo = vhost_user_has_ufo,
 };
@@ -244,7 +252,7 @@ static void net_vhost_user_event(void *opaque, int event)
     trace_vhost_user_event(chr->label, event);
     switch (event) {
     case CHR_EVENT_OPENED:
-        if (vhost_user_start(queues, ncs, &s->chr) < 0) {
+        if (vhost_user_start(queues, ncs, s->vhost_user) < 0) {
             qemu_chr_fe_disconnect(&s->chr);
             return;
         }
@@ -283,12 +291,19 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
 {
     Error *err = NULL;
     NetClientState *nc, *nc0 = NULL;
+    VhostUserState *user = NULL;
     NetVhostUserState *s;
     int i;
 
     assert(name);
     assert(queues > 0);
 
+    user = vhost_user_init();
+    if (!user) {
+        error_report("failed to init vhost_user");
+        goto err;
+    }
+
     for (i = 0; i < queues; i++) {
         nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
         snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user%d to %s",
@@ -299,17 +314,19 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
             s = DO_UPCAST(NetVhostUserState, nc, nc);
             if (!qemu_chr_fe_init(&s->chr, chr, &err)) {
                 error_report_err(err);
-                return -1;
+                goto err;
             }
+            user->chr = &s->chr;
         }
-
+        s = DO_UPCAST(NetVhostUserState, nc, nc);
+        s->vhost_user = user;
     }
 
     s = DO_UPCAST(NetVhostUserState, nc, nc0);
     do {
         if (qemu_chr_fe_wait_connected(&s->chr, &err) < 0) {
             error_report_err(err);
-            return -1;
+            goto err;
         }
         qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
                                  net_vhost_user_event, NULL, nc0->name, NULL,
@@ -319,6 +336,15 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
     assert(s->vhost_net);
 
     return 0;
+
+err:
+    if (user) {
+        vhost_user_cleanup(user);
+        g_free(user);
+        s->vhost_user = NULL;
+    }
+
+    return -1;
 }
 
 static Chardev *net_vhost_claim_chardev(
-- 
2.11.0


[Qemu-devel] [PATCH v3 3/6] vhost-user: support receiving file descriptors in slave_read
Posted by Tiwei Bie, 1 week ago
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 hw/virtio/vhost-user.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 91edd95453..9cea2c8c51 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -854,14 +854,44 @@ static void slave_read(void *opaque)
     VhostUserHeader hdr = { 0, };
     VhostUserPayload payload = { 0, };
     int size, ret = 0;
+    struct iovec iov;
+    struct msghdr msgh;
+    int fd = -1;
+    char control[CMSG_SPACE(sizeof(fd))];
+    struct cmsghdr *cmsg;
+    size_t fdsize;
+
+    memset(&msgh, 0, sizeof(msgh));
+    msgh.msg_iov = &iov;
+    msgh.msg_iovlen = 1;
+    msgh.msg_control = control;
+    msgh.msg_controllen = sizeof(control);
 
     /* Read header */
-    size = read(u->slave_fd, &hdr, VHOST_USER_HDR_SIZE);
+    iov.iov_base = &hdr;
+    iov.iov_len = VHOST_USER_HDR_SIZE;
+
+    size = recvmsg(u->slave_fd, &msgh, 0);
     if (size != VHOST_USER_HDR_SIZE) {
         error_report("Failed to read from slave.");
         goto err;
     }
 
+    if (msgh.msg_flags & MSG_CTRUNC) {
+        error_report("Truncated message.");
+        goto err;
+    }
+
+    for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
+         cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
+            if (cmsg->cmsg_level == SOL_SOCKET &&
+                cmsg->cmsg_type == SCM_RIGHTS) {
+                    fdsize = cmsg->cmsg_len - CMSG_LEN(0);
+                    memcpy(&fd, CMSG_DATA(cmsg), fdsize);
+                    break;
+            }
+    }
+
     if (hdr.size > VHOST_USER_PAYLOAD_SIZE) {
         error_report("Failed to read msg header."
                 " Size %d exceeds the maximum %zu.", hdr.size,
@@ -885,9 +915,15 @@ static void slave_read(void *opaque)
         break;
     default:
         error_report("Received unexpected msg type.");
+        if (fd != -1) {
+            close(fd);
+        }
         ret = -EINVAL;
     }
 
+    /* Message handlers need to make sure that fd will be consumed. */
+    fd = -1;
+
     /*
      * REPLY_ACK feature handling. Other reply types has to be managed
      * directly in their request handlers.
@@ -920,6 +956,9 @@ err:
     qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
     close(u->slave_fd);
     u->slave_fd = -1;
+    if (fd != -1) {
+        close(fd);
+    }
     return;
 }
 
-- 
2.11.0


[Qemu-devel] [PATCH v3 4/6] virtio: support setting memory region based host notifier
Posted by Tiwei Bie, 1 week ago
This patch introduces the support for setting memory region
based host notifiers for virtio device. This is helpful when
using a hardware accelerator for a virtio device, because
hardware heavily depends on the notification, this will allow
the guest driver in the VM to notify the hardware directly.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 hw/virtio/virtio-pci.c         | 22 ++++++++++++++++++++++
 hw/virtio/virtio.c             | 13 +++++++++++++
 include/hw/virtio/virtio-bus.h |  2 ++
 include/hw/virtio/virtio.h     |  2 ++
 4 files changed, 39 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 1e8ab7bbc5..5eb0c323ca 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1037,6 +1037,27 @@ assign_error:
     return r;
 }
 
+static int virtio_pci_set_host_notifier_mr(DeviceState *d, int n,
+                                           MemoryRegion *mr, bool assign)
+{
+    VirtIOPCIProxy *proxy = to_virtio_pci_proxy(d);
+    int offset;
+
+    if (n >= VIRTIO_QUEUE_MAX || !virtio_pci_modern(proxy) ||
+        virtio_pci_queue_mem_mult(proxy) != memory_region_size(mr)) {
+        return -1;
+    }
+
+    if (assign) {
+        offset = virtio_pci_queue_mem_mult(proxy) * n;
+        memory_region_add_subregion_overlap(&proxy->notify.mr, offset, mr, 1);
+    } else {
+        memory_region_del_subregion(&proxy->notify.mr, mr);
+    }
+
+    return 0;
+}
+
 static void virtio_pci_vmstate_change(DeviceState *d, bool running)
 {
     VirtIOPCIProxy *proxy = to_virtio_pci_proxy(d);
@@ -2652,6 +2673,7 @@ static void virtio_pci_bus_class_init(ObjectClass *klass, void *data)
     k->has_extra_state = virtio_pci_has_extra_state;
     k->query_guest_notifiers = virtio_pci_query_guest_notifiers;
     k->set_guest_notifiers = virtio_pci_set_guest_notifiers;
+    k->set_host_notifier_mr = virtio_pci_set_host_notifier_mr;
     k->vmstate_change = virtio_pci_vmstate_change;
     k->pre_plugged = virtio_pci_pre_plugged;
     k->device_plugged = virtio_pci_device_plugged;
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 006d3d1148..1debb0147b 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2454,6 +2454,19 @@ EventNotifier *virtio_queue_get_host_notifier(VirtQueue *vq)
     return &vq->host_notifier;
 }
 
+int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, int n,
+                                      MemoryRegion *mr, bool assign)
+{
+    BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
+    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+
+    if (k->set_host_notifier_mr) {
+        return k->set_host_notifier_mr(qbus->parent, n, mr, assign);
+    }
+
+    return -1;
+}
+
 void virtio_device_set_child_bus_name(VirtIODevice *vdev, char *bus_name)
 {
     g_free(vdev->bus_name);
diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h
index ced3d2d2b0..7fec9dc929 100644
--- a/include/hw/virtio/virtio-bus.h
+++ b/include/hw/virtio/virtio-bus.h
@@ -52,6 +52,8 @@ typedef struct VirtioBusClass {
     bool (*has_extra_state)(DeviceState *d);
     bool (*query_guest_notifiers)(DeviceState *d);
     int (*set_guest_notifiers)(DeviceState *d, int nvqs, bool assign);
+    int (*set_host_notifier_mr)(DeviceState *d, int n,
+                                MemoryRegion *mr, bool assign);
     void (*vmstate_change)(DeviceState *d, bool running);
     /*
      * Expose the features the transport layer supports before
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 098bdaaea3..9c1fa07d6d 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -239,6 +239,8 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, int align);
 void virtio_queue_notify(VirtIODevice *vdev, int n);
 uint16_t virtio_queue_vector(VirtIODevice *vdev, int n);
 void virtio_queue_set_vector(VirtIODevice *vdev, int n, uint16_t vector);
+int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, int n,
+                                      MemoryRegion *mr, bool assign);
 int virtio_set_status(VirtIODevice *vdev, uint8_t val);
 void virtio_reset(void *opaque);
 void virtio_update_irq(VirtIODevice *vdev);
-- 
2.11.0


[Qemu-devel] [PATCH v3 5/6] vhost: allow backends to filter memory sections
Posted by Tiwei Bie, 1 week ago
This patch introduces a vhost op for vhost backends to allow
them to filter the memory sections that they can handle.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 hw/virtio/vhost-user.c            | 11 +++++++++++
 hw/virtio/vhost.c                 |  9 +++++++--
 include/hw/virtio/vhost-backend.h |  4 ++++
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 9cea2c8c51..791e0a4763 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1622,6 +1622,16 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
     return 0;
 }
 
+static bool vhost_user_mem_section_filter(struct vhost_dev *dev,
+                                          MemoryRegionSection *section)
+{
+    bool result;
+
+    result = memory_region_get_fd(section->mr) >= 0;
+
+    return result;
+}
+
 VhostUserState *vhost_user_init(void)
 {
     VhostUserState *user = g_new0(struct VhostUserState, 1);
@@ -1663,4 +1673,5 @@ const VhostOps user_ops = {
         .vhost_set_config = vhost_user_set_config,
         .vhost_crypto_create_session = vhost_user_crypto_create_session,
         .vhost_crypto_close_session = vhost_user_crypto_close_session,
+        .vhost_backend_mem_section_filter = vhost_user_mem_section_filter,
 };
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index f51bf573d5..a939a38d97 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -382,7 +382,7 @@ static int vhost_verify_ring_mappings(struct vhost_dev *dev,
     return r;
 }
 
-static bool vhost_section(MemoryRegionSection *section)
+static bool vhost_section(struct vhost_dev *dev, MemoryRegionSection *section)
 {
     bool result;
     bool log_dirty = memory_region_get_dirty_log_mask(section->mr) &
@@ -395,6 +395,11 @@ static bool vhost_section(MemoryRegionSection *section)
      */
     result &= !log_dirty;
 
+    if (result && dev->vhost_ops->vhost_backend_mem_section_filter) {
+        result &=
+            dev->vhost_ops->vhost_backend_mem_section_filter(dev, section);
+    }
+
     trace_vhost_section(section->mr->name, result);
     return result;
 }
@@ -628,7 +633,7 @@ static void vhost_region_addnop(MemoryListener *listener,
     struct vhost_dev *dev = container_of(listener, struct vhost_dev,
                                          memory_listener);
 
-    if (!vhost_section(section)) {
+    if (!vhost_section(dev, section)) {
         return;
     }
     vhost_region_add_section(dev, section);
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index 5dac61f9ea..81283ec50f 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -101,6 +101,9 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev,
 typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev,
                                              uint64_t session_id);
 
+typedef bool (*vhost_backend_mem_section_filter_op)(struct vhost_dev *dev,
+                                                MemoryRegionSection *section);
+
 typedef struct VhostOps {
     VhostBackendType backend_type;
     vhost_backend_init vhost_backend_init;
@@ -138,6 +141,7 @@ typedef struct VhostOps {
     vhost_set_config_op vhost_set_config;
     vhost_crypto_create_session_op vhost_crypto_create_session;
     vhost_crypto_close_session_op vhost_crypto_close_session;
+    vhost_backend_mem_section_filter_op vhost_backend_mem_section_filter;
 } VhostOps;
 
 extern const VhostOps user_ops;
-- 
2.11.0


[Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Tiwei Bie, 1 week ago
This patch introduces VHOST_USER_PROTOCOL_F_HOST_NOTIFIER.
With this feature negotiated, vhost-user backend can register
memory region based host notifiers. And it will allow the guest
driver in the VM to notify the hardware accelerator at the
vhost-user backend directly.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 docs/interop/vhost-user.txt    |  33 +++++++++++
 hw/virtio/vhost-user.c         | 123 +++++++++++++++++++++++++++++++++++++++++
 include/hw/virtio/vhost-user.h |   8 +++
 3 files changed, 164 insertions(+)

diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index 534caab18a..9e57b36b20 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -132,6 +132,16 @@ Depending on the request type, payload can be:
    Payload: Size bytes array holding the contents of the virtio
        device's configuration space
 
+ * Vring area description
+   -----------------------
+   | u64 | size | offset |
+   -----------------------
+
+   u64: a 64-bit integer contains vring index and flags
+   Size: a 64-bit size of this area
+   Offset: a 64-bit offset of this area from the start of the
+       supplied file descriptor
+
 In QEMU the vhost-user message is implemented with the following struct:
 
 typedef struct VhostUserMsg {
@@ -146,6 +156,7 @@ typedef struct VhostUserMsg {
         VhostUserLog log;
         struct vhost_iotlb_msg iotlb;
         VhostUserConfig config;
+        VhostUserVringArea area;
     };
 } QEMU_PACKED VhostUserMsg;
 
@@ -380,6 +391,7 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
 #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
 #define VHOST_USER_PROTOCOL_F_CONFIG         9
+#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  10
 
 Master message types
 --------------------
@@ -777,6 +789,27 @@ Slave message types
      the VHOST_USER_NEED_REPLY flag, master must respond with zero when
      operation is successfully completed, or non-zero otherwise.
 
+ * VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG
+
+      Id: 3
+      Equivalent ioctl: N/A
+      Slave payload: vring area description
+      Master payload: N/A
+
+      Sets host notifier for a specified queue. The queue index is contained
+      in the u64 field of the vring area description. The host notifier is
+      described by the file descriptor (typically it's a VFIO device fd) which
+      is passed as ancillary data and the size (which is mmap size and should
+      be the same as host page size) and offset (which is mmap offset) carried
+      in the vring area description. QEMU can mmap the file descriptor based
+      on the size and offset to get a memory range. Registering a host notifier
+      means mapping this memory range to the VM as the specified queue's notify
+      MMIO region. Slave sends this request to tell QEMU to de-register the
+      existing notifier if any and register the new notifier if the request is
+      sent with a file descriptor.
+      This request should be sent only when VHOST_USER_PROTOCOL_F_HOST_NOTIFIER
+      protocol feature has been successfully negotiated.
+
 VHOST_USER_PROTOCOL_F_REPLY_ACK:
 -------------------------------
 The original vhost-user specification only demands replies for certain
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 791e0a4763..1cd9c7276b 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -13,6 +13,7 @@
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-user.h"
 #include "hw/virtio/vhost-backend.h"
+#include "hw/virtio/virtio.h"
 #include "hw/virtio/virtio-net.h"
 #include "chardev/char-fe.h"
 #include "sysemu/kvm.h"
@@ -48,6 +49,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
     VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
     VHOST_USER_PROTOCOL_F_CONFIG = 9,
+    VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 10,
     VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -92,6 +94,7 @@ typedef enum VhostUserSlaveRequest {
     VHOST_USER_SLAVE_NONE = 0,
     VHOST_USER_SLAVE_IOTLB_MSG = 1,
     VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
+    VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
     VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
@@ -136,6 +139,12 @@ static VhostUserConfig c __attribute__ ((unused));
                                    + sizeof(c.size) \
                                    + sizeof(c.flags))
 
+typedef struct VhostUserVringArea {
+    uint64_t u64;
+    uint64_t size;
+    uint64_t offset;
+} VhostUserVringArea;
+
 typedef struct {
     VhostUserRequest request;
 
@@ -157,6 +166,7 @@ typedef union {
         struct vhost_iotlb_msg iotlb;
         VhostUserConfig config;
         VhostUserCryptoSession session;
+        VhostUserVringArea area;
 } VhostUserPayload;
 
 typedef struct VhostUserMsg {
@@ -638,9 +648,37 @@ static int vhost_user_set_vring_num(struct vhost_dev *dev,
     return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
 }
 
+static void vhost_user_host_notifier_restore(struct vhost_dev *dev,
+                                             int queue_idx)
+{
+    struct vhost_user *u = dev->opaque;
+    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
+    VirtIODevice *vdev = dev->vdev;
+
+    if (n->addr && !n->set) {
+        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true);
+        n->set = true;
+    }
+}
+
+static void vhost_user_host_notifier_remove(struct vhost_dev *dev,
+                                            int queue_idx)
+{
+    struct vhost_user *u = dev->opaque;
+    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
+    VirtIODevice *vdev = dev->vdev;
+
+    if (n->addr && n->set) {
+        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
+        n->set = false;
+    }
+}
+
 static int vhost_user_set_vring_base(struct vhost_dev *dev,
                                      struct vhost_vring_state *ring)
 {
+    vhost_user_host_notifier_restore(dev, ring->index);
+
     return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
 }
 
@@ -674,6 +712,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
         .hdr.size = sizeof(msg.payload.state),
     };
 
+    vhost_user_host_notifier_remove(dev, ring->index);
+
     if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
         return -1;
     }
@@ -847,6 +887,76 @@ static int vhost_user_slave_handle_config_change(struct vhost_dev *dev)
     return ret;
 }
 
+static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
+                                                       VhostUserVringArea *area,
+                                                       int fd)
+{
+    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
+    size_t page_size = qemu_real_host_page_size;
+    struct vhost_user *u = dev->opaque;
+    VhostUserState *user = u->user;
+    VirtIODevice *vdev = dev->vdev;
+    VhostUserHostNotifier *n;
+    int ret = 0;
+    void *addr;
+    char *name;
+
+    if (!virtio_has_feature(dev->protocol_features,
+                            VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
+        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
+        ret = -1;
+        goto out;
+    }
+
+    n = &user->notifier[queue_idx];
+
+    if (n->addr) {
+        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
+        object_unparent(OBJECT(&n->mr));
+        munmap(n->addr, page_size);
+        n->addr = NULL;
+    }
+
+    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
+        goto out;
+    }
+
+    /* Sanity check. */
+    if (area->size != page_size) {
+        ret = -1;
+        goto out;
+    }
+
+    addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
+                fd, area->offset);
+    if (addr == MAP_FAILED) {
+        ret = -1;
+        goto out;
+    }
+
+    name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
+                           user, queue_idx);
+    memory_region_init_ram_device_ptr(&n->mr, OBJECT(vdev), name,
+                                      page_size, addr);
+    g_free(name);
+
+    if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true)) {
+        munmap(addr, page_size);
+        ret = -1;
+        goto out;
+    }
+
+    n->addr = addr;
+    n->set = true;
+
+out:
+    /* Always close the fd. */
+    if (fd != -1) {
+        close(fd);
+    }
+    return ret;
+}
+
 static void slave_read(void *opaque)
 {
     struct vhost_dev *dev = opaque;
@@ -913,6 +1023,10 @@ static void slave_read(void *opaque)
     case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
         ret = vhost_user_slave_handle_config_change(dev);
         break;
+    case VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG:
+        ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
+                                                          fd);
+        break;
     default:
         error_report("Received unexpected msg type.");
         if (fd != -1) {
@@ -1641,6 +1755,15 @@ VhostUserState *vhost_user_init(void)
 
 void vhost_user_cleanup(VhostUserState *user)
 {
+    int i;
+
+    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
+        if (user->notifier[i].addr) {
+            object_unparent(OBJECT(&user->notifier[i].mr));
+            munmap(user->notifier[i].addr, qemu_real_host_page_size);
+            user->notifier[i].addr = NULL;
+        }
+    }
 }
 
 const VhostOps user_ops = {
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
index eb8bc0d90d..fd660393a0 100644
--- a/include/hw/virtio/vhost-user.h
+++ b/include/hw/virtio/vhost-user.h
@@ -9,9 +9,17 @@
 #define HW_VIRTIO_VHOST_USER_H
 
 #include "chardev/char-fe.h"
+#include "hw/virtio/virtio.h"
+
+typedef struct VhostUserHostNotifier {
+    MemoryRegion mr;
+    void *addr;
+    bool set;
+} VhostUserHostNotifier;
 
 typedef struct VhostUserState {
     CharBackend *chr;
+    VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostUserState;
 
 VhostUserState *vhost_user_init(void);
-- 
2.11.0


Re: [Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Michael S. Tsirkin, 1 day ago
On Thu, Apr 12, 2018 at 11:12:32PM +0800, Tiwei Bie wrote:
> This patch introduces VHOST_USER_PROTOCOL_F_HOST_NOTIFIER.
> With this feature negotiated, vhost-user backend can register
> memory region based host notifiers. And it will allow the guest
> driver in the VM to notify the hardware accelerator at the
> vhost-user backend directly.
> 
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>

Overall I think we can merge this approach, but I have two main concerns
about this:

1. Testing. Most people do not have the virtio hardware
   so how to make sure this does not bit rot?

   I have an idea: add an option like this to libvhost-user.
   Naturally libvhost-user can not get notified about a write
   to an mmapped area, but it can write a special value there
   (e.g. all-ones?) and then poll it to detect VQ # writes.

   Then include a vhost user bridge test with an option like this.

   I'd like to see a patch doing this.

2. Memory barriers. Right now after updating the avail idx,
   virtio does smp_wmb() and then the MMIO write.
   Normal hardware drivers do wmb() which is an sfence.
   Can a PCI device read bypass index write and see a stale
   index value?
   To make virtio pci do wmb() we would need a new feature bit.
   Alternatively I guess we could maybe look at subsystem vendor/device id.

   I'd like to see a patch doing one of these things.

Thanks!

> ---
>  docs/interop/vhost-user.txt    |  33 +++++++++++
>  hw/virtio/vhost-user.c         | 123 +++++++++++++++++++++++++++++++++++++++++
>  include/hw/virtio/vhost-user.h |   8 +++
>  3 files changed, 164 insertions(+)
> 
> diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> index 534caab18a..9e57b36b20 100644
> --- a/docs/interop/vhost-user.txt
> +++ b/docs/interop/vhost-user.txt
> @@ -132,6 +132,16 @@ Depending on the request type, payload can be:
>     Payload: Size bytes array holding the contents of the virtio
>         device's configuration space
>  
> + * Vring area description
> +   -----------------------
> +   | u64 | size | offset |
> +   -----------------------
> +
> +   u64: a 64-bit integer contains vring index and flags
> +   Size: a 64-bit size of this area
> +   Offset: a 64-bit offset of this area from the start of the
> +       supplied file descriptor
> +
>  In QEMU the vhost-user message is implemented with the following struct:
>  
>  typedef struct VhostUserMsg {
> @@ -146,6 +156,7 @@ typedef struct VhostUserMsg {
>          VhostUserLog log;
>          struct vhost_iotlb_msg iotlb;
>          VhostUserConfig config;
> +        VhostUserVringArea area;
>      };
>  } QEMU_PACKED VhostUserMsg;
>  
> @@ -380,6 +391,7 @@ Protocol features
>  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
>  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
>  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> +#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  10
>  
>  Master message types
>  --------------------
> @@ -777,6 +789,27 @@ Slave message types
>       the VHOST_USER_NEED_REPLY flag, master must respond with zero when
>       operation is successfully completed, or non-zero otherwise.
>  
> + * VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG
> +
> +      Id: 3
> +      Equivalent ioctl: N/A
> +      Slave payload: vring area description
> +      Master payload: N/A
> +
> +      Sets host notifier for a specified queue. The queue index is contained
> +      in the u64 field of the vring area description. The host notifier is
> +      described by the file descriptor (typically it's a VFIO device fd) which
> +      is passed as ancillary data and the size (which is mmap size and should
> +      be the same as host page size) and offset (which is mmap offset) carried
> +      in the vring area description. QEMU can mmap the file descriptor based
> +      on the size and offset to get a memory range. Registering a host notifier
> +      means mapping this memory range to the VM as the specified queue's notify
> +      MMIO region. Slave sends this request to tell QEMU to de-register the
> +      existing notifier if any and register the new notifier if the request is
> +      sent with a file descriptor.
> +      This request should be sent only when VHOST_USER_PROTOCOL_F_HOST_NOTIFIER
> +      protocol feature has been successfully negotiated.
> +
>  VHOST_USER_PROTOCOL_F_REPLY_ACK:
>  -------------------------------
>  The original vhost-user specification only demands replies for certain
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 791e0a4763..1cd9c7276b 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -13,6 +13,7 @@
>  #include "hw/virtio/vhost.h"
>  #include "hw/virtio/vhost-user.h"
>  #include "hw/virtio/vhost-backend.h"
> +#include "hw/virtio/virtio.h"
>  #include "hw/virtio/virtio-net.h"
>  #include "chardev/char-fe.h"
>  #include "sysemu/kvm.h"
> @@ -48,6 +49,7 @@ enum VhostUserProtocolFeature {
>      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
>      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
>      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> +    VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 10,
>      VHOST_USER_PROTOCOL_F_MAX
>  };
>  
> @@ -92,6 +94,7 @@ typedef enum VhostUserSlaveRequest {
>      VHOST_USER_SLAVE_NONE = 0,
>      VHOST_USER_SLAVE_IOTLB_MSG = 1,
>      VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
> +    VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
>      VHOST_USER_SLAVE_MAX
>  }  VhostUserSlaveRequest;
>  
> @@ -136,6 +139,12 @@ static VhostUserConfig c __attribute__ ((unused));
>                                     + sizeof(c.size) \
>                                     + sizeof(c.flags))
>  
> +typedef struct VhostUserVringArea {
> +    uint64_t u64;
> +    uint64_t size;
> +    uint64_t offset;
> +} VhostUserVringArea;
> +
>  typedef struct {
>      VhostUserRequest request;
>  
> @@ -157,6 +166,7 @@ typedef union {
>          struct vhost_iotlb_msg iotlb;
>          VhostUserConfig config;
>          VhostUserCryptoSession session;
> +        VhostUserVringArea area;
>  } VhostUserPayload;
>  
>  typedef struct VhostUserMsg {
> @@ -638,9 +648,37 @@ static int vhost_user_set_vring_num(struct vhost_dev *dev,
>      return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
>  }
>  
> +static void vhost_user_host_notifier_restore(struct vhost_dev *dev,
> +                                             int queue_idx)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> +    VirtIODevice *vdev = dev->vdev;
> +
> +    if (n->addr && !n->set) {
> +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true);
> +        n->set = true;
> +    }
> +}
> +
> +static void vhost_user_host_notifier_remove(struct vhost_dev *dev,
> +                                            int queue_idx)
> +{
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> +    VirtIODevice *vdev = dev->vdev;
> +
> +    if (n->addr && n->set) {
> +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> +        n->set = false;
> +    }
> +}
> +
>  static int vhost_user_set_vring_base(struct vhost_dev *dev,
>                                       struct vhost_vring_state *ring)
>  {
> +    vhost_user_host_notifier_restore(dev, ring->index);
> +
>      return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
>  }
>  
> @@ -674,6 +712,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
>          .hdr.size = sizeof(msg.payload.state),
>      };
>  
> +    vhost_user_host_notifier_remove(dev, ring->index);
> +
>      if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
>          return -1;
>      }
> @@ -847,6 +887,76 @@ static int vhost_user_slave_handle_config_change(struct vhost_dev *dev)
>      return ret;
>  }
>  
> +static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
> +                                                       VhostUserVringArea *area,
> +                                                       int fd)
> +{
> +    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
> +    size_t page_size = qemu_real_host_page_size;
> +    struct vhost_user *u = dev->opaque;
> +    VhostUserState *user = u->user;
> +    VirtIODevice *vdev = dev->vdev;
> +    VhostUserHostNotifier *n;
> +    int ret = 0;
> +    void *addr;
> +    char *name;
> +
> +    if (!virtio_has_feature(dev->protocol_features,
> +                            VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
> +        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    n = &user->notifier[queue_idx];
> +
> +    if (n->addr) {
> +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> +        object_unparent(OBJECT(&n->mr));
> +        munmap(n->addr, page_size);
> +        n->addr = NULL;
> +    }
> +
> +    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
> +        goto out;
> +    }
> +
> +    /* Sanity check. */
> +    if (area->size != page_size) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
> +                fd, area->offset);
> +    if (addr == MAP_FAILED) {
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
> +                           user, queue_idx);
> +    memory_region_init_ram_device_ptr(&n->mr, OBJECT(vdev), name,
> +                                      page_size, addr);
> +    g_free(name);
> +
> +    if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true)) {
> +        munmap(addr, page_size);
> +        ret = -1;
> +        goto out;
> +    }
> +
> +    n->addr = addr;
> +    n->set = true;
> +
> +out:
> +    /* Always close the fd. */
> +    if (fd != -1) {
> +        close(fd);
> +    }
> +    return ret;
> +}
> +
>  static void slave_read(void *opaque)
>  {
>      struct vhost_dev *dev = opaque;
> @@ -913,6 +1023,10 @@ static void slave_read(void *opaque)
>      case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
>          ret = vhost_user_slave_handle_config_change(dev);
>          break;
> +    case VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG:
> +        ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
> +                                                          fd);
> +        break;
>      default:
>          error_report("Received unexpected msg type.");
>          if (fd != -1) {
> @@ -1641,6 +1755,15 @@ VhostUserState *vhost_user_init(void)
>  
>  void vhost_user_cleanup(VhostUserState *user)
>  {
> +    int i;
> +
> +    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> +        if (user->notifier[i].addr) {
> +            object_unparent(OBJECT(&user->notifier[i].mr));
> +            munmap(user->notifier[i].addr, qemu_real_host_page_size);
> +            user->notifier[i].addr = NULL;
> +        }
> +    }
>  }
>  
>  const VhostOps user_ops = {
> diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> index eb8bc0d90d..fd660393a0 100644
> --- a/include/hw/virtio/vhost-user.h
> +++ b/include/hw/virtio/vhost-user.h
> @@ -9,9 +9,17 @@
>  #define HW_VIRTIO_VHOST_USER_H
>  
>  #include "chardev/char-fe.h"
> +#include "hw/virtio/virtio.h"
> +
> +typedef struct VhostUserHostNotifier {
> +    MemoryRegion mr;
> +    void *addr;
> +    bool set;
> +} VhostUserHostNotifier;
>  
>  typedef struct VhostUserState {
>      CharBackend *chr;
> +    VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
>  } VhostUserState;
>  
>  VhostUserState *vhost_user_init(void);
> -- 
> 2.11.0

Re: [Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Tiwei Bie, 16 hours ago
On Wed, Apr 18, 2018 at 07:34:06PM +0300, Michael S. Tsirkin wrote:
> On Thu, Apr 12, 2018 at 11:12:32PM +0800, Tiwei Bie wrote:
> > This patch introduces VHOST_USER_PROTOCOL_F_HOST_NOTIFIER.
> > With this feature negotiated, vhost-user backend can register
> > memory region based host notifiers. And it will allow the guest
> > driver in the VM to notify the hardware accelerator at the
> > vhost-user backend directly.
> > 
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> 
> Overall I think we can merge this approach, but I have two main concerns
> about this:
> 
> 1. Testing. Most people do not have the virtio hardware
>    so how to make sure this does not bit rot?
> 
>    I have an idea: add an option like this to libvhost-user.
>    Naturally libvhost-user can not get notified about a write
>    to an mmapped area, but it can write a special value there
>    (e.g. all-ones?) and then poll it to detect VQ # writes.
> 
>    Then include a vhost user bridge test with an option like this.
> 
>    I'd like to see a patch doing this.

Sure, I'll do it. Thanks for the suggestion!

> 
> 2. Memory barriers. Right now after updating the avail idx,
>    virtio does smp_wmb() and then the MMIO write.
>    Normal hardware drivers do wmb() which is an sfence.
>    Can a PCI device read bypass index write and see a stale
>    index value?

It depends on arch's memory model. Cunming will provide
more details about this later.

>    To make virtio pci do wmb() we would need a new feature bit.
>    Alternatively I guess we could maybe look at subsystem vendor/device id.
> 
>    I'd like to see a patch doing one of these things.

We prefer to add a new feature bit as it's a more robust
way to do this. I'll send out some patches soon.

Thank you very much! :)

Best regards,
Tiwei Bie

> 
> Thanks!
> 
> > ---
> >  docs/interop/vhost-user.txt    |  33 +++++++++++
> >  hw/virtio/vhost-user.c         | 123 +++++++++++++++++++++++++++++++++++++++++
> >  include/hw/virtio/vhost-user.h |   8 +++
> >  3 files changed, 164 insertions(+)
> > 
> > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > index 534caab18a..9e57b36b20 100644
> > --- a/docs/interop/vhost-user.txt
> > +++ b/docs/interop/vhost-user.txt
> > @@ -132,6 +132,16 @@ Depending on the request type, payload can be:
> >     Payload: Size bytes array holding the contents of the virtio
> >         device's configuration space
> >  
> > + * Vring area description
> > +   -----------------------
> > +   | u64 | size | offset |
> > +   -----------------------
> > +
> > +   u64: a 64-bit integer contains vring index and flags
> > +   Size: a 64-bit size of this area
> > +   Offset: a 64-bit offset of this area from the start of the
> > +       supplied file descriptor
> > +
> >  In QEMU the vhost-user message is implemented with the following struct:
> >  
> >  typedef struct VhostUserMsg {
> > @@ -146,6 +156,7 @@ typedef struct VhostUserMsg {
> >          VhostUserLog log;
> >          struct vhost_iotlb_msg iotlb;
> >          VhostUserConfig config;
> > +        VhostUserVringArea area;
> >      };
> >  } QEMU_PACKED VhostUserMsg;
> >  
> > @@ -380,6 +391,7 @@ Protocol features
> >  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> >  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
> >  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> > +#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  10
> >  
> >  Master message types
> >  --------------------
> > @@ -777,6 +789,27 @@ Slave message types
> >       the VHOST_USER_NEED_REPLY flag, master must respond with zero when
> >       operation is successfully completed, or non-zero otherwise.
> >  
> > + * VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG
> > +
> > +      Id: 3
> > +      Equivalent ioctl: N/A
> > +      Slave payload: vring area description
> > +      Master payload: N/A
> > +
> > +      Sets host notifier for a specified queue. The queue index is contained
> > +      in the u64 field of the vring area description. The host notifier is
> > +      described by the file descriptor (typically it's a VFIO device fd) which
> > +      is passed as ancillary data and the size (which is mmap size and should
> > +      be the same as host page size) and offset (which is mmap offset) carried
> > +      in the vring area description. QEMU can mmap the file descriptor based
> > +      on the size and offset to get a memory range. Registering a host notifier
> > +      means mapping this memory range to the VM as the specified queue's notify
> > +      MMIO region. Slave sends this request to tell QEMU to de-register the
> > +      existing notifier if any and register the new notifier if the request is
> > +      sent with a file descriptor.
> > +      This request should be sent only when VHOST_USER_PROTOCOL_F_HOST_NOTIFIER
> > +      protocol feature has been successfully negotiated.
> > +
> >  VHOST_USER_PROTOCOL_F_REPLY_ACK:
> >  -------------------------------
> >  The original vhost-user specification only demands replies for certain
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index 791e0a4763..1cd9c7276b 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -13,6 +13,7 @@
> >  #include "hw/virtio/vhost.h"
> >  #include "hw/virtio/vhost-user.h"
> >  #include "hw/virtio/vhost-backend.h"
> > +#include "hw/virtio/virtio.h"
> >  #include "hw/virtio/virtio-net.h"
> >  #include "chardev/char-fe.h"
> >  #include "sysemu/kvm.h"
> > @@ -48,6 +49,7 @@ enum VhostUserProtocolFeature {
> >      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> >      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
> >      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> > +    VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 10,
> >      VHOST_USER_PROTOCOL_F_MAX
> >  };
> >  
> > @@ -92,6 +94,7 @@ typedef enum VhostUserSlaveRequest {
> >      VHOST_USER_SLAVE_NONE = 0,
> >      VHOST_USER_SLAVE_IOTLB_MSG = 1,
> >      VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
> > +    VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
> >      VHOST_USER_SLAVE_MAX
> >  }  VhostUserSlaveRequest;
> >  
> > @@ -136,6 +139,12 @@ static VhostUserConfig c __attribute__ ((unused));
> >                                     + sizeof(c.size) \
> >                                     + sizeof(c.flags))
> >  
> > +typedef struct VhostUserVringArea {
> > +    uint64_t u64;
> > +    uint64_t size;
> > +    uint64_t offset;
> > +} VhostUserVringArea;
> > +
> >  typedef struct {
> >      VhostUserRequest request;
> >  
> > @@ -157,6 +166,7 @@ typedef union {
> >          struct vhost_iotlb_msg iotlb;
> >          VhostUserConfig config;
> >          VhostUserCryptoSession session;
> > +        VhostUserVringArea area;
> >  } VhostUserPayload;
> >  
> >  typedef struct VhostUserMsg {
> > @@ -638,9 +648,37 @@ static int vhost_user_set_vring_num(struct vhost_dev *dev,
> >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
> >  }
> >  
> > +static void vhost_user_host_notifier_restore(struct vhost_dev *dev,
> > +                                             int queue_idx)
> > +{
> > +    struct vhost_user *u = dev->opaque;
> > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > +    VirtIODevice *vdev = dev->vdev;
> > +
> > +    if (n->addr && !n->set) {
> > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true);
> > +        n->set = true;
> > +    }
> > +}
> > +
> > +static void vhost_user_host_notifier_remove(struct vhost_dev *dev,
> > +                                            int queue_idx)
> > +{
> > +    struct vhost_user *u = dev->opaque;
> > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > +    VirtIODevice *vdev = dev->vdev;
> > +
> > +    if (n->addr && n->set) {
> > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> > +        n->set = false;
> > +    }
> > +}
> > +
> >  static int vhost_user_set_vring_base(struct vhost_dev *dev,
> >                                       struct vhost_vring_state *ring)
> >  {
> > +    vhost_user_host_notifier_restore(dev, ring->index);
> > +
> >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
> >  }
> >  
> > @@ -674,6 +712,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
> >          .hdr.size = sizeof(msg.payload.state),
> >      };
> >  
> > +    vhost_user_host_notifier_remove(dev, ring->index);
> > +
> >      if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> >          return -1;
> >      }
> > @@ -847,6 +887,76 @@ static int vhost_user_slave_handle_config_change(struct vhost_dev *dev)
> >      return ret;
> >  }
> >  
> > +static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
> > +                                                       VhostUserVringArea *area,
> > +                                                       int fd)
> > +{
> > +    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
> > +    size_t page_size = qemu_real_host_page_size;
> > +    struct vhost_user *u = dev->opaque;
> > +    VhostUserState *user = u->user;
> > +    VirtIODevice *vdev = dev->vdev;
> > +    VhostUserHostNotifier *n;
> > +    int ret = 0;
> > +    void *addr;
> > +    char *name;
> > +
> > +    if (!virtio_has_feature(dev->protocol_features,
> > +                            VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
> > +        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
> > +        ret = -1;
> > +        goto out;
> > +    }
> > +
> > +    n = &user->notifier[queue_idx];
> > +
> > +    if (n->addr) {
> > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> > +        object_unparent(OBJECT(&n->mr));
> > +        munmap(n->addr, page_size);
> > +        n->addr = NULL;
> > +    }
> > +
> > +    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
> > +        goto out;
> > +    }
> > +
> > +    /* Sanity check. */
> > +    if (area->size != page_size) {
> > +        ret = -1;
> > +        goto out;
> > +    }
> > +
> > +    addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
> > +                fd, area->offset);
> > +    if (addr == MAP_FAILED) {
> > +        ret = -1;
> > +        goto out;
> > +    }
> > +
> > +    name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
> > +                           user, queue_idx);
> > +    memory_region_init_ram_device_ptr(&n->mr, OBJECT(vdev), name,
> > +                                      page_size, addr);
> > +    g_free(name);
> > +
> > +    if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true)) {
> > +        munmap(addr, page_size);
> > +        ret = -1;
> > +        goto out;
> > +    }
> > +
> > +    n->addr = addr;
> > +    n->set = true;
> > +
> > +out:
> > +    /* Always close the fd. */
> > +    if (fd != -1) {
> > +        close(fd);
> > +    }
> > +    return ret;
> > +}
> > +
> >  static void slave_read(void *opaque)
> >  {
> >      struct vhost_dev *dev = opaque;
> > @@ -913,6 +1023,10 @@ static void slave_read(void *opaque)
> >      case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
> >          ret = vhost_user_slave_handle_config_change(dev);
> >          break;
> > +    case VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG:
> > +        ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
> > +                                                          fd);
> > +        break;
> >      default:
> >          error_report("Received unexpected msg type.");
> >          if (fd != -1) {
> > @@ -1641,6 +1755,15 @@ VhostUserState *vhost_user_init(void)
> >  
> >  void vhost_user_cleanup(VhostUserState *user)
> >  {
> > +    int i;
> > +
> > +    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> > +        if (user->notifier[i].addr) {
> > +            object_unparent(OBJECT(&user->notifier[i].mr));
> > +            munmap(user->notifier[i].addr, qemu_real_host_page_size);
> > +            user->notifier[i].addr = NULL;
> > +        }
> > +    }
> >  }
> >  
> >  const VhostOps user_ops = {
> > diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> > index eb8bc0d90d..fd660393a0 100644
> > --- a/include/hw/virtio/vhost-user.h
> > +++ b/include/hw/virtio/vhost-user.h
> > @@ -9,9 +9,17 @@
> >  #define HW_VIRTIO_VHOST_USER_H
> >  
> >  #include "chardev/char-fe.h"
> > +#include "hw/virtio/virtio.h"
> > +
> > +typedef struct VhostUserHostNotifier {
> > +    MemoryRegion mr;
> > +    void *addr;
> > +    bool set;
> > +} VhostUserHostNotifier;
> >  
> >  typedef struct VhostUserState {
> >      CharBackend *chr;
> > +    VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
> >  } VhostUserState;
> >  
> >  VhostUserState *vhost_user_init(void);
> > -- 
> > 2.11.0

Re: [Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Liang, Cunming, 14 hours ago

> -----Original Message-----
> From: Bie, Tiwei
> Sent: Thursday, April 19, 2018 7:15 PM
> To: Michael S. Tsirkin <mst@redhat.com>
> Cc: jasowang@redhat.com; alex.williamson@redhat.com; pbonzini@redhat.com;
> stefanha@redhat.com; qemu-devel@nongnu.org; virtio-dev@lists.oasis-
> open.org; Liang, Cunming <cunming.liang@intel.com>; Daly, Dan
> <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> Subject: Re: [PATCH v3 6/6] vhost-user: support registering external host
> notifiers
> 
> On Wed, Apr 18, 2018 at 07:34:06PM +0300, Michael S. Tsirkin wrote:
> > On Thu, Apr 12, 2018 at 11:12:32PM +0800, Tiwei Bie wrote:
> > > This patch introduces VHOST_USER_PROTOCOL_F_HOST_NOTIFIER.
> > > With this feature negotiated, vhost-user backend can register memory
> > > region based host notifiers. And it will allow the guest driver in
> > > the VM to notify the hardware accelerator at the vhost-user backend
> > > directly.
> > >
> > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> >
> > Overall I think we can merge this approach, but I have two main
> > concerns about this:
> >
> > 1. Testing. Most people do not have the virtio hardware
> >    so how to make sure this does not bit rot?
> >
> >    I have an idea: add an option like this to libvhost-user.
> >    Naturally libvhost-user can not get notified about a write
> >    to an mmapped area, but it can write a special value there
> >    (e.g. all-ones?) and then poll it to detect VQ # writes.
> >
> >    Then include a vhost user bridge test with an option like this.
> >
> >    I'd like to see a patch doing this.
> 
> Sure, I'll do it. Thanks for the suggestion!
> 
> >
> > 2. Memory barriers. Right now after updating the avail idx,
> >    virtio does smp_wmb() and then the MMIO write.
> >    Normal hardware drivers do wmb() which is an sfence.
> >    Can a PCI device read bypass index write and see a stale
> >    index value?
A compiler barrier is enough on strongly-ordered memory platform. As it doesn't re-order store, PCI device won't see a stale index value. But a weakly-ordered memory needs sfence.

> 
> It depends on arch's memory model. Cunming will provide more details about
> this later.
> 
> >    To make virtio pci do wmb() we would need a new feature bit.
> >    Alternatively I guess we could maybe look at subsystem vendor/device id.
> >
> >    I'd like to see a patch doing one of these things.
> 
> We prefer to add a new feature bit as it's a more robust way to do this. I'll send
> out some patches soon.
> 
> Thank you very much! :)
> 
> Best regards,
> Tiwei Bie
> 
> >
> > Thanks!
> >
> > > ---
> > >  docs/interop/vhost-user.txt    |  33 +++++++++++
> > >  hw/virtio/vhost-user.c         | 123
> +++++++++++++++++++++++++++++++++++++++++
> > >  include/hw/virtio/vhost-user.h |   8 +++
> > >  3 files changed, 164 insertions(+)
> > >
> > > diff --git a/docs/interop/vhost-user.txt
> > > b/docs/interop/vhost-user.txt index 534caab18a..9e57b36b20 100644
> > > --- a/docs/interop/vhost-user.txt
> > > +++ b/docs/interop/vhost-user.txt
> > > @@ -132,6 +132,16 @@ Depending on the request type, payload can be:
> > >     Payload: Size bytes array holding the contents of the virtio
> > >         device's configuration space
> > >
> > > + * Vring area description
> > > +   -----------------------
> > > +   | u64 | size | offset |
> > > +   -----------------------
> > > +
> > > +   u64: a 64-bit integer contains vring index and flags
> > > +   Size: a 64-bit size of this area
> > > +   Offset: a 64-bit offset of this area from the start of the
> > > +       supplied file descriptor
> > > +
> > >  In QEMU the vhost-user message is implemented with the following struct:
> > >
> > >  typedef struct VhostUserMsg {
> > > @@ -146,6 +156,7 @@ typedef struct VhostUserMsg {
> > >          VhostUserLog log;
> > >          struct vhost_iotlb_msg iotlb;
> > >          VhostUserConfig config;
> > > +        VhostUserVringArea area;
> > >      };
> > >  } QEMU_PACKED VhostUserMsg;
> > >
> > > @@ -380,6 +391,7 @@ Protocol features  #define
> > > VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> > >  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
> > >  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> > > +#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  10
> > >
> > >  Master message types
> > >  --------------------
> > > @@ -777,6 +789,27 @@ Slave message types
> > >       the VHOST_USER_NEED_REPLY flag, master must respond with zero when
> > >       operation is successfully completed, or non-zero otherwise.
> > >
> > > + * VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG
> > > +
> > > +      Id: 3
> > > +      Equivalent ioctl: N/A
> > > +      Slave payload: vring area description
> > > +      Master payload: N/A
> > > +
> > > +      Sets host notifier for a specified queue. The queue index is contained
> > > +      in the u64 field of the vring area description. The host notifier is
> > > +      described by the file descriptor (typically it's a VFIO device fd) which
> > > +      is passed as ancillary data and the size (which is mmap size and should
> > > +      be the same as host page size) and offset (which is mmap offset) carried
> > > +      in the vring area description. QEMU can mmap the file descriptor based
> > > +      on the size and offset to get a memory range. Registering a host notifier
> > > +      means mapping this memory range to the VM as the specified queue's
> notify
> > > +      MMIO region. Slave sends this request to tell QEMU to de-register the
> > > +      existing notifier if any and register the new notifier if the request is
> > > +      sent with a file descriptor.
> > > +      This request should be sent only when
> VHOST_USER_PROTOCOL_F_HOST_NOTIFIER
> > > +      protocol feature has been successfully negotiated.
> > > +
> > >  VHOST_USER_PROTOCOL_F_REPLY_ACK:
> > >  -------------------------------
> > >  The original vhost-user specification only demands replies for
> > > certain diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > index 791e0a4763..1cd9c7276b 100644
> > > --- a/hw/virtio/vhost-user.c
> > > +++ b/hw/virtio/vhost-user.c
> > > @@ -13,6 +13,7 @@
> > >  #include "hw/virtio/vhost.h"
> > >  #include "hw/virtio/vhost-user.h"
> > >  #include "hw/virtio/vhost-backend.h"
> > > +#include "hw/virtio/virtio.h"
> > >  #include "hw/virtio/virtio-net.h"
> > >  #include "chardev/char-fe.h"
> > >  #include "sysemu/kvm.h"
> > > @@ -48,6 +49,7 @@ enum VhostUserProtocolFeature {
> > >      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> > >      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
> > >      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> > > +    VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 10,
> > >      VHOST_USER_PROTOCOL_F_MAX
> > >  };
> > >
> > > @@ -92,6 +94,7 @@ typedef enum VhostUserSlaveRequest {
> > >      VHOST_USER_SLAVE_NONE = 0,
> > >      VHOST_USER_SLAVE_IOTLB_MSG = 1,
> > >      VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
> > > +    VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
> > >      VHOST_USER_SLAVE_MAX
> > >  }  VhostUserSlaveRequest;
> > >
> > > @@ -136,6 +139,12 @@ static VhostUserConfig c __attribute__ ((unused));
> > >                                     + sizeof(c.size) \
> > >                                     + sizeof(c.flags))
> > >
> > > +typedef struct VhostUserVringArea {
> > > +    uint64_t u64;
> > > +    uint64_t size;
> > > +    uint64_t offset;
> > > +} VhostUserVringArea;
> > > +
> > >  typedef struct {
> > >      VhostUserRequest request;
> > >
> > > @@ -157,6 +166,7 @@ typedef union {
> > >          struct vhost_iotlb_msg iotlb;
> > >          VhostUserConfig config;
> > >          VhostUserCryptoSession session;
> > > +        VhostUserVringArea area;
> > >  } VhostUserPayload;
> > >
> > >  typedef struct VhostUserMsg {
> > > @@ -638,9 +648,37 @@ static int vhost_user_set_vring_num(struct
> vhost_dev *dev,
> > >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);  }
> > >
> > > +static void vhost_user_host_notifier_restore(struct vhost_dev *dev,
> > > +                                             int queue_idx) {
> > > +    struct vhost_user *u = dev->opaque;
> > > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > > +    VirtIODevice *vdev = dev->vdev;
> > > +
> > > +    if (n->addr && !n->set) {
> > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true);
> > > +        n->set = true;
> > > +    }
> > > +}
> > > +
> > > +static void vhost_user_host_notifier_remove(struct vhost_dev *dev,
> > > +                                            int queue_idx) {
> > > +    struct vhost_user *u = dev->opaque;
> > > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > > +    VirtIODevice *vdev = dev->vdev;
> > > +
> > > +    if (n->addr && n->set) {
> > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> > > +        n->set = false;
> > > +    }
> > > +}
> > > +
> > >  static int vhost_user_set_vring_base(struct vhost_dev *dev,
> > >                                       struct vhost_vring_state
> > > *ring)  {
> > > +    vhost_user_host_notifier_restore(dev, ring->index);
> > > +
> > >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
> > > }
> > >
> > > @@ -674,6 +712,8 @@ static int vhost_user_get_vring_base(struct
> vhost_dev *dev,
> > >          .hdr.size = sizeof(msg.payload.state),
> > >      };
> > >
> > > +    vhost_user_host_notifier_remove(dev, ring->index);
> > > +
> > >      if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > >          return -1;
> > >      }
> > > @@ -847,6 +887,76 @@ static int
> vhost_user_slave_handle_config_change(struct vhost_dev *dev)
> > >      return ret;
> > >  }
> > >
> > > +static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev
> *dev,
> > > +                                                       VhostUserVringArea *area,
> > > +                                                       int fd) {
> > > +    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
> > > +    size_t page_size = qemu_real_host_page_size;
> > > +    struct vhost_user *u = dev->opaque;
> > > +    VhostUserState *user = u->user;
> > > +    VirtIODevice *vdev = dev->vdev;
> > > +    VhostUserHostNotifier *n;
> > > +    int ret = 0;
> > > +    void *addr;
> > > +    char *name;
> > > +
> > > +    if (!virtio_has_feature(dev->protocol_features,
> > > +                            VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
> > > +        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
> > > +        ret = -1;
> > > +        goto out;
> > > +    }
> > > +
> > > +    n = &user->notifier[queue_idx];
> > > +
> > > +    if (n->addr) {
> > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> > > +        object_unparent(OBJECT(&n->mr));
> > > +        munmap(n->addr, page_size);
> > > +        n->addr = NULL;
> > > +    }
> > > +
> > > +    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
> > > +        goto out;
> > > +    }
> > > +
> > > +    /* Sanity check. */
> > > +    if (area->size != page_size) {
> > > +        ret = -1;
> > > +        goto out;
> > > +    }
> > > +
> > > +    addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
> > > +                fd, area->offset);
> > > +    if (addr == MAP_FAILED) {
> > > +        ret = -1;
> > > +        goto out;
> > > +    }
> > > +
> > > +    name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
> > > +                           user, queue_idx);
> > > +    memory_region_init_ram_device_ptr(&n->mr, OBJECT(vdev), name,
> > > +                                      page_size, addr);
> > > +    g_free(name);
> > > +
> > > +    if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true)) {
> > > +        munmap(addr, page_size);
> > > +        ret = -1;
> > > +        goto out;
> > > +    }
> > > +
> > > +    n->addr = addr;
> > > +    n->set = true;
> > > +
> > > +out:
> > > +    /* Always close the fd. */
> > > +    if (fd != -1) {
> > > +        close(fd);
> > > +    }
> > > +    return ret;
> > > +}
> > > +
> > >  static void slave_read(void *opaque)  {
> > >      struct vhost_dev *dev = opaque; @@ -913,6 +1023,10 @@ static
> > > void slave_read(void *opaque)
> > >      case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
> > >          ret = vhost_user_slave_handle_config_change(dev);
> > >          break;
> > > +    case VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG:
> > > +        ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
> > > +                                                          fd);
> > > +        break;
> > >      default:
> > >          error_report("Received unexpected msg type.");
> > >          if (fd != -1) {
> > > @@ -1641,6 +1755,15 @@ VhostUserState *vhost_user_init(void)
> > >
> > >  void vhost_user_cleanup(VhostUserState *user)  {
> > > +    int i;
> > > +
> > > +    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> > > +        if (user->notifier[i].addr) {
> > > +            object_unparent(OBJECT(&user->notifier[i].mr));
> > > +            munmap(user->notifier[i].addr, qemu_real_host_page_size);
> > > +            user->notifier[i].addr = NULL;
> > > +        }
> > > +    }
> > >  }
> > >
> > >  const VhostOps user_ops = {
> > > diff --git a/include/hw/virtio/vhost-user.h
> > > b/include/hw/virtio/vhost-user.h index eb8bc0d90d..fd660393a0 100644
> > > --- a/include/hw/virtio/vhost-user.h
> > > +++ b/include/hw/virtio/vhost-user.h
> > > @@ -9,9 +9,17 @@
> > >  #define HW_VIRTIO_VHOST_USER_H
> > >
> > >  #include "chardev/char-fe.h"
> > > +#include "hw/virtio/virtio.h"
> > > +
> > > +typedef struct VhostUserHostNotifier {
> > > +    MemoryRegion mr;
> > > +    void *addr;
> > > +    bool set;
> > > +} VhostUserHostNotifier;
> > >
> > >  typedef struct VhostUserState {
> > >      CharBackend *chr;
> > > +    VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
> > >  } VhostUserState;
> > >
> > >  VhostUserState *vhost_user_init(void);
> > > --
> > > 2.11.0
Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Paolo Bonzini, 14 hours ago
On 19/04/2018 14:43, Liang, Cunming wrote:
>> 2. Memory barriers. Right now after updating the avail idx, 
>> virtio does smp_wmb() and then the MMIO write. Normal hardware
>> drivers do wmb() which is an sfence. Can a PCI device read bypass
>> index write and see a stale index value?
>
> A compiler barrier is enough on strongly-ordered memory platform. As
> it doesn't re-order store, PCI device won't see a stale index value.
> But a weakly-ordered memory needs sfence.

That is complicated then.  We need to define a feature bit and (in the
Linux driver) propagate it to vring_create_virtqueue's weak_barrier
argument.  However:

- if we make it 1 when weak barriers are needed, the device also needs
to nack feature negotiation (not allow setting the FEATURES_OK) if the
bit is not set by the driver.  However, that is not enough.  Live
migration assumes that it is okay to migrate a virtual machine from a
source that doesn't support a feature to a destination that supports it.
 In this case, it would assume that it is okay to migrate from software
virtio to hardware virtio.  This is wrong because the destination would
use weak barriers

- if we make it 1 when strong barriers are enough, software virtio
devices needs to be updated to expose the bit.  This works, including
live migration, but updated drivers will now go slower when run against
an old device that doesn't know the feature bit.

Maybe bump the PCI revision, so that only the new revision has the bit?

Thanks,

Paolo

Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Michael S. Tsirkin, 11 hours ago
On Thu, Apr 19, 2018 at 03:02:40PM +0200, Paolo Bonzini wrote:
> On 19/04/2018 14:43, Liang, Cunming wrote:
> >> 2. Memory barriers. Right now after updating the avail idx, 
> >> virtio does smp_wmb() and then the MMIO write. Normal hardware
> >> drivers do wmb() which is an sfence. Can a PCI device read bypass
> >> index write and see a stale index value?
> >
> > A compiler barrier is enough on strongly-ordered memory platform. As
> > it doesn't re-order store, PCI device won't see a stale index value.
> > But a weakly-ordered memory needs sfence.
> 
> That is complicated then.  We need to define a feature bit and (in the
> Linux driver) propagate it to vring_create_virtqueue's weak_barrier
> argument.  However:
> 
> - if we make it 1 when weak barriers are needed, the device also needs
> to nack feature negotiation (not allow setting the FEATURES_OK) if the
> bit is not set by the driver.
>  However, that is not enough.  Live
> migration assumes that it is okay to migrate a virtual machine from a
> source that doesn't support a feature to a destination that supports it.
>  In this case, it would assume that it is okay to migrate from software
> virtio to hardware virtio.  This is wrong because the destination would
> use weak barriers

You can't migrate between systems with different sets of device features
right now.

> - if we make it 1 when strong barriers are enough, software virtio
> devices needs to be updated to expose the bit.  This works, including
> live migration, but updated drivers will now go slower when run against
> an old device that doesn't know the feature bit.
> 
> Maybe bump the PCI revision, so that only the new revision has the bit?
> 
> Thanks,
> 
> Paolo

As a first step, if you want to migrate to a HW offloaded solution
then you need to enable the feature. It does mean it will go
a bit slower when run with software, so it's only good if
most systems in your cluster do have the HW offload.
I think we can start by getting that working and
think about ways to improve down the road.


That's the usecase we designed FEATURES_OK for though, so I do
think/hope it's enough and we don't need to play with revisions.


-- 
MST

Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Paolo Bonzini, 11 hours ago
On 19/04/2018 17:19, Michael S. Tsirkin wrote:
>> - if we make it 1 when weak barriers are needed, the device also needs
>> to nack feature negotiation (not allow setting the FEATURES_OK) if the
>> bit is not set by the driver.
>>  However, that is not enough.  Live
>> migration assumes that it is okay to migrate a virtual machine from a
>> source that doesn't support a feature to a destination that supports it.
>>  In this case, it would assume that it is okay to migrate from software
>> virtio to hardware virtio.  This is wrong because the destination would
>> use weak barriers
> 
> You can't migrate between systems with different sets of device features
> right now.

Yes, you can, exactly because some features are defined not by the
machine type but rather by the host kernel.  See virtio_net_get_features
in QEMU's hw/virtio/virtio-net.c, and virtio_set_features_nocheck in
QEMU's hw/virtio/virtio.c.

Thanks,

Paolo

Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Michael S. Tsirkin, 11 hours ago
On Thu, Apr 19, 2018 at 05:51:51PM +0200, Paolo Bonzini wrote:
> On 19/04/2018 17:19, Michael S. Tsirkin wrote:
> >> - if we make it 1 when weak barriers are needed, the device also needs
> >> to nack feature negotiation (not allow setting the FEATURES_OK) if the
> >> bit is not set by the driver.
> >>  However, that is not enough.  Live
> >> migration assumes that it is okay to migrate a virtual machine from a
> >> source that doesn't support a feature to a destination that supports it.
> >>  In this case, it would assume that it is okay to migrate from software
> >> virtio to hardware virtio.  This is wrong because the destination would
> >> use weak barriers
> > 
> > You can't migrate between systems with different sets of device features
> > right now.
> 
> Yes, you can, exactly because some features are defined not by the
> machine type but rather by the host kernel.  See virtio_net_get_features
> in QEMU's hw/virtio/virtio-net.c, and virtio_set_features_nocheck in
> QEMU's hw/virtio/virtio.c.
> 
> Thanks,
> 
> Paolo

Oh you are right. Well we can just special-case that one :)

-- 
MST

Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Paolo Bonzini, 11 hours ago
On 19/04/2018 17:59, Michael S. Tsirkin wrote:
> On Thu, Apr 19, 2018 at 05:51:51PM +0200, Paolo Bonzini wrote:
>> On 19/04/2018 17:19, Michael S. Tsirkin wrote:
>>>> - if we make it 1 when weak barriers are needed, the device also needs
>>>> to nack feature negotiation (not allow setting the FEATURES_OK) if the
>>>> bit is not set by the driver.
>>>>  However, that is not enough.  Live
>>>> migration assumes that it is okay to migrate a virtual machine from a
>>>> source that doesn't support a feature to a destination that supports it.
>>>>  In this case, it would assume that it is okay to migrate from software
>>>> virtio to hardware virtio.  This is wrong because the destination would
>>>> use weak barriers
>>>
>>> You can't migrate between systems with different sets of device features
>>> right now.
>>
>> Yes, you can, exactly because some features are defined not by the
>> machine type but rather by the host kernel.  See virtio_net_get_features
>> in QEMU's hw/virtio/virtio-net.c, and virtio_set_features_nocheck in
>> QEMU's hw/virtio/virtio.c.
> 
> Oh you are right. Well we can just special-case that one :)

Anything wrong with bumping the PCI revision?

Paolo

Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Michael S. Tsirkin, 10 hours ago
On Thu, Apr 19, 2018 at 06:07:07PM +0200, Paolo Bonzini wrote:
> On 19/04/2018 17:59, Michael S. Tsirkin wrote:
> > On Thu, Apr 19, 2018 at 05:51:51PM +0200, Paolo Bonzini wrote:
> >> On 19/04/2018 17:19, Michael S. Tsirkin wrote:
> >>>> - if we make it 1 when weak barriers are needed, the device also needs
> >>>> to nack feature negotiation (not allow setting the FEATURES_OK) if the
> >>>> bit is not set by the driver.
> >>>>  However, that is not enough.  Live
> >>>> migration assumes that it is okay to migrate a virtual machine from a
> >>>> source that doesn't support a feature to a destination that supports it.
> >>>>  In this case, it would assume that it is okay to migrate from software
> >>>> virtio to hardware virtio.  This is wrong because the destination would
> >>>> use weak barriers
> >>>
> >>> You can't migrate between systems with different sets of device features
> >>> right now.
> >>
> >> Yes, you can, exactly because some features are defined not by the
> >> machine type but rather by the host kernel.  See virtio_net_get_features
> >> in QEMU's hw/virtio/virtio-net.c, and virtio_set_features_nocheck in
> >> QEMU's hw/virtio/virtio.c.
> > 
> > Oh you are right. Well we can just special-case that one :)
> 
> Anything wrong with bumping the PCI revision?
> 
> Paolo

Nothing except in virtio 1 bumping the revision does not prevent
loading the old driver.

-- 
MST

Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Liang, Cunming, 10 hours ago

> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> Sent: Thursday, April 19, 2018 11:19 PM
> To: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Liang, Cunming <cunming.liang@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>;
> jasowang@redhat.com; alex.williamson@redhat.com; stefanha@redhat.com;
> qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org; Daly, Dan
> <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> Subject: Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering
> external host notifiers
> 
> On Thu, Apr 19, 2018 at 03:02:40PM +0200, Paolo Bonzini wrote:
> > On 19/04/2018 14:43, Liang, Cunming wrote:
> > >> 2. Memory barriers. Right now after updating the avail idx, virtio
> > >> does smp_wmb() and then the MMIO write. Normal hardware drivers do
> > >> wmb() which is an sfence. Can a PCI device read bypass index write
> > >> and see a stale index value?
> > >
> > > A compiler barrier is enough on strongly-ordered memory platform. As
> > > it doesn't re-order store, PCI device won't see a stale index value.
> > > But a weakly-ordered memory needs sfence.
> >
> > That is complicated then.  We need to define a feature bit and (in the
> > Linux driver) propagate it to vring_create_virtqueue's weak_barrier
> > argument.  However:
> >
> > - if we make it 1 when weak barriers are needed, the device also needs
> > to nack feature negotiation (not allow setting the FEATURES_OK) if the
> > bit is not set by the driver.
> >  However, that is not enough.  Live
> > migration assumes that it is okay to migrate a virtual machine from a
> > source that doesn't support a feature to a destination that supports it.
> >  In this case, it would assume that it is okay to migrate from
> > software virtio to hardware virtio.  This is wrong because the
> > destination would use weak barriers
> 
> You can't migrate between systems with different sets of device features right
> now.
> 
> > - if we make it 1 when strong barriers are enough, software virtio
> > devices needs to be updated to expose the bit.  This works, including
> > live migration, but updated drivers will now go slower when run
> > against an old device that doesn't know the feature bit.
> >
> > Maybe bump the PCI revision, so that only the new revision has the bit?
> >
> > Thanks,
> >
> > Paolo
> 
> As a first step, if you want to migrate to a HW offloaded solution then you need
> to enable the feature.

> It does mean it will go a bit slower when run with software,
> so it's only good if most systems in your cluster do have the HW offload.
To clarify a bit more, it's suboptimal to always use mandatory barriers for MMIO. Per strongly-order memory, 'weak barriers' (smp_wmb) is pretty good for MMIO. The tradeoff doesn't always happen, software and HW offload can align on the same page.

> I think we can start by getting that working and think about ways to improve
> down the road.
> 
> 
> That's the usecase we designed FEATURES_OK for though, so I do think/hope it's
> enough and we don't need to play with revisions.
> 
> 
> --
> MST

Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Michael S. Tsirkin, 10 hours ago
On Thu, Apr 19, 2018 at 04:24:29PM +0000, Liang, Cunming wrote:
> 
> 
> > -----Original Message-----
> > From: Michael S. Tsirkin [mailto:mst@redhat.com]
> > Sent: Thursday, April 19, 2018 11:19 PM
> > To: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Liang, Cunming <cunming.liang@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>;
> > jasowang@redhat.com; alex.williamson@redhat.com; stefanha@redhat.com;
> > qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org; Daly, Dan
> > <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> > <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> > Subject: Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering
> > external host notifiers
> > 
> > On Thu, Apr 19, 2018 at 03:02:40PM +0200, Paolo Bonzini wrote:
> > > On 19/04/2018 14:43, Liang, Cunming wrote:
> > > >> 2. Memory barriers. Right now after updating the avail idx, virtio
> > > >> does smp_wmb() and then the MMIO write. Normal hardware drivers do
> > > >> wmb() which is an sfence. Can a PCI device read bypass index write
> > > >> and see a stale index value?
> > > >
> > > > A compiler barrier is enough on strongly-ordered memory platform. As
> > > > it doesn't re-order store, PCI device won't see a stale index value.
> > > > But a weakly-ordered memory needs sfence.
> > >
> > > That is complicated then.  We need to define a feature bit and (in the
> > > Linux driver) propagate it to vring_create_virtqueue's weak_barrier
> > > argument.  However:
> > >
> > > - if we make it 1 when weak barriers are needed, the device also needs
> > > to nack feature negotiation (not allow setting the FEATURES_OK) if the
> > > bit is not set by the driver.
> > >  However, that is not enough.  Live
> > > migration assumes that it is okay to migrate a virtual machine from a
> > > source that doesn't support a feature to a destination that supports it.
> > >  In this case, it would assume that it is okay to migrate from
> > > software virtio to hardware virtio.  This is wrong because the
> > > destination would use weak barriers
> > 
> > You can't migrate between systems with different sets of device features right
> > now.
> > 
> > > - if we make it 1 when strong barriers are enough, software virtio
> > > devices needs to be updated to expose the bit.  This works, including
> > > live migration, but updated drivers will now go slower when run
> > > against an old device that doesn't know the feature bit.
> > >
> > > Maybe bump the PCI revision, so that only the new revision has the bit?
> > >
> > > Thanks,
> > >
> > > Paolo
> > 
> > As a first step, if you want to migrate to a HW offloaded solution then you need
> > to enable the feature.
> 
> > It does mean it will go a bit slower when run with software,
> > so it's only good if most systems in your cluster do have the HW offload.
> To clarify a bit more, it's suboptimal to always use mandatory barriers for MMIO. Per strongly-order memory, 'weak barriers' (smp_wmb) is pretty good for MMIO. The tradeoff doesn't always happen, software and HW offload can align on the same page.

I agree to all of the above except where you say smp_wmb.

smp_wmb is for controlling SMP effects on Linux, and I suspect
it will not do the right thing on some non-Intel architectures.

The claim is I think correct for Intel/AMD platforms, and probably
other strongly ordered ones. I suspect it's incorrect for ARM and
power.

Replace smp_wmb with 'asm volatile ("") on Intel' and I'll agree.



> > I think we can start by getting that working and think about ways to improve
> > down the road.
> > 
> > 
> > That's the usecase we designed FEATURES_OK for though, so I do think/hope it's
> > enough and we don't need to play with revisions.
> > 
> > 
> > --
> > MST

Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Liang, Cunming, 15 minutes ago

> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> Sent: Friday, April 20, 2018 12:56 AM
> To: Liang, Cunming <cunming.liang@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>; Bie, Tiwei <tiwei.bie@intel.com>;
> jasowang@redhat.com; alex.williamson@redhat.com; stefanha@redhat.com;
> qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org; Daly, Dan
> <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> Subject: Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering
> external host notifiers
> 
> On Thu, Apr 19, 2018 at 04:24:29PM +0000, Liang, Cunming wrote:
> >
> >
> > > -----Original Message-----
> > > From: Michael S. Tsirkin [mailto:mst@redhat.com]
> > > Sent: Thursday, April 19, 2018 11:19 PM
> > > To: Paolo Bonzini <pbonzini@redhat.com>
> > > Cc: Liang, Cunming <cunming.liang@intel.com>; Bie, Tiwei
> > > <tiwei.bie@intel.com>; jasowang@redhat.com;
> > > alex.williamson@redhat.com; stefanha@redhat.com;
> > > qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org; Daly, Dan
> > > <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang,
> > > Zhihong <zhihong.wang@intel.com>; Wang, Xiao W
> > > <xiao.w.wang@intel.com>
> > > Subject: Re: [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support
> > > registering external host notifiers
> > >
> > > On Thu, Apr 19, 2018 at 03:02:40PM +0200, Paolo Bonzini wrote:
> > > > On 19/04/2018 14:43, Liang, Cunming wrote:
> > > > >> 2. Memory barriers. Right now after updating the avail idx,
> > > > >> virtio does smp_wmb() and then the MMIO write. Normal hardware
> > > > >> drivers do
> > > > >> wmb() which is an sfence. Can a PCI device read bypass index
> > > > >> write and see a stale index value?
> > > > >
> > > > > A compiler barrier is enough on strongly-ordered memory
> > > > > platform. As it doesn't re-order store, PCI device won't see a stale index
> value.
> > > > > But a weakly-ordered memory needs sfence.
> > > >
> > > > That is complicated then.  We need to define a feature bit and (in
> > > > the Linux driver) propagate it to vring_create_virtqueue's
> > > > weak_barrier argument.  However:
> > > >
> > > > - if we make it 1 when weak barriers are needed, the device also
> > > > needs to nack feature negotiation (not allow setting the
> > > > FEATURES_OK) if the bit is not set by the driver.
> > > >  However, that is not enough.  Live migration assumes that it is
> > > > okay to migrate a virtual machine from a source that doesn't
> > > > support a feature to a destination that supports it.
> > > >  In this case, it would assume that it is okay to migrate from
> > > > software virtio to hardware virtio.  This is wrong because the
> > > > destination would use weak barriers
> > >
> > > You can't migrate between systems with different sets of device
> > > features right now.
> > >
> > > > - if we make it 1 when strong barriers are enough, software virtio
> > > > devices needs to be updated to expose the bit.  This works,
> > > > including live migration, but updated drivers will now go slower
> > > > when run against an old device that doesn't know the feature bit.
> > > >
> > > > Maybe bump the PCI revision, so that only the new revision has the bit?
> > > >
> > > > Thanks,
> > > >
> > > > Paolo
> > >
> > > As a first step, if you want to migrate to a HW offloaded solution
> > > then you need to enable the feature.
> >
> > > It does mean it will go a bit slower when run with software, so it's
> > > only good if most systems in your cluster do have the HW offload.
> > To clarify a bit more, it's suboptimal to always use mandatory barriers for MMIO.
> Per strongly-order memory, 'weak barriers' (smp_wmb) is pretty good for MMIO.
> The tradeoff doesn't always happen, software and HW offload can align on the
> same page.
> 
> I agree to all of the above except where you say smp_wmb.
> 
> smp_wmb is for controlling SMP effects on Linux, and I suspect it will not do the
> right thing on some non-Intel architectures.
> 
> The claim is I think correct for Intel/AMD platforms, and probably other strongly
> ordered ones. I suspect it's incorrect for ARM and power.
> 
> Replace smp_wmb with 'asm volatile ("") on Intel' and I'll agree.

Yeah, that's more accurate. 

> 
> 
> 
> > > I think we can start by getting that working and think about ways to
> > > improve down the road.
> > >
> > >
> > > That's the usecase we designed FEATURES_OK for though, so I do
> > > think/hope it's enough and we don't need to play with revisions.
> > >
> > >
> > > --
> > > MST

Re: [Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Michael S. Tsirkin, 11 hours ago
On Thu, Apr 19, 2018 at 12:43:42PM +0000, Liang, Cunming wrote:
> 
> 
> > -----Original Message-----
> > From: Bie, Tiwei
> > Sent: Thursday, April 19, 2018 7:15 PM
> > To: Michael S. Tsirkin <mst@redhat.com>
> > Cc: jasowang@redhat.com; alex.williamson@redhat.com; pbonzini@redhat.com;
> > stefanha@redhat.com; qemu-devel@nongnu.org; virtio-dev@lists.oasis-
> > open.org; Liang, Cunming <cunming.liang@intel.com>; Daly, Dan
> > <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> > <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> > Subject: Re: [PATCH v3 6/6] vhost-user: support registering external host
> > notifiers
> > 
> > On Wed, Apr 18, 2018 at 07:34:06PM +0300, Michael S. Tsirkin wrote:
> > > On Thu, Apr 12, 2018 at 11:12:32PM +0800, Tiwei Bie wrote:
> > > > This patch introduces VHOST_USER_PROTOCOL_F_HOST_NOTIFIER.
> > > > With this feature negotiated, vhost-user backend can register memory
> > > > region based host notifiers. And it will allow the guest driver in
> > > > the VM to notify the hardware accelerator at the vhost-user backend
> > > > directly.
> > > >
> > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > >
> > > Overall I think we can merge this approach, but I have two main
> > > concerns about this:
> > >
> > > 1. Testing. Most people do not have the virtio hardware
> > >    so how to make sure this does not bit rot?
> > >
> > >    I have an idea: add an option like this to libvhost-user.
> > >    Naturally libvhost-user can not get notified about a write
> > >    to an mmapped area, but it can write a special value there
> > >    (e.g. all-ones?) and then poll it to detect VQ # writes.
> > >
> > >    Then include a vhost user bridge test with an option like this.
> > >
> > >    I'd like to see a patch doing this.
> > 
> > Sure, I'll do it. Thanks for the suggestion!
> > 
> > >
> > > 2. Memory barriers. Right now after updating the avail idx,
> > >    virtio does smp_wmb() and then the MMIO write.
> > >    Normal hardware drivers do wmb() which is an sfence.
> > >    Can a PCI device read bypass index write and see a stale
> > >    index value?
> A compiler barrier is enough on strongly-ordered memory platform. As it doesn't re-order store, PCI device won't see a stale index value. But a weakly-ordered memory needs sfence.


Oh you are right.

So it's only needed for non-intel platforms or when packets are in WC
memory then. And I don't know whether dpdk ever puts packets in WC memory.

I guess we'll cross this bridge when we get to it.


> > 
> > It depends on arch's memory model. Cunming will provide more details about
> > this later.
> > 
> > >    To make virtio pci do wmb() we would need a new feature bit.
> > >    Alternatively I guess we could maybe look at subsystem vendor/device id.
> > >
> > >    I'd like to see a patch doing one of these things.
> > 
> > We prefer to add a new feature bit as it's a more robust way to do this. I'll send
> > out some patches soon.
> > 
> > Thank you very much! :)
> > 
> > Best regards,
> > Tiwei Bie
> > 
> > >
> > > Thanks!
> > >
> > > > ---
> > > >  docs/interop/vhost-user.txt    |  33 +++++++++++
> > > >  hw/virtio/vhost-user.c         | 123
> > +++++++++++++++++++++++++++++++++++++++++
> > > >  include/hw/virtio/vhost-user.h |   8 +++
> > > >  3 files changed, 164 insertions(+)
> > > >
> > > > diff --git a/docs/interop/vhost-user.txt
> > > > b/docs/interop/vhost-user.txt index 534caab18a..9e57b36b20 100644
> > > > --- a/docs/interop/vhost-user.txt
> > > > +++ b/docs/interop/vhost-user.txt
> > > > @@ -132,6 +132,16 @@ Depending on the request type, payload can be:
> > > >     Payload: Size bytes array holding the contents of the virtio
> > > >         device's configuration space
> > > >
> > > > + * Vring area description
> > > > +   -----------------------
> > > > +   | u64 | size | offset |
> > > > +   -----------------------
> > > > +
> > > > +   u64: a 64-bit integer contains vring index and flags
> > > > +   Size: a 64-bit size of this area
> > > > +   Offset: a 64-bit offset of this area from the start of the
> > > > +       supplied file descriptor
> > > > +
> > > >  In QEMU the vhost-user message is implemented with the following struct:
> > > >
> > > >  typedef struct VhostUserMsg {
> > > > @@ -146,6 +156,7 @@ typedef struct VhostUserMsg {
> > > >          VhostUserLog log;
> > > >          struct vhost_iotlb_msg iotlb;
> > > >          VhostUserConfig config;
> > > > +        VhostUserVringArea area;
> > > >      };
> > > >  } QEMU_PACKED VhostUserMsg;
> > > >
> > > > @@ -380,6 +391,7 @@ Protocol features  #define
> > > > VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> > > >  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
> > > >  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> > > > +#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  10
> > > >
> > > >  Master message types
> > > >  --------------------
> > > > @@ -777,6 +789,27 @@ Slave message types
> > > >       the VHOST_USER_NEED_REPLY flag, master must respond with zero when
> > > >       operation is successfully completed, or non-zero otherwise.
> > > >
> > > > + * VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG
> > > > +
> > > > +      Id: 3
> > > > +      Equivalent ioctl: N/A
> > > > +      Slave payload: vring area description
> > > > +      Master payload: N/A
> > > > +
> > > > +      Sets host notifier for a specified queue. The queue index is contained
> > > > +      in the u64 field of the vring area description. The host notifier is
> > > > +      described by the file descriptor (typically it's a VFIO device fd) which
> > > > +      is passed as ancillary data and the size (which is mmap size and should
> > > > +      be the same as host page size) and offset (which is mmap offset) carried
> > > > +      in the vring area description. QEMU can mmap the file descriptor based
> > > > +      on the size and offset to get a memory range. Registering a host notifier
> > > > +      means mapping this memory range to the VM as the specified queue's
> > notify
> > > > +      MMIO region. Slave sends this request to tell QEMU to de-register the
> > > > +      existing notifier if any and register the new notifier if the request is
> > > > +      sent with a file descriptor.
> > > > +      This request should be sent only when
> > VHOST_USER_PROTOCOL_F_HOST_NOTIFIER
> > > > +      protocol feature has been successfully negotiated.
> > > > +
> > > >  VHOST_USER_PROTOCOL_F_REPLY_ACK:
> > > >  -------------------------------
> > > >  The original vhost-user specification only demands replies for
> > > > certain diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > > index 791e0a4763..1cd9c7276b 100644
> > > > --- a/hw/virtio/vhost-user.c
> > > > +++ b/hw/virtio/vhost-user.c
> > > > @@ -13,6 +13,7 @@
> > > >  #include "hw/virtio/vhost.h"
> > > >  #include "hw/virtio/vhost-user.h"
> > > >  #include "hw/virtio/vhost-backend.h"
> > > > +#include "hw/virtio/virtio.h"
> > > >  #include "hw/virtio/virtio-net.h"
> > > >  #include "chardev/char-fe.h"
> > > >  #include "sysemu/kvm.h"
> > > > @@ -48,6 +49,7 @@ enum VhostUserProtocolFeature {
> > > >      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> > > >      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
> > > >      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> > > > +    VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 10,
> > > >      VHOST_USER_PROTOCOL_F_MAX
> > > >  };
> > > >
> > > > @@ -92,6 +94,7 @@ typedef enum VhostUserSlaveRequest {
> > > >      VHOST_USER_SLAVE_NONE = 0,
> > > >      VHOST_USER_SLAVE_IOTLB_MSG = 1,
> > > >      VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
> > > > +    VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
> > > >      VHOST_USER_SLAVE_MAX
> > > >  }  VhostUserSlaveRequest;
> > > >
> > > > @@ -136,6 +139,12 @@ static VhostUserConfig c __attribute__ ((unused));
> > > >                                     + sizeof(c.size) \
> > > >                                     + sizeof(c.flags))
> > > >
> > > > +typedef struct VhostUserVringArea {
> > > > +    uint64_t u64;
> > > > +    uint64_t size;
> > > > +    uint64_t offset;
> > > > +} VhostUserVringArea;
> > > > +
> > > >  typedef struct {
> > > >      VhostUserRequest request;
> > > >
> > > > @@ -157,6 +166,7 @@ typedef union {
> > > >          struct vhost_iotlb_msg iotlb;
> > > >          VhostUserConfig config;
> > > >          VhostUserCryptoSession session;
> > > > +        VhostUserVringArea area;
> > > >  } VhostUserPayload;
> > > >
> > > >  typedef struct VhostUserMsg {
> > > > @@ -638,9 +648,37 @@ static int vhost_user_set_vring_num(struct
> > vhost_dev *dev,
> > > >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);  }
> > > >
> > > > +static void vhost_user_host_notifier_restore(struct vhost_dev *dev,
> > > > +                                             int queue_idx) {
> > > > +    struct vhost_user *u = dev->opaque;
> > > > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > > > +    VirtIODevice *vdev = dev->vdev;
> > > > +
> > > > +    if (n->addr && !n->set) {
> > > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true);
> > > > +        n->set = true;
> > > > +    }
> > > > +}
> > > > +
> > > > +static void vhost_user_host_notifier_remove(struct vhost_dev *dev,
> > > > +                                            int queue_idx) {
> > > > +    struct vhost_user *u = dev->opaque;
> > > > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > > > +    VirtIODevice *vdev = dev->vdev;
> > > > +
> > > > +    if (n->addr && n->set) {
> > > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> > > > +        n->set = false;
> > > > +    }
> > > > +}
> > > > +
> > > >  static int vhost_user_set_vring_base(struct vhost_dev *dev,
> > > >                                       struct vhost_vring_state
> > > > *ring)  {
> > > > +    vhost_user_host_notifier_restore(dev, ring->index);
> > > > +
> > > >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
> > > > }
> > > >
> > > > @@ -674,6 +712,8 @@ static int vhost_user_get_vring_base(struct
> > vhost_dev *dev,
> > > >          .hdr.size = sizeof(msg.payload.state),
> > > >      };
> > > >
> > > > +    vhost_user_host_notifier_remove(dev, ring->index);
> > > > +
> > > >      if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > > >          return -1;
> > > >      }
> > > > @@ -847,6 +887,76 @@ static int
> > vhost_user_slave_handle_config_change(struct vhost_dev *dev)
> > > >      return ret;
> > > >  }
> > > >
> > > > +static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev
> > *dev,
> > > > +                                                       VhostUserVringArea *area,
> > > > +                                                       int fd) {
> > > > +    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
> > > > +    size_t page_size = qemu_real_host_page_size;
> > > > +    struct vhost_user *u = dev->opaque;
> > > > +    VhostUserState *user = u->user;
> > > > +    VirtIODevice *vdev = dev->vdev;
> > > > +    VhostUserHostNotifier *n;
> > > > +    int ret = 0;
> > > > +    void *addr;
> > > > +    char *name;
> > > > +
> > > > +    if (!virtio_has_feature(dev->protocol_features,
> > > > +                            VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
> > > > +        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
> > > > +        ret = -1;
> > > > +        goto out;
> > > > +    }
> > > > +
> > > > +    n = &user->notifier[queue_idx];
> > > > +
> > > > +    if (n->addr) {
> > > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
> > > > +        object_unparent(OBJECT(&n->mr));
> > > > +        munmap(n->addr, page_size);
> > > > +        n->addr = NULL;
> > > > +    }
> > > > +
> > > > +    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
> > > > +        goto out;
> > > > +    }
> > > > +
> > > > +    /* Sanity check. */
> > > > +    if (area->size != page_size) {
> > > > +        ret = -1;
> > > > +        goto out;
> > > > +    }
> > > > +
> > > > +    addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
> > > > +                fd, area->offset);
> > > > +    if (addr == MAP_FAILED) {
> > > > +        ret = -1;
> > > > +        goto out;
> > > > +    }
> > > > +
> > > > +    name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
> > > > +                           user, queue_idx);
> > > > +    memory_region_init_ram_device_ptr(&n->mr, OBJECT(vdev), name,
> > > > +                                      page_size, addr);
> > > > +    g_free(name);
> > > > +
> > > > +    if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true)) {
> > > > +        munmap(addr, page_size);
> > > > +        ret = -1;
> > > > +        goto out;
> > > > +    }
> > > > +
> > > > +    n->addr = addr;
> > > > +    n->set = true;
> > > > +
> > > > +out:
> > > > +    /* Always close the fd. */
> > > > +    if (fd != -1) {
> > > > +        close(fd);
> > > > +    }
> > > > +    return ret;
> > > > +}
> > > > +
> > > >  static void slave_read(void *opaque)  {
> > > >      struct vhost_dev *dev = opaque; @@ -913,6 +1023,10 @@ static
> > > > void slave_read(void *opaque)
> > > >      case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
> > > >          ret = vhost_user_slave_handle_config_change(dev);
> > > >          break;
> > > > +    case VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG:
> > > > +        ret = vhost_user_slave_handle_vring_host_notifier(dev, &payload.area,
> > > > +                                                          fd);
> > > > +        break;
> > > >      default:
> > > >          error_report("Received unexpected msg type.");
> > > >          if (fd != -1) {
> > > > @@ -1641,6 +1755,15 @@ VhostUserState *vhost_user_init(void)
> > > >
> > > >  void vhost_user_cleanup(VhostUserState *user)  {
> > > > +    int i;
> > > > +
> > > > +    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> > > > +        if (user->notifier[i].addr) {
> > > > +            object_unparent(OBJECT(&user->notifier[i].mr));
> > > > +            munmap(user->notifier[i].addr, qemu_real_host_page_size);
> > > > +            user->notifier[i].addr = NULL;
> > > > +        }
> > > > +    }
> > > >  }
> > > >
> > > >  const VhostOps user_ops = {
> > > > diff --git a/include/hw/virtio/vhost-user.h
> > > > b/include/hw/virtio/vhost-user.h index eb8bc0d90d..fd660393a0 100644
> > > > --- a/include/hw/virtio/vhost-user.h
> > > > +++ b/include/hw/virtio/vhost-user.h
> > > > @@ -9,9 +9,17 @@
> > > >  #define HW_VIRTIO_VHOST_USER_H
> > > >
> > > >  #include "chardev/char-fe.h"
> > > > +#include "hw/virtio/virtio.h"
> > > > +
> > > > +typedef struct VhostUserHostNotifier {
> > > > +    MemoryRegion mr;
> > > > +    void *addr;
> > > > +    bool set;
> > > > +} VhostUserHostNotifier;
> > > >
> > > >  typedef struct VhostUserState {
> > > >      CharBackend *chr;
> > > > +    VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
> > > >  } VhostUserState;
> > > >
> > > >  VhostUserState *vhost_user_init(void);
> > > > --
> > > > 2.11.0

Re: [Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Paolo Bonzini, 11 hours ago
On 19/04/2018 17:42, Michael S. Tsirkin wrote:
>> A compiler barrier is enough on strongly-ordered memory platform.
>> As it doesn't re-order store, PCI device won't see a stale index
>> value. But a weakly-ordered memory needs sfence.
> 
> Oh you are right.
> 
> So it's only needed for non-intel platforms or when packets are in
> WC memory then. And I don't know whether dpdk ever puts packets in WC
> memory.
> 
> I guess we'll cross this bridge when we get to it.

Non-TSO architectures seem important...

Paolo

Re: [Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Michael S. Tsirkin, 10 hours ago
On Thu, Apr 19, 2018 at 05:52:23PM +0200, Paolo Bonzini wrote:
> On 19/04/2018 17:42, Michael S. Tsirkin wrote:
> >> A compiler barrier is enough on strongly-ordered memory platform.
> >> As it doesn't re-order store, PCI device won't see a stale index
> >> value. But a weakly-ordered memory needs sfence.
> > 
> > Oh you are right.
> > 
> > So it's only needed for non-intel platforms or when packets are in
> > WC memory then. And I don't know whether dpdk ever puts packets in WC
> > memory.
> > 
> > I guess we'll cross this bridge when we get to it.
> 
> Non-TSO architectures seem important...
> 
> Paolo

Only if there are virtio offload engines on platforms with these
architectures :) Worth working on for sure but doesn't have to block
this patchset.

-- 
MST

Re: [Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Liang, Cunming, 10 hours ago

> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Thursday, April 19, 2018 11:52 PM
> To: Michael S. Tsirkin <mst@redhat.com>; Liang, Cunming
> <cunming.liang@intel.com>
> Cc: Bie, Tiwei <tiwei.bie@intel.com>; jasowang@redhat.com;
> alex.williamson@redhat.com; stefanha@redhat.com; qemu-devel@nongnu.org;
> virtio-dev@lists.oasis-open.org; Daly, Dan <dan.daly@intel.com>; Tan, Jianfeng
> <jianfeng.tan@intel.com>; Wang, Zhihong <zhihong.wang@intel.com>; Wang,
> Xiao W <xiao.w.wang@intel.com>
> Subject: Re: [PATCH v3 6/6] vhost-user: support registering external host
> notifiers
> 
> On 19/04/2018 17:42, Michael S. Tsirkin wrote:
> >> A compiler barrier is enough on strongly-ordered memory platform.
> >> As it doesn't re-order store, PCI device won't see a stale index
> >> value. But a weakly-ordered memory needs sfence.
> >
> > Oh you are right.
> >
> > So it's only needed for non-intel platforms or when packets are in WC
> > memory then. And I don't know whether dpdk ever puts packets in WC
> > memory.
> >
> > I guess we'll cross this bridge when we get to it.
> 
> Non-TSO architectures seem important...

I'm not familiar with Non-TSO, trying to understand the difference according to the feature set. Let's say non-TSO architectures do not set 'weak_barriers'. Then mandatory barrier is used for software. HW offload on that platform would choose different feature set against software? 
If it's not, essentially we're worried about live migration from a TSO to a non-TSO architectures platform?

> 
> Paolo
Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Paolo Bonzini, 10 hours ago
On 19/04/2018 18:52, Liang, Cunming wrote:
>>> Oh you are right.
>>> 
>>> So it's only needed for non-intel platforms or when packets are
>>> in WC memory then. And I don't know whether dpdk ever puts
>>> packets in WC memory.
>>> 
>>> I guess we'll cross this bridge when we get to it.
>> Non-TSO architectures seem important...
>
> I'm not familiar with Non-TSO, trying to understand the difference
> according to the feature set. Let's say non-TSO architectures do not
> set 'weak_barriers'. Then mandatory barrier is used for software. HW
> offload on that platform would choose different feature set against
> software? If it's not, essentially we're worried about live migration
> from a TSO to a non-TSO architectures platform?

I'm worried about live migration from software virtio to hardware virtio
on non-TSO architectures.  For example, on ARM you would have a "dmb
ishst" (smp_wmb) for software virtio and a "dsb st" (wmb) or "dmb oshst"
(dma_wmb) for hardware virtio.

For this to work, you would have to set up the VM so that it uses the
heavier barriers from the beginning, even when backed by software virtio.

Paolo

Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Michael S. Tsirkin, 9 hours ago
On Thu, Apr 19, 2018 at 06:59:39PM +0200, Paolo Bonzini wrote:
> On 19/04/2018 18:52, Liang, Cunming wrote:
> >>> Oh you are right.
> >>> 
> >>> So it's only needed for non-intel platforms or when packets are
> >>> in WC memory then. And I don't know whether dpdk ever puts
> >>> packets in WC memory.
> >>> 
> >>> I guess we'll cross this bridge when we get to it.
> >> Non-TSO architectures seem important...
> >
> > I'm not familiar with Non-TSO, trying to understand the difference
> > according to the feature set. Let's say non-TSO architectures do not
> > set 'weak_barriers'. Then mandatory barrier is used for software. HW
> > offload on that platform would choose different feature set against
> > software? If it's not, essentially we're worried about live migration
> > from a TSO to a non-TSO architectures platform?
> 
> I'm worried about live migration from software virtio to hardware virtio
> on non-TSO architectures.  For example, on ARM you would have a "dmb
> ishst" (smp_wmb) for software virtio and a "dsb st" (wmb) or "dmb oshst"
> (dma_wmb) for hardware virtio.
> 
> For this to work, you would have to set up the VM so that it uses the
> heavier barriers from the beginning, even when backed by software virtio.
> 
> Paolo

Right. Or disallow hardware to software migrations.

But generally the mandatory and even dma barriers in Linux are often an overkill.

See ARM for example: 

#if defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP)
#define mb()            __arm_heavy_mb()
#define rmb()           dsb()
#define wmb()           __arm_heavy_mb(st)
#define dma_rmb()       dmb(osh)
#define dma_wmb()       dmb(oshst)
#else
#define mb()            barrier()
#define rmb()           barrier()
#define wmb()           barrier()
#define dma_rmb()       barrier()
#define dma_wmb()       barrier()
#endif

That CONFIG_SMP here is clearly wrong but I don't really know what
to set it to. Also, we probably should switch virtio_wmb to dma_XX
barriers.

That's actually easy. Will try to do.

-- 
MST

Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Paolo Bonzini, 9 hours ago
On 19/04/2018 19:27, Michael S. Tsirkin wrote:
> 
> That CONFIG_SMP here is clearly wrong but I don't really know what
> to set it to. Also, we probably should switch virtio_wmb to dma_XX
> barriers.
> 
> That's actually easy. Will try to do.

Should it be dma_wmb() before updating the indices, and wmb() before
writing the "kick virtqueue" register?

Paolo

Re: [Qemu-devel] [virtio-dev] RE: [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Michael S. Tsirkin, 9 hours ago
On Thu, Apr 19, 2018 at 07:35:57PM +0200, Paolo Bonzini wrote:
> On 19/04/2018 19:27, Michael S. Tsirkin wrote:
> > 
> > That CONFIG_SMP here is clearly wrong but I don't really know what
> > to set it to. Also, we probably should switch virtio_wmb to dma_XX
> > barriers.
> > 
> > That's actually easy. Will try to do.
> 
> Should it be dma_wmb() before updating the indices, and wmb() before
> writing the "kick virtqueue" register?
> 
> Paolo

if anything kick virtqueue should do wmb internally
within virtio pci.

No one uses strong barriers with virtio pci right now
so we don't do it.

-- 
MST

Re: [Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Michael S. Tsirkin, 10 hours ago
On Thu, Apr 19, 2018 at 04:52:20PM +0000, Liang, Cunming wrote:
> 
> 
> > -----Original Message-----
> > From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> > Sent: Thursday, April 19, 2018 11:52 PM
> > To: Michael S. Tsirkin <mst@redhat.com>; Liang, Cunming
> > <cunming.liang@intel.com>
> > Cc: Bie, Tiwei <tiwei.bie@intel.com>; jasowang@redhat.com;
> > alex.williamson@redhat.com; stefanha@redhat.com; qemu-devel@nongnu.org;
> > virtio-dev@lists.oasis-open.org; Daly, Dan <dan.daly@intel.com>; Tan, Jianfeng
> > <jianfeng.tan@intel.com>; Wang, Zhihong <zhihong.wang@intel.com>; Wang,
> > Xiao W <xiao.w.wang@intel.com>
> > Subject: Re: [PATCH v3 6/6] vhost-user: support registering external host
> > notifiers
> > 
> > On 19/04/2018 17:42, Michael S. Tsirkin wrote:
> > >> A compiler barrier is enough on strongly-ordered memory platform.
> > >> As it doesn't re-order store, PCI device won't see a stale index
> > >> value. But a weakly-ordered memory needs sfence.
> > >
> > > Oh you are right.
> > >
> > > So it's only needed for non-intel platforms or when packets are in WC
> > > memory then. And I don't know whether dpdk ever puts packets in WC
> > > memory.
> > >
> > > I guess we'll cross this bridge when we get to it.
> > 
> > Non-TSO architectures seem important...
> 
> I'm not familiar with Non-TSO, trying to understand the difference
> according to the feature set. Let's say non-TSO architectures do not
> set 'weak_barriers'. Then mandatory barrier is used for software. HW
> offload on that platform would choose different feature set against
> software? 

Right. We'll need a flag for this feature for starters. It doesn't exist
:) Paolo also points out that we should then add code to disallow
migration between setups with and without the feature.

> If it's not, essentially we're worried about live migration from a TSO to a non-TSO architectures platform?

Probably not.

> > 
> > Paolo

Re: [Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Liang, Cunming, 4 hours ago

> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> Sent: Friday, April 20, 2018 1:01 AM
> To: Liang, Cunming <cunming.liang@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>; Bie, Tiwei <tiwei.bie@intel.com>;
> jasowang@redhat.com; alex.williamson@redhat.com; stefanha@redhat.com;
> qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org; Daly, Dan
> <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> Subject: Re: [PATCH v3 6/6] vhost-user: support registering external host
> notifiers
> 
> On Thu, Apr 19, 2018 at 04:52:20PM +0000, Liang, Cunming wrote:
> >
> >
> > > -----Original Message-----
> > > From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> > > Sent: Thursday, April 19, 2018 11:52 PM
> > > To: Michael S. Tsirkin <mst@redhat.com>; Liang, Cunming
> > > <cunming.liang@intel.com>
> > > Cc: Bie, Tiwei <tiwei.bie@intel.com>; jasowang@redhat.com;
> > > alex.williamson@redhat.com; stefanha@redhat.com;
> > > qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org; Daly, Dan
> > > <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang,
> > > Zhihong <zhihong.wang@intel.com>; Wang, Xiao W
> > > <xiao.w.wang@intel.com>
> > > Subject: Re: [PATCH v3 6/6] vhost-user: support registering external
> > > host notifiers
> > >
> > > On 19/04/2018 17:42, Michael S. Tsirkin wrote:
> > > >> A compiler barrier is enough on strongly-ordered memory platform.
> > > >> As it doesn't re-order store, PCI device won't see a stale index
> > > >> value. But a weakly-ordered memory needs sfence.
> > > >
> > > > Oh you are right.
> > > >
> > > > So it's only needed for non-intel platforms or when packets are in
> > > > WC memory then. And I don't know whether dpdk ever puts packets in
> > > > WC memory.
> > > >
> > > > I guess we'll cross this bridge when we get to it.
> > >
> > > Non-TSO architectures seem important...
> >
> > I'm not familiar with Non-TSO, trying to understand the difference
> > according to the feature set. Let's say non-TSO architectures do not
> > set 'weak_barriers'. Then mandatory barrier is used for software. HW
> > offload on that platform would choose different feature set against
> > software?
> 
> Right. We'll need a flag for this feature for starters. It doesn't exist
> :) Paolo also points out that we should then add code to disallow migration
> between setups with and without the feature.
I see. Thanks.

> 
> > If it's not, essentially we're worried about live migration from a TSO to a non-
> TSO architectures platform?
> 
> Probably not.
> 
> > >
> > > Paolo

Re: [Qemu-devel] [PATCH v3 6/6] vhost-user: support registering external host notifiers
Posted by Liang, Cunming, 10 hours ago

> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> Sent: Thursday, April 19, 2018 11:43 PM
> To: Liang, Cunming <cunming.liang@intel.com>
> Cc: Bie, Tiwei <tiwei.bie@intel.com>; jasowang@redhat.com;
> alex.williamson@redhat.com; pbonzini@redhat.com; stefanha@redhat.com;
> qemu-devel@nongnu.org; virtio-dev@lists.oasis-open.org; Daly, Dan
> <dan.daly@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> Subject: Re: [PATCH v3 6/6] vhost-user: support registering external host
> notifiers
> 
> On Thu, Apr 19, 2018 at 12:43:42PM +0000, Liang, Cunming wrote:
> >
> >
> > > -----Original Message-----
> > > From: Bie, Tiwei
> > > Sent: Thursday, April 19, 2018 7:15 PM
> > > To: Michael S. Tsirkin <mst@redhat.com>
> > > Cc: jasowang@redhat.com; alex.williamson@redhat.com;
> > > pbonzini@redhat.com; stefanha@redhat.com; qemu-devel@nongnu.org;
> > > virtio-dev@lists.oasis- open.org; Liang, Cunming
> > > <cunming.liang@intel.com>; Daly, Dan <dan.daly@intel.com>; Tan,
> > > Jianfeng <jianfeng.tan@intel.com>; Wang, Zhihong
> > > <zhihong.wang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>
> > > Subject: Re: [PATCH v3 6/6] vhost-user: support registering external
> > > host notifiers
> > >
> > > On Wed, Apr 18, 2018 at 07:34:06PM +0300, Michael S. Tsirkin wrote:
> > > > On Thu, Apr 12, 2018 at 11:12:32PM +0800, Tiwei Bie wrote:
> > > > > This patch introduces VHOST_USER_PROTOCOL_F_HOST_NOTIFIER.
> > > > > With this feature negotiated, vhost-user backend can register
> > > > > memory region based host notifiers. And it will allow the guest
> > > > > driver in the VM to notify the hardware accelerator at the
> > > > > vhost-user backend directly.
> > > > >
> > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > >
> > > > Overall I think we can merge this approach, but I have two main
> > > > concerns about this:
> > > >
> > > > 1. Testing. Most people do not have the virtio hardware
> > > >    so how to make sure this does not bit rot?
> > > >
> > > >    I have an idea: add an option like this to libvhost-user.
> > > >    Naturally libvhost-user can not get notified about a write
> > > >    to an mmapped area, but it can write a special value there
> > > >    (e.g. all-ones?) and then poll it to detect VQ # writes.
> > > >
> > > >    Then include a vhost user bridge test with an option like this.
> > > >
> > > >    I'd like to see a patch doing this.
> > >
> > > Sure, I'll do it. Thanks for the suggestion!
> > >
> > > >
> > > > 2. Memory barriers. Right now after updating the avail idx,
> > > >    virtio does smp_wmb() and then the MMIO write.
> > > >    Normal hardware drivers do wmb() which is an sfence.
> > > >    Can a PCI device read bypass index write and see a stale
> > > >    index value?
> > A compiler barrier is enough on strongly-ordered memory platform. As it
> doesn't re-order store, PCI device won't see a stale index value. But a weakly-
> ordered memory needs sfence.
> 
> 
> Oh you are right.
> 
> So it's only needed for non-intel platforms or when packets are in WC memory
> then. And I don't know whether dpdk ever puts packets in WC memory.

No, we haven't use WC memory.

> 
> I guess we'll cross this bridge when we get to it.
> 
> 
> > >
> > > It depends on arch's memory model. Cunming will provide more details
> > > about this later.
> > >
> > > >    To make virtio pci do wmb() we would need a new feature bit.
> > > >    Alternatively I guess we could maybe look at subsystem vendor/device id.
> > > >
> > > >    I'd like to see a patch doing one of these things.
> > >
> > > We prefer to add a new feature bit as it's a more robust way to do
> > > this. I'll send out some patches soon.
> > >
> > > Thank you very much! :)
> > >
> > > Best regards,
> > > Tiwei Bie
> > >
> > > >
> > > > Thanks!
> > > >
> > > > > ---
> > > > >  docs/interop/vhost-user.txt    |  33 +++++++++++
> > > > >  hw/virtio/vhost-user.c         | 123
> > > +++++++++++++++++++++++++++++++++++++++++
> > > > >  include/hw/virtio/vhost-user.h |   8 +++
> > > > >  3 files changed, 164 insertions(+)
> > > > >
> > > > > diff --git a/docs/interop/vhost-user.txt
> > > > > b/docs/interop/vhost-user.txt index 534caab18a..9e57b36b20
> > > > > 100644
> > > > > --- a/docs/interop/vhost-user.txt
> > > > > +++ b/docs/interop/vhost-user.txt
> > > > > @@ -132,6 +132,16 @@ Depending on the request type, payload can be:
> > > > >     Payload: Size bytes array holding the contents of the virtio
> > > > >         device's configuration space
> > > > >
> > > > > + * Vring area description
> > > > > +   -----------------------
> > > > > +   | u64 | size | offset |
> > > > > +   -----------------------
> > > > > +
> > > > > +   u64: a 64-bit integer contains vring index and flags
> > > > > +   Size: a 64-bit size of this area
> > > > > +   Offset: a 64-bit offset of this area from the start of the
> > > > > +       supplied file descriptor
> > > > > +
> > > > >  In QEMU the vhost-user message is implemented with the following
> struct:
> > > > >
> > > > >  typedef struct VhostUserMsg {
> > > > > @@ -146,6 +156,7 @@ typedef struct VhostUserMsg {
> > > > >          VhostUserLog log;
> > > > >          struct vhost_iotlb_msg iotlb;
> > > > >          VhostUserConfig config;
> > > > > +        VhostUserVringArea area;
> > > > >      };
> > > > >  } QEMU_PACKED VhostUserMsg;
> > > > >
> > > > > @@ -380,6 +391,7 @@ Protocol features  #define
> > > > > VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> > > > >  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
> > > > >  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> > > > > +#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  10
> > > > >
> > > > >  Master message types
> > > > >  --------------------
> > > > > @@ -777,6 +789,27 @@ Slave message types
> > > > >       the VHOST_USER_NEED_REPLY flag, master must respond with zero
> when
> > > > >       operation is successfully completed, or non-zero otherwise.
> > > > >
> > > > > + * VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG
> > > > > +
> > > > > +      Id: 3
> > > > > +      Equivalent ioctl: N/A
> > > > > +      Slave payload: vring area description
> > > > > +      Master payload: N/A
> > > > > +
> > > > > +      Sets host notifier for a specified queue. The queue index is contained
> > > > > +      in the u64 field of the vring area description. The host notifier is
> > > > > +      described by the file descriptor (typically it's a VFIO device fd) which
> > > > > +      is passed as ancillary data and the size (which is mmap size and should
> > > > > +      be the same as host page size) and offset (which is mmap offset)
> carried
> > > > > +      in the vring area description. QEMU can mmap the file descriptor
> based
> > > > > +      on the size and offset to get a memory range. Registering a host
> notifier
> > > > > +      means mapping this memory range to the VM as the
> > > > > + specified queue's
> > > notify
> > > > > +      MMIO region. Slave sends this request to tell QEMU to de-register
> the
> > > > > +      existing notifier if any and register the new notifier if the request is
> > > > > +      sent with a file descriptor.
> > > > > +      This request should be sent only when
> > > VHOST_USER_PROTOCOL_F_HOST_NOTIFIER
> > > > > +      protocol feature has been successfully negotiated.
> > > > > +
> > > > >  VHOST_USER_PROTOCOL_F_REPLY_ACK:
> > > > >  -------------------------------  The original vhost-user
> > > > > specification only demands replies for certain diff --git
> > > > > a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index
> > > > > 791e0a4763..1cd9c7276b 100644
> > > > > --- a/hw/virtio/vhost-user.c
> > > > > +++ b/hw/virtio/vhost-user.c
> > > > > @@ -13,6 +13,7 @@
> > > > >  #include "hw/virtio/vhost.h"
> > > > >  #include "hw/virtio/vhost-user.h"
> > > > >  #include "hw/virtio/vhost-backend.h"
> > > > > +#include "hw/virtio/virtio.h"
> > > > >  #include "hw/virtio/virtio-net.h"
> > > > >  #include "chardev/char-fe.h"
> > > > >  #include "sysemu/kvm.h"
> > > > > @@ -48,6 +49,7 @@ enum VhostUserProtocolFeature {
> > > > >      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> > > > >      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
> > > > >      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> > > > > +    VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 10,
> > > > >      VHOST_USER_PROTOCOL_F_MAX
> > > > >  };
> > > > >
> > > > > @@ -92,6 +94,7 @@ typedef enum VhostUserSlaveRequest {
> > > > >      VHOST_USER_SLAVE_NONE = 0,
> > > > >      VHOST_USER_SLAVE_IOTLB_MSG = 1,
> > > > >      VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
> > > > > +    VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG = 3,
> > > > >      VHOST_USER_SLAVE_MAX
> > > > >  }  VhostUserSlaveRequest;
> > > > >
> > > > > @@ -136,6 +139,12 @@ static VhostUserConfig c __attribute__
> ((unused));
> > > > >                                     + sizeof(c.size) \
> > > > >                                     + sizeof(c.flags))
> > > > >
> > > > > +typedef struct VhostUserVringArea {
> > > > > +    uint64_t u64;
> > > > > +    uint64_t size;
> > > > > +    uint64_t offset;
> > > > > +} VhostUserVringArea;
> > > > > +
> > > > >  typedef struct {
> > > > >      VhostUserRequest request;
> > > > >
> > > > > @@ -157,6 +166,7 @@ typedef union {
> > > > >          struct vhost_iotlb_msg iotlb;
> > > > >          VhostUserConfig config;
> > > > >          VhostUserCryptoSession session;
> > > > > +        VhostUserVringArea area;
> > > > >  } VhostUserPayload;
> > > > >
> > > > >  typedef struct VhostUserMsg {
> > > > > @@ -638,9 +648,37 @@ static int vhost_user_set_vring_num(struct
> > > vhost_dev *dev,
> > > > >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM,
> > > > > ring);  }
> > > > >
> > > > > +static void vhost_user_host_notifier_restore(struct vhost_dev *dev,
> > > > > +                                             int queue_idx) {
> > > > > +    struct vhost_user *u = dev->opaque;
> > > > > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > > > > +    VirtIODevice *vdev = dev->vdev;
> > > > > +
> > > > > +    if (n->addr && !n->set) {
> > > > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true);
> > > > > +        n->set = true;
> > > > > +    }
> > > > > +}
> > > > > +
> > > > > +static void vhost_user_host_notifier_remove(struct vhost_dev *dev,
> > > > > +                                            int queue_idx) {
> > > > > +    struct vhost_user *u = dev->opaque;
> > > > > +    VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
> > > > > +    VirtIODevice *vdev = dev->vdev;
> > > > > +
> > > > > +    if (n->addr && n->set) {
> > > > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr,
> false);
> > > > > +        n->set = false;
> > > > > +    }
> > > > > +}
> > > > > +
> > > > >  static int vhost_user_set_vring_base(struct vhost_dev *dev,
> > > > >                                       struct vhost_vring_state
> > > > > *ring)  {
> > > > > +    vhost_user_host_notifier_restore(dev, ring->index);
> > > > > +
> > > > >      return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE,
> > > > > ring); }
> > > > >
> > > > > @@ -674,6 +712,8 @@ static int vhost_user_get_vring_base(struct
> > > vhost_dev *dev,
> > > > >          .hdr.size = sizeof(msg.payload.state),
> > > > >      };
> > > > >
> > > > > +    vhost_user_host_notifier_remove(dev, ring->index);
> > > > > +
> > > > >      if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> > > > >          return -1;
> > > > >      }
> > > > > @@ -847,6 +887,76 @@ static int
> > > vhost_user_slave_handle_config_change(struct vhost_dev *dev)
> > > > >      return ret;
> > > > >  }
> > > > >
> > > > > +static int vhost_user_slave_handle_vring_host_notifier(struct
> > > > > +vhost_dev
> > > *dev,
> > > > > +                                                       VhostUserVringArea *area,
> > > > > +                                                       int fd) {
> > > > > +    int queue_idx = area->u64 & VHOST_USER_VRING_IDX_MASK;
> > > > > +    size_t page_size = qemu_real_host_page_size;
> > > > > +    struct vhost_user *u = dev->opaque;
> > > > > +    VhostUserState *user = u->user;
> > > > > +    VirtIODevice *vdev = dev->vdev;
> > > > > +    VhostUserHostNotifier *n;
> > > > > +    int ret = 0;
> > > > > +    void *addr;
> > > > > +    char *name;
> > > > > +
> > > > > +    if (!virtio_has_feature(dev->protocol_features,
> > > > > +                            VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) ||
> > > > > +        vdev == NULL || queue_idx >= virtio_get_num_queues(vdev)) {
> > > > > +        ret = -1;
> > > > > +        goto out;
> > > > > +    }
> > > > > +
> > > > > +    n = &user->notifier[queue_idx];
> > > > > +
> > > > > +    if (n->addr) {
> > > > > +        virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr,
> false);
> > > > > +        object_unparent(OBJECT(&n->mr));
> > > > > +        munmap(n->addr, page_size);
> > > > > +        n->addr = NULL;
> > > > > +    }
> > > > > +
> > > > > +    if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
> > > > > +        goto out;
> > > > > +    }
> > > > > +
> > > > > +    /* Sanity check. */
> > > > > +    if (area->size != page_size) {
> > > > > +        ret = -1;
> > > > > +        goto out;
> > > > > +    }
> > > > > +
> > > > > +    addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
> MAP_SHARED,
> > > > > +                fd, area->offset);
> > > > > +    if (addr == MAP_FAILED) {
> > > > > +        ret = -1;
> > > > > +        goto out;
> > > > > +    }
> > > > > +
> > > > > +    name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
> > > > > +                           user, queue_idx);
> > > > > +    memory_region_init_ram_device_ptr(&n->mr, OBJECT(vdev), name,
> > > > > +                                      page_size, addr);
> > > > > +    g_free(name);
> > > > > +
> > > > > +    if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr,
> true)) {
> > > > > +        munmap(addr, page_size);
> > > > > +        ret = -1;
> > > > > +        goto out;
> > > > > +    }
> > > > > +
> > > > > +    n->addr = addr;
> > > > > +    n->set = true;
> > > > > +
> > > > > +out:
> > > > > +    /* Always close the fd. */
> > > > > +    if (fd != -1) {
> > > > > +        close(fd);
> > > > > +    }
> > > > > +    return ret;
> > > > > +}
> > > > > +
> > > > >  static void slave_read(void *opaque)  {
> > > > >      struct vhost_dev *dev = opaque; @@ -913,6 +1023,10 @@
> > > > > static void slave_read(void *opaque)
> > > > >      case VHOST_USER_SLAVE_CONFIG_CHANGE_MSG :
> > > > >          ret = vhost_user_slave_handle_config_change(dev);
> > > > >          break;
> > > > > +    case VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG:
> > > > > +        ret = vhost_user_slave_handle_vring_host_notifier(dev,
> &payload.area,
> > > > > +                                                          fd);
> > > > > +        break;
> > > > >      default:
> > > > >          error_report("Received unexpected msg type.");
> > > > >          if (fd != -1) {
> > > > > @@ -1641,6 +1755,15 @@ VhostUserState *vhost_user_init(void)
> > > > >
> > > > >  void vhost_user_cleanup(VhostUserState *user)  {
> > > > > +    int i;
> > > > > +
> > > > > +    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> > > > > +        if (user->notifier[i].addr) {
> > > > > +            object_unparent(OBJECT(&user->notifier[i].mr));
> > > > > +            munmap(user->notifier[i].addr, qemu_real_host_page_size);
> > > > > +            user->notifier[i].addr = NULL;
> > > > > +        }
> > > > > +    }
> > > > >  }
> > > > >
> > > > >  const VhostOps user_ops = {
> > > > > diff --git a/include/hw/virtio/vhost-user.h
> > > > > b/include/hw/virtio/vhost-user.h index eb8bc0d90d..fd660393a0
> > > > > 100644
> > > > > --- a/include/hw/virtio/vhost-user.h
> > > > > +++ b/include/hw/virtio/vhost-user.h
> > > > > @@ -9,9 +9,17 @@
> > > > >  #define HW_VIRTIO_VHOST_USER_H
> > > > >
> > > > >  #include "chardev/char-fe.h"
> > > > > +#include "hw/virtio/virtio.h"
> > > > > +
> > > > > +typedef struct VhostUserHostNotifier {
> > > > > +    MemoryRegion mr;
> > > > > +    void *addr;
> > > > > +    bool set;
> > > > > +} VhostUserHostNotifier;
> > > > >
> > > > >  typedef struct VhostUserState {
> > > > >      CharBackend *chr;
> > > > > +    VhostUserHostNotifier notifier[VIRTIO_QUEUE_MAX];
> > > > >  } VhostUserState;
> > > > >
> > > > >  VhostUserState *vhost_user_init(void);
> > > > > --
> > > > > 2.11.0