From nobody Mon Feb 9 05:41:12 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (208.118.235.17 [208.118.235.17]) by mx.zohomail.com with SMTPS id 1520540114021533.039911480966; Thu, 8 Mar 2018 12:15:14 -0800 (PST) Received: from localhost ([::1]:41617 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eu1wQ-00015q-3E for importer@patchew.org; Thu, 08 Mar 2018 15:15:06 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47399) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eu1gT-0003JI-QF for qemu-devel@nongnu.org; Thu, 08 Mar 2018 14:58:39 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eu1gS-0006Ct-EF for qemu-devel@nongnu.org; Thu, 08 Mar 2018 14:58:37 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:41462 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eu1gS-0006CS-8U for qemu-devel@nongnu.org; Thu, 08 Mar 2018 14:58:36 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B87E58182D24 for ; Thu, 8 Mar 2018 19:58:35 +0000 (UTC) Received: from dgilbert-t530.redhat.com (ovpn-117-232.ams2.redhat.com [10.36.117.232]) by smtp.corp.redhat.com (Postfix) with ESMTP id 82D26202322A; Thu, 8 Mar 2018 19:58:34 +0000 (UTC) From: "Dr. David Alan Gilbert (git)" To: qemu-devel@nongnu.org, mst@redhat.com, maxime.coquelin@redhat.com, marcandre.lureau@redhat.com, peterx@redhat.com, quintela@redhat.com Date: Thu, 8 Mar 2018 19:57:57 +0000 Message-Id: <20180308195811.24894-16-dgilbert@redhat.com> In-Reply-To: <20180308195811.24894-1-dgilbert@redhat.com> References: <20180308195811.24894-1-dgilbert@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Thu, 08 Mar 2018 19:58:35 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Thu, 08 Mar 2018 19:58:35 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'dgilbert@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v4 15/29] vhost+postcopy: Send address back to qemu X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: aarcange@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: "Dr. David Alan Gilbert" We need a better way, but at the moment we need the address of the mappings sent back to qemu so it can interpret the messages on the userfaultfd it reads. This is done as a 3 stage set: QEMU -> client set_mem_table mmap stuff, get addresses client -> qemu here are the addresses qemu -> client OK - now you can use them That ensures that qemu has registered the new addresses in it's userfault code before the client starts accessing them. Note: We don't ask for the default 'ack' reply since we've got our own. Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Marc-Andr=C3=A9 Lureau --- contrib/libvhost-user/libvhost-user.c | 24 ++++++++++++- docs/interop/vhost-user.txt | 9 +++++ hw/virtio/trace-events | 1 + hw/virtio/vhost-user.c | 67 +++++++++++++++++++++++++++++++= ++-- 4 files changed, 98 insertions(+), 3 deletions(-) diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/= libvhost-user.c index a18bc74a7c..e02e5d6f46 100644 --- a/contrib/libvhost-user/libvhost-user.c +++ b/contrib/libvhost-user/libvhost-user.c @@ -491,10 +491,32 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserM= sg *vmsg) dev_region->mmap_addr); } =20 + /* Return the address to QEMU so that it can translate the ufd + * fault addresses back. + */ + msg_region->userspace_addr =3D (uintptr_t)(mmap_addr + + dev_region->mmap_offset); close(vmsg->fds[i]); } =20 - /* TODO: Get address back to QEMU */ + /* Send the message back to qemu with the addresses filled in */ + vmsg->fd_num =3D 0; + if (!vu_message_write(dev, dev->sock, vmsg)) { + vu_panic(dev, "failed to respond to set-mem-table for postcopy"); + return false; + } + + /* Wait for QEMU to confirm that it's registered the handler for the + * faults. + */ + if (!vu_message_read(dev, dev->sock, vmsg) || + vmsg->size !=3D sizeof(vmsg->payload.u64) || + vmsg->payload.u64 !=3D 0) { + vu_panic(dev, "failed to receive valid ack for postcopy set-mem-ta= ble"); + return false; + } + + /* OK, now we can go and register the memory and generate faults */ for (i =3D 0; i < dev->nregions; i++) { VuDevRegion *dev_region =3D &dev->regions[i]; #ifdef UFFDIO_REGISTER diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt index 7cc7006ef3..cc049196c9 100644 --- a/docs/interop/vhost-user.txt +++ b/docs/interop/vhost-user.txt @@ -455,12 +455,21 @@ Master message types Id: 5 Equivalent ioctl: VHOST_SET_MEM_TABLE Master payload: memory regions description + Slave payload: (postcopy only) memory regions description =20 Sets the memory map regions on the slave so it can translate the vri= ng addresses. In the ancillary data there is an array of file descripto= rs for each memory mapped region. The size and ordering of the fds matc= hes the number and ordering of memory regions. =20 + When VHOST_USER_POSTCOPY_LISTEN has been received, SET_MEM_TABLE rep= lies with + the bases of the memory mapped regions to the master. It must have = mmap'd + the regions but not yet accessed them and should not yet generate a = userfault + event. Note NEED_REPLY_MASK is not set in this case. + QEMU will then reply back to the list of mappings with an empty + VHOST_USER_SET_MEM_TABLE as an acknolwedgment; only upon reception o= f this + message may the guest start accessing the memory and generating faul= ts. + * VHOST_USER_SET_LOG_BASE =20 Id: 6 diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events index 06ec03d6e7..05d18ada77 100644 --- a/hw/virtio/trace-events +++ b/hw/virtio/trace-events @@ -8,6 +8,7 @@ vhost_section(const char *name, int r) "%s:%d" =20 # hw/virtio/vhost-user.c vhost_user_postcopy_listen(void) "" +vhost_user_set_mem_table_postcopy(uint64_t client_addr, uint64_t qhva, int= reply_i, int region_i) "client:0x%"PRIx64" for hva: 0x%"PRIx64" reply %d r= egion %d" =20 # hw/virtio/virtio.c virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned o= ut_num) "elem %p size %zd in_num %u out_num %u" diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 311addc33b..6875f729e8 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -174,6 +174,7 @@ struct vhost_user { int slave_fd; NotifierWithReturn postcopy_notifier; struct PostCopyFD postcopy_fd; + uint64_t postcopy_client_bases[VHOST_MEMORY_MAX_NREGIONS]; /* True once we've entered postcopy_listen */ bool postcopy_listen; }; @@ -343,12 +344,15 @@ static int vhost_user_set_log_base(struct vhost_dev *= dev, uint64_t base, static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, struct vhost_memory *mem) { + struct vhost_user *u =3D dev->opaque; int fds[VHOST_MEMORY_MAX_NREGIONS]; int i, fd; size_t fd_num =3D 0; bool reply_supported =3D virtio_has_feature(dev->protocol_features, VHOST_USER_PROTOCOL_F_REPLY_= ACK); - /* TODO: Add actual postcopy differences */ + VhostUserMsg msg_reply; + int region_i, msg_i; + VhostUserMsg msg =3D { .hdr.request =3D VHOST_USER_SET_MEM_TABLE, .hdr.flags =3D VHOST_USER_VERSION, @@ -395,6 +399,64 @@ static int vhost_user_set_mem_table_postcopy(struct vh= ost_dev *dev, return -1; } =20 + if (vhost_user_read(dev, &msg_reply) < 0) { + return -1; + } + + if (msg_reply.hdr.request !=3D VHOST_USER_SET_MEM_TABLE) { + error_report("%s: Received unexpected msg type." + "Expected %d received %d", __func__, + VHOST_USER_SET_MEM_TABLE, msg_reply.hdr.request); + return -1; + } + /* We're using the same structure, just reusing one of the + * fields, so it should be the same size. + */ + if (msg_reply.hdr.size !=3D msg.hdr.size) { + error_report("%s: Unexpected size for postcopy reply " + "%d vs %d", __func__, msg_reply.hdr.size, msg.hdr.siz= e); + return -1; + } + + memset(u->postcopy_client_bases, 0, + sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS); + + /* They're in the same order as the regions that were sent + * but some of the regions were skipped (above) if they + * didn't have fd's + */ + for (msg_i =3D 0, region_i =3D 0; + region_i < dev->mem->nregions; + region_i++) { + if (msg_i < fd_num && + msg_reply.payload.memory.regions[msg_i].guest_phys_addr =3D=3D + dev->mem->regions[region_i].guest_phys_addr) { + u->postcopy_client_bases[region_i] =3D + msg_reply.payload.memory.regions[msg_i].userspace_addr; + trace_vhost_user_set_mem_table_postcopy( + msg_reply.payload.memory.regions[msg_i].userspace_addr, + msg.payload.memory.regions[msg_i].userspace_addr, + msg_i, region_i); + msg_i++; + } + } + if (msg_i !=3D fd_num) { + error_report("%s: postcopy reply not fully consumed " + "%d vs %zd", + __func__, msg_i, fd_num); + return -1; + } + /* Now we've registered this with the postcopy code, we ack to the cli= ent, + * because now we're in the position to be able to deal with any faults + * it generates. + */ + /* TODO: Use this for failure cases as well with a bad value */ + msg.hdr.size =3D sizeof(msg.payload.u64); + msg.payload.u64 =3D 0; /* OK */ + if (vhost_user_write(dev, &msg, NULL, 0) < 0) { + return -1; + } + if (reply_supported) { return process_message_reply(dev, &msg); } @@ -411,7 +473,8 @@ static int vhost_user_set_mem_table(struct vhost_dev *d= ev, size_t fd_num =3D 0; bool do_postcopy =3D u->postcopy_listen && u->postcopy_fd.handler; bool reply_supported =3D virtio_has_feature(dev->protocol_features, - VHOST_USER_PROTOCOL_F_REPLY_= ACK); + VHOST_USER_PROTOCOL_F_REPLY_ACK)= && + !do_postcopy; =20 if (do_postcopy) { /* Postcopy has enough differences that it's best done in it's own --=20 2.14.3