From nobody Mon Feb 9 02:51:08 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 150360353598691.00360253289148; Thu, 24 Aug 2017 12:38:55 -0700 (PDT) Received: from localhost ([::1]:50161 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dkxxu-0003C7-OQ for importer@patchew.org; Thu, 24 Aug 2017 15:38:54 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52273) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dkxnq-0002Ly-EV for qemu-devel@nongnu.org; Thu, 24 Aug 2017 15:28:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dkxno-0004ed-Kh for qemu-devel@nongnu.org; Thu, 24 Aug 2017 15:28:30 -0400 Received: from mx1.redhat.com ([209.132.183.28]:50748) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dkxno-0004dc-BL for qemu-devel@nongnu.org; Thu, 24 Aug 2017 15:28:28 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4FF117EA96; Thu, 24 Aug 2017 19:28:27 +0000 (UTC) Received: from dgilbert-t530.redhat.com (ovpn-117-165.ams2.redhat.com [10.36.117.165]) by smtp.corp.redhat.com (Postfix) with ESMTP id C987A17CC4; Thu, 24 Aug 2017 19:28:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 4FF117EA96 Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=dgilbert@redhat.com From: "Dr. David Alan Gilbert (git)" To: qemu-devel@nongnu.org, maxime.coquelin@redhat.com, a.perevalov@samsung.com, mst@redhat.com, marcandre.lureau@redhat.com Date: Thu, 24 Aug 2017 20:27:10 +0100 Message-Id: <20170824192730.8440-13-dgilbert@redhat.com> In-Reply-To: <20170824192730.8440-1-dgilbert@redhat.com> References: <20170824192730.8440-1-dgilbert@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Thu, 24 Aug 2017 19:28:27 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [RFC v2 12/32] postcopy: Allow registering of fd handler X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: lvivier@redhat.com, aarcange@redhat.com, felipe@nutanix.com, peterx@redhat.com, quintela@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: "Dr. David Alan Gilbert" Allow other userfaultfd's to be registered into the fault thread so that handlers for shared memory can get responses. Signed-off-by: Dr. David Alan Gilbert --- migration/migration.c | 3 + migration/migration.h | 2 + migration/postcopy-ram.c | 212 +++++++++++++++++++++++++++++++++++--------= ---- migration/postcopy-ram.h | 21 +++++ migration/trace-events | 2 + 5 files changed, 186 insertions(+), 54 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index c3fe0ed9ca..2c43d730e2 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -144,6 +144,8 @@ MigrationIncomingState *migration_incoming_get_current(= void) if (!once) { mis_current.state =3D MIGRATION_STATUS_NONE; memset(&mis_current, 0, sizeof(MigrationIncomingState)); + mis_current.postcopy_remote_fds =3D g_array_new(FALSE, TRUE, + sizeof(struct PostCopyF= D)); qemu_mutex_init(&mis_current.rp_mutex); qemu_event_init(&mis_current.main_thread_load_event, false); once =3D true; @@ -166,6 +168,7 @@ void migration_incoming_state_destroy(void) qemu_fclose(mis->from_src_file); mis->from_src_file =3D NULL; } + g_array_free(mis->postcopy_remote_fds, TRUE); =20 qemu_event_destroy(&mis->main_thread_load_event); } diff --git a/migration/migration.h b/migration/migration.h index 148c9facbc..9fcea6bb25 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -48,6 +48,8 @@ struct MigrationIncomingState { QemuMutex rp_mutex; /* We send replies from multiple threads */ void *postcopy_tmp_page; void *postcopy_tmp_zero_page; + /* PostCopyFD's for external userfaultfds & handlers of shared memory = */ + GArray *postcopy_remote_fds; =20 QEMUBH *bh; =20 diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index 95007c00ef..faee7708ff 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -466,29 +466,43 @@ static void *postcopy_ram_fault_thread(void *opaque) MigrationIncomingState *mis =3D opaque; struct uffd_msg msg; int ret; + size_t index; RAMBlock *rb =3D NULL; RAMBlock *last_rb =3D NULL; /* last RAMBlock we sent part of */ =20 trace_postcopy_ram_fault_thread_entry(); qemu_sem_post(&mis->fault_thread_sem); =20 + struct pollfd *pfd; + size_t pfd_len =3D 2 + mis->postcopy_remote_fds->len; + + pfd =3D g_new0(struct pollfd, pfd_len); + + pfd[0].fd =3D mis->userfault_fd; + pfd[0].events =3D POLLIN; + pfd[1].fd =3D mis->userfault_quit_fd; + pfd[1].events =3D POLLIN; /* Waiting for eventfd to go positive */ + trace_postcopy_ram_fault_thread_fds_core(pfd[0].fd, pfd[1].fd); + for (index =3D 0; index < mis->postcopy_remote_fds->len; index++) { + struct PostCopyFD *pcfd =3D &g_array_index(mis->postcopy_remote_fd= s, + struct PostCopyFD, index); + pfd[2 + index].fd =3D pcfd->fd; + pfd[2 + index].events =3D POLLIN; + trace_postcopy_ram_fault_thread_fds_extra(2 + index, pcfd->idstr, + pcfd->fd); + } + while (true) { ram_addr_t rb_offset; - struct pollfd pfd[2]; + int poll_result; =20 /* * We're mainly waiting for the kernel to give us a faulting HVA, * however we can be told to quit via userfault_quit_fd which is * an eventfd */ - pfd[0].fd =3D mis->userfault_fd; - pfd[0].events =3D POLLIN; - pfd[0].revents =3D 0; - pfd[1].fd =3D mis->userfault_quit_fd; - pfd[1].events =3D POLLIN; /* Waiting for eventfd to go positive */ - pfd[1].revents =3D 0; - - if (poll(pfd, 2, -1 /* Wait forever */) =3D=3D -1) { + poll_result =3D poll(pfd, pfd_len, -1 /* Wait forever */); + if (poll_result =3D=3D -1) { error_report("%s: userfault poll: %s", __func__, strerror(errn= o)); break; } @@ -498,57 +512,118 @@ static void *postcopy_ram_fault_thread(void *opaque) break; } =20 - ret =3D read(mis->userfault_fd, &msg, sizeof(msg)); - if (ret !=3D sizeof(msg)) { - if (errno =3D=3D EAGAIN) { - /* - * if a wake up happens on the other thread just after - * the poll, there is nothing to read. - */ - continue; + if (pfd[0].revents) { + poll_result--; + ret =3D read(mis->userfault_fd, &msg, sizeof(msg)); + if (ret !=3D sizeof(msg)) { + if (errno =3D=3D EAGAIN) { + /* + * if a wake up happens on the other thread just after + * the poll, there is nothing to read. + */ + continue; + } + if (ret < 0) { + error_report("%s: Failed to read full userfault " + "message: %s", + __func__, strerror(errno)); + break; + } else { + error_report("%s: Read %d bytes from userfaultfd " + "expected %zd", + __func__, ret, sizeof(msg)); + break; /* Lost alignment, don't know what we'd read ne= xt */ + } + } + if (msg.event !=3D UFFD_EVENT_PAGEFAULT) { + error_report("%s: Read unexpected event %ud from userfault= fd", + __func__, msg.event); + continue; /* It's not a page fault, shouldn't happen */ } - if (ret < 0) { - error_report("%s: Failed to read full userfault message: %= s", - __func__, strerror(errno)); + + rb =3D qemu_ram_block_from_host( + (void *)(uintptr_t)msg.arg.pagefault.address, + true, &rb_offset); + if (!rb) { + error_report("postcopy_ram_fault_thread: Fault outside gue= st: %" + PRIx64, (uint64_t)msg.arg.pagefault.address); break; - } else { - error_report("%s: Read %d bytes from userfaultfd expected = %zd", - __func__, ret, sizeof(msg)); - break; /* Lost alignment, don't know what we'd read next */ } - } - if (msg.event !=3D UFFD_EVENT_PAGEFAULT) { - error_report("%s: Read unexpected event %ud from userfaultfd", - __func__, msg.event); - continue; /* It's not a page fault, shouldn't happen */ - } =20 - rb =3D qemu_ram_block_from_host( - (void *)(uintptr_t)msg.arg.pagefault.address, - true, &rb_offset); - if (!rb) { - error_report("postcopy_ram_fault_thread: Fault outside guest: = %" - PRIx64, (uint64_t)msg.arg.pagefault.address); - break; - } + rb_offset &=3D ~(qemu_ram_pagesize(rb) - 1); + trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.addr= ess, + qemu_ram_get_idstr(rb), + rb_offset); =20 - rb_offset &=3D ~(qemu_ram_pagesize(rb) - 1); - trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address, - qemu_ram_get_idstr(rb), - rb_offset); + /* + * Send the request to the source - we want to request one + * of our host page sizes (which is >=3D TPS) + */ + if (rb !=3D last_rb) { + last_rb =3D rb; + migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb), + rb_offset, qemu_ram_pagesize(rb)); + } else { + /* Save some space */ + migrate_send_rp_req_pages(mis, NULL, + rb_offset, qemu_ram_pagesize(rb)); + } + } =20 - /* - * Send the request to the source - we want to request one - * of our host page sizes (which is >=3D TPS) - */ - if (rb !=3D last_rb) { - last_rb =3D rb; - migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb), - rb_offset, qemu_ram_pagesize(rb)); - } else { - /* Save some space */ - migrate_send_rp_req_pages(mis, NULL, - rb_offset, qemu_ram_pagesize(rb)); + /* Now handle any requests from external processes on shared memor= y */ + /* TODO: May need to handle devices deregistering during postcopy = */ + for (index =3D 2; index < pfd_len && poll_result; index++) { + if (pfd[index].revents) { + struct PostCopyFD *pcfd =3D + &g_array_index(mis->postcopy_remote_fds, + struct PostCopyFD, index - 2); + + poll_result--; + if (pfd[index].revents & POLLERR) { + error_report("%s: POLLERR on poll %zd fd=3D%d", + __func__, index, pcfd->fd); + pfd[index].events =3D 0; + continue; + } + + ret =3D read(pcfd->fd, &msg, sizeof(msg)); + if (ret !=3D sizeof(msg)) { + if (errno =3D=3D EAGAIN) { + /* + * if a wake up happens on the other thread just a= fter + * the poll, there is nothing to read. + */ + continue; + } + if (ret < 0) { + error_report("%s: Failed to read full userfault " + "message: %s (shared) revents=3D%d", + __func__, strerror(errno), + pfd[index].revents); + /*TODO: Could just disable this sharer */ + break; + } else { + error_report("%s: Read %d bytes from userfaultfd " + "expected %zd (shared)", + __func__, ret, sizeof(msg)); + /*TODO: Could just disable this sharer */ + break; /*Lost alignment,don't know what we'd read = next*/ + } + } + if (msg.event !=3D UFFD_EVENT_PAGEFAULT) { + error_report("%s: Read unexpected event %ud " + "from userfaultfd (shared)", + __func__, msg.event); + continue; /* It's not a page fault, shouldn't happen */ + } + /* Call the device handler registered with us */ + ret =3D pcfd->handler(pcfd, &msg); + if (ret) { + error_report("%s: Failed to resolve shared fault on %z= d/%s", + __func__, index, pcfd->idstr); + /* TODO: Fail? Disable this sharer? */ + } + } } } trace_postcopy_ram_fault_thread_exit(); @@ -878,3 +953,32 @@ PostcopyState postcopy_state_set(PostcopyState new_sta= te) { return atomic_xchg(&incoming_postcopy_state, new_state); } + +/* Register a handler for external shared memory postcopy + * called on the destination. + */ +void postcopy_register_shared_ufd(struct PostCopyFD *pcfd) +{ + MigrationIncomingState *mis =3D migration_incoming_get_current(); + + mis->postcopy_remote_fds =3D g_array_append_val(mis->postcopy_remote_f= ds, + *pcfd); +} + +/* Unregister a handler for external shared memory postcopy + */ +void postcopy_unregister_shared_ufd(struct PostCopyFD *pcfd) +{ + guint i; + MigrationIncomingState *mis =3D migration_incoming_get_current(); + GArray *pcrfds =3D mis->postcopy_remote_fds; + + for (i =3D 0; i < pcrfds->len; i++) { + struct PostCopyFD *cur =3D &g_array_index(pcrfds, struct PostCopyF= D, i); + if (cur->fd =3D=3D pcfd->fd) { + mis->postcopy_remote_fds =3D g_array_remove_index(pcrfds, i); + return; + } + } +} + diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h index 70d4b09659..ba8a8ffec5 100644 --- a/migration/postcopy-ram.h +++ b/migration/postcopy-ram.h @@ -141,4 +141,25 @@ void postcopy_remove_notifier(NotifierWithReturn *n); /* Call the notifier list set by postcopy_add_start_notifier */ int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp); =20 +struct PostCopyFD; + +/* ufd is a pointer to the struct uffd_msg *TODO: more Portable! */ +typedef int (*pcfdhandler)(struct PostCopyFD *pcfd, void *ufd); + +struct PostCopyFD { + int fd; + /* Data to pass to handler */ + void *data; + /* Handler to be called whenever we get a poll event */ + pcfdhandler handler; + /* A string to use in error messages */ + char *idstr; +}; + +/* Register a userfaultfd owned by an external process for + * shared memory. + */ +void postcopy_register_shared_ufd(struct PostCopyFD *pcfd); +void postcopy_unregister_shared_ufd(struct PostCopyFD *pcfd); + #endif diff --git a/migration/trace-events b/migration/trace-events index 7a3b5144ff..23f4e5339b 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -189,6 +189,8 @@ postcopy_place_page_zero(void *host_addr) "host=3D%p" postcopy_ram_enable_notify(void) "" postcopy_ram_fault_thread_entry(void) "" postcopy_ram_fault_thread_exit(void) "" +postcopy_ram_fault_thread_fds_core(int baseufd, int quitfd) "ufd: %d quitf= d: %d" +postcopy_ram_fault_thread_fds_extra(size_t index, const char *name, int fd= ) "%zd/%s: %d" postcopy_ram_fault_thread_quit(void) "" postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock,= size_t offset) "Request for HVA=3D0x%" PRIx64 " rb=3D%s offset=3D0x%zx" postcopy_ram_incoming_cleanup_closeuf(void) "" --=20 2.13.5