From nobody Mon Feb 9 02:34:55 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1565748251; cv=none; d=zoho.com; s=zohoarc; b=cSuadTNKQw2tdFGeJT0Fpa7KZzmqSPBkm6pMMzBeMmqMt2wEA3y4Du0ylUfBbpiPYOA2vAsXDBjXWBIG9RQPzGs3LILeDv8SiFLVqBNrjadEB2L/b7SPf9pKzwODiYVtjm4b1mVkS760Ar0Z29Z3ZeBWrnnP5t1Z9X0xxRtQp4o= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1565748251; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results; bh=vBWGrWHUb85ofv3y9yDkvJ4l8OPCOkX2KVzvODQUvik=; b=BmkVYQ/KCMNnlr7pIbvDBUBWTJoO847H1avcUMzR4I2SA9A5T3lbSceMN6A0TUfcSCybt+7/c44NQEgV+0KRTor0HKrIzh/InSyOoyC1+knBaAjQ7RNiFyidhFS7WbK9LqtTobjEXJfInTwvXxUrhFeQaMNkoOH8cEjejycHay4= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.gnu.org (209.51.188.17 [209.51.188.17]) by mx.zohomail.com with SMTPS id 1565748251464825.6237954059083; Tue, 13 Aug 2019 19:04:11 -0700 (PDT) Received: from localhost ([::1]:56584 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1hxidx-0002o2-D9 for importer@patchew.org; Tue, 13 Aug 2019 22:04:05 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:50944) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1hxicQ-0001Ev-LF for qemu-devel@nongnu.org; Tue, 13 Aug 2019 22:02:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hxicP-0005Ft-Fw for qemu-devel@nongnu.org; Tue, 13 Aug 2019 22:02:30 -0400 Received: from mx1.redhat.com ([209.132.183.28]:37805) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hxicN-0005FQ-Iv for qemu-devel@nongnu.org; Tue, 13 Aug 2019 22:02:28 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 183BB3082B67 for ; Wed, 14 Aug 2019 02:02:25 +0000 (UTC) Received: from localhost.localdomain (ovpn-117-78.ams2.redhat.com [10.36.117.78]) by smtp.corp.redhat.com (Postfix) with ESMTP id 255064AD; Wed, 14 Aug 2019 02:02:23 +0000 (UTC) From: Juan Quintela To: qemu-devel@nongnu.org Date: Wed, 14 Aug 2019 04:02:13 +0200 Message-Id: <20190814020218.1868-2-quintela@redhat.com> In-Reply-To: <20190814020218.1868-1-quintela@redhat.com> References: <20190814020218.1868-1-quintela@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.45]); Wed, 14 Aug 2019 02:02:25 +0000 (UTC) Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH 1/6] migration: Add traces for multifd terminate threads X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Dr. David Alan Gilbert" , Juan Quintela Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Type: text/plain; charset="utf-8" Signed-off-by: Juan Quintela Reviewed-by: Dr. David Alan Gilbert Reviewed-by: Philippe Mathieu-Daud=C3=A9 --- migration/ram.c | 4 ++++ migration/trace-events | 2 ++ 2 files changed, 6 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index 889148dd84..ca11d43e30 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -996,6 +996,8 @@ static void multifd_send_terminate_threads(Error *err) { int i; =20 + trace_multifd_send_terminate_threads(err !=3D NULL); + if (err) { MigrationState *s =3D migrate_get_current(); migrate_set_error(s, err); @@ -1254,6 +1256,8 @@ static void multifd_recv_terminate_threads(Error *err) { int i; =20 + trace_multifd_recv_terminate_threads(err !=3D NULL); + if (err) { MigrationState *s =3D migrate_get_current(); migrate_set_error(s, err); diff --git a/migration/trace-events b/migration/trace-events index d8e54c367a..886ce70ca0 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -85,12 +85,14 @@ multifd_recv(uint8_t id, uint64_t packet_num, uint32_t = used, uint32_t flags, uin multifd_recv_sync_main(long packet_num) "packet num %ld" multifd_recv_sync_main_signal(uint8_t id) "channel %d" multifd_recv_sync_main_wait(uint8_t id) "channel %d" +multifd_recv_terminate_threads(bool error) "error %d" multifd_recv_thread_end(uint8_t id, uint64_t packets, uint64_t pages) "cha= nnel %d packets %" PRIu64 " pages %" PRIu64 multifd_recv_thread_start(uint8_t id) "%d" multifd_send(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flag= s, uint32_t next_packet_size) "channel %d packet_num %" PRIu64 " pages %d f= lags 0x%x next packet size %d" multifd_send_sync_main(long packet_num) "packet num %ld" multifd_send_sync_main_signal(uint8_t id) "channel %d" multifd_send_sync_main_wait(uint8_t id) "channel %d" +multifd_send_terminate_threads(bool error) "error %d" multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t pages) "cha= nnel %d packets %" PRIu64 " pages %" PRIu64 multifd_send_thread_start(uint8_t id) "%d" ram_discard_range(const char *rbname, uint64_t start, size_t len) "%s: sta= rt: %" PRIx64 " %zx" --=20 2.21.0 From nobody Mon Feb 9 02:34:55 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1565748247; cv=none; d=zoho.com; s=zohoarc; b=D8o9xByhoXE3pGKdN8jElzszLIkTdF69Q8a/3oJCREUMRdYvIv0svLfwm/m8rkSwJlWpkmM4TGSUOsaAPabNP0ijL7AWyN/hHkYnZm053lNDrmfB72+KEaDc8LTbaaIkROr3BIEM4LZZ0Fhnv9poBGhbgQYU4POSHbqfTsY2/UQ= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1565748247; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results; bh=50Fo1n/E52PPtrg3vuNDsHwgfbREcsmzEcLI4M4xe4Y=; b=XdMCzSemeIoRbPORkUaUPe+Oz4sc1I7G3ZLs3wX3mesyY4UgGFadTfysmgACq+NgfsK5r35FZe2UYfcZTlrwWabd8gnemuKUwzzJQdUZ2KJncAB6qw2mp97y80tPamLWQ5N4RCFkZpvGs4m53V9iS1KUT1sNSiX4a8CpD3KPf5U= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1565748247781360.86955290244737; Tue, 13 Aug 2019 19:04:07 -0700 (PDT) Received: from localhost ([::1]:56586 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1hxidy-0002oR-Lz for importer@patchew.org; Tue, 13 Aug 2019 22:04:06 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:50945) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1hxicQ-0001Ew-JJ for qemu-devel@nongnu.org; Tue, 13 Aug 2019 22:02:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hxicP-0005Gb-Gf for qemu-devel@nongnu.org; Tue, 13 Aug 2019 22:02:30 -0400 Received: from mx1.redhat.com ([209.132.183.28]:33916) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hxicO-0005Fa-LU for qemu-devel@nongnu.org; Tue, 13 Aug 2019 22:02:29 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5F515811BF for ; Wed, 14 Aug 2019 02:02:26 +0000 (UTC) Received: from localhost.localdomain (ovpn-117-78.ams2.redhat.com [10.36.117.78]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6D0167AB43; Wed, 14 Aug 2019 02:02:25 +0000 (UTC) From: Juan Quintela To: qemu-devel@nongnu.org Date: Wed, 14 Aug 2019 04:02:14 +0200 Message-Id: <20190814020218.1868-3-quintela@redhat.com> In-Reply-To: <20190814020218.1868-1-quintela@redhat.com> References: <20190814020218.1868-1-quintela@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Wed, 14 Aug 2019 02:02:26 +0000 (UTC) Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH 2/6] migration: Make global sem_sync semaphore by channel X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Dr. David Alan Gilbert" , Juan Quintela Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Type: text/plain; charset="utf-8" This makes easy to debug things because when you want for all threads to arrive at that semaphore, you know which one your are waiting for. Signed-off-by: Juan Quintela Reviewed-by: Dr. David Alan Gilbert --- migration/ram.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index ca11d43e30..4bdd201a4e 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -661,6 +661,8 @@ typedef struct { uint64_t num_packets; /* pages sent through this channel */ uint64_t num_pages; + /* syncs main thread and channels */ + QemuSemaphore sem_sync; } MultiFDSendParams; =20 typedef struct { @@ -896,8 +898,6 @@ struct { MultiFDSendParams *params; /* array of pages to sent */ MultiFDPages_t *pages; - /* syncs main thread and channels */ - QemuSemaphore sem_sync; /* global number of generated multifd packets */ uint64_t packet_num; /* send channels ready */ @@ -1038,6 +1038,7 @@ void multifd_save_cleanup(void) p->c =3D NULL; qemu_mutex_destroy(&p->mutex); qemu_sem_destroy(&p->sem); + qemu_sem_destroy(&p->sem_sync); g_free(p->name); p->name =3D NULL; multifd_pages_clear(p->pages); @@ -1047,7 +1048,6 @@ void multifd_save_cleanup(void) p->packet =3D NULL; } qemu_sem_destroy(&multifd_send_state->channels_ready); - qemu_sem_destroy(&multifd_send_state->sem_sync); g_free(multifd_send_state->params); multifd_send_state->params =3D NULL; multifd_pages_clear(multifd_send_state->pages); @@ -1092,7 +1092,7 @@ static void multifd_send_sync_main(void) MultiFDSendParams *p =3D &multifd_send_state->params[i]; =20 trace_multifd_send_sync_main_wait(p->id); - qemu_sem_wait(&multifd_send_state->sem_sync); + qemu_sem_wait(&p->sem_sync); } trace_multifd_send_sync_main(multifd_send_state->packet_num); } @@ -1152,7 +1152,7 @@ static void *multifd_send_thread(void *opaque) qemu_mutex_unlock(&p->mutex); =20 if (flags & MULTIFD_FLAG_SYNC) { - qemu_sem_post(&multifd_send_state->sem_sync); + qemu_sem_post(&p->sem_sync); } qemu_sem_post(&multifd_send_state->channels_ready); } else if (p->quit) { @@ -1175,7 +1175,7 @@ out: */ if (ret !=3D 0) { if (flags & MULTIFD_FLAG_SYNC) { - qemu_sem_post(&multifd_send_state->sem_sync); + qemu_sem_post(&p->sem_sync); } qemu_sem_post(&multifd_send_state->channels_ready); } @@ -1221,7 +1221,6 @@ int multifd_save_setup(void) multifd_send_state =3D g_malloc0(sizeof(*multifd_send_state)); multifd_send_state->params =3D g_new0(MultiFDSendParams, thread_count); multifd_send_state->pages =3D multifd_pages_init(page_count); - qemu_sem_init(&multifd_send_state->sem_sync, 0); qemu_sem_init(&multifd_send_state->channels_ready, 0); =20 for (i =3D 0; i < thread_count; i++) { @@ -1229,6 +1228,7 @@ int multifd_save_setup(void) =20 qemu_mutex_init(&p->mutex); qemu_sem_init(&p->sem, 0); + qemu_sem_init(&p->sem_sync, 0); p->quit =3D false; p->pending_job =3D 0; p->id =3D i; --=20 2.21.0 From nobody Mon Feb 9 02:34:55 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1565748380; cv=none; d=zoho.com; s=zohoarc; b=XNLi+5+GCd/rdh92kh0bn2/o1VvcncP8Vg3IzA39u6bOyWvC/RHvamuz7j/O3R9X9hCFqDL+kbEd0V69AV7geGfYCwSmJOSxL7/2aTKCKwcqVXyuXFUWfHeSJscKRYEXdTbTzdf+5HsUikoHRdlqFoxS3f70vwdkDJyxY1WERU8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1565748380; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results; bh=UHFusiui40fRizOzMd03hkkLn5NLXLywM7aGg7nAPds=; b=Mcp1eZxN+lLd9qxltAnLpmZ6XCK+JHNK9IWnyV0/LbSLLAog/joazRl54ZNH9AlwatS+kUWOmlo1x/33RaaVVKXcSQQwx4ENFFSWBGZLNxbxZydYL4KQf7ls4AO98GFtquc9isQ9GhJ/v6cVQMbbBvQ1Yx2bGDqPGoKFLNBgCeU= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1565748380210246.7083342538184; Tue, 13 Aug 2019 19:06:20 -0700 (PDT) Received: from localhost ([::1]:56612 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1hxig6-0005YK-Mn for importer@patchew.org; Tue, 13 Aug 2019 22:06:18 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:50966) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1hxicS-0001FD-N5 for qemu-devel@nongnu.org; Tue, 13 Aug 2019 22:02:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hxicR-0005IK-MH for qemu-devel@nongnu.org; Tue, 13 Aug 2019 22:02:32 -0400 Received: from mx1.redhat.com ([209.132.183.28]:37240) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hxicR-0005I5-Dh for qemu-devel@nongnu.org; Tue, 13 Aug 2019 22:02:31 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BB0378E589 for ; Wed, 14 Aug 2019 02:02:30 +0000 (UTC) Received: from localhost.localdomain (ovpn-117-78.ams2.redhat.com [10.36.117.78]) by smtp.corp.redhat.com (Postfix) with ESMTP id B309E7AB44; Wed, 14 Aug 2019 02:02:26 +0000 (UTC) From: Juan Quintela To: qemu-devel@nongnu.org Date: Wed, 14 Aug 2019 04:02:15 +0200 Message-Id: <20190814020218.1868-4-quintela@redhat.com> In-Reply-To: <20190814020218.1868-1-quintela@redhat.com> References: <20190814020218.1868-1-quintela@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Wed, 14 Aug 2019 02:02:30 +0000 (UTC) Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH 3/6] migration: Make sure that all multifd channels have been created X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Dr. David Alan Gilbert" , Juan Quintela Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Type: text/plain; charset="utf-8" If we start the migration before all have been created, we have to handle the case that one channel still don't exist. This way it is easier. Signed-off-by: Juan Quintela --- migration/ram.c | 14 ++++++++++++++ migration/trace-events | 1 + 2 files changed, 15 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index 4bdd201a4e..4a6ae677a9 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -663,6 +663,8 @@ typedef struct { uint64_t num_pages; /* syncs main thread and channels */ QemuSemaphore sem_sync; + /* thread has started and setup is done */ + QemuSemaphore started; } MultiFDSendParams; =20 typedef struct { @@ -1039,6 +1041,7 @@ void multifd_save_cleanup(void) qemu_mutex_destroy(&p->mutex); qemu_sem_destroy(&p->sem); qemu_sem_destroy(&p->sem_sync); + qemu_sem_destroy(&p->started); g_free(p->name); p->name =3D NULL; multifd_pages_clear(p->pages); @@ -1113,6 +1116,8 @@ static void *multifd_send_thread(void *opaque) /* initial packet */ p->num_packets =3D 1; =20 + qemu_sem_post(&p->started); + while (true) { qemu_sem_wait(&p->sem); qemu_mutex_lock(&p->mutex); @@ -1229,6 +1234,7 @@ int multifd_save_setup(void) qemu_mutex_init(&p->mutex); qemu_sem_init(&p->sem, 0); qemu_sem_init(&p->sem_sync, 0); + qemu_sem_init(&p->started, 0); p->quit =3D false; p->pending_job =3D 0; p->id =3D i; @@ -3486,6 +3492,14 @@ static int ram_save_setup(QEMUFile *f, void *opaque) ram_control_before_iterate(f, RAM_CONTROL_SETUP); ram_control_after_iterate(f, RAM_CONTROL_SETUP); =20 + /* We want to wait for all threads to have started before doing + * anything else */ + for (int i =3D 0; i < migrate_multifd_channels(); i++) { + MultiFDSendParams *p =3D &multifd_send_state->params[i]; + + qemu_sem_wait(&p->started); + trace_multifd_send_thread_started(p->id); + } multifd_send_sync_main(); qemu_put_be64(f, RAM_SAVE_FLAG_EOS); qemu_fflush(f); diff --git a/migration/trace-events b/migration/trace-events index 886ce70ca0..dd13a5c4b1 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -95,6 +95,7 @@ multifd_send_sync_main_wait(uint8_t id) "channel %d" multifd_send_terminate_threads(bool error) "error %d" multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t pages) "cha= nnel %d packets %" PRIu64 " pages %" PRIu64 multifd_send_thread_start(uint8_t id) "%d" +multifd_send_thread_started(uint8_t id) "channel %d" ram_discard_range(const char *rbname, uint64_t start, size_t len) "%s: sta= rt: %" PRIx64 " %zx" ram_load_loop(const char *rbname, uint64_t addr, int flags, void *host) "%= s: addr: 0x%" PRIx64 " flags: 0x%x host: %p" ram_load_postcopy_loop(uint64_t addr, int flags) "@%" PRIx64 " %x" --=20 2.21.0 From nobody Mon Feb 9 02:34:55 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1565748380; cv=none; d=zoho.com; s=zohoarc; b=MX6bgmKICS5cHgNChtVsEkJLwlWCUEKF3j2YTqi+njvgIEwtjxXAsrXWCMAXg9KHNDEfCX6lwEw59t13m/g8ZXHsgm5hytzlFHGtZ+xu7WfZF2P7uI0w07xABrFLmW15ceBBIvaR33iZG8TP4buYXXTUkd8uyRMy+px4yP2AEc4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1565748380; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results; bh=QvulvnFuoxKV/VZLhIbGZ+9i8pYFICsSIzp3CpTQ89s=; b=TYsty6v6PhqjJ+Dg/amyJTc4HLXp46HWYlgpQUrcK3Qp0aMGWcUPXnrfYzSxdl8cIDm4q+jPVlWVS3MEdWshr2A5Jiu3Wt52Q+Tg7RErSQanvoevBzsMFyi9CwE2TweIryp0ybCm8AglmO8H4c5DXH5/bJn71fIsSMvZhijLDSI= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1565748380641361.2655659621912; Tue, 13 Aug 2019 19:06:20 -0700 (PDT) Received: from localhost ([::1]:56614 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1hxig7-0005Ya-5X for importer@patchew.org; Tue, 13 Aug 2019 22:06:19 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51008) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1hxicV-0001K0-Nw for qemu-devel@nongnu.org; Tue, 13 Aug 2019 22:02:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hxicU-0005J3-ML for qemu-devel@nongnu.org; Tue, 13 Aug 2019 22:02:35 -0400 Received: from mx1.redhat.com ([209.132.183.28]:54430) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hxicU-0005Id-Ec for qemu-devel@nongnu.org; Tue, 13 Aug 2019 22:02:34 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BF9ED3082E0F for ; Wed, 14 Aug 2019 02:02:33 +0000 (UTC) Received: from localhost.localdomain (ovpn-117-78.ams2.redhat.com [10.36.117.78]) by smtp.corp.redhat.com (Postfix) with ESMTP id 188FD7AB44; Wed, 14 Aug 2019 02:02:30 +0000 (UTC) From: Juan Quintela To: qemu-devel@nongnu.org Date: Wed, 14 Aug 2019 04:02:16 +0200 Message-Id: <20190814020218.1868-5-quintela@redhat.com> In-Reply-To: <20190814020218.1868-1-quintela@redhat.com> References: <20190814020218.1868-1-quintela@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.46]); Wed, 14 Aug 2019 02:02:33 +0000 (UTC) Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH 4/6] migration: Make multifd threads wait until all have been created X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Dr. David Alan Gilbert" , Juan Quintela Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Type: text/plain; charset="utf-8" This makes it clear that no thread handles any incoming message until all threads have been created. Signed-off-by: Juan Quintela --- migration/ram.c | 24 ++++++++++++++++++++++-- migration/trace-events | 1 + 2 files changed, 23 insertions(+), 2 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 4a6ae677a9..f1aec95f83 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -702,6 +702,8 @@ typedef struct { uint64_t num_pages; /* syncs main thread and channels */ QemuSemaphore sem_sync; + /* thread can continue */ + QemuSemaphore can_start; } MultiFDRecvParams; =20 static int multifd_send_initial_packet(MultiFDSendParams *p, Error **errp) @@ -1313,6 +1315,7 @@ int multifd_load_cleanup(Error **errp) p->c =3D NULL; qemu_mutex_destroy(&p->mutex); qemu_sem_destroy(&p->sem_sync); + qemu_sem_destroy(&p->can_start); g_free(p->name); p->name =3D NULL; multifd_pages_clear(p->pages); @@ -1366,6 +1369,9 @@ static void *multifd_recv_thread(void *opaque) trace_multifd_recv_thread_start(p->id); rcu_register_thread(); =20 + qemu_sem_wait(&p->can_start); + trace_multifd_recv_thread_can_start(p->id); + while (true) { uint32_t used; uint32_t flags; @@ -1445,6 +1451,7 @@ int multifd_load_setup(void) =20 qemu_mutex_init(&p->mutex); qemu_sem_init(&p->sem_sync, 0); + qemu_sem_init(&p->can_start, 0); p->quit =3D false; p->id =3D i; p->pages =3D multifd_pages_init(page_count); @@ -1477,6 +1484,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error = **errp) { MultiFDRecvParams *p; Error *local_err =3D NULL; + bool last_one; int id; =20 id =3D multifd_recv_initial_packet(ioc, &local_err); @@ -1506,8 +1514,20 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error= **errp) qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p, QEMU_THREAD_JOINABLE); atomic_inc(&multifd_recv_state->count); - return atomic_read(&multifd_recv_state->count) =3D=3D - migrate_multifd_channels(); + + last_one =3D atomic_read(&multifd_recv_state->count) + =3D=3D migrate_multifd_channels(); + + if (last_one) { + int i; + + for (i =3D 0; i < migrate_multifd_channels(); i++) { + MultiFDRecvParams *p =3D &multifd_recv_state->params[i]; + + qemu_sem_post(&p->can_start); + } + } + return last_one; } =20 /** diff --git a/migration/trace-events b/migration/trace-events index dd13a5c4b1..9fbef614ab 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -86,6 +86,7 @@ multifd_recv_sync_main(long packet_num) "packet num %ld" multifd_recv_sync_main_signal(uint8_t id) "channel %d" multifd_recv_sync_main_wait(uint8_t id) "channel %d" multifd_recv_terminate_threads(bool error) "error %d" +multifd_recv_thread_can_start(uint8_t id) "channel %d" multifd_recv_thread_end(uint8_t id, uint64_t packets, uint64_t pages) "cha= nnel %d packets %" PRIu64 " pages %" PRIu64 multifd_recv_thread_start(uint8_t id) "%d" multifd_send(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flag= s, uint32_t next_packet_size) "channel %d packet_num %" PRIu64 " pages %d f= lags 0x%x next packet size %d" --=20 2.21.0 From nobody Mon Feb 9 02:34:55 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1565748463; cv=none; d=zoho.com; s=zohoarc; b=XB8bfWAUgWl1ip/7vHrkNHpg8UijrulkK8rIaIQyYsMz6pdZwUAQDhzX4Xpe2LkRa9pvylWx+f67PuQMZML2HOTHW6GmU5rpOwqvmwP2UkZ99e0F9suH+L6monpYYI2r7hAwOMiCKHnc+jtxfuZCOnnrIUBXFaO0/4bdS7+8C54= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1565748463; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results; bh=q7Z9SEJRHOqgJhvwgevmKO9Jv7jQRm7fNjyDIbnaN8g=; b=JFTaNTJxNuDXaYPBxyUTSjW1Y7/dSlEOohth40TGE42+MkXA5O/Z1PLidRVOTq0X0LeaQJyVwdLbagHK9QALpqGSTfZOYAVqiVloExCFX48efPWu3MgrooGfyRCmLf9e6c21PvyjZJH8okEfsAIMZ4kJ/ZFX+wDFeUQ9fcIhFeQ= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 156574846343587.74773290396263; Tue, 13 Aug 2019 19:07:43 -0700 (PDT) Received: from localhost ([::1]:56648 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1hxihS-0007oV-Dc for importer@patchew.org; Tue, 13 Aug 2019 22:07:42 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51027) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1hxicZ-0001L3-NS for qemu-devel@nongnu.org; Tue, 13 Aug 2019 22:02:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hxicY-0005K1-3J for qemu-devel@nongnu.org; Tue, 13 Aug 2019 22:02:39 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35806) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hxicX-0005Jo-QK for qemu-devel@nongnu.org; Tue, 13 Aug 2019 22:02:38 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 921463175292 for ; Wed, 14 Aug 2019 02:02:36 +0000 (UTC) Received: from localhost.localdomain (ovpn-117-78.ams2.redhat.com [10.36.117.78]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1ECD17AB44; Wed, 14 Aug 2019 02:02:33 +0000 (UTC) From: Juan Quintela To: qemu-devel@nongnu.org Date: Wed, 14 Aug 2019 04:02:17 +0200 Message-Id: <20190814020218.1868-6-quintela@redhat.com> In-Reply-To: <20190814020218.1868-1-quintela@redhat.com> References: <20190814020218.1868-1-quintela@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.49]); Wed, 14 Aug 2019 02:02:36 +0000 (UTC) Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH 5/6] migration: add some multifd traces X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Dr. David Alan Gilbert" , Juan Quintela Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Type: text/plain; charset="utf-8" Signed-off-by: Juan Quintela Reviewed-by: Dr. David Alan Gilbert Reviewed-by: Philippe Mathieu-Daud=C3=A9 --- migration/ram.c | 3 +++ migration/trace-events | 4 ++++ 2 files changed, 7 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index f1aec95f83..25a211c3fb 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1173,6 +1173,7 @@ static void *multifd_send_thread(void *opaque) =20 out: if (local_err) { + trace_multifd_send_error(p->id); multifd_send_terminate_threads(local_err); } =20 @@ -1203,6 +1204,7 @@ static void multifd_new_send_channel_async(QIOTask *t= ask, gpointer opaque) QIOChannel *sioc =3D QIO_CHANNEL(qio_task_get_source(task)); Error *local_err =3D NULL; =20 + trace_multifd_new_send_channel_async(p->id); if (qio_task_propagate_error(task, &local_err)) { migrate_set_error(migrate_get_current(), local_err); multifd_save_cleanup(); @@ -1496,6 +1498,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error = **errp) atomic_read(&multifd_recv_state->count)); return false; } + trace_multifd_recv_new_channel(id); =20 p =3D &multifd_recv_state->params[id]; if (p->c !=3D NULL) { diff --git a/migration/trace-events b/migration/trace-events index 9fbef614ab..5d85f8bf83 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -81,7 +81,9 @@ migration_bitmap_sync_start(void) "" migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64 migration_bitmap_clear_dirty(char *str, uint64_t start, uint64_t size, uns= igned long page) "rb %s start 0x%"PRIx64" size 0x%"PRIx64" page 0x%lx" migration_throttle(void) "" +multifd_new_send_channel_async(uint8_t id) "channel %d" multifd_recv(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flag= s, uint32_t next_packet_size) "channel %d packet_num %" PRIu64 " pages %d f= lags 0x%x next packet size %d" +multifd_recv_new_channel(uint8_t id) "channel %d" multifd_recv_sync_main(long packet_num) "packet num %ld" multifd_recv_sync_main_signal(uint8_t id) "channel %d" multifd_recv_sync_main_wait(uint8_t id) "channel %d" @@ -89,7 +91,9 @@ multifd_recv_terminate_threads(bool error) "error %d" multifd_recv_thread_can_start(uint8_t id) "channel %d" multifd_recv_thread_end(uint8_t id, uint64_t packets, uint64_t pages) "cha= nnel %d packets %" PRIu64 " pages %" PRIu64 multifd_recv_thread_start(uint8_t id) "%d" +multifd_save_setup_wait(uint8_t id) "%d" multifd_send(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flag= s, uint32_t next_packet_size) "channel %d packet_num %" PRIu64 " pages %d f= lags 0x%x next packet size %d" +multifd_send_error(uint8_t id) "channel %d" multifd_send_sync_main(long packet_num) "packet num %ld" multifd_send_sync_main_signal(uint8_t id) "channel %d" multifd_send_sync_main_wait(uint8_t id) "channel %d" --=20 2.21.0 From nobody Mon Feb 9 02:34:55 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1565748382; cv=none; d=zoho.com; s=zohoarc; b=izPM9lFrHG9J6Y0wGynV60BwJcWBE+LF0cgm5cTGJa6hhw4ybiazFZQslyfebXq/d3xdCm1wC964W51F9vY3xOuO5YUrd3G//A1eQIGJPQRhIEW6J8HAsQko5s1uZClisb1AFDA9/pDg2HelMArAnRBQsWTK73944ExEIMPUB9Q= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1565748382; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results; bh=/slJu2ECTDeLSNbcpxCjRYFv681CLKIcdOrGN5uz4go=; b=XLG9WZuzP8j0OEIMGSYCtLO5gEWFL5KjCni8xl8OqzGX5pxyo4ngRyskfKLhB+ywz0X838IzZIEVQYFhugbNV/tc9tRTjd5fiCM8++NApTeoVyedfXLqDIFPJ0+Kk44u2x2frTZeYi69b0COz8m2xU2ptWyOdXZgfR35V2DbyDM= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1565748382002571.7916740912599; Tue, 13 Aug 2019 19:06:22 -0700 (PDT) Received: from localhost ([::1]:56611 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1hxig8-0005Xy-F8 for importer@patchew.org; Tue, 13 Aug 2019 22:06:20 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51038) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1hxicd-0001MV-HJ for qemu-devel@nongnu.org; Tue, 13 Aug 2019 22:02:44 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hxicb-0005Ko-LA for qemu-devel@nongnu.org; Tue, 13 Aug 2019 22:02:43 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52632) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hxicb-0005KE-7X for qemu-devel@nongnu.org; Tue, 13 Aug 2019 22:02:41 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 89A0085360 for ; Wed, 14 Aug 2019 02:02:39 +0000 (UTC) Received: from localhost.localdomain (ovpn-117-78.ams2.redhat.com [10.36.117.78]) by smtp.corp.redhat.com (Postfix) with ESMTP id E5A1C7AB49; Wed, 14 Aug 2019 02:02:36 +0000 (UTC) From: Juan Quintela To: qemu-devel@nongnu.org Date: Wed, 14 Aug 2019 04:02:18 +0200 Message-Id: <20190814020218.1868-7-quintela@redhat.com> In-Reply-To: <20190814020218.1868-1-quintela@redhat.com> References: <20190814020218.1868-1-quintela@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Wed, 14 Aug 2019 02:02:39 +0000 (UTC) Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH 6/6] RFH: We lost "connect" events X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?UTF-8?q?Daniel=20P=20=2E=20Berrang=C3=A9?= , "Dr. David Alan Gilbert" , Juan Quintela Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Type: text/plain; charset="utf-8" When we have lots of channels, sometimes multifd migration fails with the following error: (qemu) migrate -d tcp:0:4444 (qemu) qemu-system-x86_64: multifd_send_pages: channel 17 has already quit! qemu-system-x86_64: multifd_send_pages: channel 17 has already quit! qemu-system-x86_64: multifd_send_sync_main: multifd_send_pages fail qemu-system-x86_64: Unable to write to socket: Connection reset by peer info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on decompress-error-check: on clear-bitmap-shift: 18 capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks:= off compress: off events: off postcopy-ram: off x-colo: off release-ram: o= ff block: off return-path: off pause-before-switchover: off multifd: on dir= ty-bitmaps: off postcopy-blocktime: off late-block-activate: off x-ignore-s= hared: off Migration status: failed (Unable to write to socket: Connection reset by pe= er) total time: 0 milliseconds On this particular example I am using 100 channels. The bigger the number of channels, the easier that it is to reproduce. That don't mean that it is a good idea to use so many channels. With the previous patches on this series, I can run "reliabely" on my hardware with until 10 channels. Most of the time. Until it fails. With 100 channels, it fails almost always. I thought that the problem was on the send side, so I tried to debug there. As you can see for the delay, if you put any printf()/error_report/trace, you can get that the error goes away, it is very timing sensitive. With a delay of 10000 microseconds, it only works sometimes. What have I discovered so far: - send side calls qemu_socket() on all the channels. So it appears that it gets created correctly. - on the destination side, it appears that "somehowe" some of the connections are lost by the listener. This error happens when the destination side socket hasn't been "accepted", and it is not properly created. As far as I can see, we have several options: 1- I don't know how to use properly qio asynchronously (this is one big posiblity). 2- glib has one error in this case? or how qio listener is implemented on top of glib. I put lots of printf() and other instrumentation, and it appears that the listener io_func is not called at all for the connections that are missing. 3- it is always possible that we are missing some g_main_loop_run() somewhere. Notice how test/test-io-channel-socket.c calls it "creatively". 4- It is enterely possible that I should be using the sockets as blocking instead of non-blocking. But I am not sure about that one yet. - on the sending side, what happens is: eventually it call socket_connect() after all the async dance with thread creation, etc, etc. Source side creates all the channels, it is the destination side which is missing some of them. sending side sends the first packet by that channel, it "sucheeds" and didn't give any error. after some time, sending side decides to send another packet through that channel, and it is now when we get the above error. Any good ideas? Later, Juan. PD: Command line used is attached: Imortant bits: - multifd is set - multifd_channels is set to 100 /scratch/qemu/fail/x64/x86_64-softmmu/qemu-system-x86_64 -M pc-i440fx-3.1,accel=3Dkvm,usb=3Doff,vmport=3Doff,nvdimm -L /mnt/code/qemu/check/pc-bios/ -smp 2 -name t1,debug-threads=3Don -m 3G -uuid 113100f9-6c99-4a7a-9b78-eb1c088d1087 -monitor stdio -boot strict=3Don -drive file=3D/mnt/images/test.img,format=3Dqcow2,if=3Dnone,id=3Ddisk0 -device virtio-blk-pci,scsi=3Doff,bus=3Dpci.0,addr=3D0x7,drive=3Ddisk0,id=3Dvirtio-= disk0,bootindex=3D1 -netdev tap,id=3Dhostnet0,script=3D/etc/kvm-ifup,downscript=3D -device virtio-net-pci,netdev=3Dhostnet0,id=3Dnet0,mac=3D52:54:00:9d:10:51,bus=3Dpc= i.0,addr=3D0x3 -serial pty -parallel none -usb -device usb-tablet -k es -vga cirrus --global migration.x-multifd=3Don --global migration.multifd-channels=3D100 -trace events=3D/home/quintela/tmp/events CC: Daniel P. Berrang=C3=A9 Signed-off-by: Juan Quintela --- migration/ram.c | 1 + 1 file changed, 1 insertion(+) diff --git a/migration/ram.c b/migration/ram.c index 25a211c3fb..50586304a0 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1248,6 +1248,7 @@ int multifd_save_setup(void) p->packet =3D g_malloc0(p->packet_len); p->name =3D g_strdup_printf("multifdsend_%d", i); socket_send_channel_create(multifd_new_send_channel_async, p); + usleep(100000); } return 0; } --=20 2.21.0