From nobody Thu May 16 00:12:38 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=suse.de ARC-Seal: i=1; a=rsa-sha256; t=1695058200; cv=none; d=zohomail.com; s=zohoarc; b=QGMKxcnPHHv+zGp03ktKsSIJ0hgiUz+CldeyKQJaglQCX2GI4aABX1IhLfRlp9lAlT+9X0qnsmj66iZk9hOeL9qs3l6/k/i1gyi56M+AJa1kP+449XiV6AMsF5A93m73PGBgrgnOpz9U3zXDiFj3XutucDFdpC5aUkowhHfHW6M= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1695058200; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=CtCKoalPOk2/4GyZh0hIOszBczbb3C2TFerg0lLfubU=; b=dpbIzx6dldDT4MW1DpbPNfYEMaDTi8MH85YC1C/r6cWLmNcwhDFKdkKFYm1V/Q70lYEplp8o34sAEh3iW/uzu96/oL510Q/7/Me8vzo8QEEfq+Mfv1Fb2j7c/rkKf+Xabu4Vj/a7Yo9R9Uo+EA5YlioHyXPjI0Rx/38vqGndVvE= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1695058200619880.7851932471209; Mon, 18 Sep 2023 10:30:00 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qiI35-0006rZ-Gj; Mon, 18 Sep 2023 13:28:39 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qiI2x-0006l4-VM for qemu-devel@nongnu.org; Mon, 18 Sep 2023 13:28:32 -0400 Received: from smtp-out1.suse.de ([2001:67c:2178:6::1c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qiI2v-0004SC-FM for qemu-devel@nongnu.org; Mon, 18 Sep 2023 13:28:31 -0400 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id E396F21F40; Mon, 18 Sep 2023 17:28:27 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 5DE251358A; Mon, 18 Sep 2023 17:28:26 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id OERCCrqICGUoGAAAMHmgww (envelope-from ); Mon, 18 Sep 2023 17:28:26 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1695058107; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CtCKoalPOk2/4GyZh0hIOszBczbb3C2TFerg0lLfubU=; b=krFBeEc92AIC8qKt88d5fAKVWZQa0Hqw4V9dIGEqVLgWYMzNBiE23lIh6e6Q94VJWlPlZa MN2KZJl2Eu5ux1dOlxj/kkw4mWUnieCB7F+gnltDpSp9lG8TsSeYLLJNkCsVIOLkwisLai Ez/HxS0acP/D1RWNaGBAj6F2QcCkfL8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1695058107; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CtCKoalPOk2/4GyZh0hIOszBczbb3C2TFerg0lLfubU=; b=YT4CCFHvfF/VVda2A/xa0CovhjRwX/s7uuRWA0CsVqzNLu0oIRqeT/WnwmJHW94tLTk3H9 Z3i6CQ+JyZ/HqkBA== From: Fabiano Rosas To: qemu-devel@nongnu.org Cc: Juan Quintela , Peter Xu , Stefan Hajnoczi , Leonardo Bras Subject: [PATCH 1/8] migration: Fix race that dest preempt thread close too early Date: Mon, 18 Sep 2023 14:28:15 -0300 Message-Id: <20230918172822.19052-2-farosas@suse.de> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230918172822.19052-1-farosas@suse.de> References: <20230918172822.19052-1-farosas@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2001:67c:2178:6::1c; envelope-from=farosas@suse.de; helo=smtp-out1.suse.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @suse.de) X-ZM-MESSAGEID: 1695058202503100003 Content-Type: text/plain; charset="utf-8" From: Peter Xu We hit intermit CI issue on failing at migration-test over the unit test preempt/plain: qemu-system-x86_64: Unable to read from socket: Connection reset by peer Memory content inconsistency at 5b43000 first_byte =3D bd last_byte =3D bc = current =3D 4f hit_edge =3D 1 ** ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion faile= d: (bad =3D=3D 0) (test program exited with status code -6) Fabiano debugged into it and found that the preempt thread can quit even without receiving all the pages, which can cause guest not receiving all the pages and corrupt the guest memory. To make sure preempt thread finished receiving all the pages, we can rely on the page_requested_count being zero because preempt channel will only receive requested page faults. Note, not all the faulted pages are required to be sent via the preempt channel/thread; imagine the case when a requested page is just queued into the background main channel for migration, the src qemu will just still send it via the background channel. Here instead of spinning over reading the count, we add a condvar so the main thread can wait on it if that unusual case happened, without burning the cpu for no good reason, even if the duration is short; so even if we spin in this rare case is probably fine. It's just better to not do so. The condvar is only used when that special case is triggered. Some memory ordering trick is needed to guarantee it from happening (against the preempt thread status field), so the main thread will always get a kick when that triggers correctly. Closes: https://gitlab.com/qemu-project/qemu/-/issues/1886 Debugged-by: Fabiano Rosas Signed-off-by: Peter Xu Signed-off-by: Fabiano Rosas --- migration/migration.c | 3 ++- migration/migration.h | 13 ++++++++++++- migration/postcopy-ram.c | 38 +++++++++++++++++++++++++++++++++++++- 3 files changed, 51 insertions(+), 3 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index d61e572742..3ee1e6b0d6 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -153,6 +153,7 @@ void migration_object_init(void) qemu_sem_init(¤t_incoming->postcopy_qemufile_dst_done, 0); =20 qemu_mutex_init(¤t_incoming->page_request_mutex); + qemu_cond_init(¤t_incoming->page_request_cond); current_incoming->page_requested =3D g_tree_new(page_request_addr_cmp); =20 migration_object_check(current_migration, &error_fatal); @@ -367,7 +368,7 @@ int migrate_send_rp_req_pages(MigrationIncomingState *m= is, * things like g_tree_lookup() will return TRUE (1) when found. */ g_tree_insert(mis->page_requested, aligned, (gpointer)1); - mis->page_requested_count++; + qatomic_inc(&mis->page_requested_count); trace_postcopy_page_req_add(aligned, mis->page_requested_count= ); } } diff --git a/migration/migration.h b/migration/migration.h index c390500604..cdaa10d515 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -196,7 +196,10 @@ struct MigrationIncomingState { =20 /* A tree of pages that we requested to the source VM */ GTree *page_requested; - /* For debugging purpose only, but would be nice to keep */ + /* + * For postcopy only, count the number of requested page faults that + * still haven't been resolved. + */ int page_requested_count; /* * The mutex helps to maintain the requested pages that we sent to the @@ -210,6 +213,14 @@ struct MigrationIncomingState { * contains valid information. */ QemuMutex page_request_mutex; + /* + * If postcopy preempt is enabled, there is a chance that the main + * thread finished loading its data before the preempt channel has + * finished loading the urgent pages. If that happens, the two threads + * will use this condvar to synchronize, so the main thread will always + * wait until all pages received. + */ + QemuCond page_request_cond; =20 /* * Number of devices that have yet to approve switchover. When this re= aches diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index 29aea9456d..5408e028c6 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -599,6 +599,30 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingSta= te *mis) if (mis->preempt_thread_status =3D=3D PREEMPT_THREAD_CREATED) { /* Notify the fast load thread to quit */ mis->preempt_thread_status =3D PREEMPT_THREAD_QUIT; + /* + * Update preempt_thread_status before reading count. Note: mutex + * lock only provide ACQUIRE semantic, and it doesn't stops this + * write to be reordered after reading the count. + */ + smp_mb(); + /* + * It's possible that the preempt thread is still handling the last + * pages to arrive which were requested by guest page faults. + * Making sure nothing is left behind by waiting on the condvar if + * that unlikely case happened. + */ + WITH_QEMU_LOCK_GUARD(&mis->page_request_mutex) { + if (qatomic_read(&mis->page_requested_count)) { + /* + * It is guaranteed to receive a signal later, because the + * count>0 now, so it's destined to be decreased to zero + * very soon by the preempt thread. + */ + qemu_cond_wait(&mis->page_request_cond, + &mis->page_request_mutex); + } + } + /* Notify the fast load thread to quit */ if (mis->postcopy_qemufile_dst) { qemu_file_shutdown(mis->postcopy_qemufile_dst); } @@ -1277,8 +1301,20 @@ static int qemu_ufd_copy_ioctl(MigrationIncomingStat= e *mis, void *host_addr, */ if (g_tree_lookup(mis->page_requested, host_addr)) { g_tree_remove(mis->page_requested, host_addr); - mis->page_requested_count--; + int left_pages =3D qatomic_dec_fetch(&mis->page_requested_coun= t); + trace_postcopy_page_req_del(host_addr, mis->page_requested_cou= nt); + /* Order the update of count and read of preempt status */ + smp_mb(); + if (mis->preempt_thread_status =3D=3D PREEMPT_THREAD_QUIT && + left_pages =3D=3D 0) { + /* + * This probably means the main thread is waiting for us. + * Notify that we've finished receiving the last requested + * page. + */ + qemu_cond_signal(&mis->page_request_cond); + } } qemu_mutex_unlock(&mis->page_request_mutex); mark_postcopy_blocktime_end((uintptr_t)host_addr); --=20 2.35.3 From nobody Thu May 16 00:12:38 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=suse.de ARC-Seal: i=1; a=rsa-sha256; t=1695058158; cv=none; d=zohomail.com; s=zohoarc; b=lTVV+2otjShx9SDoqLBrXpp17PAoy6HXXv90KRWc7J+WaEqaZT8vEbYTiMmZayYR2P+Q/TkHUESoaQgCd9iD7wFOCmA7tBxi8RcnmaZB26AAiiGrI0+71+6XmUZhRac05IvGQbsOG4LmWhG7oxmvSb1hAhLD8XuryfyIKzizDQ8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1695058158; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=7m6ZhfyJCfU+LqraU6Ps/zeNxE5mlr1xY4cZUtnPWNA=; b=clIgV2EZJF+PVq7zj+dGMJZZ14+yux4YTohamXoUxV4v9TkZJ9RAG/dZHonmP6Nk/p6MR17H009Gxc3z0YTPaEGr0AWnW+0i7RAbNxE1D1QudkJX3UQiDE2ffqU1RPA9Sq+MhdpAuw4KAgua76YdRFVPI2RJAYV4EMUAiFF3bg4= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 169505815862314.77789642875689; Mon, 18 Sep 2023 10:29:18 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qiI34-0006mn-PG; Mon, 18 Sep 2023 13:28:38 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qiI2y-0006lf-SU for qemu-devel@nongnu.org; Mon, 18 Sep 2023 13:28:34 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qiI2x-0004SL-Bl for qemu-devel@nongnu.org; Mon, 18 Sep 2023 13:28:32 -0400 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id D77052003E; Mon, 18 Sep 2023 17:28:29 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 599901358A; Mon, 18 Sep 2023 17:28:28 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id yGdTCbyICGUoGAAAMHmgww (envelope-from ); Mon, 18 Sep 2023 17:28:28 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1695058109; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7m6ZhfyJCfU+LqraU6Ps/zeNxE5mlr1xY4cZUtnPWNA=; b=ky/YbMd5m7akAzDp6ONouzIbesb3/wanx7rp0Wvt2koshNTbgcU6vyBPoAMF0Y130EzZMv /KxxsB4SJ5q61SfNJAuugbOBN9YfN8qqdX+Ad1epAWBRc0N/Fop9IVhdrQjPbElJjn95B3 npfLcq5WYtrTt84Egff9GOZ/CV8VUJ0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1695058109; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7m6ZhfyJCfU+LqraU6Ps/zeNxE5mlr1xY4cZUtnPWNA=; b=kyOIZpN5U+nwtrDkgHVzRRydZBbjPppZCsSCtT71jZnmDkvWK/m7nUxS6KnehgERl+QmFA pgUzWEWK2RY8ypBg== From: Fabiano Rosas To: qemu-devel@nongnu.org Cc: Juan Quintela , Peter Xu , Stefan Hajnoczi , Leonardo Bras Subject: [PATCH 2/8] migration: Fix possible race when setting rp_state.error Date: Mon, 18 Sep 2023 14:28:16 -0300 Message-Id: <20230918172822.19052-3-farosas@suse.de> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230918172822.19052-1-farosas@suse.de> References: <20230918172822.19052-1-farosas@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=195.135.220.29; envelope-from=farosas@suse.de; helo=smtp-out2.suse.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @suse.de) X-ZM-MESSAGEID: 1695058160252100003 Content-Type: text/plain; charset="utf-8" We don't need to set the rp_state.error right after a shutdown because qemu_file_shutdown() always sets the QEMUFile error, so the return path thread would have seen it and set the rp error itself. Setting the error outside of the thread is also racy because the thread could clear it after we set it. Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas --- migration/migration.c | 1 - 1 file changed, 1 deletion(-) diff --git a/migration/migration.c b/migration/migration.c index 3ee1e6b0d6..d426b69ada 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2074,7 +2074,6 @@ static int await_return_path_close_on_source(Migratio= nState *ms) * waiting for the destination. */ qemu_file_shutdown(ms->rp_state.from_dst_file); - mark_source_rp_bad(ms); } trace_await_return_path_close_on_source_joining(); qemu_thread_join(&ms->rp_state.rp_thread); --=20 2.35.3 From nobody Thu May 16 00:12:38 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=suse.de ARC-Seal: i=1; a=rsa-sha256; t=1695058159; cv=none; d=zohomail.com; s=zohoarc; b=Pel9I7OghsXIfinZzV8900i4m1zJ+tkyiQ+5TDhN53xbD5+rbZ77vRK+5H+KEq/h5wCIu3eJbtRNKetu575nbK+4A/9bYb0A9ee5t0/8Jfwkz8oghFnpE+0dgrjuLKMTs27IDaeoN9qfYsZHWJ/kzdXz8yi6aIIiwMsHkTyzVYo= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1695058159; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=G1otjZX0QY+A/Q21txZqVIo8Ep3DaqdFJQeEcUS7su4=; b=S5XWFAsUn+A75323oDZsKyrYiqObd0sb9ylNAMhliWPMmYDmyiugrZV8PJ1FK3q1S8RUvQfNgo7Wi/S12BwtqhujQkM7MfNjTfsWbpDSdNG9rjtRv3+r5VMTfRrp54KxlVgew5Un1ofxBjWE0r8jL5OkulDtggnn/ETqr4Xj5NM= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1695058159628879.9455914514195; Mon, 18 Sep 2023 10:29:19 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qiI34-0006qD-Qo; Mon, 18 Sep 2023 13:28:38 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qiI30-0006mG-MA for qemu-devel@nongnu.org; Mon, 18 Sep 2023 13:28:34 -0400 Received: from smtp-out1.suse.de ([2001:67c:2178:6::1c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qiI2y-0004Sd-Sx for qemu-devel@nongnu.org; Mon, 18 Sep 2023 13:28:34 -0400 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id C84DC21F2D; Mon, 18 Sep 2023 17:28:31 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 4D4AC1358A; Mon, 18 Sep 2023 17:28:30 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id CIomBr6ICGUoGAAAMHmgww (envelope-from ); Mon, 18 Sep 2023 17:28:30 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1695058111; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=G1otjZX0QY+A/Q21txZqVIo8Ep3DaqdFJQeEcUS7su4=; b=yyjEYgG+3/mhxuoAsUQyRjEcRixT3P1wZd869lV7+9xLzZ2hm8WbaT0PTw9hNWd65Su6m8 P2LklQ85/yKpiiH5uPDqSR+oCDMroZn5RN4Df6OzbD7mFURN8WKjRFOS/5qQUQ+2gKwi67 p7BGUMoi8Vwk4aIc8YnY7G/mRAlRqtM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1695058111; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=G1otjZX0QY+A/Q21txZqVIo8Ep3DaqdFJQeEcUS7su4=; b=mKRMHh4bFDm3qi7i6cBQN00pSRUrCApFqwXaIf1jxbbD5dr+xHfDiDblMvGOziGqDl/79b 1Bzyr1EGv3QnbwAw== From: Fabiano Rosas To: qemu-devel@nongnu.org Cc: Juan Quintela , Peter Xu , Stefan Hajnoczi , Leonardo Bras Subject: [PATCH 3/8] migration: Fix possible races when shutting down the return path Date: Mon, 18 Sep 2023 14:28:17 -0300 Message-Id: <20230918172822.19052-4-farosas@suse.de> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230918172822.19052-1-farosas@suse.de> References: <20230918172822.19052-1-farosas@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2001:67c:2178:6::1c; envelope-from=farosas@suse.de; helo=smtp-out1.suse.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @suse.de) X-ZM-MESSAGEID: 1695058160177100001 Content-Type: text/plain; charset="utf-8" We cannot call qemu_file_shutdown() on the return path file without taking the file lock. The return path thread could be running it's cleanup code and have just cleared the from_dst_file pointer. Checking ms->to_dst_file for errors could also race with migrate_fd_cleanup() which clears the to_dst_file pointer. Protect both accesses by taking the file lock. This was caught by inspection, it should be rare, but the next patches will start calling this code from other places, so let's do the correct thing. Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas --- migration/migration.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index d426b69ada..15b7258bb2 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2064,17 +2064,18 @@ static int open_return_path_on_source(MigrationStat= e *ms, static int await_return_path_close_on_source(MigrationState *ms) { /* - * If this is a normal exit then the destination will send a SHUT and = the - * rp_thread will exit, however if there's an error we need to cause - * it to exit. + * If this is a normal exit then the destination will send a SHUT + * and the rp_thread will exit, however if there's an error we + * need to cause it to exit. shutdown(2), if we have it, will + * cause it to unblock if it's stuck waiting for the destination. */ - if (qemu_file_get_error(ms->to_dst_file) && ms->rp_state.from_dst_file= ) { - /* - * shutdown(2), if we have it, will cause it to unblock if it's st= uck - * waiting for the destination. - */ - qemu_file_shutdown(ms->rp_state.from_dst_file); + WITH_QEMU_LOCK_GUARD(&ms->qemu_file_lock) { + if (ms->to_dst_file && ms->rp_state.from_dst_file && + qemu_file_get_error(ms->to_dst_file)) { + qemu_file_shutdown(ms->rp_state.from_dst_file); + } } + trace_await_return_path_close_on_source_joining(); qemu_thread_join(&ms->rp_state.rp_thread); ms->rp_state.rp_thread_created =3D false; --=20 2.35.3 From nobody Thu May 16 00:12:38 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=suse.de ARC-Seal: i=1; a=rsa-sha256; t=1695058204; cv=none; d=zohomail.com; s=zohoarc; b=DmD+ql9POk/5decfowAhBgx8EEw6B6UYnty5hlS2VQayEChf7YPaXA5jzCNq4NicFfhwxmjVbI50x6Lyp5q1hU/Cdfwb5RoGUF4DCUU0QhwbhIIyYa5oGKwKv0UGkgz7syvxF0VyoG7BrBii+RlQIqpjhHatce6lvoFy+a7a/rE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1695058204; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=bYUP+MM7SbKNQ/UCCiBh+qINj5V/1DPeXsfFNDPnhCI=; b=a+F/7bqIyF6tKn3VDywO7RZOqtL1ymHbVhR0rRKGt7cl4CJWptaMW8KF9tZldHsVoxqHGaLdaieFeA+9WogILGhTQy3B2yB8FJHF2MbfnG5+IvKqCGWwxwn2o6PwOdTs83clK9yoUBCVQNhwQsv24SsRlmzltGxl857LsunWxh8= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1695058204967321.4072420309391; Mon, 18 Sep 2023 10:30:04 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qiI35-0006rL-Bq; Mon, 18 Sep 2023 13:28:39 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qiI32-0006mb-BK for qemu-devel@nongnu.org; Mon, 18 Sep 2023 13:28:36 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qiI30-0004Su-Tg for qemu-devel@nongnu.org; Mon, 18 Sep 2023 13:28:36 -0400 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id C69FE2003E; Mon, 18 Sep 2023 17:28:33 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 3EA231358A; Mon, 18 Sep 2023 17:28:32 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id gPO7AsCICGUoGAAAMHmgww (envelope-from ); Mon, 18 Sep 2023 17:28:32 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1695058113; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bYUP+MM7SbKNQ/UCCiBh+qINj5V/1DPeXsfFNDPnhCI=; b=FCKRcrY8dTGHOAla3tEMAusdNACkqocRBArbb5YIjSXuFenAleAL317JTzCSovYYKG7jbS G0yUhcfpaSLuf8jjBAljaVM6/nqlGzEV6c/WARWtclDGnENx4oh988pZhqJbdEQiIISGfX UsBa1SCkZq67BFfXrCJ8lxKLXmE5T4U= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1695058113; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bYUP+MM7SbKNQ/UCCiBh+qINj5V/1DPeXsfFNDPnhCI=; b=WIVrlYDh392+YtzQqJnxvB1dfHQ23bAbv7ODqgVMGOIg4mgBUaUdlAuDEMDdgX02cnaih5 3bNBlgWoAF/QtvBA== From: Fabiano Rosas To: qemu-devel@nongnu.org Cc: Juan Quintela , Peter Xu , Stefan Hajnoczi , Leonardo Bras Subject: [PATCH 4/8] migration: Fix possible race when shutting down to_dst_file Date: Mon, 18 Sep 2023 14:28:18 -0300 Message-Id: <20230918172822.19052-5-farosas@suse.de> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230918172822.19052-1-farosas@suse.de> References: <20230918172822.19052-1-farosas@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=195.135.220.29; envelope-from=farosas@suse.de; helo=smtp-out2.suse.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @suse.de) X-ZM-MESSAGEID: 1695058205381100001 Content-Type: text/plain; charset="utf-8" It's not safe to call qemu_file_shutdown() on the to_dst_file without first checking for the file's presence under the lock. The cleanup of this file happens at postcopy_pause() and migrate_fd_cleanup() which are not necessarily running in the same thread as migrate_fd_cancel(). Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas --- migration/migration.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 15b7258bb2..6e09463466 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1246,7 +1246,7 @@ static void migrate_fd_error(MigrationState *s, const= Error *error) static void migrate_fd_cancel(MigrationState *s) { int old_state ; - QEMUFile *f =3D migrate_get_current()->to_dst_file; + trace_migrate_fd_cancel(); =20 WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) { @@ -1272,11 +1272,13 @@ static void migrate_fd_cancel(MigrationState *s) * If we're unlucky the migration code might be stuck somewhere in a * send/write while the network has failed and is waiting to timeout; * if we've got shutdown(2) available then we can force it to quit. - * The outgoing qemu file gets closed in migrate_fd_cleanup that is - * called in a bh, so there is no race against this cancel. */ - if (s->state =3D=3D MIGRATION_STATUS_CANCELLING && f) { - qemu_file_shutdown(f); + if (s->state =3D=3D MIGRATION_STATUS_CANCELLING) { + WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) { + if (s->to_dst_file) { + qemu_file_shutdown(s->to_dst_file); + } + } } if (s->state =3D=3D MIGRATION_STATUS_CANCELLING && s->block_inactive) { Error *local_err =3D NULL; @@ -1536,12 +1538,14 @@ void qmp_migrate_pause(Error **errp) { MigrationState *ms =3D migrate_get_current(); MigrationIncomingState *mis =3D migration_incoming_get_current(); - int ret; + int ret =3D 0; =20 if (ms->state =3D=3D MIGRATION_STATUS_POSTCOPY_ACTIVE) { /* Source side, during postcopy */ qemu_mutex_lock(&ms->qemu_file_lock); - ret =3D qemu_file_shutdown(ms->to_dst_file); + if (ms->to_dst_file) { + ret =3D qemu_file_shutdown(ms->to_dst_file); + } qemu_mutex_unlock(&ms->qemu_file_lock); if (ret) { error_setg(errp, "Failed to pause source migration"); --=20 2.35.3 From nobody Thu May 16 00:12:38 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=suse.de ARC-Seal: i=1; a=rsa-sha256; t=1695058147; cv=none; d=zohomail.com; s=zohoarc; b=FRYCugKZSndZi+33ninC64XKHrBJfR3AXkosbm7Zy8BLOUMD38Qe7QrSJHLKWC2f7u06yjUQZNyefIL4DtwMFmb+9VxDbAmG9rugfF2+y3ZPKo1P/5A7FPFmxxq8mQ7Go20wsLXgSazmOPX2VxN12mOT8JjsoUn1TkB0eRMq+s4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1695058147; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=3Dei5TO/Jl+TF/hi8Ru1PfkzN2q94g94zsP+jz/sxRE=; b=XnKlB23kEIvW2dfTTxQpoyxUdIZWW+KdFH4vX/ptGK2ED5z4Lpl9l8lpOHc7hAqj/FQert1TV+sARNdY/a5IRU6eXYe9yWyrehImAozNhbYSoLU2Z4p1UEvMkydsykEZvVRpA/juydxKl/Qj7wUnZahhKCEQRO69RI60CTSGTpo= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 169505814747296.55899491880461; Mon, 18 Sep 2023 10:29:07 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qiI36-0006s4-48; Mon, 18 Sep 2023 13:28:40 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qiI34-0006qt-8b for qemu-devel@nongnu.org; Mon, 18 Sep 2023 13:28:38 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qiI32-0004TB-PO for qemu-devel@nongnu.org; Mon, 18 Sep 2023 13:28:37 -0400 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id A9D5721F40; Mon, 18 Sep 2023 17:28:35 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 305371358A; Mon, 18 Sep 2023 17:28:33 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id UPDPOsGICGUoGAAAMHmgww (envelope-from ); Mon, 18 Sep 2023 17:28:33 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1695058115; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3Dei5TO/Jl+TF/hi8Ru1PfkzN2q94g94zsP+jz/sxRE=; b=0VXF89LsDbPltJr4EfGAnlvJ/9D5khx8/bZctBu2tnKu+yl8qzo+LpgvGoCBmcv0Eo3nUa lyj5AVjXSjHg+Npms3yhP/GAL7JA3iDNN6z1JCuItzmKZaz0tdVvT5fVVyNDVjdbThjZo3 nsawVuQCS2yk/6ylig72mx0N7e1wSJY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1695058115; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3Dei5TO/Jl+TF/hi8Ru1PfkzN2q94g94zsP+jz/sxRE=; b=py7SmMwBAQqxfNeG32HIxfHqZ8oVtQv/+ylWt9Jep8E3dEO3K5374uHlchCQrDNdZoULCf lGxbuwYk6kKi10Bw== From: Fabiano Rosas To: qemu-devel@nongnu.org Cc: Juan Quintela , Peter Xu , Stefan Hajnoczi , Leonardo Bras Subject: [PATCH 5/8] migration: Remove redundant cleanup of postcopy_qemufile_src Date: Mon, 18 Sep 2023 14:28:19 -0300 Message-Id: <20230918172822.19052-6-farosas@suse.de> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230918172822.19052-1-farosas@suse.de> References: <20230918172822.19052-1-farosas@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=195.135.220.28; envelope-from=farosas@suse.de; helo=smtp-out1.suse.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @suse.de) X-ZM-MESSAGEID: 1695058148110100001 Content-Type: text/plain; charset="utf-8" This file is owned by the return path thread which is already doing cleanup. Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas --- migration/migration.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 6e09463466..4372b0fbbf 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1178,12 +1178,6 @@ static void migrate_fd_cleanup(MigrationState *s) qemu_fclose(tmp); } =20 - if (s->postcopy_qemufile_src) { - migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src); - qemu_fclose(s->postcopy_qemufile_src); - s->postcopy_qemufile_src =3D NULL; - } - assert(!migration_is_active(s)); =20 if (s->state =3D=3D MIGRATION_STATUS_CANCELLING) { --=20 2.35.3 From nobody Thu May 16 00:12:38 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=suse.de ARC-Seal: i=1; a=rsa-sha256; t=1695058192; cv=none; d=zohomail.com; s=zohoarc; b=V3O4QcrJTfkGa+Phab7tG4xN/XOfG43Gl+X1jHxJWGnxHVGAdTc0Mr+8WoLaNZBarlZx3W3YYRxhZBfXstz7RaosJdDRKMJgRhuiML4i8npw8Tit6n5y3QWTMo3GbHbg8gw4Mc+LyyZIn5FfIq1ZHWVcJhZQ0zS8MBuek9czWtc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1695058192; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=CZjE+FFvb0C0Fjf1sBA7KjXl5hIF4Ps2Ppk+f7eWafQ=; b=UNWkApD9L3MI96mLPTL+FpixFvAWfla8IjlgzVqlO74v4ft4R/oxRQwvm6it0hW4NhPZVXG5rmmYXEXFq8h0eFES/UT4YHEN/QqTHvQw7jsGwVz7GE4dG79GLPa27hvXrtT0iSMOexMzL+4w50k7WpAK2WiGUwkxQc9rDx1NFA4= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1695058192784236.31539677928367; Mon, 18 Sep 2023 10:29:52 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qiI37-0006v0-WC; Mon, 18 Sep 2023 13:28:42 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qiI36-0006sJ-8W for qemu-devel@nongnu.org; Mon, 18 Sep 2023 13:28:40 -0400 Received: from smtp-out2.suse.de ([2001:67c:2178:6::1d]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qiI34-0004TS-Mc for qemu-devel@nongnu.org; Mon, 18 Sep 2023 13:28:40 -0400 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 9CB132003F; Mon, 18 Sep 2023 17:28:37 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 1F9581358A; Mon, 18 Sep 2023 17:28:35 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id SGrGNsOICGUoGAAAMHmgww (envelope-from ); Mon, 18 Sep 2023 17:28:35 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1695058117; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CZjE+FFvb0C0Fjf1sBA7KjXl5hIF4Ps2Ppk+f7eWafQ=; b=IfzkfdwYUBJ8lNi+XECb5CljBHbUd0MLMd2cvqFPG+gR5/ZTjToaOXQpA4eCGg64xsYIRK bGXDuEoBTYcGzkqlOi8j1Bbc3Ncm3Xw47cLIWQPUI5R+5mnqDYZX1mh4sx2xCDjRB1PSFf MO2LXrN4k+dRKoI6/rcmXNbDa0S7OjA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1695058117; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CZjE+FFvb0C0Fjf1sBA7KjXl5hIF4Ps2Ppk+f7eWafQ=; b=L0Dl3YdTWsV0W5HgwCjMejBFcprPMxV9eHWgzm/i3lkIvyM+osIvKsUE8yY5CfhzEF+zBP DddyXY46s/GID7AA== From: Fabiano Rosas To: qemu-devel@nongnu.org Cc: Juan Quintela , Peter Xu , Stefan Hajnoczi , Leonardo Bras Subject: [PATCH 6/8] migration: Consolidate return path closing code Date: Mon, 18 Sep 2023 14:28:20 -0300 Message-Id: <20230918172822.19052-7-farosas@suse.de> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230918172822.19052-1-farosas@suse.de> References: <20230918172822.19052-1-farosas@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2001:67c:2178:6::1d; envelope-from=farosas@suse.de; helo=smtp-out2.suse.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @suse.de) X-ZM-MESSAGEID: 1695058193449100001 Content-Type: text/plain; charset="utf-8" We'll start calling the await_return_path_close_on_source() function from other parts of the code, so move all of the related checks and tracepoints into it. Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas --- migration/migration.c | 29 ++++++++++++++--------------- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 4372b0fbbf..f6c0250d33 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2061,6 +2061,14 @@ static int open_return_path_on_source(MigrationState= *ms, /* Returns 0 if the RP was ok, otherwise there was an error on the RP */ static int await_return_path_close_on_source(MigrationState *ms) { + int ret; + + if (!ms->rp_state.rp_thread_created) { + return 0; + } + + trace_migration_return_path_end_before(); + /* * If this is a normal exit then the destination will send a SHUT * and the rp_thread will exit, however if there's an error we @@ -2078,7 +2086,10 @@ static int await_return_path_close_on_source(Migrati= onState *ms) qemu_thread_join(&ms->rp_state.rp_thread); ms->rp_state.rp_thread_created =3D false; trace_await_return_path_close_on_source_close(); - return ms->rp_state.error; + + ret =3D ms->rp_state.error; + trace_migration_return_path_end_after(ret); + return ret; } =20 static inline void @@ -2374,20 +2385,8 @@ static void migration_completion(MigrationState *s) goto fail; } =20 - /* - * If rp was opened we must clean up the thread before - * cleaning everything else up (since if there are no failures - * it will wait for the destination to send it's status in - * a SHUT command). - */ - if (s->rp_state.rp_thread_created) { - int rp_error; - trace_migration_return_path_end_before(); - rp_error =3D await_return_path_close_on_source(s); - trace_migration_return_path_end_after(rp_error); - if (rp_error) { - goto fail; - } + if (await_return_path_close_on_source(s)) { + goto fail; } =20 if (qemu_file_get_error(s->to_dst_file)) { --=20 2.35.3 From nobody Thu May 16 00:12:38 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=suse.de ARC-Seal: i=1; a=rsa-sha256; t=1695058178; cv=none; d=zohomail.com; s=zohoarc; b=TKBO/Pvp2oChO/T54EoS2QGfEcAIXHeyFWqPjLS53tEBPzACh9nCGw9Ttjfow6uWfyUpbOO/7WrzZtLBqiPPEqAPN3AgXvyShmujkJiDAAHzqLtQDl6Aec0XyRYUJpA7UfVq/98tqqSQQfOpWsfR/VIA8vCRaapo+TyAdwz6ziU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1695058178; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=njiEasCiyDZ3zEQzv1te8rw2b1GzhzW8OUb87XFlTlc=; b=a8VypaqpyMU9hhbDfsARd1GjVgQaL0jSau9Foy4TKzDFIqR/kf4ZnKxl8OU9iJcg5UyObRxzeZTAlO+HcSfy5uRh+X5TxWtSxAKAJOoXf4v8jj2eqg8W+rG0hSsYGhjtKxBeuKLp3tH1kUK8xLp5hWzooAQrev3DZ72CBgWN8rA= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 169505817888513.063464553200333; Mon, 18 Sep 2023 10:29:38 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qiI3C-0006x1-67; Mon, 18 Sep 2023 13:28:46 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qiI3A-0006wU-N1 for qemu-devel@nongnu.org; Mon, 18 Sep 2023 13:28:44 -0400 Received: from smtp-out2.suse.de ([2001:67c:2178:6::1d]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qiI38-0004Tl-P5 for qemu-devel@nongnu.org; Mon, 18 Sep 2023 13:28:44 -0400 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 9F67B20040; Mon, 18 Sep 2023 17:28:39 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 1545D1358A; Mon, 18 Sep 2023 17:28:37 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id SCf4M8WICGUoGAAAMHmgww (envelope-from ); Mon, 18 Sep 2023 17:28:37 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1695058119; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=njiEasCiyDZ3zEQzv1te8rw2b1GzhzW8OUb87XFlTlc=; b=0qh31PKJwlln4m5d9gVjRlDEcfcJtGPKqim/wS+cxTBDpGRDtYg7jAQ3JkPfEx6W9bYEOi hHTMpKr8eDk+arLYBe26elORMyImrZ/pi7nd17TT04skgklrDvOw0hI3wHkUvyDmlGTHPq NhCdLy0+JbP7qc0JwB+A28cEojK1Jwk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1695058119; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=njiEasCiyDZ3zEQzv1te8rw2b1GzhzW8OUb87XFlTlc=; b=ID9m2BcJG57PSiqXtaby3/peTh4qP5PuoUMcuT1aHLZrFPV3LJAq2R1qt8RxCSEycq0gVI Nk8bBSQK1GyL+8CQ== From: Fabiano Rosas To: qemu-devel@nongnu.org Cc: Juan Quintela , Peter Xu , Stefan Hajnoczi , Leonardo Bras Subject: [PATCH 7/8] migration: Replace the return path retry logic Date: Mon, 18 Sep 2023 14:28:21 -0300 Message-Id: <20230918172822.19052-8-farosas@suse.de> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230918172822.19052-1-farosas@suse.de> References: <20230918172822.19052-1-farosas@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2001:67c:2178:6::1d; envelope-from=farosas@suse.de; helo=smtp-out2.suse.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @suse.de) X-ZM-MESSAGEID: 1695058180885100003 Content-Type: text/plain; charset="utf-8" Replace the return path retry logic with finishing and restarting the thread. This fixes a race when resuming the migration that leads to a segfault. Currently when doing postcopy we consider that an IO error on the return path file could be due to a network intermittency. We then keep the thread alive but have it do cleanup of the 'from_dst_file' and wait on the 'postcopy_pause_rp' semaphore. When the user issues a migrate resume, a new return path is opened and the thread is allowed to continue. There's a race condition in the above mechanism. It is possible for the new return path file to be setup *before* the cleanup code in the return path thread has had a chance to run, leading to the *new* file being closed and the pointer set to NULL. When the thread is released after the resume, it tries to dereference 'from_dst_file' and crashes: Thread 7 "return path" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffd1dbf700 (LWP 9611)] 0x00005555560e4893 in qemu_file_get_error_obj (f=3D0x0, errp=3D0x0) at ../m= igration/qemu-file.c:154 154 return f->last_error; (gdb) bt #0 0x00005555560e4893 in qemu_file_get_error_obj (f=3D0x0, errp=3D0x0) at= ../migration/qemu-file.c:154 #1 0x00005555560e4983 in qemu_file_get_error (f=3D0x0) at ../migration/qe= mu-file.c:206 #2 0x0000555555b9a1df in source_return_path_thread (opaque=3D0x555556e060= 00) at ../migration/migration.c:1876 #3 0x000055555602e14f in qemu_thread_start (args=3D0x55555782e780) at ../= util/qemu-thread-posix.c:541 #4 0x00007ffff38d76ea in start_thread (arg=3D0x7fffd1dbf700) at pthread_c= reate.c:477 #5 0x00007ffff35efa6f in clone () at ../sysdeps/unix/sysv/linux/x86_64/cl= one.S:95 Here's the race (important bit is open_return_path happening before migration_release_dst_files): migration | qmp | return path Reviewed-by: Peter Xu --------------------------+-----------------------------+------------------= --------------- qmp_migrate_pause() shutdown(ms->to_dst_file) f->last_error =3D -EIO migrate_detect_error() postcopy_pause() set_state(PAUSED) wait(postcopy_pause_sem) qmp_migrate(resume) migrate_fd_connect() resume =3D state =3D=3D PAUSED open_return_path <-- TOO SOON! set_state(RECOVER) post(postcopy_pause_sem) (incoming closes to_src_file) res =3D qemu_file_get_error(rp) migration_release_dst_files() ms->rp_state.from_dst_file =3D NULL post(postcopy_pause_rp_sem) postcopy_pause_return_path_thread() wait(postcopy_pause_rp_sem) rp =3D ms->rp_state.from_dst_file goto retry qemu_file_get_error(rp) SIGSEGV ---------------------------------------------------------------------------= ---------------- We can keep the retry logic without having the thread alive and waiting. The only piece of data used by it is the 'from_dst_file' and it is only allowed to proceed after a migrate resume is issued and the semaphore released at migrate_fd_connect(). Move the retry logic to outside the thread by waiting for the thread to finish before pausing the migration. Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas --- migration/migration.c | 60 ++++++++----------------------------------- migration/migration.h | 1 - 2 files changed, 11 insertions(+), 50 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index f6c0250d33..af78f7ee54 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1787,18 +1787,6 @@ static void migrate_handle_rp_req_pages(MigrationSta= te *ms, const char* rbname, } } =20 -/* Return true to retry, false to quit */ -static bool postcopy_pause_return_path_thread(MigrationState *s) -{ - trace_postcopy_pause_return_path(); - - qemu_sem_wait(&s->postcopy_pause_rp_sem); - - trace_postcopy_pause_return_path_continued(); - - return true; -} - static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_na= me) { RAMBlock *block =3D qemu_ram_block_by_name(block_name); @@ -1882,7 +1870,6 @@ static void *source_return_path_thread(void *opaque) trace_source_return_path_thread_entry(); rcu_register_thread(); =20 -retry: while (!ms->rp_state.error && !qemu_file_get_error(rp) && migration_is_setup_or_active(ms->state)) { trace_source_return_path_thread_loop_top(); @@ -2004,26 +1991,7 @@ retry: } =20 out: - res =3D qemu_file_get_error(rp); - if (res) { - if (res && migration_in_postcopy()) { - /* - * Maybe there is something we can do: it looks like a - * network down issue, and we pause for a recovery. - */ - migration_release_dst_files(ms); - rp =3D NULL; - if (postcopy_pause_return_path_thread(ms)) { - /* - * Reload rp, reset the rest. Referencing it is safe since - * it's reset only by us above, or when migration completes - */ - rp =3D ms->rp_state.from_dst_file; - ms->rp_state.error =3D false; - goto retry; - } - } - + if (qemu_file_get_error(rp)) { trace_source_return_path_thread_bad_end(); mark_source_rp_bad(ms); } @@ -2034,8 +2002,7 @@ out: return NULL; } =20 -static int open_return_path_on_source(MigrationState *ms, - bool create_thread) +static int open_return_path_on_source(MigrationState *ms) { ms->rp_state.from_dst_file =3D qemu_file_get_return_path(ms->to_dst_fi= le); if (!ms->rp_state.from_dst_file) { @@ -2044,11 +2011,6 @@ static int open_return_path_on_source(MigrationState= *ms, =20 trace_open_return_path_on_source(); =20 - if (!create_thread) { - /* We're done */ - return 0; - } - qemu_thread_create(&ms->rp_state.rp_thread, "return path", source_return_path_thread, ms, QEMU_THREAD_JOINABLE= ); ms->rp_state.rp_thread_created =3D true; @@ -2088,6 +2050,7 @@ static int await_return_path_close_on_source(Migratio= nState *ms) trace_await_return_path_close_on_source_close(); =20 ret =3D ms->rp_state.error; + ms->rp_state.error =3D false; trace_migration_return_path_end_after(ret); return ret; } @@ -2563,6 +2526,13 @@ static MigThrError postcopy_pause(MigrationState *s) qemu_file_shutdown(file); qemu_fclose(file); =20 + /* + * We're already pausing, so ignore any errors on the return + * path and just wait for the thread to finish. It will be + * re-created when we resume. + */ + await_return_path_close_on_source(s); + migrate_set_state(&s->state, s->state, MIGRATION_STATUS_POSTCOPY_PAUSED); =20 @@ -2580,12 +2550,6 @@ static MigThrError postcopy_pause(MigrationState *s) if (s->state =3D=3D MIGRATION_STATUS_POSTCOPY_RECOVER) { /* Woken up by a recover procedure. Give it a shot */ =20 - /* - * Firstly, let's wake up the return path now, with a new - * return path channel. - */ - qemu_sem_post(&s->postcopy_pause_rp_sem); - /* Do the resume logic */ if (postcopy_do_resume(s) =3D=3D 0) { /* Let's continue! */ @@ -3275,7 +3239,7 @@ void migrate_fd_connect(MigrationState *s, Error *err= or_in) * QEMU uses the return path. */ if (migrate_postcopy_ram() || migrate_return_path()) { - if (open_return_path_on_source(s, !resume)) { + if (open_return_path_on_source(s)) { error_setg(&local_err, "Unable to open return-path for postcop= y"); migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED= ); migrate_set_error(s, local_err); @@ -3339,7 +3303,6 @@ static void migration_instance_finalize(Object *obj) qemu_sem_destroy(&ms->rate_limit_sem); qemu_sem_destroy(&ms->pause_sem); qemu_sem_destroy(&ms->postcopy_pause_sem); - qemu_sem_destroy(&ms->postcopy_pause_rp_sem); qemu_sem_destroy(&ms->rp_state.rp_sem); qemu_sem_destroy(&ms->rp_state.rp_pong_acks); qemu_sem_destroy(&ms->postcopy_qemufile_src_sem); @@ -3359,7 +3322,6 @@ static void migration_instance_init(Object *obj) migrate_params_init(&ms->parameters); =20 qemu_sem_init(&ms->postcopy_pause_sem, 0); - qemu_sem_init(&ms->postcopy_pause_rp_sem, 0); qemu_sem_init(&ms->rp_state.rp_sem, 0); qemu_sem_init(&ms->rp_state.rp_pong_acks, 0); qemu_sem_init(&ms->rate_limit_sem, 0); diff --git a/migration/migration.h b/migration/migration.h index cdaa10d515..972597f4de 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -393,7 +393,6 @@ struct MigrationState { =20 /* Needed by postcopy-pause state */ QemuSemaphore postcopy_pause_sem; - QemuSemaphore postcopy_pause_rp_sem; /* * Whether we abort the migration if decompression errors are * detected at the destination. It is left at false for qemu --=20 2.35.3 From nobody Thu May 16 00:12:38 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=suse.de ARC-Seal: i=1; a=rsa-sha256; t=1695058185; cv=none; d=zohomail.com; s=zohoarc; b=hgnMZ7v5PwsYB1vCi0l/4hubtGtB8ebvFP45SelDtghn3WH9nmDWMcqGbdZqBYSgQe/OKpeiaPzKDoIvxGtVjksl1WMQ/Ub6SCbKSLBcK7fPAotB2Ozs49pvhYSzZpesadD11jq5/4mCldnNYABRqZmCRtMglJuP6v5RdasngNs= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1695058185; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=xAMfDFMZ+ezrCTtyQ8YqIFwJmcIIhXTsp71GC2FgNq8=; b=emSt1BGac9F1FTATpYouFVCrWgO4H4ebO2TGiED8nAlgwm4CKPe1cXIvOQga9chrhQpGQ3B1oHF/8OW2xYAiqi+/WIuXhOWR5BfLH1Fg1GYy9Kygq44qvTsAMNXsaCr8aKTK2zZrHj5c1kkYx106D6UQJZRiqfKYqveZwWFOj/Q= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1695058185762693.694373229404; Mon, 18 Sep 2023 10:29:45 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qiI3E-000727-2o; Mon, 18 Sep 2023 13:28:48 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qiI3B-0006wq-RO for qemu-devel@nongnu.org; Mon, 18 Sep 2023 13:28:45 -0400 Received: from smtp-out2.suse.de ([2001:67c:2178:6::1d]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qiI38-0004U0-U4 for qemu-devel@nongnu.org; Mon, 18 Sep 2023 13:28:45 -0400 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 87F632003E; Mon, 18 Sep 2023 17:28:41 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 0B50C1358A; Mon, 18 Sep 2023 17:28:39 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 6AaPMceICGUoGAAAMHmgww (envelope-from ); Mon, 18 Sep 2023 17:28:39 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1695058121; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xAMfDFMZ+ezrCTtyQ8YqIFwJmcIIhXTsp71GC2FgNq8=; b=h3YzGygrxm2/np0y3trSFrTtzlky3Q8/4vHR50lquUGm7J4U6e0zpJjn+UGsRhSxmkilFm YwlSGPFs/ZGGZf6d/0S36jeq0hWDCZXd6TtZqCtOpWWQ8p1LbO1nM+ZulS2H8MY6PBhmsv 9lDeOjeYsEAqbdsoURhw/NmNf8CjT6A= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1695058121; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xAMfDFMZ+ezrCTtyQ8YqIFwJmcIIhXTsp71GC2FgNq8=; b=y4cTQNhRu/7K7wX0Uq1Z8jfgwH7pCi8ZyaDcgetzNz/f+Y55F1pPp7CciKatlz4HVg3/AY Vf3uvc1DoQbIdOCg== From: Fabiano Rosas To: qemu-devel@nongnu.org Cc: Juan Quintela , Peter Xu , Stefan Hajnoczi , Leonardo Bras Subject: [PATCH 8/8] migration: Move return path cleanup to main migration thread Date: Mon, 18 Sep 2023 14:28:22 -0300 Message-Id: <20230918172822.19052-9-farosas@suse.de> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230918172822.19052-1-farosas@suse.de> References: <20230918172822.19052-1-farosas@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2001:67c:2178:6::1d; envelope-from=farosas@suse.de; helo=smtp-out2.suse.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @suse.de) X-ZM-MESSAGEID: 1695058187283100007 Content-Type: text/plain; charset="utf-8" Now that the return path thread is allowed to finish during a paused migration, we can move the cleanup of the QEMUFiles to the main migration thread. Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas --- migration/migration.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/migration/migration.c b/migration/migration.c index af78f7ee54..e2ed85b5be 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -98,6 +98,7 @@ static int migration_maybe_pause(MigrationState *s, int *current_active_state, int new_state); static void migrate_fd_cancel(MigrationState *s); +static int await_return_path_close_on_source(MigrationState *s); =20 static bool migration_needs_multiple_sockets(void) { @@ -1178,6 +1179,12 @@ static void migrate_fd_cleanup(MigrationState *s) qemu_fclose(tmp); } =20 + /* + * We already cleaned up to_dst_file, so errors from the return + * path might be due to that, ignore them. + */ + await_return_path_close_on_source(s); + assert(!migration_is_active(s)); =20 if (s->state =3D=3D MIGRATION_STATUS_CANCELLING) { @@ -1997,7 +2004,6 @@ out: } =20 trace_source_return_path_thread_end(); - migration_release_dst_files(ms); rcu_unregister_thread(); return NULL; } @@ -2051,6 +2057,9 @@ static int await_return_path_close_on_source(Migratio= nState *ms) =20 ret =3D ms->rp_state.error; ms->rp_state.error =3D false; + + migration_release_dst_files(ms); + trace_migration_return_path_end_after(ret); return ret; } --=20 2.35.3