From nobody Sat May 18 11:46:48 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=suse.de ARC-Seal: i=1; a=rsa-sha256; t=1690550100; cv=none; d=zohomail.com; s=zohoarc; b=AyfqjCMcbg6TgLxfGjE4GFBq0XfEsfXq35XsTBUKAHFl0kl1LyCqo3Xe4wqu4fzHr9G4d+Zp+19bqA8kVU8e0dvPgqSUEJmb91GZUL4lm9oXReffV5WroIVx8xknt8rzZs9q349uSjZBBKHCqHQEZQ/QCAqjchecPUeymal7dvE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1690550100; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=HqUR9NR2BE8TI0OGnkTqmfoEcr2dPERzkp1LvcIZOGc=; b=eaZKop4u8VwekuhZKf+2AiRLq6MJoBZ33LXHYg8V9EBXj799lWq2PuiW5FxBWNRz+0vl3N/1JE/zKDZ4LLgwfaxd4KF4HZPG+Hni4+iBwa/jfqW39rCRVJUt4JymftD/Z16xmGKCq6tM3Pa1gJa7eFFFhk1w5suIhEoYdbt50Ig= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1690550100068728.5736439693745; Fri, 28 Jul 2023 06:15:00 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qPMNw-0008Qp-Ce; Fri, 28 Jul 2023 08:15:56 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qPMNS-0008Kr-L7 for qemu-devel@nongnu.org; Fri, 28 Jul 2023 08:15:45 -0400 Received: from smtp-out1.suse.de ([2001:67c:2178:6::1c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qPMNQ-0001b8-Tw for qemu-devel@nongnu.org; Fri, 28 Jul 2023 08:15:26 -0400 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 2F73621982; Fri, 28 Jul 2023 12:15:22 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D667013276; Fri, 28 Jul 2023 12:15:20 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 0MFuJ1ixw2RBYQAAMHmgww (envelope-from ); Fri, 28 Jul 2023 12:15:20 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1690546522; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HqUR9NR2BE8TI0OGnkTqmfoEcr2dPERzkp1LvcIZOGc=; b=E6aa+qhGjRg3hRuciC424OBJydQon7mC4Od3FeEnT56czTFHsq0sFuPsr8cd7fI8GICdzw Xyoo6qp73WtnzLb7hZ3CVEnAkmIbsfYbRJEZe7DjHI35IlW3ouMvh0FSTdKzFBRxs49dxI u7FAeD4cGkOTPOrdPIRRoDeorpnf2l4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1690546522; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HqUR9NR2BE8TI0OGnkTqmfoEcr2dPERzkp1LvcIZOGc=; b=HRi7SY4brqxE966DRUkrrbMTWD3CLNLtBvdlPy5eejt491nBFBYwMU4SryZQ+TXWiq2JiG ClBU48SHAm8M63CA== From: Fabiano Rosas To: qemu-devel@nongnu.org Cc: Juan Quintela , Peter Xu , Leonardo Bras Subject: [PATCH 1/3] migration: Stop marking RP bad after shutdown Date: Fri, 28 Jul 2023 09:15:14 -0300 Message-Id: <20230728121516.16258-2-farosas@suse.de> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230728121516.16258-1-farosas@suse.de> References: <20230728121516.16258-1-farosas@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2001:67c:2178:6::1c; envelope-from=farosas@suse.de; helo=smtp-out1.suse.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @suse.de) X-ZM-MESSAGEID: 1690550101634100001 Content-Type: text/plain; charset="utf-8" When waiting for the return path (RP) thread to finish, there is really nothing wrong in the RP if the destination end of the migration stops responding, leaving it stuck. Stop returning an error at that point and leave it to other parts of the code to catch. One such part is the very next routine run by migration_completion() which checks 'to_dst_file' for an error and fails the migration. Another is the RP thread itself when the recvmsg() returns an error. With this we stop marking RP bad from outside of the thread and can reuse await_return_path_close_on_source() in the next patches to wait on the thread during a paused migration. Signed-off-by: Fabiano Rosas --- migration/migration.c | 1 - 1 file changed, 1 deletion(-) diff --git a/migration/migration.c b/migration/migration.c index 91bba630a8..051067f8c5 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2049,7 +2049,6 @@ static int await_return_path_close_on_source(Migratio= nState *ms) * waiting for the destination. */ qemu_file_shutdown(ms->rp_state.from_dst_file); - mark_source_rp_bad(ms); } trace_await_return_path_close_on_source_joining(); qemu_thread_join(&ms->rp_state.rp_thread); --=20 2.35.3 From nobody Sat May 18 11:46:48 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=suse.de ARC-Seal: i=1; a=rsa-sha256; t=1690547862; cv=none; d=zohomail.com; s=zohoarc; b=jC5z04vK6Q/sTaHB7S8F6L6S/Q7wf0Cm6WhYqxWQFv+OmkFOSFgXwKap3bOWmXxDz0QLyX3v4l9TUejMBZa7C+mM7DfMlok9PWD9yUpwrztu5ailDVOYaaSGx8xAj3DDI++TkD11f5KnTEOqyDkkvbtIeh2hj4saQiWtIUy2gt4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1690547862; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=x1NX5e+s4TSo2RVnYpLpDHBnlHTIjC1lK3bw9ReJxG8=; b=CNhXpRlUY1K5hch5dzPzyIQtf67P1E8q8davoqq40RcU18UEGfBEq1SSt3FjTksxUa2KSPLVO3qbkfiw4MAF7EbJdtv38p8C9z3ft0jRcxzds0aRUgQwCAjTytydN9JArsjJIqJHGzWSyW7GyTvpBR9bPi6W8vthTp80BnjwjB4= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 169054786235389.5364454686802; Fri, 28 Jul 2023 05:37:42 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qPMNv-0008Q8-D3; Fri, 28 Jul 2023 08:15:55 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qPMNU-0008MR-QV for qemu-devel@nongnu.org; Fri, 28 Jul 2023 08:15:47 -0400 Received: from smtp-out1.suse.de ([2001:67c:2178:6::1c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qPMNR-0001be-3R for qemu-devel@nongnu.org; Fri, 28 Jul 2023 08:15:27 -0400 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id E2B49219DB; Fri, 28 Jul 2023 12:15:23 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 9A66313276; Fri, 28 Jul 2023 12:15:22 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id uPEUGVqxw2RBYQAAMHmgww (envelope-from ); Fri, 28 Jul 2023 12:15:22 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1690546523; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=x1NX5e+s4TSo2RVnYpLpDHBnlHTIjC1lK3bw9ReJxG8=; b=Ugw58yjx+AYT5lTrnfPPjn4JLGJpjCRnmB1sVKC3PbVmPol/2JE0H6lxYIH6v51ehwT/0q rm71sdn1QDHjHLsXUOehcA78tCQLxx4oJTMP0e7aAllM7AbDtiLuUcVGhq7vl80FbkqIkf 84zVmV3W3jAFPXjLV/7TkuWHJO2WgrE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1690546523; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=x1NX5e+s4TSo2RVnYpLpDHBnlHTIjC1lK3bw9ReJxG8=; b=Y1IVzC0LDDoDgJiLAb7E2mtwyAVv7/wEspw63eHXnuC2DUu4WOgFxCeyEFtHgoczwthwys Myql1vxIs+YEDuCw== From: Fabiano Rosas To: qemu-devel@nongnu.org Cc: Juan Quintela , Peter Xu , Leonardo Bras Subject: [PATCH 2/3] migration: Simplify calling of await_return_path_close_on_source Date: Fri, 28 Jul 2023 09:15:15 -0300 Message-Id: <20230728121516.16258-3-farosas@suse.de> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230728121516.16258-1-farosas@suse.de> References: <20230728121516.16258-1-farosas@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2001:67c:2178:6::1c; envelope-from=farosas@suse.de; helo=smtp-out1.suse.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @suse.de) X-ZM-MESSAGEID: 1690547864750100001 Content-Type: text/plain; charset="utf-8" We're about to reuse this function so move the 'rp_thread_created' check into it and remove the redundant tracing and comment. Add a new tracepoint akin to what is already done at migration_completion(). Signed-off-by: Fabiano Rosas --- migration/migration.c | 21 +++++++-------------- migration/trace-events | 3 +-- 2 files changed, 8 insertions(+), 16 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 051067f8c5..d6f4470265 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2038,6 +2038,10 @@ static int open_return_path_on_source(MigrationState= *ms, /* Returns 0 if the RP was ok, otherwise there was an error on the RP */ static int await_return_path_close_on_source(MigrationState *ms) { + if (!ms->rp_state.rp_thread_created) { + return 0; + } + /* * If this is a normal exit then the destination will send a SHUT and = the * rp_thread will exit, however if there's an error we need to cause @@ -2350,20 +2354,9 @@ static void migration_completion(MigrationState *s) goto fail; } =20 - /* - * If rp was opened we must clean up the thread before - * cleaning everything else up (since if there are no failures - * it will wait for the destination to send it's status in - * a SHUT command). - */ - if (s->rp_state.rp_thread_created) { - int rp_error; - trace_migration_return_path_end_before(); - rp_error =3D await_return_path_close_on_source(s); - trace_migration_return_path_end_after(rp_error); - if (rp_error) { - goto fail; - } + if (await_return_path_close_on_source(s)) { + trace_migration_completion_rp_err(); + goto fail; } =20 if (qemu_file_get_error(s->to_dst_file)) { diff --git a/migration/trace-events b/migration/trace-events index 5259c1044b..33a69064ca 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -157,13 +157,12 @@ migrate_pending_estimate(uint64_t size, uint64_t pre,= uint64_t post) "estimate p migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d" migrate_send_rp_recv_bitmap(char *name, int64_t size) "block '%s' size 0x%= "PRIi64 migration_completion_file_err(void) "" +migration_completion_rp_err(void) "" migration_completion_vm_stop(int ret) "ret %d" migration_completion_postcopy_end(void) "" migration_completion_postcopy_end_after_complete(void) "" migration_rate_limit_pre(int ms) "%d ms" migration_rate_limit_post(int urgent) "urgent: %d" -migration_return_path_end_before(void) "" -migration_return_path_end_after(int rp_error) "%d" migration_thread_after_loop(void) "" migration_thread_file_err(void) "" migration_thread_setup_complete(void) "" --=20 2.35.3 From nobody Sat May 18 11:46:48 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=suse.de ARC-Seal: i=1; a=rsa-sha256; t=1690547319; cv=none; d=zohomail.com; s=zohoarc; b=bkete6Qf6X7sKjxHm0OaIVG3t+Gd5PfK+1YJIzJgH0UNhkAunWJBZRv113fdRG7bOkrOvOmwbJHPWmnk1brwiDSpgt87ghWaZd4+uowlqft75he+oNhOUHyr0oCwdbqXAneq+TysvB42OlFoLB7KxAenk0rEbZoL8Djbx9e1PtY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1690547319; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=M48srtVI6IH4NN35AymzmYvCbH8SgvoZVZwP1SX/fRc=; b=EHyVAGbsYlM6zBSBnJc1OIOwug94heGL+aVIJFmC80xdM/IMGxCl+EthdK8yPNmzr14O70/Y4dXl4txlIu4Ca4h3uGgsTZ0Ly7TyMXHeoC5w8MCUa+HvpWkET/UpvUvIH/5xciplKcM3xk0zRnvCujFfuKzyv4VP3MfymOMH/hw= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1690547319744822.349269132755; Fri, 28 Jul 2023 05:28:39 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qPMNt-0008OH-HK; Fri, 28 Jul 2023 08:15:53 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qPMNi-0008N0-2U for qemu-devel@nongnu.org; Fri, 28 Jul 2023 08:15:49 -0400 Received: from smtp-out1.suse.de ([2001:67c:2178:6::1c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qPMNT-0001cH-8j for qemu-devel@nongnu.org; Fri, 28 Jul 2023 08:15:29 -0400 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id A26C3219DF; Fri, 28 Jul 2023 12:15:25 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 5C89513276; Fri, 28 Jul 2023 12:15:24 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id WPXzCVyxw2RBYQAAMHmgww (envelope-from ); Fri, 28 Jul 2023 12:15:24 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1690546525; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=M48srtVI6IH4NN35AymzmYvCbH8SgvoZVZwP1SX/fRc=; b=Hd1KD6hiI65qLPLsWFcqGdODbva/OuczQvSky+9/6p+PncsuZ/OS1x97MVr/XLOVYFo+9c dsdQXszxele79Ubk2+XGlabAkVTaMY+mmDU8AWat41btZLKANaVgfmQlBLkYfODJF7SOp1 Br6cOOcGgqnp+G6y4ky3LMmvcL/VCVI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1690546525; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=M48srtVI6IH4NN35AymzmYvCbH8SgvoZVZwP1SX/fRc=; b=b3dq7Oj/kKRMaJpFHLzskN5a+luWAvFMal8JKXvPjl6dwkdipIb6Vcc/J+PtXNgVDqgywG zquBCjiSwlfcx7Bg== From: Fabiano Rosas To: qemu-devel@nongnu.org Cc: Juan Quintela , Peter Xu , Leonardo Bras Subject: [PATCH 3/3] migration: Replace the return path retry logic Date: Fri, 28 Jul 2023 09:15:16 -0300 Message-Id: <20230728121516.16258-4-farosas@suse.de> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230728121516.16258-1-farosas@suse.de> References: <20230728121516.16258-1-farosas@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2001:67c:2178:6::1c; envelope-from=farosas@suse.de; helo=smtp-out1.suse.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @suse.de) X-ZM-MESSAGEID: 1690547320362100001 Content-Type: text/plain; charset="utf-8" Replace the return path retry logic with finishing and restarting the thread. This fixes a race when resuming the migration that leads to a segfault. Currently when doing postcopy we consider that an IO error on the return path file could be due to a network intermittency. We then keep the thread alive but have it do cleanup of the 'from_dst_file' and wait on the 'postcopy_pause_rp' semaphore. When the user issues a migrate resume, a new return path is opened and the thread is allowed to continue. There's a race condition in the above mechanism. It is possible for the new return path file to be setup *before* the cleanup code in the return path thread has had a chance to run, leading to the *new* file being closed and the pointer set to NULL. When the thread is released after the resume, it tries to dereference 'from_dst_file' and crashes: Thread 7 "return path" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffd1dbf700 (LWP 9611)] 0x00005555560e4893 in qemu_file_get_error_obj (f=3D0x0, errp=3D0x0) at ../m= igration/qemu-file.c:154 154 return f->last_error; (gdb) bt #0 0x00005555560e4893 in qemu_file_get_error_obj (f=3D0x0, errp=3D0x0) at= ../migration/qemu-file.c:154 #1 0x00005555560e4983 in qemu_file_get_error (f=3D0x0) at ../migration/qe= mu-file.c:206 #2 0x0000555555b9a1df in source_return_path_thread (opaque=3D0x555556e060= 00) at ../migration/migration.c:1876 #3 0x000055555602e14f in qemu_thread_start (args=3D0x55555782e780) at ../= util/qemu-thread-posix.c:541 #4 0x00007ffff38d76ea in start_thread (arg=3D0x7fffd1dbf700) at pthread_c= reate.c:477 #5 0x00007ffff35efa6f in clone () at ../sysdeps/unix/sysv/linux/x86_64/cl= one.S:95 Here's the race (important bit is open_return_path happening before migration_release_dst_files): migration | qmp | return path --------------------------+-----------------------------+------------------= --------------- qmp_migrate_pause() shutdown(ms->to_dst_file) f->last_error =3D -EIO migrate_detect_error() postcopy_pause() set_state(PAUSED) wait(postcopy_pause_sem) qmp_migrate(resume) migrate_fd_connect() resume =3D state =3D=3D PAUSED open_return_path <-- TOO SOON! set_state(RECOVER) post(postcopy_pause_sem) (incoming closes to_src_file) res =3D qemu_file_get_error(rp) migration_release_dst_files() ms->rp_state.from_dst_file =3D NULL post(postcopy_pause_rp_sem) postcopy_pause_return_path_thread() wait(postcopy_pause_rp_sem) rp =3D ms->rp_state.from_dst_file goto retry qemu_file_get_error(rp) SIGSEGV ---------------------------------------------------------------------------= ---------------- We can keep the retry logic without having the thread alive and waiting. The only piece of data used by it is the 'from_dst_file' and it is only allowed to proceed after a migrate resume is issued and the semaphore released at migrate_fd_connect(). Move the retry logic to outside the thread by having open_return_path_on_source() wait for the thread to finish before creating a new one with the updated 'from_dst_file'. Signed-off-by: Fabiano Rosas --- migration/migration.c | 72 +++++++++++++++---------------------------- migration/migration.h | 1 - 2 files changed, 25 insertions(+), 48 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index d6f4470265..36cdd7bda8 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -97,6 +97,7 @@ static int migration_maybe_pause(MigrationState *s, int *current_active_state, int new_state); static void migrate_fd_cancel(MigrationState *s); +static int await_return_path_close_on_source(MigrationState *ms); =20 static bool migration_needs_multiple_sockets(void) { @@ -1764,18 +1765,6 @@ static void migrate_handle_rp_req_pages(MigrationSta= te *ms, const char* rbname, } } =20 -/* Return true to retry, false to quit */ -static bool postcopy_pause_return_path_thread(MigrationState *s) -{ - trace_postcopy_pause_return_path(); - - qemu_sem_wait(&s->postcopy_pause_rp_sem); - - trace_postcopy_pause_return_path_continued(); - - return true; -} - static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_na= me) { RAMBlock *block =3D qemu_ram_block_by_name(block_name); @@ -1859,7 +1848,6 @@ static void *source_return_path_thread(void *opaque) trace_source_return_path_thread_entry(); rcu_register_thread(); =20 -retry: while (!ms->rp_state.error && !qemu_file_get_error(rp) && migration_is_setup_or_active(ms->state)) { trace_source_return_path_thread_loop_top(); @@ -1981,28 +1969,18 @@ retry: } =20 out: - res =3D qemu_file_get_error(rp); - if (res) { - if (res && migration_in_postcopy()) { + if (qemu_file_get_error(rp)) { + if (migration_in_postcopy()) { /* - * Maybe there is something we can do: it looks like a - * network down issue, and we pause for a recovery. + * This could be a network issue that would have been + * detected by the main migration thread and caused the + * migration to pause. Do cleanup and finish. */ - migration_release_dst_files(ms); - rp =3D NULL; - if (postcopy_pause_return_path_thread(ms)) { - /* - * Reload rp, reset the rest. Referencing it is safe since - * it's reset only by us above, or when migration completes - */ - rp =3D ms->rp_state.from_dst_file; - ms->rp_state.error =3D false; - goto retry; - } + ms->rp_state.error =3D false; + } else { + trace_source_return_path_thread_bad_end(); + mark_source_rp_bad(ms); } - - trace_source_return_path_thread_bad_end(); - mark_source_rp_bad(ms); } =20 trace_source_return_path_thread_end(); @@ -2012,8 +1990,21 @@ out: } =20 static int open_return_path_on_source(MigrationState *ms, - bool create_thread) + bool resume) { + if (resume) { + assert(ms->state =3D=3D MIGRATION_STATUS_POSTCOPY_PAUSED); + + /* + * We're resuming from a paused postcopy migration. Wait for + * the thread to do its cleanup before re-opening the return + * path. + */ + if (await_return_path_close_on_source(ms)) { + return -1; + } + } + ms->rp_state.from_dst_file =3D qemu_file_get_return_path(ms->to_dst_fi= le); if (!ms->rp_state.from_dst_file) { return -1; @@ -2021,11 +2012,6 @@ static int open_return_path_on_source(MigrationState= *ms, =20 trace_open_return_path_on_source(); =20 - if (!create_thread) { - /* We're done */ - return 0; - } - qemu_thread_create(&ms->rp_state.rp_thread, "return path", source_return_path_thread, ms, QEMU_THREAD_JOINABLE= ); ms->rp_state.rp_thread_created =3D true; @@ -2550,12 +2536,6 @@ static MigThrError postcopy_pause(MigrationState *s) if (s->state =3D=3D MIGRATION_STATUS_POSTCOPY_RECOVER) { /* Woken up by a recover procedure. Give it a shot */ =20 - /* - * Firstly, let's wake up the return path now, with a new - * return path channel. - */ - qemu_sem_post(&s->postcopy_pause_rp_sem); - /* Do the resume logic */ if (postcopy_do_resume(s) =3D=3D 0) { /* Let's continue! */ @@ -3243,7 +3223,7 @@ void migrate_fd_connect(MigrationState *s, Error *err= or_in) * QEMU uses the return path. */ if (migrate_postcopy_ram() || migrate_return_path()) { - if (open_return_path_on_source(s, !resume)) { + if (open_return_path_on_source(s, resume)) { error_report("Unable to open return-path for postcopy"); migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED= ); migrate_fd_cleanup(s); @@ -3304,7 +3284,6 @@ static void migration_instance_finalize(Object *obj) qemu_sem_destroy(&ms->rate_limit_sem); qemu_sem_destroy(&ms->pause_sem); qemu_sem_destroy(&ms->postcopy_pause_sem); - qemu_sem_destroy(&ms->postcopy_pause_rp_sem); qemu_sem_destroy(&ms->rp_state.rp_sem); qemu_sem_destroy(&ms->rp_state.rp_pong_acks); qemu_sem_destroy(&ms->postcopy_qemufile_src_sem); @@ -3324,7 +3303,6 @@ static void migration_instance_init(Object *obj) migrate_params_init(&ms->parameters); =20 qemu_sem_init(&ms->postcopy_pause_sem, 0); - qemu_sem_init(&ms->postcopy_pause_rp_sem, 0); qemu_sem_init(&ms->rp_state.rp_sem, 0); qemu_sem_init(&ms->rp_state.rp_pong_acks, 0); qemu_sem_init(&ms->rate_limit_sem, 0); diff --git a/migration/migration.h b/migration/migration.h index b7c8b67542..e78db5361c 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -382,7 +382,6 @@ struct MigrationState { =20 /* Needed by postcopy-pause state */ QemuSemaphore postcopy_pause_sem; - QemuSemaphore postcopy_pause_rp_sem; /* * Whether we abort the migration if decompression errors are * detected at the destination. It is left at false for qemu --=20 2.35.3