From nobody Fri May 10 20:15:25 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694557414; cv=none; d=zohomail.com; s=zohoarc; b=ez8I09g9F+RM+TNjCN7TORPZuIBjn8MLVcXx27/oQTkIPtmryp/C28bkDsgkaAzaOqUKWm+UgOW5TnzV43wfRSyGluYyj8VLPRENk/VTn4ng2W7GHPz4MHeb7x+vLZUh/h4ys3HFj0pumZl5Efd0q3jNyFwdDoiGuT+5c0I9cTU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694557414; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=Uvq5IlCtFKtt9HwA7RP+yrEXQL9cf5tHRMuJHvBx/MM=; b=Zd4SVPOc/AhoBTaKgFBXc3otuyrr+VlgvRZKNxOHk4A3da3qAmL+wWJHdi4LRRuWHyTRXETp3yM/8DNC8Lx/zEM/WZ3mJNpKAs/ozfZMs+a5aAvIH0h+izjVHBRXUhRCJlnzf9/LAxjQ7ggq+S70bFx3e7DQMh6r4W1CnNv5EoA= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694557414819752.3271787734872; Tue, 12 Sep 2023 15:23:34 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qgBlc-0007mj-An; Tue, 12 Sep 2023 18:21:56 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgBlZ-0007lj-Sw for qemu-devel@nongnu.org; Tue, 12 Sep 2023 18:21:53 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgBlX-0003Gs-2p for qemu-devel@nongnu.org; Tue, 12 Sep 2023 18:21:53 -0400 Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-596-WdScp9WvMcWHqsoJgTZkQg-1; Tue, 12 Sep 2023 18:21:49 -0400 Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-76f1cc68e65so159926185a.1 for ; Tue, 12 Sep 2023 15:21:49 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id d5-20020a05620a136500b0076f206cf16fsm3494272qkl.89.2023.09.12.15.21.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Sep 2023 15:21:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694557310; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Uvq5IlCtFKtt9HwA7RP+yrEXQL9cf5tHRMuJHvBx/MM=; b=FZSYDuX5vaVFZu36T2I4BE9I33wzrZ3dhDX4pRncMxAmYBt0VeoQPs5g4ITpOWDCEpTQcS DDgBVnp7kf51q2p8+z3Qn/1jgCnUHnCfuZDiozjOw67+clTOjDDATpmbsL3ElDtaCBZZUY nYqf2rknteZo9y2jGN4EtTX34eQT0TE= X-MC-Unique: WdScp9WvMcWHqsoJgTZkQg-1 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694557308; x=1695162108; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Uvq5IlCtFKtt9HwA7RP+yrEXQL9cf5tHRMuJHvBx/MM=; b=VFKsCgiL6XHoxGrXzjvRFoxvlvN5w3u2EZHFyItMRTlS/ZilkfkCNb4p3/888C+Uhv EGFuO8j7nk5Cx4zZXfuDKA13oGiVlMZIFQesmA8iAdtAeVfi0inpzsIzNciDac4xj5rs ES+8zrX0BHl2AkhzjilDR/JydWiJISRaEdJrQh9Hs3dffompD/srv+yHfKh5SYA0p7qh egIZPKgCHh+D2f7l/W7TV/zj/ICm3zuPT8UDUDB8NUK9TlwQ/BWvnMWID+uP5FpVAqt6 KOlGIT2/SC060/JVI8YljfsSMvlfZSsO1W2omotC/tJ7/neSjBtCQLbKgXiQXTKapWG9 zxSA== X-Gm-Message-State: AOJu0YzDVgqhaSUg9h1C55eLQAv0+fnKHVYT4ffEvg1AV7syVo+RT0D6 VCiJyrG5kQkVQ91tOHnKzGeZJhXTqJNZO/C33hZfb7zJ+Ul7VR3gvaKvPy5VmXkEYevZQFuB7ov /qLTefIkSMSboJ7Ucf2IwUZFjXWQar3D9YI37Fm3IVVQKkQ2VCx6yYst3cI3Ir7X6K58gSE29 X-Received: by 2002:a05:6214:f6e:b0:63d:2a0b:3f91 with SMTP id iy14-20020a0562140f6e00b0063d2a0b3f91mr775053qvb.2.1694557308554; Tue, 12 Sep 2023 15:21:48 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGK/W4GlaGonNgUjEHtzuFxH8DytI6COmDicp8canMW0Y03XPT+lJDLDgnLo6hawvGy6VUdxg== X-Received: by 2002:a05:6214:f6e:b0:63d:2a0b:3f91 with SMTP id iy14-20020a0562140f6e00b0063d2a0b3f91mr775039qvb.2.1694557308200; Tue, 12 Sep 2023 15:21:48 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: peterx@redhat.com, Juan Quintela , Fabiano Rosas Subject: [PATCH v2 01/11] migration: Display error in query-migrate irrelevant of status Date: Tue, 12 Sep 2023 18:21:35 -0400 Message-ID: <20230912222145.731099-2-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230912222145.731099-1-peterx@redhat.com> References: <20230912222145.731099-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -5 X-Spam_score: -0.6 X-Spam_bar: / X-Spam_report: (-0.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_SORBS_WEB=1.5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694557415561100001 Content-Type: text/plain; charset="utf-8" Display it as long as being set, irrelevant of FAILED status. E.g., it may also be applicable to PAUSED stage of postcopy, to provide hint on what has gone wrong. The error_mutex seems to be overlooked when referencing the error, add it to be very safe. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=3D2018404 Reviewed-by: Fabiano Rosas Signed-off-by: Peter Xu --- qapi/migration.json | 5 ++--- migration/migration.c | 8 +++++--- 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/qapi/migration.json b/qapi/migration.json index 8843e74b59..c241b6d318 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -230,9 +230,8 @@ # throttled during auto-converge. This is only present when # auto-converge has started throttling guest cpus. (Since 2.7) # -# @error-desc: the human readable error description string, when -# @status is 'failed'. Clients should not attempt to parse the -# error strings. (Since 2.7) +# @error-desc: the human readable error description string. Clients +# should not attempt to parse the error strings. (Since 2.7) # # @postcopy-blocktime: total time when all vCPU were blocked during # postcopy live migration. This is only present when the diff --git a/migration/migration.c b/migration/migration.c index d61e572742..61e91f61af 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1052,9 +1052,6 @@ static void fill_source_migration_info(MigrationInfo = *info) break; case MIGRATION_STATUS_FAILED: info->has_status =3D true; - if (s->error) { - info->error_desc =3D g_strdup(error_get_pretty(s->error)); - } break; case MIGRATION_STATUS_CANCELLED: info->has_status =3D true; @@ -1064,6 +1061,11 @@ static void fill_source_migration_info(MigrationInfo= *info) break; } info->status =3D state; + + QEMU_LOCK_GUARD(&s->error_mutex); + if (s->error) { + info->error_desc =3D g_strdup(error_get_pretty(s->error)); + } } =20 static void fill_destination_migration_info(MigrationInfo *info) --=20 2.41.0 From nobody Fri May 10 20:15:25 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694557390; cv=none; d=zohomail.com; s=zohoarc; b=gjUEZ8hglev7CEydedfGdZ6EyX9mP0XwCbXt5QZqUeiFZUVmxMy4ZNOtBqOh2isKbLTSljnfVx13sWVRlaoK/+AenJJ2yaIOaAke1GZ/ORWAN+a2qOZB4IdlBmte+lJPwAIuDzm7n4szikoX2yU3siizFlryPbADpnY/vf3fisE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694557390; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=3gOUX6Fqs7kwRjqFLtp8oxY2w4IFXK1WS15a0m7KLJc=; b=KjLew4JyxTG+qQr0pZ/LwyDhcStJ3UjjAlRkpr+SHdqWZrsm/K/ngYftNFjS1cWtTzuLINPwQFOkX+8rOeVTbcsK/rrb3imm8wz3rMC+mxI9EUlhpygDhLs1kPC2Gfq2+ww4PY/pEH/Tc67dQElIHJkC4Q/dZEmbh5Rz6lXalPM= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694557390860545.8065368638152; Tue, 12 Sep 2023 15:23:10 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qgBlc-0007oR-UA; Tue, 12 Sep 2023 18:21:56 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgBlb-0007lz-60 for qemu-devel@nongnu.org; Tue, 12 Sep 2023 18:21:55 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgBlY-0003H2-Bx for qemu-devel@nongnu.org; Tue, 12 Sep 2023 18:21:54 -0400 Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-444-ykkTrRAzNCuRCRAcPzZrmA-1; Tue, 12 Sep 2023 18:21:50 -0400 Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-76f025ed860so93067885a.0 for ; Tue, 12 Sep 2023 15:21:50 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id d5-20020a05620a136500b0076f206cf16fsm3494272qkl.89.2023.09.12.15.21.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Sep 2023 15:21:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694557311; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3gOUX6Fqs7kwRjqFLtp8oxY2w4IFXK1WS15a0m7KLJc=; b=RhSBx92k7HwVE9FBLqg9wQOLlEZ6TniiR93UNj+8TqbAiMp2d3hMHS8IKgkFYNF0DT4ujz lynBjE8GC+BuyL5v4AvItUEQ32/5hMNWUc3p/oTj3VGVLplo6YB+RDfpVHBcibaY91wFuX vC7kWa0mJULmCAeVAddUdisDtmBreXs= X-MC-Unique: ykkTrRAzNCuRCRAcPzZrmA-1 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694557309; x=1695162109; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3gOUX6Fqs7kwRjqFLtp8oxY2w4IFXK1WS15a0m7KLJc=; b=b4IP1R5WgBwowguo2MgppyqTdc8qeciED5XjpiRLRSbNUAhIPOo0D2w3uxsQtlTAWf MjLes6LaPQFaCegJsUv0vFjFLqDY7LOwCcaTw0L2yRcfqCtmneCm0xNEfCNCK+GWwv2b QqGKdcUeG2c734RsbyaPXYQXWoP1y92cH16UY8XTy83fLs6tjo3X6Znp6xfblhoVHyhH shVDrl+aMEhAG2A4ea9ieHpCnj/ORYlfyMPubLUrQX9F/RFcYAiIigc9OABVxmOwV0j3 t2wK8PXv5OqN/9vl1WlGcEQvRRNxiV7ZXtfAIObeUV2ITLi5teB3zrlrYEB1bke17hDP brZw== X-Gm-Message-State: AOJu0Yxe9VoP7kFO8DczDhsWM4GFhpbOrxO55bnRRNYS0faMVfd9y4Y0 nN3QOQRzeES6vwP+8A4ErS9AXroZBJA/FzFw0qtAkIWHwgS5uoBhZp7rm68FGS+rLqZDyxJiP7W iOiIoJxBW4ylmpHbe+YOPm4ZivBrbQhCUwx9K8NR5DdK12plFW/7syEEG0ItdhCCZBg4hhKGM X-Received: by 2002:a05:620a:192a:b0:76f:1846:2f6b with SMTP id bj42-20020a05620a192a00b0076f18462f6bmr774229qkb.1.1694557309465; Tue, 12 Sep 2023 15:21:49 -0700 (PDT) X-Google-Smtp-Source: AGHT+IER/cHaKCkXCQ8rNkHTCP+dXnw7bnRTAiqPiFQSmup+2C4nRuYypDmiVytFckFwDAEHOQ28xQ== X-Received: by 2002:a05:620a:192a:b0:76f:1846:2f6b with SMTP id bj42-20020a05620a192a00b0076f18462f6bmr774212qkb.1.1694557309147; Tue, 12 Sep 2023 15:21:49 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: peterx@redhat.com, Juan Quintela , Fabiano Rosas Subject: [PATCH v2 02/11] migration: Let migrate_set_error() take ownership Date: Tue, 12 Sep 2023 18:21:36 -0400 Message-ID: <20230912222145.731099-3-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230912222145.731099-1-peterx@redhat.com> References: <20230912222145.731099-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: 0 X-Spam_score: -0.0 X-Spam_bar: / X-Spam_report: (-0.0 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_SORBS_WEB=1.5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URG_BIZ=0.573 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694557393113100003 Content-Type: text/plain; charset="utf-8" migrate_set_error() used one error_copy() so it always copy an error. However that's not the major use case - the major use case is one would like to pass the error to migrate_set_error() without further touching the error. It can be proved if we see most of the callers are freeing the error explicitly right afterwards. There're a few outliers (only if when the caller) where we can use error_copy() explicitly there. Drop three call sites where we called migrate_set_error() then following a error_report_err(): otherwise we need to do error_copy() for them. Since we already have them stored in MigrationState.error, the error report can be slightly duplicated. Signed-off-by: Peter Xu Reviewed-by: Fabiano Rosas --- migration/migration.h | 4 ++-- migration/channel.c | 1 - migration/migration.c | 25 ++++++++++++++++--------- migration/multifd.c | 10 ++++------ migration/postcopy-ram.c | 1 - migration/ram.c | 1 - 6 files changed, 22 insertions(+), 20 deletions(-) diff --git a/migration/migration.h b/migration/migration.h index c390500604..1eefa563c4 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -465,7 +465,7 @@ bool migration_has_all_channels(void); =20 uint64_t migrate_max_downtime(void); =20 -void migrate_set_error(MigrationState *s, const Error *error); +void migrate_set_error(MigrationState *s, Error *error); =20 void migrate_fd_connect(MigrationState *s, Error *error_in); =20 @@ -510,7 +510,7 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, vo= id *opaque); void migration_make_urgent_request(void); void migration_consume_urgent_request(void); bool migration_rate_limit(void); -void migration_cancel(const Error *error); +void migration_cancel(Error *error); =20 void migration_populate_vfio_info(MigrationInfo *info); void migration_reset_vfio_bytes_transferred(void); diff --git a/migration/channel.c b/migration/channel.c index ca3319a309..48b3f6abd6 100644 --- a/migration/channel.c +++ b/migration/channel.c @@ -90,7 +90,6 @@ void migration_channel_connect(MigrationState *s, } } migrate_fd_connect(s, error); - error_free(error); } =20 =20 diff --git a/migration/migration.c b/migration/migration.c index 61e91f61af..4b4dba5b12 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -162,7 +162,7 @@ void migration_object_init(void) dirty_bitmap_mig_init(); } =20 -void migration_cancel(const Error *error) +void migration_cancel(Error *error) { if (error) { migrate_set_error(current_migration, error); @@ -1218,11 +1218,22 @@ static void migrate_fd_cleanup_bh(void *opaque) object_unref(OBJECT(s)); } =20 -void migrate_set_error(MigrationState *s, const Error *error) +/* + * Set error for current migration state. The `error' ownership will be + * moved from the caller to MigrationState, so the caller doesn't need to + * free the error. + * + * If the caller still needs to reference the `error' passed in, one should + * use error_copy() explicitly. + */ +void migrate_set_error(MigrationState *s, Error *error) { QEMU_LOCK_GUARD(&s->error_mutex); if (!s->error) { - s->error =3D error_copy(error); + /* Record the first error triggered */ + s->error =3D error; + } else { + error_free(error); } } =20 @@ -1235,7 +1246,7 @@ static void migrate_error_free(MigrationState *s) } } =20 -static void migrate_fd_error(MigrationState *s, const Error *error) +static void migrate_fd_error(MigrationState *s, Error *error) { trace_migrate_fd_error(error_get_pretty(error)); assert(s->to_dst_file =3D=3D NULL); @@ -1714,7 +1725,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool = blk, if (!resume_requested) { yank_unregister_instance(MIGRATION_YANK_INSTANCE); } - migrate_fd_error(s, local_err); + migrate_fd_error(s, error_copy(local_err)); error_propagate(errp, local_err); return; } @@ -2637,7 +2648,6 @@ static MigThrError migration_detect_error(MigrationSt= ate *s) =20 if (local_error) { migrate_set_error(s, local_error); - error_free(local_error); } =20 if (state =3D=3D MIGRATION_STATUS_POSTCOPY_ACTIVE && ret) { @@ -2789,7 +2799,6 @@ static MigIterateState migration_iteration_run(Migrat= ionState *s) qatomic_read(&s->start_postcopy)) { if (postcopy_start(s, &local_err)) { migrate_set_error(s, local_err); - error_report_err(local_err); } return MIG_ITERATE_SKIP; } @@ -3283,7 +3292,6 @@ void migrate_fd_connect(MigrationState *s, Error *err= or_in) error_setg(&local_err, "Unable to open return-path for postcop= y"); migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED= ); migrate_set_error(s, local_err); - error_report_err(local_err); migrate_fd_cleanup(s); return; } @@ -3308,7 +3316,6 @@ void migrate_fd_connect(MigrationState *s, Error *err= or_in) =20 if (multifd_save_setup(&local_err) !=3D 0) { migrate_set_error(s, local_err); - error_report_err(local_err); migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_FAILED); migrate_fd_cleanup(s); diff --git a/migration/multifd.c b/migration/multifd.c index 0f6b203877..69d56104fb 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -551,7 +551,6 @@ void multifd_save_cleanup(void) multifd_send_state->ops->send_cleanup(p, &local_err); if (local_err) { migrate_set_error(migrate_get_current(), local_err); - error_free(local_err); } } qemu_sem_destroy(&multifd_send_state->channels_ready); @@ -750,7 +749,6 @@ out: if (local_err) { trace_multifd_send_error(p->id); multifd_send_terminate_threads(local_err); - error_free(local_err); } =20 /* @@ -883,7 +881,6 @@ static void multifd_new_send_channel_cleanup(MultiFDSen= dParams *p, */ p->quit =3D true; object_unref(OBJECT(ioc)); - error_free(err); } =20 static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque) @@ -1148,7 +1145,6 @@ static void *multifd_recv_thread(void *opaque) =20 if (local_err) { multifd_recv_terminate_threads(local_err); - error_free(local_err); } qemu_mutex_lock(&p->mutex); p->running =3D false; @@ -1240,7 +1236,8 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error = **errp) =20 id =3D multifd_recv_initial_packet(ioc, &local_err); if (id < 0) { - multifd_recv_terminate_threads(local_err); + /* Copy local error because we'll also return it to caller */ + multifd_recv_terminate_threads(error_copy(local_err)); error_propagate_prepend(errp, local_err, "failed to receive packet" " via multifd channel %d: ", @@ -1253,7 +1250,8 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error = **errp) if (p->c !=3D NULL) { error_setg(&local_err, "multifd: received id '%d' already setup'", id); - multifd_recv_terminate_threads(local_err); + /* Copy local error because we'll also return it to caller */ + multifd_recv_terminate_threads(error_copy(local_err)); error_propagate(errp, local_err); return; } diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index 29aea9456d..8a93b5504d 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -1594,7 +1594,6 @@ postcopy_preempt_send_channel_done(MigrationState *s, { if (local_err) { migrate_set_error(s, local_err); - error_free(local_err); } else { migration_ioc_register_yank(ioc); s->postcopy_qemufile_src =3D qemu_file_new_output(ioc); diff --git a/migration/ram.c b/migration/ram.c index 9040d66e61..fc7fe0e6e8 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4308,7 +4308,6 @@ static void ram_mig_ram_block_resized(RAMBlockNotifie= r *n, void *host, */ error_setg(&err, "RAM block '%s' resized during precopy.", rb->ids= tr); migration_cancel(err); - error_free(err); } =20 switch (ps) { --=20 2.41.0 From nobody Fri May 10 20:15:25 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694557404; cv=none; d=zohomail.com; s=zohoarc; b=BDWNtlsRHqDsco5Tetd1k7dMFzjO8osPWYx2OaBNr85GbuOW1StthZ7UUf6Ij3K74WdFrHmmOKykNJjzn8YC2G74f+bYsVoNfKc5P2AI/DkbWBxRhol11op9iJwcGY4VlRd8tfENhhMLR300MHJXbFy/2IHCNiMx2OtYEVJzzmM= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694557404; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=QyGgPIxjfkzAdNW/5xLbFjko/zMdiooGCeQtOGm9yC0=; b=GPjfU45moBU/hibRMDeSQ3hIllvJQy49frfe7zYUSEewCZCU8RbAOm7LVER6tH0SKn4Nmn3yezZ0wKDJJoISn4qXqueO2Tk2XM334yjaq13HMP0SJHRm/NV+rSn7JCZwuatdIq3f4KRDP02iuvsZk8pqYZEU/y5mdHDSAKNpS00= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694557404931906.9653679035794; Tue, 12 Sep 2023 15:23:24 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qgBli-0007qU-6W; Tue, 12 Sep 2023 18:22:02 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgBlb-0007ly-61 for qemu-devel@nongnu.org; Tue, 12 Sep 2023 18:21:55 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgBlY-0003H6-Lo for qemu-devel@nongnu.org; Tue, 12 Sep 2023 18:21:54 -0400 Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-1-EbE1VlZxMBue9lq-eIAFVg-1; Tue, 12 Sep 2023 18:21:50 -0400 Received: by mail-qk1-f200.google.com with SMTP id af79cd13be357-76efdcb7be4so131975385a.1 for ; Tue, 12 Sep 2023 15:21:50 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id d5-20020a05620a136500b0076f206cf16fsm3494272qkl.89.2023.09.12.15.21.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Sep 2023 15:21:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694557312; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QyGgPIxjfkzAdNW/5xLbFjko/zMdiooGCeQtOGm9yC0=; b=EqJDx3uSIAwHa24XP1GB9Rl3UPGP9uITuLWo69+zBb70K+UqGl+N5dMKRbDJNhwY1iFiAR Uv0lEaD4ClOkC+xF22Qc79/ZgKwnmUsT8ad/EXbc9jN5mVwD1o6Gu+6qNL1fqFGmAYZnhy stDAVrm99y4jpChZChxDZtTE72pQxwg= X-MC-Unique: EbE1VlZxMBue9lq-eIAFVg-1 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694557310; x=1695162110; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QyGgPIxjfkzAdNW/5xLbFjko/zMdiooGCeQtOGm9yC0=; b=g3RzLayqRAzXIEHr7EhMOnNfWdEy/qntOe0tFsgtHhX3Q0AfUaZ7zDQ84mN4uGI+vK HbnnCHj/CcudCdXwiz4AnGSMbELaPjTKjCV9V3vjmZ36M8c/bHOZPVUA5gXCz5H+g1S3 HxVOozzHlZaoHnQPQrtx3E4PdRoqSNonk6/TFmHStwYy/z6Dd+Ai/hvUlYUNM7eCQRU9 mFTlbO9BSBQnhmjuqr3QGN4UtzlzAvG1naumMact2bt24yoqc4KTY8rUa2gKoNocoWgq ybEhpIRdfdWjZi+Nzx+JJ6fWB9Z50ZA23lCmnFsR4vrBf1jxIaOP/beGZSmg0KTJA+RK cDTQ== X-Gm-Message-State: AOJu0YwdfgvPONntQSTIxxPoU1/do2FAxNm7BkofE6ZBLJQONGL9Cqkq Ro6zmAbmERwWAMAo2//x1izhFzcIcz6KR38VAuhcR8gVv3AbCNJvun+TzgOn2ZHbzvq5SH4GqnJ 7i0laTCKpnBKk0gGekwTesyCurvjED0Ryu8QD1FrsFzHHTDAlEuY9CSgp4KblU1Qlryy7ZOIw X-Received: by 2002:a05:620a:2485:b0:76d:1339:e871 with SMTP id i5-20020a05620a248500b0076d1339e871mr715071qkn.5.1694557310181; Tue, 12 Sep 2023 15:21:50 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEZYOFu/o3JXPn03bKiUebImVzjKfmCFAwBucaTjkz2/r66njjy8hqNa/QBjxX194kE1xb3kA== X-Received: by 2002:a05:620a:2485:b0:76d:1339:e871 with SMTP id i5-20020a05620a248500b0076d1339e871mr715051qkn.5.1694557309828; Tue, 12 Sep 2023 15:21:49 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: peterx@redhat.com, Juan Quintela , Fabiano Rosas Subject: [PATCH v2 03/11] migration: Introduce migrate_has_error() Date: Tue, 12 Sep 2023 18:21:37 -0400 Message-ID: <20230912222145.731099-4-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230912222145.731099-1-peterx@redhat.com> References: <20230912222145.731099-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694557405116100001 Content-Type: text/plain; charset="utf-8" Introduce a helper to detect whether MigrationState.error is set for whatever reason. It is intended to not taking the error_mutex here because neither do we reference the pointer, nor do we modify the pointer. State why it's safe to do so. This is preparation work for any thread (e.g. source return path thread) to setup errors in an unified way to MigrationState, rather than relying on its own way to set errors (mark_source_rp_bad()). Reviewed-by: Fabiano Rosas Signed-off-by: Peter Xu --- migration/migration.h | 1 + migration/migration.c | 7 +++++++ 2 files changed, 8 insertions(+) diff --git a/migration/migration.h b/migration/migration.h index 1eefa563c4..b50e97a098 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -466,6 +466,7 @@ bool migration_has_all_channels(void); uint64_t migrate_max_downtime(void); =20 void migrate_set_error(MigrationState *s, Error *error); +bool migrate_has_error(MigrationState *s); =20 void migrate_fd_connect(MigrationState *s, Error *error_in); =20 diff --git a/migration/migration.c b/migration/migration.c index 4b4dba5b12..7bd056a4b5 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1237,6 +1237,13 @@ void migrate_set_error(MigrationState *s, Error *err= or) } } =20 +bool migrate_has_error(MigrationState *s) +{ + /* The lock is not helpful here, but still follow the rule */ + QEMU_LOCK_GUARD(&s->error_mutex); + return qatomic_read(&s->error); +} + static void migrate_error_free(MigrationState *s) { QEMU_LOCK_GUARD(&s->error_mutex); --=20 2.41.0 From nobody Fri May 10 20:15:25 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694557433; cv=none; d=zohomail.com; s=zohoarc; b=eFnYuCYJIwkKWzLe+gNvfPMVD5ISrIzwo6n2PSE3uFlx9o/TNsFi+Gf0Mku63m3YZt3DlMkQGNjoVpeZsdIbfxHSHL9ymfq93wk7L7WVyym8kcWOnL48C9REKbCihrOmnLgfuI9FgQsF7s9AemM4uIApx4n2zFEUWV7UGmrFgmM= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694557433; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=kzRRhjngRmSGWR9ui7xiU8pYOZhdCeFBF/QqAeCbwnk=; b=KKmRuSvSp6Xlo+e3Bo2m6k8HywElvYI+KobP3fhemLO9V9gxwfCPwYWDNuxLgCpm8ml6YapyQgb8UXio7XrGyxRN9PlDk5podw7/WyoyHIbvdnq8qfA7r0A3Q3+LbJXyTXE4RgDmJtrN3aflOKmwfwQaEjbcKU1RF+8QsuQ2YX0= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694557433319853.665009459874; Tue, 12 Sep 2023 15:23:53 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qgBlf-0007ph-Rq; Tue, 12 Sep 2023 18:21:59 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgBlc-0007nz-O0 for qemu-devel@nongnu.org; Tue, 12 Sep 2023 18:21:56 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgBlZ-0003HG-Mc for qemu-devel@nongnu.org; Tue, 12 Sep 2023 18:21:56 -0400 Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-606-cUEQxnMkMkiXMktxY8B-lw-1; Tue, 12 Sep 2023 18:21:51 -0400 Received: by mail-qk1-f200.google.com with SMTP id af79cd13be357-76efdcb7be4so131975585a.1 for ; Tue, 12 Sep 2023 15:21:51 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id d5-20020a05620a136500b0076f206cf16fsm3494272qkl.89.2023.09.12.15.21.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Sep 2023 15:21:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694557313; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kzRRhjngRmSGWR9ui7xiU8pYOZhdCeFBF/QqAeCbwnk=; b=D6Bk5p2QHLlLeCXGH5wZgM5TQUbW6BloicBSwSP1nxNpJQGNiovI9qxDJ2wj6SMkJqpH/0 NnnZc+RNEb0wODaVRojMd5mncTjAhOlh8ZEIabfRpfD8AUXtGSfyNHfTbOnz69FVMjc6qZ K8ar5aIa56TTJXDUNL21ZzHzGLhPGc4= X-MC-Unique: cUEQxnMkMkiXMktxY8B-lw-1 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694557311; x=1695162111; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kzRRhjngRmSGWR9ui7xiU8pYOZhdCeFBF/QqAeCbwnk=; b=HLYnngYgoJYvZBvo/kINQUe5WQBzsg124B4dYFCv+ihzxMkLSkl0lXYoIc2fUVbTtV 3Mnsu8JWAgac+nq6kb9noKOQyJ4hKIoY57gcGrX5s/oSw2DXoMK0U9kvJPfJCHr/Tpvp I5GjXjpNVvZV23NIFCk387EHuCk2JrryTnamHosU2H77u0xFTEJ1189X5nppHI0BM88D aYNnKMnk8P7Cl9l+8D6K0Pil3Gbl1W/Nc+4/ISoUtqbzD52ixxWxjk9m1QUAx8YWrY35 AcynTSaVCrDct45Z0VqU93PmGaAW2pjWNAG7abHTqDtqQpmIpq1ivTaH2sYGFIdG8K29 ZGZg== X-Gm-Message-State: AOJu0Ywd/HmPsk0R9cxJFn6by6ZbykVFFtsMrYr7t+j7PGwGS2HXJuag G0OKeAyLoFxUyMQjxYmW2xjhpQs9d8U1ZIWDsIlVxpDs0vmXMnF8wYYe5718Ng2plvLZn4HszP0 h4K3uuoxOD6xJxmx/5ISNuBmGsJwxc+WgSeJDjLI8Qr6BZXvjhClfQiJm7VQQKiWvGLIZnB5L X-Received: by 2002:a05:620a:2485:b0:76d:1339:e871 with SMTP id i5-20020a05620a248500b0076d1339e871mr715102qkn.5.1694557311004; Tue, 12 Sep 2023 15:21:51 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHCJEAxDSzPMwsK+8i6r5JIVaohOmweb/b71eB42Fc9IrcZsXZCx5Gd/2rL+XDa3VYhwKJHkw== X-Received: by 2002:a05:620a:2485:b0:76d:1339:e871 with SMTP id i5-20020a05620a248500b0076d1339e871mr715080qkn.5.1694557310500; Tue, 12 Sep 2023 15:21:50 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: peterx@redhat.com, Juan Quintela , Fabiano Rosas Subject: [PATCH v2 04/11] migration: Refactor error handling in source return path Date: Tue, 12 Sep 2023 18:21:38 -0400 Message-ID: <20230912222145.731099-5-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230912222145.731099-1-peterx@redhat.com> References: <20230912222145.731099-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694557434089100001 Content-Type: text/plain; charset="utf-8" rp_state.error was a boolean used to show error happened in return path thread. That's not only duplicating error reporting (migrate_set_error), but also not good enough in that we only do error_report() and set it to true, we never can keep a history of the exact error and show it in query-migrate. To make this better, a few things done: - Use error_setg() rather than error_report() across the whole lifecycle of return path thread, keeping the error in an Error*. - Use migrate_set_error() to apply that captured error to the global migration object when error occured in this thread. - With above, no need to have mark_source_rp_bad(), remove it, alongside with rp_state.error itself. Reviewed-by: Fabiano Rosas Signed-off-by: Peter Xu --- migration/migration.h | 1 - migration/ram.h | 5 +- migration/migration.c | 122 +++++++++++++++++++++-------------------- migration/ram.c | 41 +++++++------- migration/trace-events | 2 +- 5 files changed, 89 insertions(+), 82 deletions(-) diff --git a/migration/migration.h b/migration/migration.h index b50e97a098..48322e909e 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -297,7 +297,6 @@ struct MigrationState { /* Protected by qemu_file_lock */ QEMUFile *from_dst_file; QemuThread rp_thread; - bool error; /* * We can also check non-zero of rp_thread, but there's no "offici= al" * way to do this, so this bool makes it slightly more elegant. diff --git a/migration/ram.h b/migration/ram.h index 145c915ca7..14ed666d58 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -51,7 +51,8 @@ uint64_t ram_bytes_total(void); void mig_throttle_counter_reset(void); =20 uint64_t ram_pagesize_summary(void); -int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t = len); +int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t = len, + Error **errp); void ram_postcopy_migrated_memory_release(MigrationState *ms); /* For outgoing discard bitmap */ void ram_postcopy_send_discard_bitmap(MigrationState *ms); @@ -71,7 +72,7 @@ void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_ad= dr); void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t = nr); int64_t ramblock_recv_bitmap_send(QEMUFile *file, const char *block_name); -int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb); +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb, Error **errp); bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start); void postcopy_preempt_shutdown_file(MigrationState *s); void *postcopy_preempt_thread(void *opaque); diff --git a/migration/migration.c b/migration/migration.c index 7bd056a4b5..825d8a71d4 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1431,7 +1431,6 @@ int migrate_init(MigrationState *s, Error **errp) s->to_dst_file =3D NULL; s->state =3D MIGRATION_STATUS_NONE; s->rp_state.from_dst_file =3D NULL; - s->rp_state.error =3D false; s->mbps =3D 0.0; s->pages_per_second =3D 0.0; s->downtime =3D 0; @@ -1754,14 +1753,14 @@ void qmp_migrate_continue(MigrationStatus state, Er= ror **errp) qemu_sem_post(&s->pause_sem); } =20 -/* migration thread support */ -/* - * Something bad happened to the RP stream, mark an error - * The caller shall print or trace something to indicate why - */ -static void mark_source_rp_bad(MigrationState *s) +void migration_rp_wait(MigrationState *s) { - s->rp_state.error =3D true; + qemu_sem_wait(&s->rp_state.rp_sem); +} + +void migration_rp_kick(MigrationState *s) +{ + qemu_sem_post(&s->rp_state.rp_sem); } =20 static struct rp_cmd_args { @@ -1785,7 +1784,7 @@ static struct rp_cmd_args { * and we don't need to send pages that have already been sent. */ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rb= name, - ram_addr_t start, size_t len) + ram_addr_t start, size_t len, Erro= r **errp) { long our_host_ps =3D qemu_real_host_page_size(); =20 @@ -1797,15 +1796,12 @@ static void migrate_handle_rp_req_pages(MigrationSt= ate *ms, const char* rbname, */ if (!QEMU_IS_ALIGNED(start, our_host_ps) || !QEMU_IS_ALIGNED(len, our_host_ps)) { - error_report("%s: Misaligned page request, start: " RAM_ADDR_FMT - " len: %zd", __func__, start, len); - mark_source_rp_bad(ms); + error_setg(errp, "MIG_RP_MSG_REQ_PAGES: Misaligned page request, s= tart:" + RAM_ADDR_FMT " len: %zd", start, len); return; } =20 - if (ram_save_queue_pages(rbname, start, len)) { - mark_source_rp_bad(ms); - } + ram_save_queue_pages(rbname, start, len, errp); } =20 /* Return true to retry, false to quit */ @@ -1820,26 +1816,28 @@ static bool postcopy_pause_return_path_thread(Migra= tionState *s) return true; } =20 -static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_na= me) +static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_na= me, + Error **errp) { RAMBlock *block =3D qemu_ram_block_by_name(block_name); =20 if (!block) { - error_report("%s: invalid block name '%s'", __func__, block_name); + error_setg(errp, "MIG_RP_MSG_RECV_BITMAP has invalid block name '%= s'", + block_name); return -EINVAL; } =20 /* Fetch the received bitmap and refresh the dirty bitmap */ - return ram_dirty_bitmap_reload(s, block); + return ram_dirty_bitmap_reload(s, block, errp); } =20 -static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value) +static int migrate_handle_rp_resume_ack(MigrationState *s, + uint32_t value, Error **errp) { trace_source_return_path_thread_resume_ack(value); =20 if (value !=3D MIGRATION_RESUME_ACK_VALUE) { - error_report("%s: illegal resume_ack value %"PRIu32, - __func__, value); + error_setg(errp, "illegal resume_ack value %"PRIu32, value); return -1; } =20 @@ -1898,49 +1896,47 @@ static void *source_return_path_thread(void *opaque) uint32_t tmp32, sibling_error; ram_addr_t start =3D 0; /* =3D0 to silence warning */ size_t len =3D 0, expected_len; + Error *err =3D NULL; int res; =20 trace_source_return_path_thread_entry(); rcu_register_thread(); =20 retry: - while (!ms->rp_state.error && !qemu_file_get_error(rp) && + while (!migrate_has_error(ms) && !qemu_file_get_error(rp) && migration_is_setup_or_active(ms->state)) { trace_source_return_path_thread_loop_top(); + header_type =3D qemu_get_be16(rp); header_len =3D qemu_get_be16(rp); =20 if (qemu_file_get_error(rp)) { - mark_source_rp_bad(ms); goto out; } =20 if (header_type >=3D MIG_RP_MSG_MAX || header_type =3D=3D MIG_RP_MSG_INVALID) { - error_report("RP: Received invalid message 0x%04x length 0x%04= x", - header_type, header_len); - mark_source_rp_bad(ms); + error_setg(&err, "Received invalid message 0x%04x length 0x%04= x", + header_type, header_len); goto out; } =20 if ((rp_cmd_args[header_type].len !=3D -1 && header_len !=3D rp_cmd_args[header_type].len) || header_len > sizeof(buf)) { - error_report("RP: Received '%s' message (0x%04x) with" - "incorrect length %d expecting %zu", - rp_cmd_args[header_type].name, header_type, heade= r_len, - (size_t)rp_cmd_args[header_type].len); - mark_source_rp_bad(ms); + error_setg(&err, "Received '%s' message (0x%04x) with" + "incorrect length %d expecting %zu", + rp_cmd_args[header_type].name, header_type, header_= len, + (size_t)rp_cmd_args[header_type].len); goto out; } =20 /* We know we've got a valid header by this point */ res =3D qemu_get_buffer(rp, buf, header_len); if (res !=3D header_len) { - error_report("RP: Failed reading data for message 0x%04x" - " read %d expected %d", - header_type, res, header_len); - mark_source_rp_bad(ms); + error_setg(&err, "Failed reading data for message 0x%04x" + " read %d expected %d", + header_type, res, header_len); goto out; } =20 @@ -1950,8 +1946,7 @@ retry: sibling_error =3D ldl_be_p(buf); trace_source_return_path_thread_shut(sibling_error); if (sibling_error) { - error_report("RP: Sibling indicated error %d", sibling_err= or); - mark_source_rp_bad(ms); + error_setg(&err, "Sibling indicated error %d", sibling_err= or); } /* * We'll let the main thread deal with closing the RP @@ -1969,7 +1964,10 @@ retry: case MIG_RP_MSG_REQ_PAGES: start =3D ldq_be_p(buf); len =3D ldl_be_p(buf + 8); - migrate_handle_rp_req_pages(ms, NULL, start, len); + migrate_handle_rp_req_pages(ms, NULL, start, len, &err); + if (err) { + goto out; + } break; =20 case MIG_RP_MSG_REQ_PAGES_ID: @@ -1984,32 +1982,32 @@ retry: expected_len +=3D tmp32; } if (header_len !=3D expected_len) { - error_report("RP: Req_Page_id with length %d expecting %zd= ", - header_len, expected_len); - mark_source_rp_bad(ms); + error_setg(&err, "Req_Page_id with length %d expecting %zd= ", + header_len, expected_len); + goto out; + } + migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len, + &err); + if (err) { goto out; } - migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len); break; =20 case MIG_RP_MSG_RECV_BITMAP: if (header_len < 1) { - error_report("%s: missing block name", __func__); - mark_source_rp_bad(ms); + error_setg(&err, "MIG_RP_MSG_RECV_BITMAP missing block nam= e"); goto out; } /* Format: len (1B) + idstr (<255B). This ends the idstr. */ buf[buf[0] + 1] =3D '\0'; - if (migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1))) { - mark_source_rp_bad(ms); + if (migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1), &err)= ) { goto out; } break; =20 case MIG_RP_MSG_RESUME_ACK: tmp32 =3D ldl_be_p(buf); - if (migrate_handle_rp_resume_ack(ms, tmp32)) { - mark_source_rp_bad(ms); + if (migrate_handle_rp_resume_ack(ms, tmp32, &err)) { goto out; } break; @@ -2025,6 +2023,19 @@ retry: } =20 out: + if (err) { + /* + * Collect any error in return-path thread and report it to the + * migration state object. + */ + migrate_set_error(ms, err); + /* + * We lost ownership to Error*, clear it, prepared to capture the + * next error. + */ + err =3D NULL; + } + res =3D qemu_file_get_error(rp); if (res) { if (res && migration_in_postcopy()) { @@ -2040,13 +2051,11 @@ out: * it's reset only by us above, or when migration completes */ rp =3D ms->rp_state.from_dst_file; - ms->rp_state.error =3D false; goto retry; } } =20 trace_source_return_path_thread_bad_end(); - mark_source_rp_bad(ms); } =20 trace_source_return_path_thread_end(); @@ -2079,8 +2088,7 @@ static int open_return_path_on_source(MigrationState = *ms, return 0; } =20 -/* Returns 0 if the RP was ok, otherwise there was an error on the RP */ -static int await_return_path_close_on_source(MigrationState *ms) +static void await_return_path_close_on_source(MigrationState *ms) { /* * If this is a normal exit then the destination will send a SHUT and = the @@ -2093,13 +2101,11 @@ static int await_return_path_close_on_source(Migrat= ionState *ms) * waiting for the destination. */ qemu_file_shutdown(ms->rp_state.from_dst_file); - mark_source_rp_bad(ms); } trace_await_return_path_close_on_source_joining(); qemu_thread_join(&ms->rp_state.rp_thread); ms->rp_state.rp_thread_created =3D false; trace_await_return_path_close_on_source_close(); - return ms->rp_state.error; } =20 static inline void @@ -2402,11 +2408,11 @@ static void migration_completion(MigrationState *s) * a SHUT command). */ if (s->rp_state.rp_thread_created) { - int rp_error; trace_migration_return_path_end_before(); - rp_error =3D await_return_path_close_on_source(s); - trace_migration_return_path_end_after(rp_error); - if (rp_error) { + await_return_path_close_on_source(s); + trace_migration_return_path_end_after(); + /* If return path has error, should have been set here */ + if (migrate_has_error(s)) { goto fail; } } diff --git a/migration/ram.c b/migration/ram.c index fc7fe0e6e8..814c59c17b 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1963,7 +1963,8 @@ static void migration_page_queue_free(RAMState *rs) * @start: starting address from the start of the RAMBlock * @len: length (in bytes) to send */ -int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t = len) +int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t = len, + Error **errp) { RAMBlock *ramblock; RAMState *rs =3D ram_state; @@ -1980,7 +1981,7 @@ int ram_save_queue_pages(const char *rbname, ram_addr= _t start, ram_addr_t len) * Shouldn't happen, we can't reuse the last RAMBlock if * it's the 1st request. */ - error_report("ram_save_queue_pages no previous block"); + error_setg(errp, "MIG_RP_MSG_REQ_PAGES has no previous block"); return -1; } } else { @@ -1988,16 +1989,17 @@ int ram_save_queue_pages(const char *rbname, ram_ad= dr_t start, ram_addr_t len) =20 if (!ramblock) { /* We shouldn't be asked for a non-existent RAMBlock */ - error_report("ram_save_queue_pages no block '%s'", rbname); + error_setg(errp, "MIG_RP_MSG_REQ_PAGES has no block '%s'", rbn= ame); return -1; } rs->last_req_rb =3D ramblock; } trace_ram_save_queue_pages(ramblock->idstr, start, len); if (!offset_in_ramblock(ramblock, start + len - 1)) { - error_report("%s request overrun start=3D" RAM_ADDR_FMT " len=3D" - RAM_ADDR_FMT " blocklen=3D" RAM_ADDR_FMT, - __func__, start, len, ramblock->used_length); + error_setg(errp, "MIG_RP_MSG_REQ_PAGES request overrun, " + "start=3D" RAM_ADDR_FMT " len=3D" + RAM_ADDR_FMT " blocklen=3D" RAM_ADDR_FMT, + start, len, ramblock->used_length); return -1; } =20 @@ -2029,9 +2031,9 @@ int ram_save_queue_pages(const char *rbname, ram_addr= _t start, ram_addr_t len) assert(len % page_size =3D=3D 0); while (len) { if (ram_save_host_page_urgent(pss)) { - error_report("%s: ram_save_host_page_urgent() failed: " - "ramblock=3D%s, start_addr=3D0x"RAM_ADDR_FMT, - __func__, ramblock->idstr, start); + error_setg(errp, "ram_save_host_page_urgent() failed: " + "ramblock=3D%s, start_addr=3D0x"RAM_ADDR_FMT, + ramblock->idstr, start); ret =3D -1; break; } @@ -4165,7 +4167,7 @@ static void ram_dirty_bitmap_reload_notify(MigrationS= tate *s) * This is only used when the postcopy migration is paused but wants * to resume from a middle point. */ -int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block) +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **er= rp) { int ret =3D -EINVAL; /* from_dst_file is always valid because we're within rp_thread */ @@ -4177,8 +4179,8 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlo= ck *block) trace_ram_dirty_bitmap_reload_begin(block->idstr); =20 if (s->state !=3D MIGRATION_STATUS_POSTCOPY_RECOVER) { - error_report("%s: incorrect state %s", __func__, - MigrationStatus_str(s->state)); + error_setg(errp, "Reload bitmap in incorrect state %s", + MigrationStatus_str(s->state)); return -EINVAL; } =20 @@ -4195,9 +4197,8 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlo= ck *block) =20 /* The size of the bitmap should match with our ramblock */ if (size !=3D local_size) { - error_report("%s: ramblock '%s' bitmap size mismatch " - "(0x%"PRIx64" !=3D 0x%"PRIx64")", __func__, - block->idstr, size, local_size); + error_setg(errp, "ramblock '%s' bitmap size mismatch (0x%"PRIx64 + " !=3D 0x%"PRIx64")", block->idstr, size, local_size); ret =3D -EINVAL; goto out; } @@ -4207,16 +4208,16 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMB= lock *block) =20 ret =3D qemu_file_get_error(file); if (ret || size !=3D local_size) { - error_report("%s: read bitmap failed for ramblock '%s': %d" - " (size 0x%"PRIx64", got: 0x%"PRIx64")", - __func__, block->idstr, ret, local_size, size); + error_setg(errp, "read bitmap failed for ramblock '%s': %d" + " (size 0x%"PRIx64", got: 0x%"PRIx64")", + block->idstr, ret, local_size, size); ret =3D -EIO; goto out; } =20 if (end_mark !=3D RAMBLOCK_RECV_BITMAP_ENDING) { - error_report("%s: ramblock '%s' end mark incorrect: 0x%"PRIx64, - __func__, block->idstr, end_mark); + error_setg(errp, "ramblock '%s' end mark incorrect: 0x%"PRIx64, + block->idstr, end_mark); ret =3D -EINVAL; goto out; } diff --git a/migration/trace-events b/migration/trace-events index 4666f19325..20cd17ffe8 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -164,7 +164,7 @@ migration_completion_postcopy_end_after_complete(void) = "" migration_rate_limit_pre(int ms) "%d ms" migration_rate_limit_post(int urgent) "urgent: %d" migration_return_path_end_before(void) "" -migration_return_path_end_after(int rp_error) "%d" +migration_return_path_end_after(void) "" migration_thread_after_loop(void) "" migration_thread_file_err(void) "" migration_thread_setup_complete(void) "" --=20 2.41.0 From nobody Fri May 10 20:15:25 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694557397; cv=none; d=zohomail.com; s=zohoarc; b=XEDmejDrv8pK0Svc0aoitFirfv7FGPaELxsF2WVKf6bDFwaL0rjKgg8I87AOlxeUyYWTumSHeVKzHGcQn3lA1XHWB/SXEwXPSRvgi7sTinnI5wivp9pOt4HOir6zSCo3m7fHFm0kgs/Sw8IQLbn7BSCAujiD2pnrr14xpagAafA= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694557397; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=YAzT2eHRbTEBEbzmQG0Ajv8gsQX/6uCZfW69b400/b4=; b=fensWx2+JybHAGLJor6cRY4uS17JUC92E+KJ0yP6lmKySs1I4xibFEYr2xCNVwrWi4LWiFfCHxcC9eZorvycR9ONX7Ph2sfrNE3WeW/rIzBvvuppHqeRbVwYhVCUTKZZzXT4AYVTNpCBlOZCcERG+8jTDeNuTmyLxwDlnvRjF6Y= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694557397556235.4230272791791; Tue, 12 Sep 2023 15:23:17 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qgBlf-0007pA-48; Tue, 12 Sep 2023 18:21:59 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgBlc-0007n2-Fq for qemu-devel@nongnu.org; Tue, 12 Sep 2023 18:21:56 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgBla-0003HK-85 for qemu-devel@nongnu.org; Tue, 12 Sep 2023 18:21:56 -0400 Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-195-2hIiWnFpPnKL9z7FToVOxA-1; Tue, 12 Sep 2023 18:21:52 -0400 Received: by mail-qk1-f197.google.com with SMTP id af79cd13be357-76f0d1c4045so92480185a.1 for ; Tue, 12 Sep 2023 15:21:52 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id d5-20020a05620a136500b0076f206cf16fsm3494272qkl.89.2023.09.12.15.21.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Sep 2023 15:21:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694557313; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YAzT2eHRbTEBEbzmQG0Ajv8gsQX/6uCZfW69b400/b4=; b=Bl7EQbLHpgnu2EgWIQgOjZ6ODZQvl4AzquD+Tm9QHBWgpKyaK+VAZYTFB885EJY8D+xwnN ih0oG61anYFjPYwtYQzNiCj3Ga/FDlW1Tw7uJ9fuvzG80hRf55Cl5rwpF3q8bY4l+ZE+t/ 7/NrU+pHxVo1Dc5+Y/9qzsFzgmYpE7k= X-MC-Unique: 2hIiWnFpPnKL9z7FToVOxA-1 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694557311; x=1695162111; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YAzT2eHRbTEBEbzmQG0Ajv8gsQX/6uCZfW69b400/b4=; b=UMT7nmFp6bYOJa5eu5ne06uFS5DCcJDguFxeCjCSIpUJD/BvPpT9DN2VlWvaQkT/4o 1FhKS0quU8PDLxflY9nbZf0uBJMBQLPyuraevpciBSp0hzQxArjFukj1hnGrGKiFoaYy v6leZwLbCJmuxMtZG98Xx5RxW3Q13JeiFEAqQIlZYMtIqaQYMUrINn46nPjypPnpedx4 5222nneCW8tcLwr2uZDvNC9s3inX3V36vHXSqgJ8v6oZgpGLQtQUtrW5YL4j0jS7pRDw Uh5B+L7ypLCorYHhS67ryNHTy+xbN2a0iUF6TQug1289As0YlMRvLEkAhLAdQxNwQdqn jYrw== X-Gm-Message-State: AOJu0YxBTH+nibnTCxKR+AgFHed1gfchI2loWL+CSeX/TLD+XF++iVVi G0A+PTi6YwCLyQ6bTjD5sllN4cjez3K91yrEzGL84sUdjd8Eu/HdzcVxyzaLXjUpY9qqvLWQ+jF 5HRqWV3LB7JzDJLa9IuSN5ccJVpVFMeXeNHB45IpNg1R/XJ/vduODLkqmxd4ae7DyV0tjx+Oo X-Received: by 2002:a05:620a:4590:b0:76d:9f3e:de94 with SMTP id bp16-20020a05620a459000b0076d9f3ede94mr736155qkb.5.1694557311593; Tue, 12 Sep 2023 15:21:51 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGJm4DOpTI2eoqqGKYRSHgD3A5TdGx47o/QaM5dHzAsTKgcV9r628yjhuAGXL5SBh4Xu8j5Ww== X-Received: by 2002:a05:620a:4590:b0:76d:9f3e:de94 with SMTP id bp16-20020a05620a459000b0076d9f3ede94mr736143qkb.5.1694557311205; Tue, 12 Sep 2023 15:21:51 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: peterx@redhat.com, Juan Quintela , Fabiano Rosas Subject: [PATCH v2 05/11] migration: Deliver return path file error to migrate state too Date: Tue, 12 Sep 2023 18:21:39 -0400 Message-ID: <20230912222145.731099-6-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230912222145.731099-1-peterx@redhat.com> References: <20230912222145.731099-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694557398782100001 Content-Type: text/plain; charset="utf-8" We've already did this for most of the return path thread errors, but not yet for the IO errors happened on the return path qemufile. Do that too. Remember to reset "err" always, because the ownership is not us anymore, otherwise we're prone to use-after-free later after recovered. Re-export qemu_file_get_error_obj(). Reviewed-by: Fabiano Rosas Signed-off-by: Peter Xu --- migration/qemu-file.h | 1 + migration/migration.c | 7 +++++++ migration/qemu-file.c | 2 +- 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/migration/qemu-file.h b/migration/qemu-file.h index 47015f5201..bc6edc5c39 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -129,6 +129,7 @@ void qemu_file_skip(QEMUFile *f, int size); void qemu_file_credit_transfer(QEMUFile *f, size_t size); int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp); void qemu_file_set_error_obj(QEMUFile *f, int ret, Error *err); +int qemu_file_get_error_obj(QEMUFile *f, Error **errp); void qemu_file_set_error(QEMUFile *f, int ret); int qemu_file_shutdown(QEMUFile *f); QEMUFile *qemu_file_get_return_path(QEMUFile *f); diff --git a/migration/migration.c b/migration/migration.c index 825d8a71d4..216d0e871f 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2038,6 +2038,13 @@ out: =20 res =3D qemu_file_get_error(rp); if (res) { + /* We have forwarded any error in "err" already, reuse "error" */ + assert(err =3D=3D NULL); + /* Try to deliver this file error to migration state */ + qemu_file_get_error_obj(rp, &err); + migrate_set_error(ms, err); + err =3D NULL; + if (res && migration_in_postcopy()) { /* * Maybe there is something we can do: it looks like a diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 19c33c9985..eea7171192 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -146,7 +146,7 @@ void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHoo= ks *hooks) * is not 0. * */ -static int qemu_file_get_error_obj(QEMUFile *f, Error **errp) +int qemu_file_get_error_obj(QEMUFile *f, Error **errp) { if (errp) { *errp =3D f->last_error_obj ? error_copy(f->last_error_obj) : NULL; --=20 2.41.0 From nobody Fri May 10 20:15:25 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694557460; cv=none; d=zohomail.com; s=zohoarc; b=GbvyVJzZ21b2OMuMd6awXTIpMRo7UjaYff7F/wmAf0pG+/rRqZ+oG5+gCg7y1myN7NEkhbrhtiLF1oCAbCGfjHxrWv4VFiqAy6LFpycW8nR7XhQ/CGVo93l/6TGk4Tcewbm9wyL82qO/D+8Co080jVJBxbQ8JIyt9xG//aQ3mLY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694557460; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=p6uAjHfWVV5l02g/VMaRfP8rGJHV6qapnjAjrMo0sFg=; b=PIr6cidlT3NQSYQffo96Pg/ENrKRybWf43JdXh680x8MXIZWs4UAUXrW+jU3KseY8GIjhrupSqehsCjJQUhqMRoc2otMQHM6u6a/coYba6Q6NRtqkqkGSR1NLM5P+Q07ujT3JKhIqd3LGf+M6m1BydKbd9ew/5laP885jhsLzMI= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694557460512339.6152023237296; Tue, 12 Sep 2023 15:24:20 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qgBlg-0007q5-GK; Tue, 12 Sep 2023 18:22:00 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgBle-0007ou-87 for qemu-devel@nongnu.org; Tue, 12 Sep 2023 18:21:58 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgBlb-0003Hv-SZ for qemu-devel@nongnu.org; Tue, 12 Sep 2023 18:21:57 -0400 Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-581-hUCaLEaUMXaNN3OiOy5lzg-1; Tue, 12 Sep 2023 18:21:53 -0400 Received: by mail-qk1-f200.google.com with SMTP id af79cd13be357-76f025ed860so93069485a.0 for ; Tue, 12 Sep 2023 15:21:53 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id d5-20020a05620a136500b0076f206cf16fsm3494272qkl.89.2023.09.12.15.21.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Sep 2023 15:21:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694557315; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=p6uAjHfWVV5l02g/VMaRfP8rGJHV6qapnjAjrMo0sFg=; b=HzJ0QsCbmelewvr8zsA/kaPtvhIO+1vOPl7cXi9V2XWErE87LzoAocpSc4/gt0iKmv+7pJ 21cMi9hcLcAaeKMKqVm6rgEPDa86icmipbjk4gkSQKgVc72rNIo19r+E+Oj4h4H9+zXb0z FGI09vTurjxFiCBxEeAhuBUm0B6T5Nw= X-MC-Unique: hUCaLEaUMXaNN3OiOy5lzg-1 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694557312; x=1695162112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=p6uAjHfWVV5l02g/VMaRfP8rGJHV6qapnjAjrMo0sFg=; b=YHCQ7jewOn0KGUp7QldxnT0CeSxzBZHwe0fgtknKnQAg2GKIOZsOxn3/HAX4p7sh9w qj803nMR/uWREqD+69ebsHqp8DD8AEEK8ydMOpWf++kM75qI9USHMoLXvTY6o3k8a5hu XZP5IG4nRO4+Jf+yHI8HY1P7cGI8YeK/JJ1/95ScoDO5AbNeddtAYcX8AanD0h6VwsWi 0/0ZSRa6wX5g8UuJLWdpqTWyrlBPiL1K1LJ1zobcD/TKYg0UP/rDhWy+FUXxR9e3iFxd IBDNajDMYs9ZMk8BMy3VQy/VegOoMyX668PoTiEtJUtQymhP/sEC6F8lCJQXzKSQKMGH jVbA== X-Gm-Message-State: AOJu0YwTMkVXvo7J4CQmp7yk5FwlP78+cJvmCbZmJFthD7oEFJg05WzT dpWv5uPtFQQF38wwHzix8PA6yeVlFKXW8xCPuhxQr8WsPEw/oKBc3CxXIB3fmeoGxswnK4THZiA Ok0L3+/JX50eGy7qJZVvE9654nKtARwbQCVH+F9HicGCUhB+zJo/VN0Fdf/h+8Xly5YzwCitQ X-Received: by 2002:a05:620a:3187:b0:76f:1118:9b62 with SMTP id bi7-20020a05620a318700b0076f11189b62mr779156qkb.3.1694557312610; Tue, 12 Sep 2023 15:21:52 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF3rw/aEa5EC4RbH9RAiLwbNKVg42cH/JQvU4hQQxFK9Zcb0X+QnVewbdnr7T0O/wN1yWogbA== X-Received: by 2002:a05:620a:3187:b0:76f:1118:9b62 with SMTP id bi7-20020a05620a318700b0076f11189b62mr779142qkb.3.1694557312245; Tue, 12 Sep 2023 15:21:52 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: peterx@redhat.com, Juan Quintela , Fabiano Rosas Subject: [PATCH v2 06/11] qemufile: Always return a verbose error Date: Tue, 12 Sep 2023 18:21:40 -0400 Message-ID: <20230912222145.731099-7-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230912222145.731099-1-peterx@redhat.com> References: <20230912222145.731099-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694557461030100001 Content-Type: text/plain; charset="utf-8" There're a lot of cases where we only have an errno set in last_error but without a detailed error description. When this happens, try to generate an error contains the errno as a descriptive error. This will be helpful in cases where one relies on the Error*. E.g., migration state only caches Error* in MigrationState.error. With this, we'll display correct error messages in e.g. query-migrate when the error was only set by qemu_file_set_error(). Reviewed-by: Fabiano Rosas Signed-off-by: Peter Xu --- migration/qemu-file.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/migration/qemu-file.c b/migration/qemu-file.c index eea7171192..3e64e900c9 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -142,15 +142,24 @@ void qemu_file_set_hooks(QEMUFile *f, const QEMUFileH= ooks *hooks) * * Return negative error value if there has been an error on previous * operations, return 0 if no error happened. - * Optional, it returns Error* in errp, but it may be NULL even if return = value - * is not 0. * + * If errp is specified, a verbose error message will be copied over. */ int qemu_file_get_error_obj(QEMUFile *f, Error **errp) { + if (!f->last_error) { + return 0; + } + + /* There is an error */ if (errp) { - *errp =3D f->last_error_obj ? error_copy(f->last_error_obj) : NULL; + if (f->last_error_obj) { + *errp =3D error_copy(f->last_error_obj); + } else { + error_setg_errno(errp, -f->last_error, "Channel error"); + } } + return f->last_error; } =20 --=20 2.41.0 From nobody Fri May 10 20:15:25 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694557449; cv=none; d=zohomail.com; s=zohoarc; b=JNH3Y0VB7CH0Fmcf0mfcGM3t5KruJ9MOJSB9h/rC54vfonneeaerYPbePGWj4QUG/EMtVTJuf8Y1NTAciaVo+VW2wiBX2h28P+5E2mIOUf9TUt3e7LBUbcVJGxtgoYHfpoFa4QizIYcpQ15n8X1z11+m4hSHa0n18K8vg6ata3M= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694557449; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=ndMphvI1GfCI57YnKOFypaBZKAB21uqUhgqAbjivef0=; b=cHLUShsBxFIEuox7XlMHHHHuwD5GsUk5Z5MFqjJ5YM/6Jey4Se4wC34gUIZFRlyBLhRQo35lLCChWxoiHyc8ud6/sEQNwUD02wHkDQSRlhgvR4jINhLLt0fW/KX2m9samZuh3cx8VR7Cp3VisYLSHOoQIp1y2oHXlA4QYH/cFJ8= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694557449342294.3790443626806; Tue, 12 Sep 2023 15:24:09 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qgBlg-0007q7-R2; Tue, 12 Sep 2023 18:22:00 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgBle-0007p2-Mr for qemu-devel@nongnu.org; Tue, 12 Sep 2023 18:21:58 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgBlc-0003I3-EU for qemu-devel@nongnu.org; Tue, 12 Sep 2023 18:21:58 -0400 Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-232-gH0npMlGOj6V6A7Or7katQ-1; Tue, 12 Sep 2023 18:21:54 -0400 Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-770544d0501so132078385a.0 for ; Tue, 12 Sep 2023 15:21:54 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id d5-20020a05620a136500b0076f206cf16fsm3494272qkl.89.2023.09.12.15.21.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Sep 2023 15:21:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694557315; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ndMphvI1GfCI57YnKOFypaBZKAB21uqUhgqAbjivef0=; b=h4R8g0jRbvqcnhYc0VmWbmjMVprFWnUN7qhJgrOB6xYikFhzgxKQoX5sdorvWn6ZFvPwl3 2JE/+dq/vOzacby9MFklx/fdglCnSCCX5O2CHmSvnkp3WQUSanS2A0Jx8qNmUEm0LApJOf +y+GOlN1WF4U09T+cpdA83F1gg+UIDo= X-MC-Unique: gH0npMlGOj6V6A7Or7katQ-1 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694557314; x=1695162114; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ndMphvI1GfCI57YnKOFypaBZKAB21uqUhgqAbjivef0=; b=JaN8hfr7A4Ef0P2+PqRBQmAjd4ELeApdmtkBNv4aCXsBW27fSk20kHCeIym41vdPVe HSIs8LY9jjNvviArijpwwBEUp7I2LpYIWBHCGsAGDUWH4znzHgfVvm4zyCeTvo54sM5c 0Kk4Y72SnxMXkBbRDw9E1vq3e4y13T9PzyJXRFEqoFnrLmD6lGmzxLADQj8PDB91qdE7 TyC06Wr74OoUQuJz89Li8sakAUZTB3+bhjXGDM/YIBvAmGNzDpHVD/jText6Nkp/vc3/ cYvMHghs3tPB9GuTUgT8yAJyBA4pQLSqVbsTbMd15/imHpY4JMFs1pbtQCF6Clu0sGal eNXw== X-Gm-Message-State: AOJu0Ywbxvb8FfvKyZhaf67mONpKTlPQXZYzmvzVrHvQmj1eO3dDpHaA OSBMRUkAvOqsL02U29OkqN8vkkaUyapyH2Qhxt6IvD0lsqSXaY78ueysagDECFYh7Xl0yRvRNC/ Wl8+fKmhQt+8f2ifaVQv2Gn1VZ96YENnGqilTezvVZ2d48EKJZ4lxiUo5cGrBpaKxOXQVuQxz X-Received: by 2002:a05:620a:4055:b0:770:fad0:146 with SMTP id i21-20020a05620a405500b00770fad00146mr844422qko.0.1694557313873; Tue, 12 Sep 2023 15:21:53 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHfbqggF6DNxq4S86EuZcwq007fVFf3Mi0Iwp5sVd30Xv4zMME0/F0/Qa/O8lCVIDzkxb9ecg== X-Received: by 2002:a05:620a:4055:b0:770:fad0:146 with SMTP id i21-20020a05620a405500b00770fad00146mr844409qko.0.1694557313506; Tue, 12 Sep 2023 15:21:53 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: peterx@redhat.com, Juan Quintela , Fabiano Rosas Subject: [PATCH v2 07/11] migration: Remember num of ramblocks to sync during recovery Date: Tue, 12 Sep 2023 18:21:41 -0400 Message-ID: <20230912222145.731099-8-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230912222145.731099-1-peterx@redhat.com> References: <20230912222145.731099-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -5 X-Spam_score: -0.6 X-Spam_bar: / X-Spam_report: (-0.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_SORBS_WEB=1.5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694557450464100001 Content-Type: text/plain; charset="utf-8" Instead of only relying on the count of rp_sem, make the counter be part of RAMState so it can be used in both threads to synchronize on the process. rp_sem will be further reused as a way to kick the main thread, e.g., on recovery failures. Reviewed-by: Fabiano Rosas Signed-off-by: Peter Xu --- migration/ram.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 814c59c17b..a9541c60b4 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -394,6 +394,14 @@ struct RAMState { /* Queue of outstanding page requests from the destination */ QemuMutex src_page_req_mutex; QSIMPLEQ_HEAD(, RAMSrcPageRequest) src_page_requests; + + /* + * This is only used when postcopy is in recovery phase, to communicate + * between the migration thread and the return path thread on dirty + * bitmap synchronizations. This field is unused in other stages of + * RAM migration. + */ + unsigned int postcopy_bmap_sync_requested; }; typedef struct RAMState RAMState; =20 @@ -4135,20 +4143,20 @@ static int ram_dirty_bitmap_sync_all(MigrationState= *s, RAMState *rs) { RAMBlock *block; QEMUFile *file =3D s->to_dst_file; - int ramblock_count =3D 0; =20 trace_ram_dirty_bitmap_sync_start(); =20 + qatomic_set(&rs->postcopy_bmap_sync_requested, 0); RAMBLOCK_FOREACH_NOT_IGNORED(block) { qemu_savevm_send_recv_bitmap(file, block->idstr); trace_ram_dirty_bitmap_request(block->idstr); - ramblock_count++; + qatomic_inc(&rs->postcopy_bmap_sync_requested); } =20 trace_ram_dirty_bitmap_sync_wait(); =20 /* Wait until all the ramblocks' dirty bitmap synced */ - while (ramblock_count--) { + while (qatomic_read(&rs->postcopy_bmap_sync_requested)) { qemu_sem_wait(&s->rp_state.rp_sem); } =20 @@ -4175,6 +4183,7 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlo= ck *block, Error **errp) unsigned long *le_bitmap, nbits =3D block->used_length >> TARGET_PAGE_= BITS; uint64_t local_size =3D DIV_ROUND_UP(nbits, 8); uint64_t size, end_mark; + RAMState *rs =3D ram_state; =20 trace_ram_dirty_bitmap_reload_begin(block->idstr); =20 @@ -4240,6 +4249,8 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlo= ck *block, Error **errp) /* We'll recalculate migration_dirty_pages in ram_state_resume_prepare= (). */ trace_ram_dirty_bitmap_reload_complete(block->idstr); =20 + qatomic_dec(&rs->postcopy_bmap_sync_requested); + /* * We succeeded to sync bitmap for current ramblock. If this is * the last one to sync, we need to notify the main send thread. --=20 2.41.0 From nobody Fri May 10 20:15:25 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694557390; cv=none; d=zohomail.com; s=zohoarc; b=lJcent85aroPKbHwfPtV5zypRUMylFqFKeEf8POHuv5aNlPa/m0akLnl1f8xbdy710xZXKkoTbDSWYMFaewfxfjohZ/23ThnL7GrYKLxSRYakPpDvOmujfmmT5D+IJrrrA3l1/M1b3BR8rKlIHBQzmXWJQBs2D4MiAr9YqWdvC4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694557390; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=xxrT383B6jVKIZe07PgDLcTMsZB82/Mqv4QueM0JWp8=; b=kY3o3HiZU8ky1CxDJJGKdldueXukAf6KYrDuhu4MfsrIFfwRvIyN2zp0F8aM88fus1TDCoQjocvQa9KXQIMCP/Hs2MjsarBrkaW9WiDMfheTYvvCNUMS+w/JZZ3oEFik/VaZSE0M5nShhG/bB6O0f0kEk4eiXlAUQczXyu6c2aI= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694557390855370.9054625303687; Tue, 12 Sep 2023 15:23:10 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qgBli-0007qe-Eq; Tue, 12 Sep 2023 18:22:02 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgBlf-0007pr-Tj for qemu-devel@nongnu.org; Tue, 12 Sep 2023 18:21:59 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgBld-0003II-Hx for qemu-devel@nongnu.org; Tue, 12 Sep 2023 18:21:59 -0400 Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-661-vsjlGPAwMUKrA7kosAGhKg-1; Tue, 12 Sep 2023 18:21:55 -0400 Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-655bc5ee855so8773986d6.0 for ; Tue, 12 Sep 2023 15:21:55 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id d5-20020a05620a136500b0076f206cf16fsm3494272qkl.89.2023.09.12.15.21.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Sep 2023 15:21:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694557316; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xxrT383B6jVKIZe07PgDLcTMsZB82/Mqv4QueM0JWp8=; b=EwmI/PAVSBMXgMB+pBYwTtvYU70IxWglWP+J5Xf6qXoxwdi3QdUV0pbEdgFcZedZa63dvO 9zoa+syZyOY5cFRmT5c64RPyMuzen2wtfj29kE4+zSCT+U2H5a68xBLFJfxcFjKUkl3d+7 MYbJeqBKUh/7rU8zr58bse0Be4jWLbk= X-MC-Unique: vsjlGPAwMUKrA7kosAGhKg-1 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694557314; x=1695162114; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xxrT383B6jVKIZe07PgDLcTMsZB82/Mqv4QueM0JWp8=; b=iGjV/LEwVeRfHEq+vZd1OrLEC0A2t0LiXgKQ2DKXdgcbqGYfRs5okHocAuQv7ng7gl JD+Szx9Us/b297gaYOzBxLO45tLi1Bjy8JSKWwVOd6qeCCLDf5sbvTlY42OywtivhAD1 x2ocL9wj0sp0gzWRvCVijFyw/0wDll6rE2t5NCvxV2M6bg4yi2MzS3S5P7XethRaR3CS hMoI8oXTWcnoZ/OQNx05QHMEv53Li/ZZgI3e48CJ6iUqPJHpIq+95qSbz4dZZF30WcBQ PuEcKfcWaKH51mja2v5hAggog9ia/TvdfWAaZ9h4+sZrMlunUWcS9qmD1UBvi7COzlxW DEpw== X-Gm-Message-State: AOJu0YzBoBPQMar5H877P7IaW4MRvWYNenols29oXB4vluYFT5jZsYow 2o373zBrpU78wNUC9BZjcLldznbcVcM+jlf794eJLz1AZtrdaflY+M+jsEfeQtES5Dd4JQuiSa/ 889nC40eWGm9dJYG8zFf9V9UD9/EGFBdZCA4ZHbyZaPUlV4+bwi8BqGoBJPpmD6nMZPnOk3Wa X-Received: by 2002:a05:6214:21ac:b0:655:dd3c:32a1 with SMTP id t12-20020a05621421ac00b00655dd3c32a1mr842222qvc.0.1694557314633; Tue, 12 Sep 2023 15:21:54 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFVidte10PuZmJb1BQQBommDPkj/t+74QxVdJIs3DOt1glDJofejFYdzHnBmS4X6U7Gl6La3w== X-Received: by 2002:a05:6214:21ac:b0:655:dd3c:32a1 with SMTP id t12-20020a05621421ac00b00655dd3c32a1mr842210qvc.0.1694557314239; Tue, 12 Sep 2023 15:21:54 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: peterx@redhat.com, Juan Quintela , Fabiano Rosas Subject: [PATCH v2 08/11] migration: Add migration_rp_wait|kick() Date: Tue, 12 Sep 2023 18:21:42 -0400 Message-ID: <20230912222145.731099-9-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230912222145.731099-1-peterx@redhat.com> References: <20230912222145.731099-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694557393101100002 Content-Type: text/plain; charset="utf-8" It's just a simple wrapper for rp_sem on either wait() or kick(), make it even clearer on how it is used. Prepared to be used even for other things. Reviewed-by: Fabiano Rosas Signed-off-by: Peter Xu --- migration/migration.h | 15 +++++++++++++++ migration/migration.c | 4 ++-- migration/ram.c | 16 +++++++--------- 3 files changed, 24 insertions(+), 11 deletions(-) diff --git a/migration/migration.h b/migration/migration.h index 48322e909e..311334c701 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -304,6 +304,12 @@ struct MigrationState { * be cleared in the rp_thread! */ bool rp_thread_created; + /* + * Used to synchronize between migration main thread and return + * path thread. The migration thread can wait() on this sem, while + * other threads (e.g., return path thread) can kick it using a + * post(). + */ QemuSemaphore rp_sem; /* * We post to this when we got one PONG from dest. So far it's an @@ -516,4 +522,13 @@ void migration_populate_vfio_info(MigrationInfo *info); void migration_reset_vfio_bytes_transferred(void); void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page); =20 +/* Migration thread waiting for return path thread. */ +void migration_rp_wait(MigrationState *s); +/* + * Kick the migration thread waiting for return path messages. NOTE: the + * name can be slightly confusing (when read as "kick the rp thread"), just + * to remember the target is always the migration thread. + */ +void migration_rp_kick(MigrationState *s); + #endif diff --git a/migration/migration.c b/migration/migration.c index 216d0e871f..b958ac8743 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1846,7 +1846,7 @@ static int migrate_handle_rp_resume_ack(MigrationStat= e *s, MIGRATION_STATUS_POSTCOPY_ACTIVE); =20 /* Notify send thread that time to continue send pages */ - qemu_sem_post(&s->rp_state.rp_sem); + migration_rp_kick(s); =20 return 0; } @@ -2514,7 +2514,7 @@ static int postcopy_resume_handshake(MigrationState *= s) qemu_savevm_send_postcopy_resume(s->to_dst_file); =20 while (s->state =3D=3D MIGRATION_STATUS_POSTCOPY_RECOVER) { - qemu_sem_wait(&s->rp_state.rp_sem); + migration_rp_wait(s); } =20 if (s->state =3D=3D MIGRATION_STATUS_POSTCOPY_ACTIVE) { diff --git a/migration/ram.c b/migration/ram.c index a9541c60b4..b5f6d65d84 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4157,7 +4157,7 @@ static int ram_dirty_bitmap_sync_all(MigrationState *= s, RAMState *rs) =20 /* Wait until all the ramblocks' dirty bitmap synced */ while (qatomic_read(&rs->postcopy_bmap_sync_requested)) { - qemu_sem_wait(&s->rp_state.rp_sem); + migration_rp_wait(s); } =20 trace_ram_dirty_bitmap_sync_complete(); @@ -4165,11 +4165,6 @@ static int ram_dirty_bitmap_sync_all(MigrationState = *s, RAMState *rs) return 0; } =20 -static void ram_dirty_bitmap_reload_notify(MigrationState *s) -{ - qemu_sem_post(&s->rp_state.rp_sem); -} - /* * Read the received bitmap, revert it as the initial dirty bitmap. * This is only used when the postcopy migration is paused but wants @@ -4252,10 +4247,13 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMB= lock *block, Error **errp) qatomic_dec(&rs->postcopy_bmap_sync_requested); =20 /* - * We succeeded to sync bitmap for current ramblock. If this is - * the last one to sync, we need to notify the main send thread. + * We succeeded to sync bitmap for current ramblock. Always kick the + * migration thread to check whether all requested bitmaps are + * reloaded. NOTE: it's racy to only kick when requested=3D=3D0, beca= use + * we don't know whether the migration thread may still be increasing + * it. */ - ram_dirty_bitmap_reload_notify(s); + migration_rp_kick(s); =20 ret =3D 0; out: --=20 2.41.0 From nobody Fri May 10 20:15:25 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694557344; cv=none; d=zohomail.com; s=zohoarc; b=C45bRrzl+/rVHnbSeMrSBLmSpi6iST4BTOdw40t6RfcWt/3DQoGfWa2x+W5G3U1uLXGyGm8aeVTNeGZfSofujDt6J1LgFvgbIx6c0y9L2cqNT4UuHCyKWl2USQQQ1d1qsnexLPtqiFT5qGDNBz2vDV5f5ehy3F18QFWDFhjcRtU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694557344; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=IYFGIN8N+od5MDPEzg3oYlHRxXVzPj/Fvb3AIUEKcpE=; b=OghfkSrtMfJXXUcppUHrp6NwbRFh39eYxXc8u6olYLBS4pse1S6flzLO9dAEapK5xHyqU4ZZ2mfTBHSfcR6javZtrw05m/ecoW0H1JF65ET9kAHnpMO/CDr0dyFlZbtOXR3hJ29E/JW6FNLBR49CnyipcdmftGOSqLs/NsucYYw= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694557344871361.3913638007904; Tue, 12 Sep 2023 15:22:24 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qgBlp-0007rw-VR; Tue, 12 Sep 2023 18:22:10 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgBlh-0007qN-A1 for qemu-devel@nongnu.org; Tue, 12 Sep 2023 18:22:01 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgBle-0003IQ-I7 for qemu-devel@nongnu.org; Tue, 12 Sep 2023 18:22:01 -0400 Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-627-zgz1939WMRaxhZgVddpAPw-1; Tue, 12 Sep 2023 18:21:56 -0400 Received: by mail-qk1-f200.google.com with SMTP id af79cd13be357-76efdcb7be4so131976385a.1 for ; Tue, 12 Sep 2023 15:21:56 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id d5-20020a05620a136500b0076f206cf16fsm3494272qkl.89.2023.09.12.15.21.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Sep 2023 15:21:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694557317; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IYFGIN8N+od5MDPEzg3oYlHRxXVzPj/Fvb3AIUEKcpE=; b=aCbxu83LvM3CguQ672ZCMxY6jtLm+eN2xyn2lsxQ4hBhQuqk0FpWXjGUBJNHl3U+pcVPja CZ754jwAzDZDV2vIez2lBDDJ2nr4ChR0UlF3a1a3sZtAXeJ69FxlS4gAmvF99TM1C1gpMI LR/FEyknzA0lH8jEW089W3KhfaVV5+Y= X-MC-Unique: zgz1939WMRaxhZgVddpAPw-1 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694557316; x=1695162116; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IYFGIN8N+od5MDPEzg3oYlHRxXVzPj/Fvb3AIUEKcpE=; b=tFy3gnk1qVKOjrxhqOyMvEeynkDSu+hby0i40tIpkdgqvU5UjjGzPHWwTQBdTaVxeU Jp8QAauxVVxivHf3sayeo1Gsyrv+7JbwgZnsrXyC7wT2pSlY54ptZiR/wJ9vRQUeqAa3 DKKMHul0ZVUlhFaTJcueb6M7Qbb0PStWpGNY0Xxock8RD01TV9ItPIinuZRUWtBJa+eq ddpx4kXoFAeqmaaFzmf+QwwyiZ8JbiSOl1/JeWsNp6S+9CxD3qzA5S3cOKg5Br4lzK83 Pd96tGcj7SbZuHrD8SaXXmU4yQFBnEkmoYTUAbXVwOap2yTg9jKxfRJmbtIWoDHNX68D LGwg== X-Gm-Message-State: AOJu0YwAl1ZvhH5/nq9/uPzRm7x5E92qr6VoKNDdzAl1ZnWRJR+UE0Pq Vsz99sio7TWYnO5PwkoqS8CpV04EpB1VUz47CjnoywgOLd0V9ip5RIy+Yf3pStquPnhNJvdmnUx j+HkQqeijHu7gOCKdhIYJ5K/f29fiA+d1Ki2RUausZ1FpEqiPagApAdIvLdWnIOCkYS4gVXox X-Received: by 2002:a05:620a:1915:b0:76d:9234:1db4 with SMTP id bj21-20020a05620a191500b0076d92341db4mr701684qkb.7.1694557315721; Tue, 12 Sep 2023 15:21:55 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEDtNdGuECTlY2eefHxPdBKvoplc1Zn6CBHclCJSzk4Igdcx2atbMIBXg6KnU9c/woGQi1xzA== X-Received: by 2002:a05:620a:1915:b0:76d:9234:1db4 with SMTP id bj21-20020a05620a191500b0076d92341db4mr701664qkb.7.1694557315377; Tue, 12 Sep 2023 15:21:55 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: peterx@redhat.com, Juan Quintela , Fabiano Rosas , Xiaohui Li Subject: [PATCH v2 09/11] migration: Allow network to fail even during recovery Date: Tue, 12 Sep 2023 18:21:43 -0400 Message-ID: <20230912222145.731099-10-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230912222145.731099-1-peterx@redhat.com> References: <20230912222145.731099-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694557345892100003 Content-Type: text/plain; charset="utf-8" Normally the postcopy recover phase should only exist for a super short period, that's the duration when QEMU is trying to recover from an interrupted postcopy migration, during which handshake will be carried out for continuing the procedure with state changes from PAUSED -> RECOVER -> POSTCOPY_ACTIVE again. Here RECOVER phase should be super small, that happens right after the admin specified a new but working network link for QEMU to reconnect to dest QEMU. However there can still be case where the channel is broken in this small RECOVER window. If it happens, with current code there's no way the src QEMU can got kicked out of RECOVER stage. No way either to retry the recover in another channel when established. This patch allows the RECOVER phase to fail itself too - we're mostly ready, just some small things missing, e.g. properly kick the main migration thread out when sleeping on rp_sem when we found that we're at RECOVER stage. When this happens, it fails the RECOVER itself, and rollback to PAUSED stage. Then the user can retry another round of recovery. To make it even stronger, teach QMP command migrate-pause to explicitly kick src/dst QEMU out when needed, so even if for some reason the migration thread didn't got kicked out already by a failing rethrn-path thread, the admin can also kick it out. This will be an super, super corner case, but still try to cover that. One can try to test this with two proxy channels for migration: (a) socat unix-listen:/tmp/src.sock,reuseaddr,fork tcp:localhost:10000 (b) socat tcp-listen:10000,reuseaddr,fork unix:/tmp/dst.sock So the migration channel will be: (a) (b) src -> /tmp/src.sock -> tcp:10000 -> /tmp/dst.sock -> dst Then to make QEMU hang at RECOVER stage, one can do below: (1) stop the postcopy using QMP command postcopy-pause (2) kill the 2nd proxy (b) (3) try to recover the postcopy using /tmp/src.sock on src (4) src QEMU will go into RECOVER stage but won't be able to continue from there, because the channel is actually broken at (b) Before this patch, step (4) will make src QEMU stuck in RECOVER stage, without a way to kick the QEMU out or continue the postcopy again. After this patch, (4) will quickly fail qemu and bounce back to PAUSED stage. Admin can also kick QEMU from (4) into PAUSED when needed using migrate-pause when needed. After bouncing back to PAUSED stage, one can recover again. Reported-by: Xiaohui Li Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=3D2111332 Signed-off-by: Peter Xu Reviewed-by: Fabiano Rosas --- migration/migration.h | 8 ++++-- migration/migration.c | 62 +++++++++++++++++++++++++++++++++++++++---- migration/ram.c | 4 ++- 3 files changed, 66 insertions(+), 8 deletions(-) diff --git a/migration/migration.h b/migration/migration.h index 311334c701..7e61e2ece7 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -482,6 +482,7 @@ int migrate_init(MigrationState *s, Error **errp); bool migration_is_blocked(Error **errp); /* True if outgoing migration has entered postcopy phase */ bool migration_in_postcopy(void); +bool migration_postcopy_is_alive(int state); MigrationState *migrate_get_current(void); =20 uint64_t ram_get_total_transferred_pages(void); @@ -522,8 +523,11 @@ void migration_populate_vfio_info(MigrationInfo *info); void migration_reset_vfio_bytes_transferred(void); void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page); =20 -/* Migration thread waiting for return path thread. */ -void migration_rp_wait(MigrationState *s); +/* + * Migration thread waiting for return path thread. Return non-zero if an + * error is detected. + */ +int migration_rp_wait(MigrationState *s); /* * Kick the migration thread waiting for return path messages. NOTE: the * name can be slightly confusing (when read as "kick the rp thread"), just diff --git a/migration/migration.c b/migration/migration.c index b958ac8743..97d4b234d2 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1349,6 +1349,17 @@ bool migration_in_postcopy(void) } } =20 +bool migration_postcopy_is_alive(int state) +{ + switch (state) { + case MIGRATION_STATUS_POSTCOPY_ACTIVE: + case MIGRATION_STATUS_POSTCOPY_RECOVER: + return true; + default: + return false; + } +} + bool migration_in_postcopy_after_devices(MigrationState *s) { return migration_in_postcopy() && s->postcopy_after_devices; @@ -1556,18 +1567,31 @@ void qmp_migrate_pause(Error **errp) MigrationIncomingState *mis =3D migration_incoming_get_current(); int ret; =20 - if (ms->state =3D=3D MIGRATION_STATUS_POSTCOPY_ACTIVE) { + if (migration_postcopy_is_alive(ms->state)) { /* Source side, during postcopy */ + Error *error =3D NULL; + + /* Tell the core migration that we're pausing */ + error_setg(&error, "Postcopy migration is paused by the user"); + migrate_set_error(ms, error); + qemu_mutex_lock(&ms->qemu_file_lock); ret =3D qemu_file_shutdown(ms->to_dst_file); qemu_mutex_unlock(&ms->qemu_file_lock); if (ret) { error_setg(errp, "Failed to pause source migration"); } + + /* + * Kick the migration thread out of any waiting windows (on behalf + * of the rp thread). + */ + migration_rp_kick(ms); + return; } =20 - if (mis->state =3D=3D MIGRATION_STATUS_POSTCOPY_ACTIVE) { + if (migration_postcopy_is_alive(mis->state)) { ret =3D qemu_file_shutdown(mis->from_src_file); if (ret) { error_setg(errp, "Failed to pause destination migration"); @@ -1576,7 +1600,7 @@ void qmp_migrate_pause(Error **errp) } =20 error_setg(errp, "migrate-pause is currently only supported " - "during postcopy-active state"); + "during postcopy-active or postcopy-recover state"); } =20 bool migration_is_blocked(Error **errp) @@ -1753,9 +1777,21 @@ void qmp_migrate_continue(MigrationStatus state, Err= or **errp) qemu_sem_post(&s->pause_sem); } =20 -void migration_rp_wait(MigrationState *s) +int migration_rp_wait(MigrationState *s) { + /* If migration has failure already, ignore the wait */ + if (migrate_has_error(s)) { + return -1; + } + qemu_sem_wait(&s->rp_state.rp_sem); + + /* After wait, double check that there's no failure */ + if (migrate_has_error(s)) { + return -1; + } + + return 0; } =20 void migration_rp_kick(MigrationState *s) @@ -1809,6 +1845,20 @@ static bool postcopy_pause_return_path_thread(Migrat= ionState *s) { trace_postcopy_pause_return_path(); =20 + if (s->state =3D=3D MIGRATION_STATUS_POSTCOPY_RECOVER) { + /* + * this will be extremely unlikely: that we got yet another network + * issue during recovering of the 1st network failure.. during this + * period the main migration thread can be waiting on rp_sem for + * this thread to sync with the other side. + * + * When this happens, explicitly kick the migration thread out of + * RECOVER stage and back to PAUSED, so the admin can try + * everything again. + */ + migration_rp_kick(s); + } + qemu_sem_wait(&s->postcopy_pause_rp_sem); =20 trace_postcopy_pause_return_path_continued(); @@ -2514,7 +2564,9 @@ static int postcopy_resume_handshake(MigrationState *= s) qemu_savevm_send_postcopy_resume(s->to_dst_file); =20 while (s->state =3D=3D MIGRATION_STATUS_POSTCOPY_RECOVER) { - migration_rp_wait(s); + if (migration_rp_wait(s)) { + return -1; + } } =20 if (s->state =3D=3D MIGRATION_STATUS_POSTCOPY_ACTIVE) { diff --git a/migration/ram.c b/migration/ram.c index b5f6d65d84..199fd3e117 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4157,7 +4157,9 @@ static int ram_dirty_bitmap_sync_all(MigrationState *= s, RAMState *rs) =20 /* Wait until all the ramblocks' dirty bitmap synced */ while (qatomic_read(&rs->postcopy_bmap_sync_requested)) { - migration_rp_wait(s); + if (migration_rp_wait(s)) { + return -1; + } } =20 trace_ram_dirty_bitmap_sync_complete(); --=20 2.41.0 From nobody Fri May 10 20:15:25 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694557405; cv=none; d=zohomail.com; s=zohoarc; b=araf9S5pxch+NSVN3O411FawiWJXJHa0nFQ3HIlt1aJnSJoRAeKrRGCGNnSCmj+kFNsuuUMtwtXfDNoZQaW5Xd6MULAFTHP4y4lpVzVFFaS9vW4SFSeHCtnKDmNRQZ2iaSvnjLLhIAk1JGWUbSb6eAk/66cBDMw+qA+hgs6i/yQ= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694557405; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=2YRmi5xS8R6Y8O27D1OcsXYIDTkrmU2x5TZce/7eTOk=; b=CFeABbcbUpXksOvr8+j/9d+6BLxx/GkY2BSun24hcWuMG8s/pBzUvxRNrcoBX760DQefOPEC+f6y1yNmBb9hPBPvXRARBLuNDVe3ja5KEKRu9+AHU9aF3sRcUVDTeo3+OBgr0sH6hQEOCBQxc6+pzXeS212/PLaOjAeHiE7PCv8= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694557405556535.4419848398043; Tue, 12 Sep 2023 15:23:25 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qgBlu-0007sJ-Ha; Tue, 12 Sep 2023 18:22:14 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgBlh-0007qR-Vq for qemu-devel@nongnu.org; Tue, 12 Sep 2023 18:22:02 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgBlf-0003Ig-R9 for qemu-devel@nongnu.org; Tue, 12 Sep 2023 18:22:01 -0400 Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-631-obAh2sMvOeeWcLQekQvW8Q-1; Tue, 12 Sep 2023 18:21:56 -0400 Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-655bc5ee855so8774026d6.0 for ; Tue, 12 Sep 2023 15:21:56 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id d5-20020a05620a136500b0076f206cf16fsm3494272qkl.89.2023.09.12.15.21.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Sep 2023 15:21:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694557319; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2YRmi5xS8R6Y8O27D1OcsXYIDTkrmU2x5TZce/7eTOk=; b=VMcpmAvG2aKWEG9eBCCBKYfFFVtEi/n2bUvuquaQrxraYb5Fl5AXtP5A5QCHNlssl4O9fE 5mrSIWK7W2Q5ULmwT6RV/dzFTxufecpncT4wTN2b6Ow/MDRqIeZho5n7iGKxZAslszpa1/ KIaypUQoxwAdUlZM2YagJGzbjbDPMz0= X-MC-Unique: obAh2sMvOeeWcLQekQvW8Q-1 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694557316; x=1695162116; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2YRmi5xS8R6Y8O27D1OcsXYIDTkrmU2x5TZce/7eTOk=; b=T/VIZB5/vfhIHqr2BMWogpIusjwSBTJvHBZB/UOYtPRtbWVccasx81Lu353k7QGPKS 9PJsDhPQ/R3uv8vt5Jfl1RxQ0LeGUHpbJJzYaxRch2Vb+la2S+l9963TNtsUc3htZMEx 63GKApKvfkVgvJclEwes8M7vKVRs3Hbsk4htNTeuC1t7klF6IZ6x737K6jHoIdQ95l9d zzHfg7uALE195lHCLjxG3KwkqtSMZCUSwGR3jhTHcUdYpwg5X/KrQ6D0vqR+d5HvxKkX oU2fKJR1FQx4nXD/Olsb2V/KIvZu63prcZODIxkPNe/G2UNa4KyM4BnczvYQokQeRaXL cQwg== X-Gm-Message-State: AOJu0YxU4d9C6zTF/08qcfhbQJ1x+fDy/2TUi/bO33Xlrlj41/fEft6r c3R3MtkYuSlk1TKxLnqfb5JirC7zhHbo30f5rbsdtfuA/E2GWK1/OKeuCVDYCLvqNkGWAlPYyK0 eB5nrQN0gbFzR+HGmMTjvjxN6zsLXe1gpcX+syLhag+HcAK+DZjOK5E9CFKdrwx/7rjKUSa46 X-Received: by 2002:a05:6214:d0a:b0:655:ebd0:1fc2 with SMTP id 10-20020a0562140d0a00b00655ebd01fc2mr742935qvh.5.1694557316316; Tue, 12 Sep 2023 15:21:56 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFS5jw99WrLfWRjLDQ3J5iwDZ8s/D6NGh+NdAd3zKjO6a69B/oEYBJUvCU4PPWnIlv2hImaBw== X-Received: by 2002:a05:6214:d0a:b0:655:ebd0:1fc2 with SMTP id 10-20020a0562140d0a00b00655ebd01fc2mr742921qvh.5.1694557316014; Tue, 12 Sep 2023 15:21:56 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: peterx@redhat.com, Juan Quintela , Fabiano Rosas Subject: [PATCH v2 10/11] migration: Allow RECOVER->PAUSED convertion for dest qemu Date: Tue, 12 Sep 2023 18:21:44 -0400 Message-ID: <20230912222145.731099-11-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230912222145.731099-1-peterx@redhat.com> References: <20230912222145.731099-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694557407319100007 Content-Type: text/plain; charset="utf-8" There's a bug on dest that if a double fault triggered on dest qemu (a network issue during postcopy-recover), we won't set PAUSED correctly because we assumed we always came from ACTIVE. Fix that by always overwriting the state to PAUSE. We could also check for these two states, but maybe it's an overkill. We did the same on the src QEMU to unconditionally switch to PAUSE anyway. Signed-off-by: Peter Xu Reviewed-by: Fabiano Rosas --- migration/savevm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/migration/savevm.c b/migration/savevm.c index bb3e99194c..422406e0ee 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2723,7 +2723,8 @@ static bool postcopy_pause_incoming(MigrationIncoming= State *mis) qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex); } =20 - migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, + /* Current state can be either ACTIVE or RECOVER */ + migrate_set_state(&mis->state, mis->state, MIGRATION_STATUS_POSTCOPY_PAUSED); =20 /* Notify the fault thread for the invalidated file handle */ --=20 2.41.0 From nobody Fri May 10 20:15:25 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694557394; cv=none; d=zohomail.com; s=zohoarc; b=ELCejr3Xd2fVe/jy1we5wWvcvFz4Ttr7SmktnoA85tbMVqWaeNc3bVV7yUP4OBOivssMc7Fd0CCP1js6+jJ4+1qmEtjo8WyLkclQYlyOq1wUP2OO6wsQS9VJXf1tcagafm1aok8x6bGsaJXzTNiMevn70bOI7Snv1nCZ/1NUbaE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694557394; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=zl3/TH1FYRm4KdD1zzJyV2raVwp2h7PrZ4uuce+spTk=; b=GbXIn08a+fdBbEuHlaJ/R0+m2NfGzfY92S1KTFK59RDp1pfAlBoDbFxIzb4KDUOhhNC5KfrxSJg0E5n/ElhJumb0ou31TozykkxonEz9PkURpgrufYYUUisS+C/tQawOZJiaQReJLEkQvn4gWOJRoYEDtxI0W+JhRdijNYj+qTM= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 169455739426237.7694189767908; Tue, 12 Sep 2023 15:23:14 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qgBlu-0007sv-Pb; Tue, 12 Sep 2023 18:22:14 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgBlk-0007rA-I0 for qemu-devel@nongnu.org; Tue, 12 Sep 2023 18:22:06 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgBlg-0003Ix-BR for qemu-devel@nongnu.org; Tue, 12 Sep 2023 18:22:03 -0400 Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-475-fTttTI7CMoGa8k80zzQC1w-1; Tue, 12 Sep 2023 18:21:58 -0400 Received: by mail-qk1-f198.google.com with SMTP id af79cd13be357-7708c1ae500so77148285a.0 for ; Tue, 12 Sep 2023 15:21:58 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id d5-20020a05620a136500b0076f206cf16fsm3494272qkl.89.2023.09.12.15.21.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Sep 2023 15:21:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694557319; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zl3/TH1FYRm4KdD1zzJyV2raVwp2h7PrZ4uuce+spTk=; b=ZSyEq6mTSETnHXAz+u73sAYKtBRDZ9qYOubg+YXxs++6l1SqV+HgOQUrMhWgX0ypHan+MI K0zr4wuilV6wHNAHdMjavYuAm66mc0unpoR7zL5enMyW5A3kszjQuYtNukv0HUOmUvbxW6 4VJaAObsF/fjLm2XSgt/0NlPqvpXaGc= X-MC-Unique: fTttTI7CMoGa8k80zzQC1w-1 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694557317; x=1695162117; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zl3/TH1FYRm4KdD1zzJyV2raVwp2h7PrZ4uuce+spTk=; b=U2/AG0L5axGmIMVORKmKMz5ZI6Ucwf5WJhCEQATjTPtvE34S0iXNW8w8Px11NoWdhM KM4kLMEsMLcYGEMDX8fW/Ry0j90RSfFvJAw0MXrgaWZL4kGaWz0Ot5vLqH1SKdUGZ1MK N9uJPzazSUzSWcq5lXI6mGoIQkvY58GsHip11F3RlrpSROTkNIIR+9q12Qwq4Sqkw7s1 FUA/U8Df3B4ubjgO7dDj4inecy8j+KYUJBCzXDlzLcLMkjlgUdLXL7j+11osx0cwHmcB hJIgf0MfGOUbtskrH1cdn/5+F57QE0s1aW1O7Ai54E1c/GMi6g4N89l4mpEfw5+3T5Rj koFw== X-Gm-Message-State: AOJu0Yy2WcnlDQwdkt55bKYQDp83Pnlb7g0Q3KGJwWf3YQvFfvx/s4m0 mvl9FIa/Du7l0SqFWKGzgcvDu6nD06j7CbY+adNFqI9zKZ0omdlfxHT/nYzAhuYl8CQc2RyTlgT 3Q7ZbcZwOWUCcYrhHrVmxL+PcGCZOVqv5GamBB/nGNX0ziSj8NgzGX50S8kva8EU7ei552tiE X-Received: by 2002:a05:620a:1912:b0:76f:167a:cc5d with SMTP id bj18-20020a05620a191200b0076f167acc5dmr659375qkb.7.1694557317638; Tue, 12 Sep 2023 15:21:57 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG1L7kKXTwFC4vvmpvIAgo/LYjAfI4+CItZFjg11+YL1mCE49oTU/rRafD7EmSlf/XshILaGA== X-Received: by 2002:a05:620a:1912:b0:76f:167a:cc5d with SMTP id bj18-20020a05620a191200b0076f167acc5dmr659359qkb.7.1694557317115; Tue, 12 Sep 2023 15:21:57 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: peterx@redhat.com, Juan Quintela , Fabiano Rosas Subject: [PATCH v2 11/11] tests/migration-test: Add a test for postcopy hangs during RECOVER Date: Tue, 12 Sep 2023 18:21:45 -0400 Message-ID: <20230912222145.731099-12-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230912222145.731099-1-peterx@redhat.com> References: <20230912222145.731099-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694557394924100009 Content-Type: text/plain; charset="utf-8" From: Fabiano Rosas To do so, create two paired sockets, but make them not providing real data. Feed those fake sockets to src/dst QEMUs for recovery to let them go into RECOVER stage without going out. Test that we can always kick it out and recover again with the right ports. This patch is based on Fabiano's version here: https://lore.kernel.org/r/877cowmdu0.fsf@suse.de Signed-off-by: Fabiano Rosas [peterx: write commit message, remove case 1, fix bugs, and more] Signed-off-by: Peter Xu --- tests/qtest/migration-test.c | 94 ++++++++++++++++++++++++++++++++++++ 1 file changed, 94 insertions(+) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 1b43df5ca7..6105c2da65 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -695,6 +695,7 @@ typedef struct { /* Postcopy specific fields */ void *postcopy_data; bool postcopy_preempt; + bool postcopy_recovery_test_fail; } MigrateCommon; =20 static int test_migrate_start(QTestState **from, QTestState **to, @@ -1357,6 +1358,78 @@ static void test_postcopy_preempt_tls_psk(void) } #endif =20 +static void wait_for_postcopy_status(QTestStatus *one, const char *status) +{ + wait_for_migration_status(from, status, + (const char * []) { "failed", "active", + "completed", NULL }); +} + +static void postcopy_recover_fail(QTestState *from, QTestState *to) +{ + int ret, pair1[2], pair2[2]; + char c; + + /* Create two unrelated socketpairs */ + ret =3D qemu_socketpair(PF_LOCAL, SOCK_STREAM, 0, pair1); + g_assert_cmpint(ret, =3D=3D, 0); + + ret =3D qemu_socketpair(PF_LOCAL, SOCK_STREAM, 0, pair2); + g_assert_cmpint(ret, =3D=3D, 0); + + /* + * Give the guests unpaired ends of the sockets, so they'll all blocked + * at reading. This mimics a wrong channel established. + */ + qtest_qmp_fds_assert_success(from, &pair1[0], 1, + "{ 'execute': 'getfd'," + " 'arguments': { 'fdname': 'fd-mig' }}"); + qtest_qmp_fds_assert_success(to, &pair2[0], 1, + "{ 'execute': 'getfd'," + " 'arguments': { 'fdname': 'fd-mig' }}"); + + /* + * Write the 1st byte as QEMU_VM_COMMAND (0x8) for the dest socket, to + * emulate the 1st byte of a real recovery, but stops from there to + * keep dest QEMU in RECOVER. This is needed so that we can kick off + * the recover process on dest QEMU (by triggering the G_IO_IN event). + * + * NOTE: this trick is not needed on src QEMUs, because src doesn't + * rely on an pre-existing G_IO_IN event, so it will always trigger the + * upcoming recovery anyway even if it can read nothing. + */ +#define QEMU_VM_COMMAND 0x08 + c =3D QEMU_VM_COMMAND; + ret =3D send(pair2[1], &c, 1, 0); + g_assert_cmpint(ret, =3D=3D, 1); + + migrate_recover(to, "fd:fd-mig"); + migrate_qmp(from, "fd:fd-mig", "{'resume': true}"); + + /* + * Make sure both QEMU instances will go into RECOVER stage, then test + * kicking them out using migrate-pause. + */ + wait_for_postcopy_status(from, "postcopy-recover") + wait_for_postcopy_status(to, "postcopy-recover"); + + /* + * This would be issued by the admin upon noticing the hang, we should + * make sure we're able to kick this out. + */ + migrate_pause(from); + wait_for_postcopy_status(from, "postcopy-paused"); + + /* Do the same test on dest */ + migrate_pause(to); + wait_for_postcopy_status(to, "postcopy-paused"); + + close(pair1[0]); + close(pair1[1]); + close(pair2[0]); + close(pair2[1]); +} + static void test_postcopy_recovery_common(MigrateCommon *args) { QTestState *from, *to; @@ -1396,6 +1469,15 @@ static void test_postcopy_recovery_common(MigrateCom= mon *args) (const char * []) { "failed", "active", "completed", NULL }); =20 + if (args->postcopy_recovery_test_fail) { + /* + * Test when a wrong socket specified for recover, and then the + * ability to kick it out, and continue with a correct socket. + */ + postcopy_recover_fail(from, to); + /* continue with a good recovery */ + } + /* * Create a new socket to emulate a new channel that is different * from the broken migration channel; tell the destination to @@ -1435,6 +1517,15 @@ static void test_postcopy_recovery_compress(void) test_postcopy_recovery_common(&args); } =20 +static void test_postcopy_recovery_double_fail(void) +{ + MigrateCommon args =3D { + .postcopy_recovery_test_fail =3D true, + }; + + test_postcopy_recovery_common(&args); +} + #ifdef CONFIG_GNUTLS static void test_postcopy_recovery_tls_psk(void) { @@ -2825,6 +2916,9 @@ int main(int argc, char **argv) qtest_add_func("/migration/postcopy/recovery/compress/plain", test_postcopy_recovery_compress); } + qtest_add_func("/migration/postcopy/recovery/double-failures", + test_postcopy_recovery_double_fail); + } =20 qtest_add_func("/migration/bad_dest", test_baddest); --=20 2.41.0