From nobody Thu Apr 3 11:39:07 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1741126562676653.7205320098304; Tue, 4 Mar 2025 14:16:02 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tpaOK-0008UR-HG; Tue, 04 Mar 2025 17:05:32 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tpaOG-0008T1-S4 for qemu-devel@nongnu.org; Tue, 04 Mar 2025 17:05:28 -0500 Received: from vps-ovh.mhejs.net ([145.239.82.108]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tpaOF-0000lB-3X for qemu-devel@nongnu.org; Tue, 04 Mar 2025 17:05:28 -0500 Received: from MUA by vps-ovh.mhejs.net with esmtpsa (TLS1.3) tls TLS_AES_256_GCM_SHA384 (Exim 4.98) (envelope-from ) id 1tpaOB-00000000LXR-0q5z; Tue, 04 Mar 2025 23:05:23 +0100 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= , Eric Blake , Markus Armbruster , =?UTF-8?q?Daniel=20P=20=2E=20Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v6 07/36] migration: postcopy_ram_listen_thread() should take BQL for some calls Date: Tue, 4 Mar 2025 23:03:34 +0100 Message-ID: <24a7412cc151f8b48d74cd170a3bdc1ce8e6c879.1741124640.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=145.239.82.108; envelope-from=mhej@vps-ovh.mhejs.net; helo=vps-ovh.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1741126564508019000 Content-Type: text/plain; charset="utf-8" From: "Maciej S. Szmigiero" All callers to migration_incoming_state_destroy() other than postcopy_ram_listen_thread() do this call with BQL held. Since migration_incoming_state_destroy() ultimately calls "load_cleanup" SaveVMHandlers and it will soon call BQL-sensitive code it makes sense to always call that function under BQL rather than to have it deal with both cases (with BQL and without BQL). Add the necessary bql_lock() and bql_unlock() to postcopy_ram_listen_thread(). qemu_loadvm_state_main() in postcopy_ram_listen_thread() could call "load_state" SaveVMHandlers that are expecting BQL to be held. In principle, the only devices that should be arriving on migration channel serviced by postcopy_ram_listen_thread() are those that are postcopiable and whose load handlers are safe to be called without BQL being held. But nothing currently prevents the source from sending data for "unsafe" devices which would cause trouble there. Add a TODO comment there so it's clear that it would be good to improve handling of such (erroneous) case in the future. Signed-off-by: Maciej S. Szmigiero Acked-by: Peter Xu --- migration/migration.c | 16 ++++++++++++++++ migration/savevm.c | 4 ++++ 2 files changed, 20 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index 9e9db26667f1..6b2a8af4231d 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -402,10 +402,26 @@ void migration_incoming_state_destroy(void) struct MigrationIncomingState *mis =3D migration_incoming_get_current(= ); =20 multifd_recv_cleanup(); + /* * RAM state cleanup needs to happen after multifd cleanup, because * multifd threads can use some of its states (receivedmap). + * + * This call also needs BQL held since it calls all registered + * load_cleanup SaveVMHandlers and at least the VFIO implementation is + * BQL-sensitive. + * + * In addition to the above, it also performs cleanup of load threads + * thread pool. + * This cleanup operation is BQL-sensitive as it requires unlocking BQL + * so a thread possibly waiting for it could get unblocked and finally + * exit. + * The reason why a load thread may need to hold BQL in the first place + * is because address space modification operations require it. + * + * Check proper BQL state here rather than risk possible deadlock late= r. */ + assert(bql_locked()); qemu_loadvm_state_cleanup(); =20 if (mis->to_src_file) { diff --git a/migration/savevm.c b/migration/savevm.c index 7c1aa8ad7b9d..3e86b572cfa8 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1986,6 +1986,8 @@ static void *postcopy_ram_listen_thread(void *opaque) * in qemu_file, and thus we must be blocking now. */ qemu_file_set_blocking(f, true); + + /* TODO: sanity check that only postcopiable data will be loaded here = */ load_res =3D qemu_loadvm_state_main(f, mis); =20 /* @@ -2046,7 +2048,9 @@ static void *postcopy_ram_listen_thread(void *opaque) * (If something broke then qemu will have to exit anyway since it's * got a bad migration state). */ + bql_lock(); migration_incoming_state_destroy(); + bql_unlock(); =20 rcu_unregister_thread(); mis->have_listen_thread =3D false;