From nobody Mon Feb 9 11:06:08 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1655931237; cv=none; d=zohomail.com; s=zohoarc; b=CAdopEMZyCFA+yAXvHBcsbFkcXPs2J6vpMae45AzKJDGFLaVbsEwDn6vVbcHTZ5VIYPmC4q/4fKGHNQDnQkwVU+OxoH3GEa3X4o+lGzBEgTJmAHobm7mugkEqLi8L4n32w5YbaB97oZ2n6pEafELew0VlB2CtBtPT26E8UAIZ90= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1655931237; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=A6BkIOPUMBsSgvV5skh0LFNxgktFfwQlggDtOwB0A54=; b=SEdSr1HQne90Q+9JBdWFFWbyfzxd/WnNqLR2Mm/DpVE99a9A0h7GsnjYnb3I3sFkPrF0FZ4NmGET4KIvW/oIrzO3e9zy6qmCot1iOFH15P0OGsyf3tkE8aG2+UJVEdqLCpEHzm9SqqSXq6iCJH5yQbbeEU5cQlp9z+IeqMc+8as= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1655931237402281.2971134938074; Wed, 22 Jun 2022 13:53:57 -0700 (PDT) Received: from localhost ([::1]:54972 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1o47MJ-0002FC-VQ for importer@patchew.org; Wed, 22 Jun 2022 16:53:55 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:49198) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o47IF-0002Tt-4t for qemu-devel@nongnu.org; Wed, 22 Jun 2022 16:49:43 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:52674) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o47IC-0004Gp-VC for qemu-devel@nongnu.org; Wed, 22 Jun 2022 16:49:42 -0400 Received: from mail-il1-f198.google.com (mail-il1-f198.google.com [209.85.166.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-172-tHMxF0mfPuuKP0qVw8STcw-1; Wed, 22 Jun 2022 16:49:39 -0400 Received: by mail-il1-f198.google.com with SMTP id g8-20020a92cda8000000b002d15f63967eso11690205ild.21 for ; Wed, 22 Jun 2022 13:49:38 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id b44-20020a0295af000000b0032b3a7817a7sm8920323jai.107.2022.06.22.13.49.35 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 22 Jun 2022 13:49:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1655930980; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=A6BkIOPUMBsSgvV5skh0LFNxgktFfwQlggDtOwB0A54=; b=fFgCbgGnE8E9JSRPKWtGLGWZYML0vSD8oDbtkbor/nawi8pP+TlKeXs8Vsw3zjM6Tr1oGD ptp6WPfSnw0J3TesRUl2WNZDB9b9QcVLOa3uMyJvENmZEU3EvRqdxDXQ1oDFhoSbuBfzGh qG1uLU/c9FCm4jS3a7kU2vGJtdyZ07s= X-MC-Unique: tHMxF0mfPuuKP0qVw8STcw-1 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=A6BkIOPUMBsSgvV5skh0LFNxgktFfwQlggDtOwB0A54=; b=SQRqn0Qc45j4d1DBdu1qcEntVQqLacjGiQVIXtvwYkrXTxc9bspiAU1N05OndC8ekB vyhsqT7ULfpPxPVLX2ODDvE0RGf0tl/u9bx87G3aspeKq3RWt7fr5N0QPIgKDquc2ZpR hOIhBGuyjGhBWDfpdCEathaRyj8/9bkhHDtXqA/waVM0cMpWjPebg2Odbrd1orpFtunf h28LaNwycymfaVbhdS71Vh3uLgMhY+ixeCDNdcIiTfPRITLlorTgEjibxxXKuqr1ey0j IgCd3A3BJC5wBRs6+6ozJMpvmxRwkIg0Q6BkYmwFZ7UDXLGvYz0ja/YblfSSKOei19MI sr+w== X-Gm-Message-State: AJIora/5CAk24gZNrfTlvlfUJKgF90oUovMYVSqQhQ8tnYQEZn2oAoRj iUzwUfo1i/gNr3TslQB93yg4Yl0dm9hLZQd6Rmbd/j7WC9wWE5AkWAR5F4VsLRMCOQhJ0iwAcY0 w2dHfoOQyb6eY7e/0o8zA8qMWMFGO/nYGUU4+MTc8MFi2D5Q7BD8/B8deI7yifyCU X-Received: by 2002:a05:6602:1687:b0:66a:44c6:63f6 with SMTP id s7-20020a056602168700b0066a44c663f6mr2850889iow.83.1655930977853; Wed, 22 Jun 2022 13:49:37 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tM/1rqkVSNF1t+nw5eo8OxOhlP/mSqIID8z+AMZsn/O55wGA623pCuuWHuXAgXp/w7ugTieg== X-Received: by 2002:a05:6602:1687:b0:66a:44c6:63f6 with SMTP id s7-20020a056602168700b0066a44c663f6mr2850870iow.83.1655930977460; Wed, 22 Jun 2022 13:49:37 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: "Daniel P . Berrange" , peterx@redhat.com, Leonardo Bras Soares Passos , Manish Mishra , "Dr . David Alan Gilbert" , Juan Quintela Subject: [PATCH v8 05/15] migration: Postcopy recover with preempt enabled Date: Wed, 22 Jun 2022 16:49:10 -0400 Message-Id: <20220622204920.79061-6-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220622204920.79061-1-peterx@redhat.com> References: <20220622204920.79061-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -28 X-Spam_score: -2.9 X-Spam_bar: -- X-Spam_report: (-2.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1655931238611100002 Content-Type: text/plain; charset="utf-8" To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thre= ad needs similar handling on fault tolerance. When ram_load_postcopy() fails, instead of stopping the thread it halts with a semaphore, preparing to be kicked again when recovery is detected. A mutex is introduced to make sure there's no concurrent operation upon the socket. To make it simple, the fast ram load thread will take the mutex du= ring its whole procedure, and only release it if it's paused. The fast-path soc= ket will be properly released by the main loading thread safely when there's network failures during postcopy with that mutex held. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu --- migration/migration.c | 27 +++++++++++++++++++++++---- migration/migration.h | 19 +++++++++++++++++++ migration/postcopy-ram.c | 25 +++++++++++++++++++++++-- migration/qemu-file.c | 27 +++++++++++++++++++++++++++ migration/qemu-file.h | 1 + migration/savevm.c | 26 ++++++++++++++++++++++++-- migration/trace-events | 2 ++ 7 files changed, 119 insertions(+), 8 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 5e20d1c941..db82ecbdcd 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -215,9 +215,11 @@ void migration_object_init(void) current_incoming->postcopy_remote_fds =3D g_array_new(FALSE, TRUE, sizeof(struct PostCopyFD)); qemu_mutex_init(¤t_incoming->rp_mutex); + qemu_mutex_init(¤t_incoming->postcopy_prio_thread_mutex); qemu_event_init(¤t_incoming->main_thread_load_event, false); qemu_sem_init(¤t_incoming->postcopy_pause_sem_dst, 0); qemu_sem_init(¤t_incoming->postcopy_pause_sem_fault, 0); + qemu_sem_init(¤t_incoming->postcopy_pause_sem_fast_load, 0); qemu_mutex_init(¤t_incoming->page_request_mutex); current_incoming->page_requested =3D g_tree_new(page_request_addr_cmp); =20 @@ -697,9 +699,9 @@ static bool postcopy_try_recover(void) =20 /* * Here, we only wake up the main loading thread (while the - * fault thread will still be waiting), so that we can receive + * rest threads will still be waiting), so that we can receive * commands from source now, and answer it if needed. The - * fault thread will be woken up afterwards until we are sure + * rest threads will be woken up afterwards until we are sure * that source is ready to reply to page requests. */ qemu_sem_post(&mis->postcopy_pause_sem_dst); @@ -3502,6 +3504,18 @@ static MigThrError postcopy_pause(MigrationState *s) qemu_file_shutdown(file); qemu_fclose(file); =20 + /* + * Do the same to postcopy fast path socket too if there is. No + * locking needed because no racer as long as we do this before se= tting + * status to paused. + */ + if (s->postcopy_qemufile_src) { + migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_s= rc); + qemu_file_shutdown(s->postcopy_qemufile_src); + qemu_fclose(s->postcopy_qemufile_src); + s->postcopy_qemufile_src =3D NULL; + } + migrate_set_state(&s->state, s->state, MIGRATION_STATUS_POSTCOPY_PAUSED); =20 @@ -3557,8 +3571,13 @@ static MigThrError migration_detect_error(MigrationS= tate *s) return MIG_THR_ERR_FATAL; } =20 - /* Try to detect any file errors */ - ret =3D qemu_file_get_error_obj(s->to_dst_file, &local_error); + /* + * Try to detect any file errors. Note that postcopy_qemufile_src will + * be NULL when postcopy preempt is not enabled. + */ + ret =3D qemu_file_get_error_obj_any(s->to_dst_file, + s->postcopy_qemufile_src, + &local_error); if (!ret) { /* Everything is fine */ assert(!local_error); diff --git a/migration/migration.h b/migration/migration.h index ff714c235f..9220cec6bd 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -118,6 +118,18 @@ struct MigrationIncomingState { /* Postcopy priority thread is used to receive postcopy requested page= s */ QemuThread postcopy_prio_thread; bool postcopy_prio_thread_created; + /* + * Used to sync between the ram load main thread and the fast ram load + * thread. It protects postcopy_qemufile_dst, which is the postcopy + * fast channel. + * + * The ram fast load thread will take it mostly for the whole lifecycle + * because it needs to continuously read data from the channel, and + * it'll only release this mutex if postcopy is interrupted, so that + * the ram load main thread will take this mutex over and properly + * release the broken channel. + */ + QemuMutex postcopy_prio_thread_mutex; /* * An array of temp host huge pages to be used, one for each postcopy * channel. @@ -147,6 +159,13 @@ struct MigrationIncomingState { /* notify PAUSED postcopy incoming migrations to try to continue */ QemuSemaphore postcopy_pause_sem_dst; QemuSemaphore postcopy_pause_sem_fault; + /* + * This semaphore is used to allow the ram fast load thread (only when + * postcopy preempt is enabled) fall into sleep when there's network + * interruption detected. When the recovery is done, the main load + * thread will kick the fast ram load thread using this semaphore. + */ + QemuSemaphore postcopy_pause_sem_fast_load; =20 /* List of listening socket addresses */ SocketAddressList *socket_address_list; diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index a3561410fe..84f7b1526e 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -1580,6 +1580,15 @@ int postcopy_preempt_setup(MigrationState *s, Error = **errp) return 0; } =20 +static void postcopy_pause_ram_fast_load(MigrationIncomingState *mis) +{ + trace_postcopy_pause_fast_load(); + qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex); + qemu_sem_wait(&mis->postcopy_pause_sem_fast_load); + qemu_mutex_lock(&mis->postcopy_prio_thread_mutex); + trace_postcopy_pause_fast_load_continued(); +} + void *postcopy_preempt_thread(void *opaque) { MigrationIncomingState *mis =3D opaque; @@ -1592,11 +1601,23 @@ void *postcopy_preempt_thread(void *opaque) qemu_sem_post(&mis->thread_sync_sem); =20 /* Sending RAM_SAVE_FLAG_EOS to terminate this thread */ - ret =3D ram_load_postcopy(mis->postcopy_qemufile_dst, RAM_CHANNEL_POST= COPY); + qemu_mutex_lock(&mis->postcopy_prio_thread_mutex); + while (1) { + ret =3D ram_load_postcopy(mis->postcopy_qemufile_dst, + RAM_CHANNEL_POSTCOPY); + /* If error happened, go into recovery routine */ + if (ret) { + postcopy_pause_ram_fast_load(mis); + } else { + /* We're done */ + break; + } + } + qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex); =20 rcu_unregister_thread(); =20 trace_postcopy_preempt_thread_exit(); =20 - return ret =3D=3D 0 ? NULL : (void *)-1; + return NULL; } diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 1e80d496b7..2f266b25cd 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -160,6 +160,33 @@ int qemu_file_get_error_obj(QEMUFile *f, Error **errp) return f->last_error; } =20 +/* + * Get last error for either stream f1 or f2 with optional Error*. + * The error returned (non-zero) can be either from f1 or f2. + * + * If any of the qemufile* is NULL, then skip the check on that file. + * + * When there is no error on both qemufile, zero is returned. + */ +int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp) +{ + int ret =3D 0; + + if (f1) { + ret =3D qemu_file_get_error_obj(f1, errp); + /* If there's already error detected, return */ + if (ret) { + return ret; + } + } + + if (f2) { + ret =3D qemu_file_get_error_obj(f2, errp); + } + + return ret; +} + /* * Set the last error for stream f with optional Error* */ diff --git a/migration/qemu-file.h b/migration/qemu-file.h index 96e72d8bd8..fa13d04d78 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -141,6 +141,7 @@ void qemu_file_acct_rate_limit(QEMUFile *f, int64_t len= ); void qemu_file_set_rate_limit(QEMUFile *f, int64_t new_rate); int64_t qemu_file_get_rate_limit(QEMUFile *f); int qemu_file_get_error_obj(QEMUFile *f, Error **errp); +int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp); void qemu_file_set_error_obj(QEMUFile *f, int ret, Error *err); void qemu_file_set_error(QEMUFile *f, int ret); int qemu_file_shutdown(QEMUFile *f); diff --git a/migration/savevm.c b/migration/savevm.c index e3af03cb9b..48e85c052c 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2117,6 +2117,13 @@ static int loadvm_postcopy_handle_resume(MigrationIn= comingState *mis) */ qemu_sem_post(&mis->postcopy_pause_sem_fault); =20 + if (migrate_postcopy_preempt()) { + /* The channel should already be setup again; make sure of it */ + assert(mis->postcopy_qemufile_dst); + /* Kick the fast ram load thread too */ + qemu_sem_post(&mis->postcopy_pause_sem_fast_load); + } + return 0; } =20 @@ -2562,6 +2569,21 @@ static bool postcopy_pause_incoming(MigrationIncomin= gState *mis) mis->to_src_file =3D NULL; qemu_mutex_unlock(&mis->rp_mutex); =20 + /* + * NOTE: this must happen before reset the PostcopyTmpPages below, + * otherwise it's racy to reset those fields when the fast load thread + * can be accessing it in parallel. + */ + if (mis->postcopy_qemufile_dst) { + qemu_file_shutdown(mis->postcopy_qemufile_dst); + /* Take the mutex to make sure the fast ram load thread halted */ + qemu_mutex_lock(&mis->postcopy_prio_thread_mutex); + migration_ioc_unregister_yank_from_file(mis->postcopy_qemufile_dst= ); + qemu_fclose(mis->postcopy_qemufile_dst); + mis->postcopy_qemufile_dst =3D NULL; + qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex); + } + migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, MIGRATION_STATUS_POSTCOPY_PAUSED); =20 @@ -2599,8 +2621,8 @@ retry: while (true) { section_type =3D qemu_get_byte(f); =20 - if (qemu_file_get_error(f)) { - ret =3D qemu_file_get_error(f); + ret =3D qemu_file_get_error_obj_any(f, mis->postcopy_qemufile_dst,= NULL); + if (ret) { break; } =20 diff --git a/migration/trace-events b/migration/trace-events index 69f311169a..0e385c3a07 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -270,6 +270,8 @@ mark_postcopy_blocktime_begin(uint64_t addr, void *dd, = uint32_t time, int cpu, i mark_postcopy_blocktime_end(uint64_t addr, void *dd, uint32_t time, int af= fected_cpu) "addr: 0x%" PRIx64 ", dd: %p, time: %u, affected_cpu: %d" postcopy_pause_fault_thread(void) "" postcopy_pause_fault_thread_continued(void) "" +postcopy_pause_fast_load(void) "" +postcopy_pause_fast_load_continued(void) "" postcopy_ram_fault_thread_entry(void) "" postcopy_ram_fault_thread_exit(void) "" postcopy_ram_fault_thread_fds_core(int baseufd, int quitfd) "ufd: %d quitf= d: %d" --=20 2.32.0