From nobody Fri Nov 1 07:36:14 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=bytedance.com ARC-Seal: i=1; a=rsa-sha256; t=1708123316; cv=none; d=zohomail.com; s=zohoarc; b=A7ycBET2CkgTzpSV/o7y9JWXTlhrarDOFbUa4D/gN5Y5WgLwPRtcWGPMg21Urtb5b3WVBwlRN/mutt8C3vgUHwgUdkbY6hVdDRzkr52r6OYXOAFqog4Z3b73MjbP1n8z27ae8kIYUK3JCItzS+NrMzkOvQT9iZIvCVeWJ0GwwPg= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1708123316; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=v3VZ+i6lWDPBJl2XFGqb5oFW0n4zsCAj1svqfubVQtg=; b=TeRmMrNVC9mVKYAI5PEyKJDHBhCaJFe3kjHm1CQtcodzv06aEajBvaFwzZ9u/xJ0WZPWWZcqYdVxM7kPV417m9aGzhT1x+UzeThwKNzFpHjPJHXn8sOtCVas1u377MqRE36/BRHDkNFUlZpd30zQP8ZvWudMFO+O86ppbP6l0/U= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1708123316174838.438774127727; Fri, 16 Feb 2024 14:41:56 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rb6tO-0004vz-4J; Fri, 16 Feb 2024 17:41:14 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rb6tM-0004v2-8S for qemu-devel@nongnu.org; Fri, 16 Feb 2024 17:41:12 -0500 Received: from mail-vk1-xa2c.google.com ([2607:f8b0:4864:20::a2c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rb6tI-0007I9-UY for qemu-devel@nongnu.org; Fri, 16 Feb 2024 17:41:11 -0500 Received: by mail-vk1-xa2c.google.com with SMTP id 71dfb90a1353d-4c0245cba99so664660e0c.0 for ; Fri, 16 Feb 2024 14:41:08 -0800 (PST) Received: from n231-230-216.byted.org ([130.44.212.104]) by smtp.gmail.com with ESMTPSA id cz18-20020a056122449200b004c0a12c4d53sm120991vkb.51.2024.02.16.14.41.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Feb 2024 14:41:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1708123268; x=1708728068; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=v3VZ+i6lWDPBJl2XFGqb5oFW0n4zsCAj1svqfubVQtg=; b=MtaFq7Fb9velGcm+mNKmmwcPQDnQtYyIACeC0rP+k2/y8cbLzFqssa80p/PX3/Rc0z Degq+ZlxOrcSIpGajlVXsYIBKPpPyChJ6+94Gv3FpTFGgIzCDRzf50T//gJ5Hl6P7GJw jZBiMIYsmrkAeKgnq9b7uiF/mW3YIyt/TU20aWL/pwOAqc9z0HYd6UUXxIyU9P9Nodly sxyMNZTFe3oJZWw2cJtl4EhG025j/aA4rHHybgcBsB/oYzRGQnQyeEkH4eMBGEDwR4IO frKLM4wKp/tIhZSZti2Fa4OXrb3sSd7vdLIHovJOLKYDhcIN2mdgfl3Maj21/fBPpijK IYLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708123268; x=1708728068; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=v3VZ+i6lWDPBJl2XFGqb5oFW0n4zsCAj1svqfubVQtg=; b=Eir/0cvpGswe7QqKa7OH9B5k/w0gjjy0fo2rl5VXZcmmZ+KwQWFnoBgak7GhnmlUoH f49LN/O6xSVSiU6QN3Lzj58eh0M7BO+zMX1dK94oEqgLrnmQB8VlH6E/fhb8S6KrPSc6 9MjTn6hn+vmv21hpq2TYn+tzU/0lARl8ZsawMwfGTr0YBPckiW7ge7sONwfMfaItOUXZ fsNwGLNmDsTtnLFrtb8Ab17wxDq/sQrtQZ1AejXSedYOclzZkeYlt1/obuKp2PcRRmzf TxZmVS/3tvY0eV8jOFhkVwV7U5aI+EaZblKK8ZnrdRMSZUsOLJaZ5p9vDXqw47ObRbAh DozQ== X-Forwarded-Encrypted: i=1; AJvYcCV+n0cH0a1AvrGJbwAyZQDqpU0C1ClsIBbtWAnqeRkyl66eD7CL1myYRIWzRT5wWlVA9oOWhBRHgh7PhYM2TBlNsVzZ+GY= X-Gm-Message-State: AOJu0Yy8/OwuBEVhNa62/YGap2Dh20TW4RVILB6zDg309MWCOUCF2l1S broj+saytOAqG3RQIbDXT0HV0QXHsrok80eMKyMot70CFv1yb42bKYYZ7/TjCzI= X-Google-Smtp-Source: AGHT+IFSbay2OENf55AzM/kqmPauD8uxw2NBdT4Eqt+99bj0GNRyrLfGbsx38ajA8HD+YXVh5Y4oAA== X-Received: by 2002:a1f:62c3:0:b0:4c0:3621:7ab0 with SMTP id w186-20020a1f62c3000000b004c036217ab0mr5632423vkb.15.1708123267284; Fri, 16 Feb 2024 14:41:07 -0800 (PST) From: Hao Xiang To: pbonzini@redhat.com, berrange@redhat.com, eduardo@habkost.net, peterx@redhat.com, farosas@suse.de, eblake@redhat.com, armbru@redhat.com, thuth@redhat.com, lvivier@redhat.com, qemu-devel@nongnu.org, jdenemar@redhat.com Cc: Hao Xiang Subject: [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread. Date: Fri, 16 Feb 2024 22:39:58 +0000 Message-Id: <20240216224002.1476890-4-hao.xiang@bytedance.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20240216224002.1476890-1-hao.xiang@bytedance.com> References: <20240216224002.1476890-1-hao.xiang@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::a2c; envelope-from=hao.xiang@bytedance.com; helo=mail-vk1-xa2c.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @bytedance.com) X-ZM-MESSAGEID: 1708123316757100006 Content-Type: text/plain; charset="utf-8" 1. Implements the zero page detection and handling on the multifd threads for non-compression, zlib and zstd compression backends. 2. Added a new value 'multifd' in ZeroPageDetection enumeration. 3. Add proper asserts to ensure pages->normal are used for normal pages in all scenarios. Signed-off-by: Hao Xiang --- migration/meson.build | 1 + migration/multifd-zero-page.c | 59 +++++++++++++++++++++++++++++++++++ migration/multifd-zlib.c | 26 ++++++++++++--- migration/multifd-zstd.c | 25 ++++++++++++--- migration/multifd.c | 50 +++++++++++++++++++++++------ migration/multifd.h | 7 +++++ qapi/migration.json | 4 ++- 7 files changed, 151 insertions(+), 21 deletions(-) create mode 100644 migration/multifd-zero-page.c diff --git a/migration/meson.build b/migration/meson.build index 92b1cc4297..1eeb915ff6 100644 --- a/migration/meson.build +++ b/migration/meson.build @@ -22,6 +22,7 @@ system_ss.add(files( 'migration.c', 'multifd.c', 'multifd-zlib.c', + 'multifd-zero-page.c', 'ram-compress.c', 'options.c', 'postcopy-ram.c', diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c new file mode 100644 index 0000000000..f0cd8e2c53 --- /dev/null +++ b/migration/multifd-zero-page.c @@ -0,0 +1,59 @@ +/* + * Multifd zero page detection implementation. + * + * Copyright (c) 2024 Bytedance Inc + * + * Authors: + * Hao Xiang + * + * This work is licensed under the terms of the GNU GPL, version 2 or late= r. + * See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "qemu/cutils.h" +#include "exec/ramblock.h" +#include "migration.h" +#include "multifd.h" +#include "options.h" +#include "ram.h" + +void multifd_zero_page_check_send(MultiFDSendParams *p) +{ + /* + * QEMU older than 9.0 don't understand zero page + * on multifd channel. This switch is required to + * maintain backward compatibility. + */ + bool use_multifd_zero_page =3D + (migrate_zero_page_detection() =3D=3D ZERO_PAGE_DETECTION_MULTIFD); + MultiFDPages_t *pages =3D p->pages; + RAMBlock *rb =3D pages->block; + + assert(pages->num !=3D 0); + assert(pages->normal_num =3D=3D 0); + assert(pages->zero_num =3D=3D 0); + + for (int i =3D 0; i < pages->num; i++) { + uint64_t offset =3D pages->offset[i]; + if (use_multifd_zero_page && + buffer_is_zero(rb->host + offset, p->page_size)) { + pages->zero[pages->zero_num] =3D offset; + pages->zero_num++; + ram_release_page(rb->idstr, offset); + } else { + pages->normal[pages->normal_num] =3D offset; + pages->normal_num++; + } + } +} + +void multifd_zero_page_check_recv(MultiFDRecvParams *p) +{ + for (int i =3D 0; i < p->zero_num; i++) { + void *page =3D p->host + p->zero[i]; + if (!buffer_is_zero(page, p->page_size)) { + memset(page, 0, p->page_size); + } + } +} diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c index 012e3bdea1..cdfe0fa70e 100644 --- a/migration/multifd-zlib.c +++ b/migration/multifd-zlib.c @@ -123,13 +123,20 @@ static int zlib_send_prepare(MultiFDSendParams *p, Er= ror **errp) int ret; uint32_t i; =20 + multifd_zero_page_check_send(p); + + if (!pages->normal_num) { + p->next_packet_size =3D 0; + goto out; + } + multifd_send_prepare_header(p); =20 - for (i =3D 0; i < pages->num; i++) { + for (i =3D 0; i < pages->normal_num; i++) { uint32_t available =3D z->zbuff_len - out_size; int flush =3D Z_NO_FLUSH; =20 - if (i =3D=3D pages->num - 1) { + if (i =3D=3D pages->normal_num - 1) { flush =3D Z_SYNC_FLUSH; } =20 @@ -138,7 +145,7 @@ static int zlib_send_prepare(MultiFDSendParams *p, Erro= r **errp) * with compression. zlib does not guarantee that this is safe, * therefore copy the page before calling deflate(). */ - memcpy(z->buf, p->pages->block->host + pages->offset[i], p->page_s= ize); + memcpy(z->buf, p->pages->block->host + pages->normal[i], p->page_s= ize); zs->avail_in =3D p->page_size; zs->next_in =3D z->buf; =20 @@ -172,10 +179,10 @@ static int zlib_send_prepare(MultiFDSendParams *p, Er= ror **errp) p->iov[p->iovs_num].iov_len =3D out_size; p->iovs_num++; p->next_packet_size =3D out_size; - p->flags |=3D MULTIFD_FLAG_ZLIB; =20 +out: + p->flags |=3D MULTIFD_FLAG_ZLIB; multifd_send_fill_packet(p); - return 0; } =20 @@ -261,6 +268,14 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error= **errp) p->id, flags, MULTIFD_FLAG_ZLIB); return -1; } + + multifd_zero_page_check_recv(p); + + if (!p->normal_num) { + assert(in_size =3D=3D 0); + return 0; + } + ret =3D qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp); =20 if (ret !=3D 0) { @@ -310,6 +325,7 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error = **errp) p->id, out_size, expected_size); return -1; } + return 0; } =20 diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c index dc8fe43e94..27a1eba075 100644 --- a/migration/multifd-zstd.c +++ b/migration/multifd-zstd.c @@ -118,19 +118,26 @@ static int zstd_send_prepare(MultiFDSendParams *p, Er= ror **errp) int ret; uint32_t i; =20 + multifd_zero_page_check_send(p); + + if (!pages->normal_num) { + p->next_packet_size =3D 0; + goto out; + } + multifd_send_prepare_header(p); =20 z->out.dst =3D z->zbuff; z->out.size =3D z->zbuff_len; z->out.pos =3D 0; =20 - for (i =3D 0; i < pages->num; i++) { + for (i =3D 0; i < pages->normal_num; i++) { ZSTD_EndDirective flush =3D ZSTD_e_continue; =20 - if (i =3D=3D pages->num - 1) { + if (i =3D=3D pages->normal_num - 1) { flush =3D ZSTD_e_flush; } - z->in.src =3D p->pages->block->host + pages->offset[i]; + z->in.src =3D p->pages->block->host + pages->normal[i]; z->in.size =3D p->page_size; z->in.pos =3D 0; =20 @@ -161,10 +168,10 @@ static int zstd_send_prepare(MultiFDSendParams *p, Er= ror **errp) p->iov[p->iovs_num].iov_len =3D z->out.pos; p->iovs_num++; p->next_packet_size =3D z->out.pos; - p->flags |=3D MULTIFD_FLAG_ZSTD; =20 +out: + p->flags |=3D MULTIFD_FLAG_ZSTD; multifd_send_fill_packet(p); - return 0; } =20 @@ -257,6 +264,14 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error= **errp) p->id, flags, MULTIFD_FLAG_ZSTD); return -1; } + + multifd_zero_page_check_recv(p); + + if (!p->normal_num) { + assert(in_size =3D=3D 0); + return 0; + } + ret =3D qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp); =20 if (ret !=3D 0) { diff --git a/migration/multifd.c b/migration/multifd.c index a33dba40d9..fbb40ea10b 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -11,6 +11,7 @@ */ =20 #include "qemu/osdep.h" +#include "qemu/cutils.h" #include "qemu/rcu.h" #include "exec/target_page.h" #include "sysemu/sysemu.h" @@ -126,6 +127,8 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Er= ror **errp) MultiFDPages_t *pages =3D p->pages; int ret; =20 + multifd_zero_page_check_send(p); + if (!use_zero_copy_send) { /* * Only !zerocopy needs the header in IOV; zerocopy will @@ -134,13 +137,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, = Error **errp) multifd_send_prepare_header(p); } =20 - for (int i =3D 0; i < pages->num; i++) { - p->iov[p->iovs_num].iov_base =3D pages->block->host + pages->offse= t[i]; + for (int i =3D 0; i < pages->normal_num; i++) { + p->iov[p->iovs_num].iov_base =3D pages->block->host + pages->norma= l[i]; p->iov[p->iovs_num].iov_len =3D p->page_size; p->iovs_num++; } =20 - p->next_packet_size =3D pages->num * p->page_size; + p->next_packet_size =3D pages->normal_num * p->page_size; p->flags |=3D MULTIFD_FLAG_NOCOMP; =20 multifd_send_fill_packet(p); @@ -202,6 +205,13 @@ static int nocomp_recv_pages(MultiFDRecvParams *p, Err= or **errp) p->id, flags, MULTIFD_FLAG_NOCOMP); return -1; } + + multifd_zero_page_check_recv(p); + + if (!p->normal_num) { + return 0; + } + for (int i =3D 0; i < p->normal_num; i++) { p->iov[i].iov_base =3D p->host + p->normal[i]; p->iov[i].iov_len =3D p->page_size; @@ -339,7 +349,7 @@ void multifd_send_fill_packet(MultiFDSendParams *p) =20 packet->flags =3D cpu_to_be32(p->flags); packet->pages_alloc =3D cpu_to_be32(p->pages->allocated); - packet->normal_pages =3D cpu_to_be32(pages->num); + packet->normal_pages =3D cpu_to_be32(pages->normal_num); packet->zero_pages =3D cpu_to_be32(pages->zero_num); packet->next_packet_size =3D cpu_to_be32(p->next_packet_size); =20 @@ -350,18 +360,25 @@ void multifd_send_fill_packet(MultiFDSendParams *p) strncpy(packet->ramblock, pages->block->idstr, 256); } =20 - for (i =3D 0; i < pages->num; i++) { + for (i =3D 0; i < pages->normal_num; i++) { /* there are architectures where ram_addr_t is 32 bit */ - uint64_t temp =3D pages->offset[i]; + uint64_t temp =3D pages->normal[i]; =20 packet->offset[i] =3D cpu_to_be64(temp); } =20 + for (i =3D 0; i < pages->zero_num; i++) { + /* there are architectures where ram_addr_t is 32 bit */ + uint64_t temp =3D pages->zero[i]; + + packet->offset[pages->normal_num + i] =3D cpu_to_be64(temp); + } + p->packets_sent++; - p->total_normal_pages +=3D pages->num; + p->total_normal_pages +=3D pages->normal_num; p->total_zero_pages +=3D pages->zero_num; =20 - trace_multifd_send(p->id, packet_num, pages->num, pages->zero_num, + trace_multifd_send(p->id, packet_num, pages->normal_num, pages->zero_n= um, p->flags, p->next_packet_size); } =20 @@ -451,6 +468,18 @@ static int multifd_recv_unfill_packet(MultiFDRecvParam= s *p, Error **errp) p->normal[i] =3D offset; } =20 + for (i =3D 0; i < p->zero_num; i++) { + uint64_t offset =3D be64_to_cpu(packet->offset[p->normal_num + i]); + + if (offset > (p->block->used_length - p->page_size)) { + error_setg(errp, "multifd: offset too long %" PRIu64 + " (max " RAM_ADDR_FMT ")", + offset, p->block->used_length); + return -1; + } + p->zero[i] =3D offset; + } + return 0; } =20 @@ -842,7 +871,7 @@ static void *multifd_send_thread(void *opaque) =20 stat64_add(&mig_stats.multifd_bytes, p->next_packet_size + p->packet_len); - stat64_add(&mig_stats.normal_pages, pages->num); + stat64_add(&mig_stats.normal_pages, pages->normal_num); stat64_add(&mig_stats.zero_pages, pages->zero_num); =20 multifd_pages_reset(p->pages); @@ -1256,7 +1285,8 @@ static void *multifd_recv_thread(void *opaque) p->flags &=3D ~MULTIFD_FLAG_SYNC; qemu_mutex_unlock(&p->mutex); =20 - if (p->normal_num) { + if (p->normal_num + p->zero_num) { + assert(!(flags & MULTIFD_FLAG_SYNC)); ret =3D multifd_recv_state->ops->recv_pages(p, &local_err); if (ret !=3D 0) { break; diff --git a/migration/multifd.h b/migration/multifd.h index 9822ff298a..125f0bbe60 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -53,6 +53,11 @@ typedef struct { uint32_t unused32[1]; /* Reserved for future use */ uint64_t unused64[3]; /* Reserved for future use */ char ramblock[256]; + /* + * This array contains the pointers to: + * - normal pages (initial normal_pages entries) + * - zero pages (following zero_pages entries) + */ uint64_t offset[]; } __attribute__((packed)) MultiFDPacket_t; =20 @@ -224,6 +229,8 @@ typedef struct { =20 void multifd_register_ops(int method, MultiFDMethods *ops); void multifd_send_fill_packet(MultiFDSendParams *p); +void multifd_zero_page_check_send(MultiFDSendParams *p); +void multifd_zero_page_check_recv(MultiFDRecvParams *p); =20 static inline void multifd_send_prepare_header(MultiFDSendParams *p) { diff --git a/qapi/migration.json b/qapi/migration.json index 99843a8e95..e2450b92d4 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -660,9 +660,11 @@ # # @none: Do not perform zero page checking. # +# @multifd: Perform zero page checking on the multifd sender thread. (sinc= e 9.0) +# ## { 'enum': 'ZeroPageDetection', - 'data': [ 'legacy', 'none' ] } + 'data': [ 'legacy', 'none', 'multifd' ] } =20 ## # @BitmapMigrationBitmapAliasTransform: --=20 2.30.2