From nobody Sat May 4 11:49:39 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=virtuozzo.com ARC-Seal: i=1; a=rsa-sha256; t=1620850261; cv=none; d=zohomail.com; s=zohoarc; b=dJUOYGShx5s3r91267PFAYxSl7xwvT6iVCsn4W7oyABfSRkoCLrxBuplEinFrh0rEKRwf4Um0Ncgeqv6yqBYw65uOlygabCIvYozqJ0+PajH9d3LdiJpZEoqVMn0aiQkBiSyJMtCAJSbzX81JdUKYqm5FpvkolEgFL0n27sjrqw= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1620850261; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=V4Bs5zwojnUESj9T0xGfn8nbKeLSbNirWo1Or+zmo0g=; b=YDfY4WL/8TlevUylES1cUTD/U3IMdOQVu99bHoFVK6YE27ECK5grgPp6lXU1r3hqQ+8ClklUWwHwEcjqIM18fvKxv/jYUwa3FXElU6ZQzKmKQdrIpVorxGiRLUcqpSXUj9wr4MMTnnV0ZlfLKvM0ljYR+UXWeq2ys5Wi0z1Cn44= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1620850261235970.226668177182; Wed, 12 May 2021 13:11:01 -0700 (PDT) Received: from localhost ([::1]:38778 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lgvC7-0005h5-Nh for importer@patchew.org; Wed, 12 May 2021 16:10:59 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60232) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lguV6-0002A6-0a for qemu-devel@nongnu.org; Wed, 12 May 2021 15:26:32 -0400 Received: from relay.sw.ru ([185.231.240.75]:44742) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lguUz-0000ok-Ah for qemu-devel@nongnu.org; Wed, 12 May 2021 15:26:31 -0400 Received: from [192.168.15.22] (helo=andrey-MS-7B54.sw.ru) by relay.sw.ru with esmtp (Exim 4.94) (envelope-from ) id 1lguUt-002BHm-N1; Wed, 12 May 2021 22:26:19 +0300 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=V4Bs5zwojnUESj9T0xGfn8nbKeLSbNirWo1Or+zmo0g=; b=D4V1UDh59nUJ enmDL2uhLhWNOLT2mQYK6ESC9uKHeL1rmVmbIqhFa7fDSETvsrQcMVgcCvefdgv+Ro3vF4issM1Fd WIObJcbfjwCTOS7cD5LrY9HBy1CSxPqjSG61TqMZn83EeoF1veGUE7fJrVGvEceBuYMNYMkdvPBxT VATwA=; From: Andrey Gruzdev To: qemu-devel@nongnu.org Cc: Den Lunev , Vladimir Sementsov-Ogievskiy , Eric Blake , Paolo Bonzini , Juan Quintela , "Dr . David Alan Gilbert" , Markus Armbruster , Peter Xu , David Hildenbrand , Andrey Gruzdev Subject: [RFC PATCH v1 1/7] migration/snapshot: Introduce qemu-snapshot tool Date: Wed, 12 May 2021 22:26:13 +0300 Message-Id: <20210512192619.537268-2-andrey.gruzdev@virtuozzo.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210512192619.537268-1-andrey.gruzdev@virtuozzo.com> References: <20210512192619.537268-1-andrey.gruzdev@virtuozzo.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=185.231.240.75; envelope-from=andrey.gruzdev@virtuozzo.com; helo=relay.sw.ru X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @virtuozzo.com) Content-Type: text/plain; charset="utf-8" Execution environment, command-line argument parsing, usage/version info et= c. Signed-off-by: Andrey Gruzdev --- include/qemu-snapshot.h | 59 ++++++ meson.build | 2 + qemu-snapshot-vm.c | 57 ++++++ qemu-snapshot.c | 439 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 557 insertions(+) create mode 100644 include/qemu-snapshot.h create mode 100644 qemu-snapshot-vm.c create mode 100644 qemu-snapshot.c diff --git a/include/qemu-snapshot.h b/include/qemu-snapshot.h new file mode 100644 index 0000000000..154e11e9a5 --- /dev/null +++ b/include/qemu-snapshot.h @@ -0,0 +1,59 @@ +/* + * QEMU External Snapshot Utility + * + * Copyright Virtuozzo GmbH, 2021 + * + * Authors: + * Andrey Gruzdev + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + */ + +#ifndef QEMU_SNAPSHOT_H +#define QEMU_SNAPSHOT_H + +/* Invalid offset */ +#define INVALID_OFFSET -1 +/* Maximum byte count for qemu_get_buffer_in_place() */ +#define INPLACE_READ_MAX (32768 - 4096) + +/* Backing cluster size */ +#define BDRV_CLUSTER_SIZE (1024 * 1024) + +/* Minimum supported target page size */ +#define PAGE_SIZE_MIN 4096 +/* + * Maximum supported target page size. The limit is caused by using + * QEMUFile and qemu_get_buffer_in_place() on migration channel. + * IO_BUF_SIZE is currently 32KB. + */ +#define PAGE_SIZE_MAX 16384 +/* RAM slice size for snapshot saving */ +#define SLICE_SIZE PAGE_SIZE_MAX +/* RAM slice size for snapshot revert */ +#define SLICE_SIZE_REVERT (16 * PAGE_SIZE_MAX) + +typedef struct StateSaveCtx { + BlockBackend *blk; /* Block backend */ +} StateSaveCtx; + +typedef struct StateLoadCtx { + BlockBackend *blk; /* Block backend */ +} StateLoadCtx; + +extern int64_t page_size; /* Page size */ +extern int64_t page_mask; /* Page mask */ +extern int page_bits; /* Page size bits */ +extern int64_t slice_size; /* RAM slice size */ +extern int64_t slice_mask; /* RAM slice mask */ +extern int slice_bits; /* RAM slice size bits */ + +void ram_init_state(void); +void ram_destroy_state(void); +StateSaveCtx *get_save_context(void); +StateLoadCtx *get_load_context(void); +int coroutine_fn save_state_main(StateSaveCtx *s); +int coroutine_fn load_state_main(StateLoadCtx *s); + +#endif /* QEMU_SNAPSHOT_H */ diff --git a/meson.build b/meson.build index 0b41ff4118..b851671914 100644 --- a/meson.build +++ b/meson.build @@ -2361,6 +2361,8 @@ if have_tools dependencies: [block, qemuutil], install: true) qemu_nbd =3D executable('qemu-nbd', files('qemu-nbd.c'), dependencies: [blockdev, qemuutil, gnutls], install: true) + qemu_snapshot =3D executable('qemu-snapshot', files('qemu-snapshot.c', '= qemu-snapshot-vm.c'), + dependencies: [blockdev, qemuutil, migration], install: tru= e) =20 subdir('storage-daemon') subdir('contrib/rdmacm-mux') diff --git a/qemu-snapshot-vm.c b/qemu-snapshot-vm.c new file mode 100644 index 0000000000..f7695e75c7 --- /dev/null +++ b/qemu-snapshot-vm.c @@ -0,0 +1,57 @@ +/* + * QEMU External Snapshot Utility + * + * Copyright Virtuozzo GmbH, 2021 + * + * Authors: + * Andrey Gruzdev + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "sysemu/block-backend.h" +#include "qemu/coroutine.h" +#include "qemu/cutils.h" +#include "qemu/bitmap.h" +#include "qemu/error-report.h" +#include "io/channel-buffer.h" +#include "migration/qemu-file-channel.h" +#include "migration/qemu-file.h" +#include "migration/savevm.h" +#include "migration/ram.h" +#include "qemu-snapshot.h" + +/* RAM transfer context */ +typedef struct RAMCtx { + int64_t normal_pages; /* Total number of normal pages */ +} RAMCtx; + +static RAMCtx ram_ctx; + +int coroutine_fn save_state_main(StateSaveCtx *s) +{ + /* TODO: implement */ + return 0; +} + +int coroutine_fn load_state_main(StateLoadCtx *s) +{ + /* TODO: implement */ + return 0; +} + +/* Initialize snapshot RAM state */ +void ram_init_state(void) +{ + RAMCtx *ram =3D &ram_ctx; + + memset(ram, 0, sizeof(ram_ctx)); +} + +/* Destroy snapshot RAM state */ +void ram_destroy_state(void) +{ + /* TODO: implement */ +} diff --git a/qemu-snapshot.c b/qemu-snapshot.c new file mode 100644 index 0000000000..7ac4ef66c4 --- /dev/null +++ b/qemu-snapshot.c @@ -0,0 +1,439 @@ +/* + * QEMU External Snapshot Utility + * + * Copyright Virtuozzo GmbH, 2021 + * + * Authors: + * Andrey Gruzdev + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include + +#include "qemu-common.h" +#include "qemu-version.h" +#include "qapi/error.h" +#include "qapi/qmp/qdict.h" +#include "sysemu/sysemu.h" +#include "sysemu/block-backend.h" +#include "sysemu/runstate.h" /* for qemu_system_killed() prototype */ +#include "qemu/cutils.h" +#include "qemu/coroutine.h" +#include "qemu/error-report.h" +#include "qemu/config-file.h" +#include "qemu/log.h" +#include "qemu/option_int.h" +#include "trace/control.h" +#include "io/channel-util.h" +#include "io/channel-buffer.h" +#include "migration/qemu-file-channel.h" +#include "migration/qemu-file.h" +#include "qemu-snapshot.h" + +int64_t page_size; +int64_t page_mask; +int page_bits; +int64_t slice_size; +int64_t slice_mask; +int slice_bits; + +static QemuOptsList snap_blk_optslist =3D { + .name =3D "blockdev", + .implied_opt_name =3D "file.filename", + .head =3D QTAILQ_HEAD_INITIALIZER(snap_blk_optslist.head), + .desc =3D { + { /*End of the list */ } + }, +}; + +static struct { + bool revert; /* Operation is snapshot revert */ + + int fd; /* Migration channel fd */ + int rp_fd; /* Return path fd (for postcopy) */ + + const char *blk_optstr; /* Command-line options for vmstate blockd= ev */ + QDict *blk_options; /* Blockdev options */ + int blk_flags; /* Blockdev flags */ + + bool postcopy; /* Use postcopy */ + int postcopy_percent; /* Start postcopy after % of normal pages = loaded */ +} params; + +static StateSaveCtx state_save_ctx; +static StateLoadCtx state_load_ctx; + +static enum { + RUNNING =3D 0, + TERMINATED +} state; + +#ifdef CONFIG_POSIX +void qemu_system_killed(int signum, pid_t pid) +{ +} +#endif /* CONFIG_POSIX */ + +StateSaveCtx *get_save_context(void) +{ + return &state_save_ctx; +} + +StateLoadCtx *get_load_context(void) +{ + return &state_load_ctx; +} + +static void init_save_context(void) +{ + memset(&state_save_ctx, 0, sizeof(state_save_ctx)); +} + +static void destroy_save_context(void) +{ + /* TODO: implement */ +} + +static void init_load_context(void) +{ + memset(&state_load_ctx, 0, sizeof(state_load_ctx)); +} + +static void destroy_load_context(void) +{ + /* TODO: implement */ +} + +static BlockBackend *image_open_opts(const char *optstr, QDict *options, i= nt flags) +{ + BlockBackend *blk; + Error *local_err =3D NULL; + + /* Open image and create block backend */ + blk =3D blk_new_open(NULL, NULL, options, flags, &local_err); + if (!blk) { + error_reportf_err(local_err, "Failed to open image '%s': ", optstr= ); + return NULL; + } + + blk_set_enable_write_cache(blk, true); + + return blk; +} + +/* Use BH to enter coroutine from the main loop */ +static void enter_co_bh(void *opaque) +{ + Coroutine *co =3D (Coroutine *) opaque; + qemu_coroutine_enter(co); +} + +static void coroutine_fn snapshot_save_co(void *opaque) +{ + StateSaveCtx *s =3D get_save_context(); + int res =3D -1; + + init_save_context(); + + /* Block backend */ + s->blk =3D image_open_opts(params.blk_optstr, params.blk_options, + params.blk_flags); + if (!s->blk) { + goto fail; + } + + res =3D save_state_main(s); + if (res) { + error_report("Failed to save snapshot: %s", strerror(-res)); + } + +fail: + destroy_save_context(); + state =3D TERMINATED; +} + +static void coroutine_fn snapshot_load_co(void *opaque) +{ + StateLoadCtx *s =3D get_load_context(); + int res =3D -1; + + init_load_context(); + + /* Block backend */ + s->blk =3D image_open_opts(params.blk_optstr, params.blk_options, + params.blk_flags); + if (!s->blk) { + goto fail; + } + + res =3D load_state_main(s); + if (res) { + error_report("Failed to load snapshot: %s", strerror(-res)); + } + +fail: + destroy_load_context(); + state =3D TERMINATED; +} + +static void usage(const char *name) +{ + printf( + "Usage: %s [options] \n" + "QEMU External Snapshot Utility\n" + "\n" + "'image-blockspec' is a block device specification for vmstate ima= ge\n" + "\n" + " -h, --help display this help and exit\n" + " -V, --version output version information and exit\n" + "\n" + "Options:\n" + " -T, --trace [[enable=3D]][,events=3D][,file=3D]\n" + " specify tracing options\n" + " -r, --revert revert to snapshot\n" + " --uri=3Dfd: specify migration fd\n" + " --page-size=3D specify target page size\n" + " --postcopy=3D<%%ram> switch to postcopy after %%ram loa= ded\n" + "\n" + QEMU_HELP_BOTTOM "\n", name); +} + +static void version(const char *name) +{ + printf( + "%s " QEMU_FULL_VERSION "\n" + "Written by Andrey Gruzdev.\n" + "\n" + QEMU_COPYRIGHT "\n" + "This is free software; see the source for copying conditions. Th= ere is NO\n" + "warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULA= R PURPOSE.\n", + name); +} + +enum { + OPTION_PAGE_SIZE =3D 256, + OPTION_POSTCOPY, + OPTION_URI, +}; + +static void process_options(int argc, char *argv[]) +{ + static const char *s_opt =3D "rhVT:"; + static const struct option l_opt[] =3D { + { "page-size", required_argument, NULL, OPTION_PAGE_SIZE }, + { "postcopy", required_argument, NULL, OPTION_POSTCOPY }, + { "uri", required_argument, NULL, OPTION_URI }, + { "revert", no_argument, NULL, 'r' }, + { "help", no_argument, NULL, 'h' }, + { "version", no_argument, NULL, 'V' }, + { "trace", required_argument, NULL, 'T' }, + { NULL, 0, NULL, 0 } + }; + + bool has_page_size =3D false; + bool has_uri =3D false; + + int64_t target_page_size =3D qemu_real_host_page_size; + int uri_fd =3D -1; + bool revert =3D false; + bool postcopy =3D false; + int postcopy_percent =3D 0; + const char *blk_optstr; + QemuOpts *blk_opts; + QDict *blk_options; + int c; + + while ((c =3D getopt_long(argc, argv, s_opt, l_opt, NULL)) !=3D -1) { + switch (c) { + case '?': + exit(EXIT_FAILURE); + + case 'h': + usage(argv[0]); + exit(EXIT_SUCCESS); + + case 'V': + version(argv[0]); + exit(EXIT_SUCCESS); + + case 'T': + trace_opt_parse(optarg); + break; + + case 'r': + if (revert) { + error_report("-r and --revert can only be specified on= ce"); + exit(EXIT_FAILURE); + } + revert =3D true; + =20 + break; + + case OPTION_POSTCOPY: + { + char *r; + + if (postcopy) { + error_report("--postcopy can only be specified once"); + exit(EXIT_FAILURE); + } + postcopy =3D true; + + postcopy_percent =3D strtol(optarg, &r, 10); + if (*r !=3D '\0' || postcopy_percent < 0 || postcopy_perce= nt > 100) { + error_report("Invalid argument to --postcopy"); + exit(EXIT_FAILURE); + } + + break; + } + + case OPTION_PAGE_SIZE: + { + char *r; + + if (has_page_size) { + error_report("--page-size can only be specified once"); + exit(EXIT_FAILURE); + } + has_page_size =3D true; + + target_page_size =3D strtol(optarg, &r, 0); + if (*r !=3D '\0' || (target_page_size & (target_page_size = - 1)) !=3D 0 || + target_page_size < PAGE_SIZE_MIN || + target_page_size > PAGE_SIZE_MAX) { + error_report("Invalid argument to --page-size"); + exit(EXIT_FAILURE); + } + + break; + } + + case OPTION_URI: + { + const char *p; + + if (has_uri) { + error_report("--uri can only be specified once"); + exit(EXIT_FAILURE); + } + has_uri =3D true; + + /* Only "--uri=3Dfd:" is currently supported */ + if (strstart(optarg, "fd:", &p)) { + char *r; + int fd; + + fd =3D strtol(p, &r,10); + if (*r !=3D '\0' || fd <=3D STDERR_FILENO) { + error_report("Invalid FD value"); + exit(EXIT_FAILURE); + } + + uri_fd =3D qemu_dup_flags(fd, O_CLOEXEC); + if (uri_fd < 0) { + error_report("Could not dup FD %d", fd); + exit(EXIT_FAILURE); + } + + /* Close original fd */ + close(fd); + } else { + error_report("Invalid argument to --uri"); + exit(EXIT_FAILURE); + } + + break; + } + + default: + g_assert_not_reached(); + } + } + + if ((argc - optind) !=3D 1) { + error_report("Invalid number of arguments"); + exit(EXIT_FAILURE); + } + + blk_optstr =3D argv[optind]; + + blk_opts =3D qemu_opts_parse_noisily(&snap_blk_optslist, blk_optstr, t= rue); + if (!blk_opts) { + exit(EXIT_FAILURE); + } + blk_options =3D qemu_opts_to_qdict(blk_opts, NULL); + qemu_opts_reset(&snap_blk_optslist); + + /* Enforced block layer options */ + qdict_put_str(blk_options, "driver", "qcow2"); + qdict_put_null(blk_options, "backing"); + qdict_put_str(blk_options, "overlap-check", "none"); + qdict_put_str(blk_options, "auto-read-only", "off"); + qdict_put_str(blk_options, "detect-zeroes", "off"); + qdict_put_str(blk_options, "lazy-refcounts", "on"); + qdict_put_str(blk_options, "file.auto-read-only", "off"); + qdict_put_str(blk_options, "file.detect-zeroes", "off"); + + params.revert =3D revert; + + if (uri_fd !=3D -1) { + params.fd =3D params.rp_fd =3D uri_fd; + } else { + params.fd =3D revert ? STDOUT_FILENO : STDIN_FILENO; + params.rp_fd =3D revert ? STDIN_FILENO : -1; + } + params.blk_optstr =3D blk_optstr; + params.blk_options =3D blk_options; + params.blk_flags =3D revert ? 0 : BDRV_O_RDWR; + params.postcopy =3D postcopy; + params.postcopy_percent =3D postcopy_percent; + + page_size =3D target_page_size; + page_mask =3D ~(page_size - 1); + page_bits =3D ctz64(page_size); + slice_size =3D revert ? SLICE_SIZE_REVERT : SLICE_SIZE; + slice_mask =3D ~(slice_size - 1); + slice_bits =3D ctz64(slice_size); +} + +int main(int argc, char **argv) +{ + Coroutine *co; + + os_setup_early_signal_handling(); + os_setup_signal_handling(); + error_init(argv[0]); + qemu_init_exec_dir(argv[0]); + module_call_init(MODULE_INIT_TRACE); + module_call_init(MODULE_INIT_QOM); + qemu_init_main_loop(&error_fatal); + bdrv_init(); + + qemu_add_opts(&qemu_trace_opts); + process_options(argc, argv); + + if (!trace_init_backends()) { + exit(EXIT_FAILURE); + } + trace_init_file(); + qemu_set_log(LOG_TRACE); + + ram_init_state(); + + if (params.revert) { + co =3D qemu_coroutine_create(snapshot_load_co, NULL); + } else { + co =3D qemu_coroutine_create(snapshot_save_co, NULL); + } + aio_bh_schedule_oneshot(qemu_get_aio_context(), enter_co_bh, co); + + do { + main_loop_wait(false); + } while (state !=3D TERMINATED); + + exit(EXIT_SUCCESS); +} --=20 2.27.0 From nobody Sat May 4 11:49:39 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=virtuozzo.com ARC-Seal: i=1; a=rsa-sha256; t=1620849955; cv=none; d=zohomail.com; s=zohoarc; b=fXrYLAhUDtBqJtpTzVlsxqdmVS8ExALkgx6U/N7EcvxfvmzDiGh2cKqS4LIS2Z8/PlleOxiKKy2YKNNJ6CZUaeStRR6tANM8VoTfsKMLXwBNoB7WdcFwrDY0MDN+JyklMvbUchGTahWKHJTkoReFH5QRapQ3qROE9wnmytbsGGI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1620849955; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=jKIMHQ971fd0/ATRtGq6S+1Upp4MAjJyJ73ggJBuvpM=; b=cyHx88UW7D+PA+0xtHfBEAseafn9E2uYG6ywh7xHdMkYf49EH8R5YLFJI/wicFFP9NuDS2gtjoxZoz6GTsqXgK2XHpZNxje6IVVAyZ+42FzqNwTgtbNvYd7+Q4N0K4vtszHHs1J5Ox/ApQZ0ocKt0HyLFL/pkVHPxTf9qj/qDvY= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1620849955913537.7142615118911; Wed, 12 May 2021 13:05:55 -0700 (PDT) Received: from localhost ([::1]:56214 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lgv7D-0006bE-5A for importer@patchew.org; Wed, 12 May 2021 16:05:55 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60190) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lguV3-000247-Gl for qemu-devel@nongnu.org; Wed, 12 May 2021 15:26:29 -0400 Received: from relay.sw.ru ([185.231.240.75]:44744) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lguUz-0000oj-Ej for qemu-devel@nongnu.org; Wed, 12 May 2021 15:26:29 -0400 Received: from [192.168.15.22] (helo=andrey-MS-7B54.sw.ru) by relay.sw.ru with esmtp (Exim 4.94) (envelope-from ) id 1lguUt-002BHm-Ri; Wed, 12 May 2021 22:26:19 +0300 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=jKIMHQ971fd0/ATRtGq6S+1Upp4MAjJyJ73ggJBuvpM=; b=enLn2np5rKxs tXnCNrValca00KrKy5jdE83unCFyrps1y7Z9Voq1G/LPUojUP5RTlncrvlISPQBd07BMhEnOFGrfz dhlfg51DTOPV48vxuKnw06Urxkn91dsu4+O/wLZokkbj88LS5kyf7KWJQV8/MsvCmfvDtkpqlxf2U dCXm4=; From: Andrey Gruzdev To: qemu-devel@nongnu.org Cc: Den Lunev , Vladimir Sementsov-Ogievskiy , Eric Blake , Paolo Bonzini , Juan Quintela , "Dr . David Alan Gilbert" , Markus Armbruster , Peter Xu , David Hildenbrand , Andrey Gruzdev Subject: [RFC PATCH v1 2/7] migration/snapshot: Introduce qemu_ftell2() routine Date: Wed, 12 May 2021 22:26:14 +0300 Message-Id: <20210512192619.537268-3-andrey.gruzdev@virtuozzo.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210512192619.537268-1-andrey.gruzdev@virtuozzo.com> References: <20210512192619.537268-1-andrey.gruzdev@virtuozzo.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=185.231.240.75; envelope-from=andrey.gruzdev@virtuozzo.com; helo=relay.sw.ru X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @virtuozzo.com) Content-Type: text/plain; charset="utf-8" In qemu-snapshot it is needed to retrieve current QEMUFile offset as a number of bytes read by qemu_get_byte()/qemu_get_buffer(). The existing qemu_ftell() routine would give read position as a number of bytes fetched from underlying IOChannel which is not the same. Signed-off-by: Andrey Gruzdev --- migration/qemu-file.c | 6 ++++++ migration/qemu-file.h | 1 + 2 files changed, 7 insertions(+) diff --git a/migration/qemu-file.c b/migration/qemu-file.c index d6e03dbc0e..66be5e6460 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -657,6 +657,12 @@ int64_t qemu_ftell(QEMUFile *f) return f->pos; } =20 +int64_t qemu_ftell2(QEMUFile *f) +{ + qemu_fflush(f); + return f->pos + f->buf_index - f->buf_size; +} + int qemu_file_rate_limit(QEMUFile *f) { if (f->shutdown) { diff --git a/migration/qemu-file.h b/migration/qemu-file.h index a9b6d6ccb7..bd1a6def02 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -124,6 +124,7 @@ void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHoo= ks *hooks); int qemu_get_fd(QEMUFile *f); int qemu_fclose(QEMUFile *f); int64_t qemu_ftell(QEMUFile *f); +int64_t qemu_ftell2(QEMUFile *f); int64_t qemu_ftell_fast(QEMUFile *f); /* * put_buffer without copying the buffer. --=20 2.27.0 From nobody Sat May 4 11:49:39 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=virtuozzo.com ARC-Seal: i=1; a=rsa-sha256; t=1620847742; cv=none; d=zohomail.com; s=zohoarc; b=YHZ5zOwrAVL1E21AvblhQOfE3c8t70XhgVfTJUl5/yYApuBtfHB7Q7R06gmlJwLMtFxAHb3mQF7m67Y2ZBJCeyrCmBnM9u22WUmI9hNfQdAkMD1Q8ypI2JHuDk699G8I7lO2YuJRvRSNrhCAJ8vCSKFA3CjtZL3yNmz/Mqi489Q= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1620847742; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=A0JWUOrAKW7MU5G28uWQDZnypzF4zh7je8ZCTXY3fyI=; b=HJ2X2pCu2Gi/f3JNYf/KWvx8wQppj+f4mcdqPrJYZSTXttsY5QYhcFH7PkmDzrME/NvLHOi9cGW1rnf5xD7kXH6A1oEM0gTxWWpohNPR/64KGUwJ8piEYWPQ/ZnmtCrp69syKXRZlltgSExTyzSp93GpgqZPkQL8Fmq7at9N9mw= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1620847741660470.31325888505944; Wed, 12 May 2021 12:29:01 -0700 (PDT) Received: from localhost ([::1]:40278 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lguXU-00070v-Ht for importer@patchew.org; Wed, 12 May 2021 15:29:00 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60170) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lguV2-000213-3d for qemu-devel@nongnu.org; Wed, 12 May 2021 15:26:28 -0400 Received: from relay.sw.ru ([185.231.240.75]:44746) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lguUz-0000ol-9y for qemu-devel@nongnu.org; Wed, 12 May 2021 15:26:27 -0400 Received: from [192.168.15.22] (helo=andrey-MS-7B54.sw.ru) by relay.sw.ru with esmtp (Exim 4.94) (envelope-from ) id 1lguUu-002BHm-13; Wed, 12 May 2021 22:26:20 +0300 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=A0JWUOrAKW7MU5G28uWQDZnypzF4zh7je8ZCTXY3fyI=; b=oAyAQECMBS/n rsPnSZQVOOWstWpq5PGtrtR4aGDaXmqOEiB7Gh3ayvM+2I0XgCNDph/JtrgN5GXmHX4KaM9DVFvGo svwrSGBnsQuTrNycDzac7xtauV8PWG1GKz1XewZLXjrISKJEMXM5eXpoWuXB+Fa7sIrUbZJYO1XRe uYvg4=; From: Andrey Gruzdev To: qemu-devel@nongnu.org Cc: Den Lunev , Vladimir Sementsov-Ogievskiy , Eric Blake , Paolo Bonzini , Juan Quintela , "Dr . David Alan Gilbert" , Markus Armbruster , Peter Xu , David Hildenbrand , Andrey Gruzdev Subject: [RFC PATCH v1 3/7] migration/snapshot: Move RAM_SAVE_FLAG_xxx defines to migration/ram.h Date: Wed, 12 May 2021 22:26:15 +0300 Message-Id: <20210512192619.537268-4-andrey.gruzdev@virtuozzo.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210512192619.537268-1-andrey.gruzdev@virtuozzo.com> References: <20210512192619.537268-1-andrey.gruzdev@virtuozzo.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=185.231.240.75; envelope-from=andrey.gruzdev@virtuozzo.com; helo=relay.sw.ru X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @virtuozzo.com) Content-Type: text/plain; charset="utf-8" Move RAM_SAVE_FLAG_xxx defines from migration/ram.c to migration/ram.h Signed-off-by: Andrey Gruzdev --- migration/ram.c | 16 ---------------- migration/ram.h | 16 ++++++++++++++++ 2 files changed, 16 insertions(+), 16 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index ace8ad431c..0359b63dde 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -63,22 +63,6 @@ /***********************************************************/ /* ram save/restore */ =20 -/* RAM_SAVE_FLAG_ZERO used to be named RAM_SAVE_FLAG_COMPRESS, it - * worked for pages that where filled with the same char. We switched - * it to only search for the zero value. And to avoid confusion with - * RAM_SSAVE_FLAG_COMPRESS_PAGE just rename it. - */ - -#define RAM_SAVE_FLAG_FULL 0x01 /* Obsolete, not used anymore */ -#define RAM_SAVE_FLAG_ZERO 0x02 -#define RAM_SAVE_FLAG_MEM_SIZE 0x04 -#define RAM_SAVE_FLAG_PAGE 0x08 -#define RAM_SAVE_FLAG_EOS 0x10 -#define RAM_SAVE_FLAG_CONTINUE 0x20 -#define RAM_SAVE_FLAG_XBZRLE 0x40 -/* 0x80 is reserved in migration.h start with 0x100 next */ -#define RAM_SAVE_FLAG_COMPRESS_PAGE 0x100 - static inline bool is_zero_range(uint8_t *p, uint64_t size) { return buffer_is_zero(p, size); diff --git a/migration/ram.h b/migration/ram.h index 4833e9fd5b..d6498b651f 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -33,6 +33,22 @@ #include "exec/cpu-common.h" #include "io/channel.h" =20 +/* RAM_SAVE_FLAG_ZERO used to be named RAM_SAVE_FLAG_COMPRESS, it + * worked for pages that where filled with the same char. We switched + * it to only search for the zero value. And to avoid confusion with + * RAM_SSAVE_FLAG_COMPRESS_PAGE just rename it. + */ + +#define RAM_SAVE_FLAG_FULL 0x01 /* Obsolete, not used anymore */ +#define RAM_SAVE_FLAG_ZERO 0x02 +#define RAM_SAVE_FLAG_MEM_SIZE 0x04 +#define RAM_SAVE_FLAG_PAGE 0x08 +#define RAM_SAVE_FLAG_EOS 0x10 +#define RAM_SAVE_FLAG_CONTINUE 0x20 +#define RAM_SAVE_FLAG_XBZRLE 0x40 +/* 0x80 is reserved in migration.h start with 0x100 next */ +#define RAM_SAVE_FLAG_COMPRESS_PAGE 0x100 + extern MigrationStats ram_counters; extern XBZRLECacheStats xbzrle_counters; extern CompressionStats compression_counters; --=20 2.27.0 From nobody Sat May 4 11:49:39 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=virtuozzo.com ARC-Seal: i=1; a=rsa-sha256; t=1620849837; cv=none; d=zohomail.com; s=zohoarc; b=I/6dGSpc5A1TZfr2ds1H1kpEOGLLmmpRqkbhbk2EwoEE/gG/UjE13J1aBP8XSW1K8oumsDG/YC0ZTdLSI6NayMhTfTUlHjHOQXQl+oa9c8VNotRO/Bu6870jNZEwEYfpIPT/T8UsU7hA0prsCs3SpahkpbklvVapG56+q0cy/gk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1620849837; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=jUzyxOas3D4hkM6Th+OZ/ZukUBJ1FSlPRUclUe0K3ok=; b=NKf6wnqKVzrifnOrEcCOJdcGoHu3FhopXQ0uvP2GgiTDgRoWtSUmvvNE/8Xqb//S0oMOlq7bE5U0BNG8BNIIESJmUIvJit7uhV+Fnfhqh4/6Uqpzo2dfQffZ3bVmcp/NqRp1HLqoPaXK/MUuBbWgC0asQwycOWUgweq1puDpfAg= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1620849836694260.46329429837135; Wed, 12 May 2021 13:03:56 -0700 (PDT) Received: from localhost ([::1]:53332 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lgv5H-0004Wc-Sp for importer@patchew.org; Wed, 12 May 2021 16:03:55 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60176) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lguV2-00022B-IW for qemu-devel@nongnu.org; Wed, 12 May 2021 15:26:28 -0400 Received: from relay.sw.ru ([185.231.240.75]:44762) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lguUz-0000op-FJ for qemu-devel@nongnu.org; Wed, 12 May 2021 15:26:28 -0400 Received: from [192.168.15.22] (helo=andrey-MS-7B54.sw.ru) by relay.sw.ru with esmtp (Exim 4.94) (envelope-from ) id 1lguUu-002BHm-5E; Wed, 12 May 2021 22:26:20 +0300 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=jUzyxOas3D4hkM6Th+OZ/ZukUBJ1FSlPRUclUe0K3ok=; b=N3LeN9g7qc7o HMy+cPEhD40WDZN4U8OGokiw+iKBuiHmtn7JAPSo1k5ZaOLGdhJn3Rlfc9y2xusJxB2GS1zCzIRVC 4Hiehuta2m3lFOd+sVN2APGIEHc0NUriYlvxcqYtSoEsOYaYYLnys5T+O0XCS+u/7fCMiWVoz3I3X pNOh0=; From: Andrey Gruzdev To: qemu-devel@nongnu.org Cc: Den Lunev , Vladimir Sementsov-Ogievskiy , Eric Blake , Paolo Bonzini , Juan Quintela , "Dr . David Alan Gilbert" , Markus Armbruster , Peter Xu , David Hildenbrand , Andrey Gruzdev Subject: [RFC PATCH v1 4/7] migration/snapshot: Block layer AIO support in qemu-snapshot Date: Wed, 12 May 2021 22:26:16 +0300 Message-Id: <20210512192619.537268-5-andrey.gruzdev@virtuozzo.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210512192619.537268-1-andrey.gruzdev@virtuozzo.com> References: <20210512192619.537268-1-andrey.gruzdev@virtuozzo.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=185.231.240.75; envelope-from=andrey.gruzdev@virtuozzo.com; helo=relay.sw.ru X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @virtuozzo.com) Content-Type: text/plain; charset="utf-8" This commit enables asynchronous block layer I/O for qemu-snapshot tool. Implementation provides in-order request completion delivery to simplify migration code. Several file utility routines are introduced as well. Signed-off-by: Andrey Gruzdev --- include/qemu-snapshot.h | 30 +++++ meson.build | 2 +- qemu-snapshot-io.c | 266 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 297 insertions(+), 1 deletion(-) create mode 100644 qemu-snapshot-io.c diff --git a/include/qemu-snapshot.h b/include/qemu-snapshot.h index 154e11e9a5..7b3406fd56 100644 --- a/include/qemu-snapshot.h +++ b/include/qemu-snapshot.h @@ -34,6 +34,23 @@ /* RAM slice size for snapshot revert */ #define SLICE_SIZE_REVERT (16 * PAGE_SIZE_MAX) =20 +typedef struct AioRing AioRing; + +typedef struct AioRingRequest { + void *opaque; /* Opaque */ + + void *data; /* Data buffer */ + int64_t offset; /* Offset */ + size_t size; /* Size */ +} AioRingRequest; + +typedef struct AioRingEvent { + AioRingRequest *origin; /* Originating request */ + ssize_t status; /* Completion status */ +} AioRingEvent; + +typedef ssize_t coroutine_fn (*AioRingFunc)(AioRingRequest *req); + typedef struct StateSaveCtx { BlockBackend *blk; /* Block backend */ } StateSaveCtx; @@ -56,4 +73,17 @@ StateLoadCtx *get_load_context(void); int coroutine_fn save_state_main(StateSaveCtx *s); int coroutine_fn load_state_main(StateLoadCtx *s); =20 +AioRing *coroutine_fn aio_ring_new(AioRingFunc func, unsigned ring_entries, + unsigned max_inflight); +void aio_ring_free(AioRing *ring); +void aio_ring_set_max_inflight(AioRing *ring, unsigned max_inflight); +AioRingRequest *coroutine_fn aio_ring_get_request(AioRing *ring); +void coroutine_fn aio_ring_submit(AioRing *ring); +AioRingEvent *coroutine_fn aio_ring_wait_event(AioRing *ring); +void coroutine_fn aio_ring_complete(AioRing *ring); + +QEMUFile *qemu_fopen_bdrv_vmstate(BlockDriverState *bs, int is_writable); +void qemu_fsplice(QEMUFile *f_dst, QEMUFile *f_src, size_t size); +void qemu_fsplice_tail(QEMUFile *f_dst, QEMUFile *f_src); + #endif /* QEMU_SNAPSHOT_H */ diff --git a/meson.build b/meson.build index b851671914..c25fc518df 100644 --- a/meson.build +++ b/meson.build @@ -2361,7 +2361,7 @@ if have_tools dependencies: [block, qemuutil], install: true) qemu_nbd =3D executable('qemu-nbd', files('qemu-nbd.c'), dependencies: [blockdev, qemuutil, gnutls], install: true) - qemu_snapshot =3D executable('qemu-snapshot', files('qemu-snapshot.c', '= qemu-snapshot-vm.c'), + qemu_snapshot =3D executable('qemu-snapshot', files('qemu-snapshot.c', '= qemu-snapshot-vm.c', 'qemu-snapshot-io.c'), dependencies: [blockdev, qemuutil, migration], install: tru= e) =20 subdir('storage-daemon') diff --git a/qemu-snapshot-io.c b/qemu-snapshot-io.c new file mode 100644 index 0000000000..cd6428a4a2 --- /dev/null +++ b/qemu-snapshot-io.c @@ -0,0 +1,266 @@ +/* + * QEMU External Snapshot Utility + * + * Copyright Virtuozzo GmbH, 2021 + * + * Authors: + * Andrey Gruzdev + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "qemu/coroutine.h" +#include "sysemu/block-backend.h" +#include "migration/qemu-file.h" +#include "qemu-snapshot.h" + +/* + * AIO ring. + * + * Coroutine-based environment to support asynchronous I/O operations + * providing in-order completion event delivery. + * + * All routines (with an exception of aio_ring_free()) are required to be + * called from the same coroutine. + * + * Call sequence to keep AIO ring filled: + * + * aio_ring_new() ! + * ! + * aio_ring_get_request() !<------!<------! + * aio_ring_submit() !------>! ! + * ! ! + * aio_ring_wait_event() ! ! + * aio_ring_complete() !-------------->! + * ! + * aio_ring_free() ! + * + */ + +typedef struct AioRingEntry { + AioRingRequest request; /* I/O request */ + AioRingEvent event; /* I/O completion event */ + bool owned; /* Owned by caller */ +} AioRingEntry; + +typedef struct AioRing { + unsigned head; /* Head entry index */ + unsigned tail; /* Tail entry index */ + + unsigned ring_mask; /* Mask for ring entry indices */ + unsigned ring_entries; /* Number of entries in the ring */ + + AioRingFunc func; /* Routine to call */ + + Coroutine *main_co; /* Caller's coroutine */ + bool waiting; /* Caller is waiting for event */ + + unsigned length; /* Tail-head distance */ + unsigned inflight; /* Number of in-flight requests */ + unsigned max_inflight; /* Maximum in-flight requests */ + + AioRingEntry entries[]; /* Flex-array of AioRingEntry */ +} AioRing; + +static void coroutine_fn aio_ring_co(void *opaque) +{ + AioRing *ring =3D (AioRing *) opaque; + AioRingEntry *entry =3D &ring->entries[ring->tail]; + + ring->tail =3D (ring->tail + 1) & ring->ring_mask; + ring->length++; + + ring->inflight++; + entry->owned =3D false; + + entry->event.status =3D ring->func(&entry->request); + + entry->event.origin =3D &entry->request; + entry->owned =3D true; + ring->inflight--; + + if (ring->waiting) { + ring->waiting =3D false; + aio_co_wake(ring->main_co); + } +} + +AioRingRequest *coroutine_fn aio_ring_get_request(AioRing *ring) +{ + assert(qemu_coroutine_self() =3D=3D ring->main_co); + + if (ring->length >=3D ring->ring_entries || + ring->inflight >=3D ring->max_inflight) { + return NULL; + } + + return &ring->entries[ring->tail].request; +} + +void coroutine_fn aio_ring_submit(AioRing *ring) +{ + assert(qemu_coroutine_self() =3D=3D ring->main_co); + assert(ring->length < ring->ring_entries); + + qemu_coroutine_enter(qemu_coroutine_create(aio_ring_co, ring)); +} + +AioRingEvent *coroutine_fn aio_ring_wait_event(AioRing *ring) +{ + AioRingEntry *entry =3D &ring->entries[ring->head]; + + assert(qemu_coroutine_self() =3D=3D ring->main_co); + + if (!ring->length) { + return NULL; + } + + while (true) { + if (entry->owned) { + return &entry->event; + } + ring->waiting =3D true; + qemu_coroutine_yield(); + } + + /* NOTREACHED */ +} + +void coroutine_fn aio_ring_complete(AioRing *ring) +{ + AioRingEntry *entry =3D &ring->entries[ring->head]; + + assert(qemu_coroutine_self() =3D=3D ring->main_co); + assert(ring->length); + + ring->head =3D (ring->head + 1) & ring->ring_mask; + ring->length--; + + entry->event.origin =3D NULL; + entry->event.status =3D 0; +} + +/* Create new AIO ring */ +AioRing *coroutine_fn aio_ring_new(AioRingFunc func, unsigned ring_entries, + unsigned max_inflight) +{ + AioRing *ring; + + assert(is_power_of_2(ring_entries)); + assert(max_inflight && max_inflight <=3D ring_entries); + + ring =3D g_malloc0(sizeof(AioRing) + ring_entries * sizeof(AioRingEntr= y)); + ring->main_co =3D qemu_coroutine_self(); + ring->ring_entries =3D ring_entries; + ring->ring_mask =3D ring_entries - 1; + ring->max_inflight =3D max_inflight; + ring->func =3D func; + + return ring; +} + +/* Free AIO ring */ +void aio_ring_free(AioRing *ring) +{ + assert(!ring->inflight); + g_free(ring); +} + +/* Limit the maximum number of in-flight AIO requests */ +void aio_ring_set_max_inflight(AioRing *ring, unsigned max_inflight) +{ + ring->max_inflight =3D MIN(max_inflight, ring->ring_entries); +} + +static ssize_t bdrv_vmstate_get_buffer(void *opaque, uint8_t *buf, int64_t= pos, + size_t size, Error **errp) +{ + return bdrv_load_vmstate((BlockDriverState *) opaque, buf, pos, size); +} + +static ssize_t bdrv_vmstate_writev_buffer(void *opaque, struct iovec *iov, + int iovcnt, int64_t pos, Error **errp) +{ + QEMUIOVector qiov; + int res; + + qemu_iovec_init_external(&qiov, iov, iovcnt); + + res =3D bdrv_writev_vmstate((BlockDriverState *) opaque, &qiov, pos); + if (res < 0) { + return res; + } + + return qiov.size; +} + +static int bdrv_vmstate_fclose(void *opaque, Error **errp) +{ + return bdrv_flush((BlockDriverState *) opaque); +} + +static const QEMUFileOps bdrv_vmstate_read_ops =3D { + .get_buffer =3D bdrv_vmstate_get_buffer, + .close =3D bdrv_vmstate_fclose, +}; + +static const QEMUFileOps bdrv_vmstate_write_ops =3D { + .writev_buffer =3D bdrv_vmstate_writev_buffer, + .close =3D bdrv_vmstate_fclose, +}; + +/* Create QEMUFile to access vmstate stream on QCOW2 image */ +QEMUFile *qemu_fopen_bdrv_vmstate(BlockDriverState *bs, int is_writable) +{ + if (is_writable) { + return qemu_fopen_ops(bs, &bdrv_vmstate_write_ops); + } + + return qemu_fopen_ops(bs, &bdrv_vmstate_read_ops); +} + +/* Move number of bytes from the source QEMUFile to destination */ +void qemu_fsplice(QEMUFile *f_dst, QEMUFile *f_src, size_t size) +{ + size_t rest =3D size; + + while (rest) { + uint8_t *ptr =3D NULL; + size_t req_size; + size_t count; + + req_size =3D MIN(rest, INPLACE_READ_MAX); + count =3D qemu_peek_buffer(f_src, &ptr, req_size, 0); + qemu_file_skip(f_src, count); + + qemu_put_buffer(f_dst, ptr, count); + rest -=3D count; + } +} + +/* + * Move data from source QEMUFile to destination + * until EOF is reached on source. + */ +void qemu_fsplice_tail(QEMUFile *f_dst, QEMUFile *f_src) +{ + bool eof =3D false; + + while (!eof) { + const size_t size =3D INPLACE_READ_MAX; + uint8_t *buffer =3D NULL; + size_t count; + + count =3D qemu_peek_buffer(f_src, &buffer, size, 0); + qemu_file_skip(f_src, count); + + /* Reached EOF on source? */ + if (count !=3D size) { + eof =3D true; + } + + qemu_put_buffer(f_dst, buffer, count); + } +} --=20 2.27.0 From nobody Sat May 4 11:49:39 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=virtuozzo.com ARC-Seal: i=1; a=rsa-sha256; t=1620850419; cv=none; d=zohomail.com; s=zohoarc; b=KpH+PlGbKpNUObz0emkQaMXqA8BIpim4rYeUooy6UnMYXLnY0K9G3riH5HgTOwszdUns7l66z1/2SgWfeZPeWFt6KuWZIUc5W9nyaOL7rlUcbxZqASL6gmAJ3Z3SsVKjetWVfeJGOHzuNzfqMn9L8bkig1Wqk1Ag11m7bfsyHTE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1620850419; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=jqt+vOkl/jrbzZePPIvLjIeBj6wuVljivCeD5fB9gXw=; b=eGS7pQWA5UkRi8OhXJDHH5mi3MKlBuDiZ/RuZKQWdP212aBF+Wxccb9MVmRa23Ja6coEmfcikifeOlG/cfC0GtIvUtJ7n4CE+W68RH6QTK+vPfNW3As3Oh/+ajsIHPD0Hq9P5/wCwun5r/Hl2bt+6vqUhOfqAxb+AGMUgBNgYzY= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 162085041911214.875480769562614; Wed, 12 May 2021 13:13:39 -0700 (PDT) Received: from localhost ([::1]:44850 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lgvEg-0001Yn-8b for importer@patchew.org; Wed, 12 May 2021 16:13:38 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60256) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lguV8-0002Fq-6T for qemu-devel@nongnu.org; Wed, 12 May 2021 15:26:34 -0400 Received: from relay.sw.ru ([185.231.240.75]:44740) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lguUz-0000om-EX for qemu-devel@nongnu.org; Wed, 12 May 2021 15:26:33 -0400 Received: from [192.168.15.22] (helo=andrey-MS-7B54.sw.ru) by relay.sw.ru with esmtp (Exim 4.94) (envelope-from ) id 1lguUu-002BHm-9y; Wed, 12 May 2021 22:26:20 +0300 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=jqt+vOkl/jrbzZePPIvLjIeBj6wuVljivCeD5fB9gXw=; b=ZmU1HJFb6GPt OCu7ePqneXRGa4rZOuxwiQaT2OgESAvVzmwXgsPcnFOH/aZWTysAYyL++dYnUtJS2t+hc1HxR4d9j XfUjyPiyhj5vr9/U7JCspw+MX4uSgWogFA2oRSpKRryuGDzbsRF7WzfQzeLglFCZOFV+C5T3ardmR yqAr8=; From: Andrey Gruzdev To: qemu-devel@nongnu.org Cc: Den Lunev , Vladimir Sementsov-Ogievskiy , Eric Blake , Paolo Bonzini , Juan Quintela , "Dr . David Alan Gilbert" , Markus Armbruster , Peter Xu , David Hildenbrand , Andrey Gruzdev Subject: [RFC PATCH v1 5/7] migration/snapshot: Implementation of qemu-snapshot save path Date: Wed, 12 May 2021 22:26:17 +0300 Message-Id: <20210512192619.537268-6-andrey.gruzdev@virtuozzo.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210512192619.537268-1-andrey.gruzdev@virtuozzo.com> References: <20210512192619.537268-1-andrey.gruzdev@virtuozzo.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=185.231.240.75; envelope-from=andrey.gruzdev@virtuozzo.com; helo=relay.sw.ru X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @virtuozzo.com) Content-Type: text/plain; charset="utf-8" Includes code to parse incoming migration stream, dispatch data to section handlers and deal with complications of open-coded migration format without introducing strong dependencies on QEMU migration code. Signed-off-by: Andrey Gruzdev --- include/qemu-snapshot.h | 34 +- qemu-snapshot-vm.c | 771 +++++++++++++++++++++++++++++++++++++++- qemu-snapshot.c | 56 ++- 3 files changed, 857 insertions(+), 4 deletions(-) diff --git a/include/qemu-snapshot.h b/include/qemu-snapshot.h index 7b3406fd56..52519f76c4 100644 --- a/include/qemu-snapshot.h +++ b/include/qemu-snapshot.h @@ -51,8 +51,40 @@ typedef struct AioRingEvent { =20 typedef ssize_t coroutine_fn (*AioRingFunc)(AioRingRequest *req); =20 +typedef struct QIOChannelBuffer QIOChannelBuffer; + typedef struct StateSaveCtx { - BlockBackend *blk; /* Block backend */ + BlockBackend *blk; /* Block backend */ + QEMUFile *f_fd; /* QEMUFile for incoming stream */ + QEMUFile *f_vmstate; /* QEMUFile for vmstate backing */ + + QIOChannelBuffer *ioc_leader; /* Migration stream leader */ + QIOChannelBuffer *ioc_pages; /* Page coalescing buffer */ + + /* Block offset of first page in ioc_pages */ + int64_t bdrv_offset; + /* Block offset of the last page in ioc_pages */ + int64_t last_bdrv_offset; + + /* Current section offset */ + int64_t section_offset; + /* Offset of the section containing list of RAM blocks */ + int64_t ram_list_offset; + /* Offset of the first RAM section */ + int64_t ram_offset; + /* Offset of the first non-iterable device section */ + int64_t device_offset; + + /* Zero buffer to fill unwritten slices on backing */ + void *zero_buf; + + /* + * Since we can't rewind the state of migration stream QEMUFile, we ju= st + * keep first few hundreds of bytes from the beginning of each section= for + * the case if particular section appears to be the first non-iterable + * device section and we are going to call default_handler(). + */ + uint8_t section_header[512]; } StateSaveCtx; =20 typedef struct StateLoadCtx { diff --git a/qemu-snapshot-vm.c b/qemu-snapshot-vm.c index f7695e75c7..2d8f2d3d79 100644 --- a/qemu-snapshot-vm.c +++ b/qemu-snapshot-vm.c @@ -23,17 +23,784 @@ #include "migration/ram.h" #include "qemu-snapshot.h" =20 +/* vmstate header magic */ +#define VMSTATE_HEADER_MAGIC 0x5354564d +/* vmstate header eof_offset position */ +#define VMSTATE_HEADER_EOF_OFFSET 24 +/* vmstate header size */ +#define VMSTATE_HEADER_SIZE 28 + +/* Maximum size of page coalescing buffer */ +#define PAGE_COALESC_MAX (512 * 1024) + +/* RAM block */ +typedef struct RAMBlock { + int64_t bdrv_offset; /* Offset on backing storage */ + int64_t length; /* Length */ + int64_t nr_pages; /* Page count */ + int64_t nr_slices; /* Number of slices (for bitmap bookkeepin= g) */ + + unsigned long *bitmap; /* Bitmap of RAM slices */ + + /* Link into ram_list */ + QSIMPLEQ_ENTRY(RAMBlock) next; + + char idstr[256]; /* RAM block id string */ +} RAMBlock; + +/* RAM block page */ +typedef struct RAMPage { + RAMBlock *block; /* RAM block containing the page */ + int64_t offset; /* Page offset in RAM block */ +} RAMPage; + /* RAM transfer context */ typedef struct RAMCtx { int64_t normal_pages; /* Total number of normal pages */ + + /* RAM block list head */ + QSIMPLEQ_HEAD(, RAMBlock) ram_block_list; } RAMCtx; =20 +/* Section handler ops */ +typedef struct SectionHandlerOps { + int (*save_state)(QEMUFile *f, void *opaque, int version_id); + int (*load_state)(QEMUFile *f, void *opaque, int version_id); + int (*load_state_iterate)(QEMUFile *f, void *opaque, int version_id); +} SectionHandlerOps; + +/* Section handlers entry */ +typedef struct SectionHandlersEntry { + const char *idstr; /* Section id string */ + const int instance_id; /* Section instance id */ + const int version_id; /* Max. supported section version id */ + + int real_section_id; /* Section id from migration stream */ + int real_version_id; /* Version id from migration stream */ + + SectionHandlerOps *ops; /* Section handler callbacks */ +} SectionHandlersEntry; + +/* Section handlers */ +typedef struct SectionHandlers { + /* Default handler */ + SectionHandlersEntry default_; + /* Handlers */ + SectionHandlersEntry handlers[]; +} SectionHandlers; + +#define SECTION_HANDLERS_ENTRY(_idstr, _instance_id, _version_id, _ops) { = \ + .idstr =3D _idstr, \ + .instance_id =3D (_instance_id), \ + .version_id =3D (_version_id), \ + .ops =3D (_ops), \ +} + +#define SECTION_HANDLERS_END() { NULL, } + +/* Forward declarations */ +static int default_save(QEMUFile *f, void *opaque, int version_id); +static int ram_save(QEMUFile *f, void *opaque, int version_id); +static int save_state_complete(StateSaveCtx *s); + static RAMCtx ram_ctx; =20 +static SectionHandlerOps default_handler_ops =3D { + .save_state =3D default_save, +}; + +static SectionHandlerOps ram_handler_ops =3D { + .save_state =3D ram_save, +}; + +static SectionHandlers section_handlers =3D { + .default_ =3D SECTION_HANDLERS_ENTRY("default", 0, 0, &default_handler= _ops), + .handlers =3D { + SECTION_HANDLERS_ENTRY("ram", 0, 4, &ram_handler_ops), + SECTION_HANDLERS_END(), + }, +}; + +static SectionHandlersEntry *find_se(const char *idstr, int instance_id) +{ + SectionHandlersEntry *se; + + for (se =3D section_handlers.handlers; se->idstr; se++) { + if (!strcmp(se->idstr, idstr) && (instance_id =3D=3D se->instance_= id)) { + return se; + } + } + + return NULL; +} + +static SectionHandlersEntry *find_se_by_section_id(int section_id) +{ + SectionHandlersEntry *se; + + for (se =3D section_handlers.handlers; se->idstr; se++) { + if (section_id =3D=3D se->real_section_id) { + return se; + } + } + + return NULL; +} + +static bool check_section_footer(QEMUFile *f, SectionHandlersEntry *se) +{ + uint8_t token; + int section_id; + + token =3D qemu_get_byte(f); + if (token !=3D QEMU_VM_SECTION_FOOTER) { + error_report("Missing footer for section %s(%d)", + se->idstr, se->real_section_id); + return false; + } + + section_id =3D qemu_get_be32(f); + if (section_id !=3D se->real_section_id) { + error_report("Unmatched footer for for section %s(%d): %d", + se->idstr, se->real_section_id, section_id); + return false; + } + + return true; +} + +static inline +bool ram_offset_in_block(RAMBlock *block, int64_t offset) +{ + return block && offset < block->length; +} + +static inline +bool ram_bdrv_offset_in_block(RAMBlock *block, int64_t bdrv_offset) +{ + return block && bdrv_offset >=3D block->bdrv_offset && + bdrv_offset < block->bdrv_offset + block->length; +} + +static inline +int64_t ram_bdrv_from_block_offset(RAMBlock *block, int64_t offset) +{ + if (!ram_offset_in_block(block, offset)) { + return INVALID_OFFSET; + } + + return block->bdrv_offset + offset; +} + +static inline +int64_t ram_block_offset_from_bdrv(RAMBlock *block, int64_t bdrv_offset) +{ + int64_t offset; + + if (!block) { + return INVALID_OFFSET; + } + + offset =3D bdrv_offset - block->bdrv_offset; + return offset >=3D 0 ? offset : INVALID_OFFSET; +} + +static RAMBlock *ram_block_by_idstr(const char *idstr) +{ + RAMBlock *block; + + QSIMPLEQ_FOREACH(block, &ram_ctx.ram_block_list, next) { + if (!strcmp(idstr, block->idstr)) { + return block; + } + } + + return NULL; +} + +static RAMBlock *ram_block_from_stream(QEMUFile *f, int flags) +{ + static RAMBlock *block; + char idstr[256]; + + if (flags & RAM_SAVE_FLAG_CONTINUE) { + if (!block) { + error_report("RAM_SAVE_FLAG_CONTINUE outside RAM block"); + return NULL; + } + + return block; + } + + if (!qemu_get_counted_string(f, idstr)) { + error_report("Failed to get RAM block name"); + return NULL; + } + + block =3D ram_block_by_idstr(idstr); + if (!block) { + error_report("Can't find RAM block %s", idstr); + return NULL; + } + + return block; +} + +static int64_t ram_block_next_bdrv_offset(void) +{ + RAMBlock *last_block; + int64_t offset; + + last_block =3D QSIMPLEQ_LAST(&ram_ctx.ram_block_list, RAMBlock, next); + if (!last_block) { + return 0; + } + + offset =3D last_block->bdrv_offset + last_block->length; + return ROUND_UP(offset, BDRV_CLUSTER_SIZE); +} + +static void ram_block_add(const char *idstr, int64_t size) +{ + RAMBlock *block; + + block =3D g_new0(RAMBlock, 1); + block->length =3D size; + block->bdrv_offset =3D ram_block_next_bdrv_offset(); + strcpy(block->idstr, idstr); + + QSIMPLEQ_INSERT_TAIL(&ram_ctx.ram_block_list, block, next); +} + +static void ram_block_list_init_bitmaps(void) +{ + RAMBlock *block; + + QSIMPLEQ_FOREACH(block, &ram_ctx.ram_block_list, next) { + block->nr_pages =3D block->length >> page_bits; + block->nr_slices =3D ROUND_UP(block->length, slice_size) >> slice_= bits; + + block->bitmap =3D bitmap_new(block->nr_slices); + bitmap_set(block->bitmap, 0, block->nr_slices); + } +} + +static bool ram_block_list_from_stream(QEMUFile *f, int64_t mem_size) +{ + int64_t total_ram_bytes; + + total_ram_bytes =3D mem_size; + while (total_ram_bytes > 0) { + char idstr[256]; + int64_t size; + + if (!qemu_get_counted_string(f, idstr)) { + error_report("Failed to get RAM block list"); + return false; + } + size =3D qemu_get_be64(f); + + ram_block_add(idstr, size); + total_ram_bytes -=3D size; + } + + if (total_ram_bytes !=3D 0) { + error_report("Corrupted RAM block list"); + return false; + } + + /* Initialize per-block bitmaps */ + ram_block_list_init_bitmaps(); + + return true; +} + +static void save_state_check_errors(StateSaveCtx *s, int *res) +{ + /* Check for -EIO which indicates input stream EOF */ + if (*res =3D=3D -EIO) { + *res =3D 0; + } + + /* + * Check for file errors on success. Replace generic -EINVAL + * retcode with file error if possible. + */ + if (*res >=3D 0 || *res =3D=3D -EINVAL) { + int f_res =3D qemu_file_get_error(s->f_fd); + + f_res =3D (f_res =3D=3D -EIO) ? 0 : f_res; + if (!f_res) { + f_res =3D qemu_file_get_error(s->f_vmstate); + } + if (f_res) { + *res =3D f_res; + } + } +} + +static int ram_alloc_page_backing(StateSaveCtx *s, RAMPage *page, + int64_t bdrv_offset) +{ + int res =3D 0; + + /* + * Reduce the number of unwritten extents in image backing file. + * + * We can achieve that by using a bitmap of RAM block 'slices' to + * enforce zero blockdev write once we are going to store a memory + * page within that slice. + */ + if (test_and_clear_bit(page->offset >> slice_bits, page->block->bitmap= )) { + res =3D blk_pwrite(s->blk, bdrv_offset & slice_mask, + s->zero_buf, slice_size, 0); + } + + return MIN(res, 0); +} + +static int ram_save_page(StateSaveCtx *s, RAMPage *page, uint8_t *data) +{ + size_t usage =3D s->ioc_pages->usage; + int64_t bdrv_offset; + int res =3D 0; + + bdrv_offset =3D ram_bdrv_from_block_offset(page->block, page->offset); + if (bdrv_offset =3D=3D INVALID_OFFSET) { + error_report("Corrupted RAM page"); + return -EINVAL; + } + + /* Deal with fragmentation of the image backing file */ + res =3D ram_alloc_page_backing(s, page, bdrv_offset); + if (res) { + return res; + } + + /* Are we saving a contiguous page? */ + if (bdrv_offset !=3D s->last_bdrv_offset || + (usage + page_size) >=3D PAGE_COALESC_MAX) { + if (usage) { + /* Flush coalesced pages to block device */ + res =3D blk_pwrite(s->blk, s->bdrv_offset, s->ioc_pages->data, + usage, 0); + res =3D MIN(res, 0); + } + + /* Reset coalescing buffer state */ + s->ioc_pages->usage =3D 0; + s->ioc_pages->offset =3D 0; + /* Switch to the new bdrv_offset */ + s->bdrv_offset =3D bdrv_offset; + } + + qio_channel_write(QIO_CHANNEL(s->ioc_pages), (char *) data, + page_size, NULL); + s->last_bdrv_offset =3D bdrv_offset + page_size; + + return res; +} + +static int ram_save_page_flush(StateSaveCtx *s) +{ + size_t usage =3D s->ioc_pages->usage; + int res =3D 0; + + if (usage) { + /* Flush coalesced pages to block device */ + res =3D blk_pwrite(s->blk, s->bdrv_offset, + s->ioc_pages->data, usage, 0); + res =3D MIN(res, 0); + } + + /* Reset coalescing buffer state */ + s->ioc_pages->usage =3D 0; + s->ioc_pages->offset =3D 0; + + s->last_bdrv_offset =3D INVALID_OFFSET; + + return res; +} + +static int ram_save(QEMUFile *f, void *opaque, int version_id) +{ + StateSaveCtx *s =3D (StateSaveCtx *) opaque; + int incompat_flags =3D RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZ= RLE; + int flags =3D 0; + int res =3D 0; + + if (version_id !=3D 4) { + error_report("Unsupported version %d for 'ram' handler v4", versio= n_id); + return -EINVAL; + } + + while (!res && !(flags & RAM_SAVE_FLAG_EOS)) { + RAMBlock *block =3D NULL; + int64_t offset; + + offset =3D qemu_get_be64(f); + flags =3D offset & ~page_mask; + offset &=3D page_mask; + + if (flags & incompat_flags) { + error_report("Incompatible RAM page flags 0x%x", flags); + res =3D -EINVAL; + break; + } + + /* Lookup RAM block for the page */ + if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE)) { + block =3D ram_block_from_stream(f, flags); + if (!block) { + res =3D -EINVAL; + break; + } + } + + switch (flags & ~RAM_SAVE_FLAG_CONTINUE) { + case RAM_SAVE_FLAG_MEM_SIZE: + if (s->ram_list_offset) { + error_report("Repeated RAM page with RAM_SAVE_FLAG_MEM_SIZ= E"); + res =3D -EINVAL; + break; + } + + /* Save position of section with the list of RAM blocks */ + s->ram_list_offset =3D s->section_offset; + + /* Get RAM block list */ + if (!ram_block_list_from_stream(f, offset)) { + res =3D -EINVAL; + } + break; + + case RAM_SAVE_FLAG_ZERO: + /* Nothing to do with zero page */ + qemu_get_byte(f); + break; + + case RAM_SAVE_FLAG_PAGE: + { + RAMPage page =3D { .block =3D block, .offset =3D offset }; + uint8_t *data; + ssize_t count; + + count =3D qemu_peek_buffer(f, &data, page_size, 0); + qemu_file_skip(f, count); + if (count !=3D page_size) { + /* I/O error */ + break; + } + + res =3D ram_save_page(s, &page, data); + + /* Update normal page count */ + ram_ctx.normal_pages++; + break; + } + + case RAM_SAVE_FLAG_EOS: + /* Normal exit */ + break; + + default: + error_report("RAM page with unknown combination of flags 0x%x"= , flags); + res =3D -EINVAL; + + } + + /* Make additional check for file errors */ + if (!res) { + res =3D qemu_file_get_error(f); + } + } + + /* Flush page coalescing buffer */ + if (!res) { + res =3D ram_save_page_flush(s); + } + + return res; +} + +static int default_save(QEMUFile *f, void *opaque, int version_id) +{ + StateSaveCtx *s =3D (StateSaveCtx *) opaque; + + if (!s->ram_offset) { + error_report("Unexpected (non-iterable device state) section"); + return -EINVAL; + } + + if (!s->device_offset) { + s->device_offset =3D s->section_offset; + /* Save the rest of vmstate, including non-iterable device state */ + return save_state_complete(s); + } + + /* Should never get here */ + assert(false); + return -EINVAL; +} + +static int save_state_complete(StateSaveCtx *s) +{ + QEMUFile *f =3D s->f_fd; + int64_t eof_pos; + int64_t pos; + + /* Current read offset */ + pos =3D qemu_ftell2(f); + + /* vmstate magic */ + qemu_put_be32(s->f_vmstate, VMSTATE_HEADER_MAGIC); + /* Target page size */ + qemu_put_be32(s->f_vmstate, page_size); + /* Number of non-zero pages */ + qemu_put_be64(s->f_vmstate, ram_ctx.normal_pages); + + /* Offsets relative to QEMU_VM_FILE_MAGIC: */ + + /* RAM block list section */ + qemu_put_be32(s->f_vmstate, s->ram_list_offset); + /* + * First non-iterable device section. + * + * Partial RAM sections are skipped in the vmstate stream so + * ram_offset shall become the device_offset. + */ + qemu_put_be32(s->f_vmstate, s->ram_offset); + /* Slot for eof_offset */ + qemu_put_be32(s->f_vmstate, 0); + + /* + * At the completion stage we save the leading part of migration stream + * which contains header, configuration section and the 'ram' section + * with QEMU_VM_SECTION_FULL type containing list of RAM blocks. + * + * Migration leader ends at the first partial RAM section. + * QEMU_VM_SECTION_PART token for that section is pointed by s->ram_of= fset. + */ + qemu_put_buffer(s->f_vmstate, s->ioc_leader->data, s->ram_offset); + /* + * Trailing part with non-iterable device state. + * + * First goes the section header which was skipped with QEMUFile + * so we need to take it from s->section_header. + */ + qemu_put_buffer(s->f_vmstate, s->section_header, pos - s->device_offse= t); + + /* Finally we forward the tail of migration stream to vmstate on backi= ng */ + qemu_fsplice_tail(s->f_vmstate, f); + eof_pos =3D qemu_ftell(s->f_vmstate); + + /* Put eof_offset to the slot in vmstate stream: */ + + /* Simulate negative seek() */ + qemu_update_position(s->f_vmstate, + (size_t)(ssize_t) (VMSTATE_HEADER_EOF_OFFSET - eo= f_pos)); + /* Write to the eof_offset header field */ + qemu_put_be32(s->f_vmstate, eof_pos - VMSTATE_HEADER_SIZE); + qemu_fflush(s->f_vmstate); + + return 1; +} + +static int save_section_config(StateSaveCtx *s) +{ + QEMUFile *f =3D s->f_fd; + uint32_t id_len; + + id_len =3D qemu_get_be32(f); + if (id_len > 255) { + error_report("Corrupted configuration section"); + return -EINVAL; + } + qemu_file_skip(f, id_len); + + return 0; +} + +static int save_section_start_full(StateSaveCtx *s) +{ + QEMUFile *f =3D s->f_fd; + SectionHandlersEntry *se; + int section_id; + int instance_id; + int version_id; + char idstr[256]; + int res; + + section_id =3D qemu_get_be32(f); + if (!qemu_get_counted_string(f, idstr)) { + error_report("Failed to get section name(%d)", section_id); + return -EINVAL; + } + + instance_id =3D qemu_get_be32(f); + version_id =3D qemu_get_be32(f); + + /* Find section handler */ + se =3D find_se(idstr, instance_id); + if (!se) { + se =3D §ion_handlers.default_; + } else if (version_id > se->version_id) { + /* Validate version */ + error_report("Unsupported version %d for '%s' v%d", + version_id, idstr, se->version_id); + return -EINVAL; + } + + se->real_section_id =3D section_id; + se->real_version_id =3D version_id; + + res =3D se->ops->save_state(f, s, se->real_version_id); + /* Positive value indicates completion, no need to check footer */ + if (res) { + return res; + } + + /* Check section footer */ + if (!check_section_footer(f, se)) { + return -EINVAL; + } + + return 0; +} + +static int save_section_part_end(StateSaveCtx *s) +{ + QEMUFile *f =3D s->f_fd; + SectionHandlersEntry *se; + int section_id; + int res; + + /* First section with QEMU_VM_SECTION_PART type must be a 'ram' sectio= n */ + if (!s->ram_offset) { + s->ram_offset =3D s->section_offset; + } + + section_id =3D qemu_get_be32(f); + + /* Lookup section handler by numeric section id */ + se =3D find_se_by_section_id(section_id); + if (!se) { + error_report("Unknown section id %d", section_id); + return -EINVAL; + } + + res =3D se->ops->save_state(f, s, se->real_version_id); + /* With partial sections we won't have positive success retcodes */ + if (res) { + return res; + } + + /* Check section footer */ + if (!check_section_footer(f, se)) { + return -EINVAL; + } + + return 0; +} + +static int save_state_header(StateSaveCtx *s) +{ + QEMUFile *f =3D s->f_fd; + uint32_t v; + + /* Validate qemu magic */ + v =3D qemu_get_be32(f); + if (v !=3D QEMU_VM_FILE_MAGIC) { + error_report("Not a migration stream"); + return -EINVAL; + } + + v =3D qemu_get_be32(f); + if (v =3D=3D QEMU_VM_FILE_VERSION_COMPAT) { + error_report("SaveVM v2 format is obsolete"); + return -EINVAL; + } + + if (v !=3D QEMU_VM_FILE_VERSION) { + error_report("Unsupported migration stream version"); + return -EINVAL; + } + + return 0; +} + int coroutine_fn save_state_main(StateSaveCtx *s) { - /* TODO: implement */ - return 0; + QEMUFile *f =3D s->f_fd; + uint8_t *buf; + uint8_t section_type; + int res =3D 0; + + /* Deal with migration stream header */ + res =3D save_state_header(s); + if (res) { + /* Check for file errors in case we have -EINVAL */ + save_state_check_errors(s, &res); + return res; + } + + while (!res) { + /* Update current section offset */ + s->section_offset =3D qemu_ftell2(f); + + /* + * We need to keep some data from the beginning of each section. + * + * When first non-iterable device section is reached and we are go= ing + * to write to the vmstate stream in 'default_handler', it is used= to + * restore the already skipped part of migration stream. + */ + qemu_peek_buffer(f, &buf, sizeof(s->section_header), 0); + memcpy(s->section_header, buf, sizeof(s->section_header)); + + /* Read section type token */ + section_type =3D qemu_get_byte(f); + + switch (section_type) { + case QEMU_VM_CONFIGURATION: + res =3D save_section_config(s); + break; + + case QEMU_VM_SECTION_FULL: + case QEMU_VM_SECTION_START: + res =3D save_section_start_full(s); + break; + + case QEMU_VM_SECTION_PART: + case QEMU_VM_SECTION_END: + res =3D save_section_part_end(s); + break; + + case QEMU_VM_EOF: + /* + * End of migration stream. + * + * Normally we will never get here since the ending part of mi= gration + * stream is a series of QEMU_VM_SECTION_FULL sections holding + * state for non-iterable devices. In our case all those secti= ons + * are saved with a single call to save_section_start_full() o= nce + * we get an unknown section id and invoke default handler. + */ + res =3D -EINVAL; + break; + + default: + error_report("Unknown section type %d", section_type); + res =3D -EINVAL; + + } + + /* Additional check for file errors */ + save_state_check_errors(s, &res); + } + + /* Replace positive retcode with 0 */ + return MIN(res, 0); } =20 int coroutine_fn load_state_main(StateLoadCtx *s) diff --git a/qemu-snapshot.c b/qemu-snapshot.c index 7ac4ef66c4..d434b8f245 100644 --- a/qemu-snapshot.c +++ b/qemu-snapshot.c @@ -94,7 +94,24 @@ static void init_save_context(void) =20 static void destroy_save_context(void) { - /* TODO: implement */ + StateSaveCtx *s =3D get_save_context(); + + if (s->f_vmstate) { + qemu_fclose(s->f_vmstate); + } + if (s->blk) { + blk_flush(s->blk); + blk_unref(s->blk); + } + if (s->zero_buf) { + qemu_vfree(s->zero_buf); + } + if (s->ioc_leader) { + object_unref(OBJECT(s->ioc_leader)); + } + if (s->ioc_pages) { + object_unref(OBJECT(s->ioc_pages)); + } } =20 static void init_load_context(void) @@ -134,6 +151,9 @@ static void enter_co_bh(void *opaque) static void coroutine_fn snapshot_save_co(void *opaque) { StateSaveCtx *s =3D get_save_context(); + QIOChannel *ioc_fd; + uint8_t *buf; + size_t count; int res =3D -1; =20 init_save_context(); @@ -145,6 +165,40 @@ static void coroutine_fn snapshot_save_co(void *opaque) goto fail; } =20 + /* QEMUFile on vmstate */ + s->f_vmstate =3D qemu_fopen_bdrv_vmstate(blk_bs(s->blk), 1); + qemu_file_set_blocking(s->f_vmstate, false); + + /* QEMUFile on migration fd */ + ioc_fd =3D qio_channel_new_fd(params.fd, &error_fatal); + qio_channel_set_name(QIO_CHANNEL(ioc_fd), "migration-channel-incoming"= ); + s->f_fd =3D qemu_fopen_channel_input(ioc_fd); + object_unref(OBJECT(ioc_fd)); + /* Use non-blocking mode in coroutine */ + qemu_file_set_blocking(s->f_fd, false); + + /* Buffer channel to store leading part of migration stream */ + s->ioc_leader =3D qio_channel_buffer_new(INPLACE_READ_MAX); + qio_channel_set_name(QIO_CHANNEL(s->ioc_leader), "migration-leader-buf= fer"); + + /* Page coalescing buffer */ + s->ioc_pages =3D qio_channel_buffer_new(128 * 1024); + qio_channel_set_name(QIO_CHANNEL(s->ioc_pages), "migration-page-buffer= "); + + /* Bounce buffer to fill unwritten extents in image backing */ + s->zero_buf =3D qemu_blockalign0(blk_bs(s->blk), slice_size); + + /* + * Here we stash the leading part of migration stream without promotin= g read + * position. Later we'll make use of it when writing the vmstate strea= m. + */ + count =3D qemu_peek_buffer(s->f_fd, &buf, INPLACE_READ_MAX, 0); + res =3D qemu_file_get_error(s->f_fd); + if (res < 0) { + goto fail; + } + qio_channel_write(QIO_CHANNEL(s->ioc_leader), (char *) buf, count, NUL= L); + res =3D save_state_main(s); if (res) { error_report("Failed to save snapshot: %s", strerror(-res)); --=20 2.27.0 From nobody Sat May 4 11:49:39 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=virtuozzo.com ARC-Seal: i=1; a=rsa-sha256; t=1620850345; cv=none; d=zohomail.com; s=zohoarc; b=iWQCqzbOvUXTkxCmnQUK8faA3Xl0eq9ZV83OcCHlLSyZtZegmKqx8eMaEmAegqAge77VAlXvhC2SuNCjDEdrd+t6PeomEXg/4dBCDIlDKZ3HLAfBwoUhQaJp1eapddaSayWE6+Si3pa+WbF1JbyciuBUbGKcXxmlC58Ymgd9D6k= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1620850345; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=QgSWCJj7oHtYM0DHbaJrmWqklUsFtiljdOEgNPygUYk=; b=dBjYuQJ9KC8yz+kAGZLSyjiZjR7h47mM+QQBLRm9lRX9fAdBgGgqhe7E9fgFO8A42FuOoY0foRtxSfzGSe2Ch5toRJHPmSGKTcl4zw7jfnbCnnsfrqGmK2c9s8c5f/+hMYKunanZzcFh1oc//mWI1MrFTrQdMWFhoYtMu/p+f8w= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1620850345286315.8503744017813; Wed, 12 May 2021 13:12:25 -0700 (PDT) Received: from localhost ([::1]:42036 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lgvDU-00082P-J4 for importer@patchew.org; Wed, 12 May 2021 16:12:24 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60246) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lguV6-0002Bt-KS for qemu-devel@nongnu.org; Wed, 12 May 2021 15:26:32 -0400 Received: from relay.sw.ru ([185.231.240.75]:44748) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lguUz-0000on-FE for qemu-devel@nongnu.org; Wed, 12 May 2021 15:26:32 -0400 Received: from [192.168.15.22] (helo=andrey-MS-7B54.sw.ru) by relay.sw.ru with esmtp (Exim 4.94) (envelope-from ) id 1lguUu-002BHm-Fd; Wed, 12 May 2021 22:26:20 +0300 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=QgSWCJj7oHtYM0DHbaJrmWqklUsFtiljdOEgNPygUYk=; b=ZjMc837mMhEr /lZi7yIzmjKBax49IgLsPUUkklHiH/IKYJiHhuT3yTF7n/mQqmzVjUL1h8PHY+jR76krTl+mSG/We 48/5uXgxX8FHlcZssvlhH0eBqv6Vhpb9x0Thaa5WWqezit4KPI24HUSPi2yBSpoywHZt2WI8vgP7x ITnY4=; From: Andrey Gruzdev To: qemu-devel@nongnu.org Cc: Den Lunev , Vladimir Sementsov-Ogievskiy , Eric Blake , Paolo Bonzini , Juan Quintela , "Dr . David Alan Gilbert" , Markus Armbruster , Peter Xu , David Hildenbrand , Andrey Gruzdev Subject: [RFC PATCH v1 6/7] migration/snapshot: Implementation of qemu-snapshot load path Date: Wed, 12 May 2021 22:26:18 +0300 Message-Id: <20210512192619.537268-7-andrey.gruzdev@virtuozzo.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210512192619.537268-1-andrey.gruzdev@virtuozzo.com> References: <20210512192619.537268-1-andrey.gruzdev@virtuozzo.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=185.231.240.75; envelope-from=andrey.gruzdev@virtuozzo.com; helo=relay.sw.ru X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @virtuozzo.com) Content-Type: text/plain; charset="utf-8" This part implements snapshot loading in precopy mode. Signed-off-by: Andrey Gruzdev --- include/qemu-snapshot.h | 24 +- qemu-snapshot-vm.c | 588 +++++++++++++++++++++++++++++++++++++++- qemu-snapshot.c | 47 +++- 3 files changed, 654 insertions(+), 5 deletions(-) diff --git a/include/qemu-snapshot.h b/include/qemu-snapshot.h index 52519f76c4..aae730d70e 100644 --- a/include/qemu-snapshot.h +++ b/include/qemu-snapshot.h @@ -34,6 +34,13 @@ /* RAM slice size for snapshot revert */ #define SLICE_SIZE_REVERT (16 * PAGE_SIZE_MAX) =20 +/* AIO transfer size */ +#define AIO_TRANSFER_SIZE BDRV_CLUSTER_SIZE +/* AIO ring size */ +#define AIO_RING_SIZE 64 +/* AIO ring in-flight limit */ +#define AIO_RING_INFLIGHT 16 + typedef struct AioRing AioRing; =20 typedef struct AioRingRequest { @@ -88,7 +95,20 @@ typedef struct StateSaveCtx { } StateSaveCtx; =20 typedef struct StateLoadCtx { - BlockBackend *blk; /* Block backend */ + BlockBackend *blk; /* Block backend */ + QEMUFile *f_fd; /* QEMUFile for outgoing stream */ + QEMUFile *f_vmstate; /* QEMUFile for vmstate backing */ + + QIOChannelBuffer *ioc_leader; /* vmstate stream leader */ + + AioRing *aio_ring; /* AIO ring */ + + /* vmstate offset of the section containing list of RAM blocks */ + int64_t ram_list_offset; + /* vmstate offset of the first non-iterable device section */ + int64_t device_offset; + /* vmstate EOF */ + int64_t eof_offset; } StateLoadCtx; =20 extern int64_t page_size; /* Page size */ @@ -100,6 +120,8 @@ extern int slice_bits; /* RAM slice size bits = */ =20 void ram_init_state(void); void ram_destroy_state(void); +ssize_t coroutine_fn ram_load_aio_co(AioRingRequest *req); + StateSaveCtx *get_save_context(void); StateLoadCtx *get_load_context(void); int coroutine_fn save_state_main(StateSaveCtx *s); diff --git a/qemu-snapshot-vm.c b/qemu-snapshot-vm.c index 2d8f2d3d79..dae5f84b80 100644 --- a/qemu-snapshot-vm.c +++ b/qemu-snapshot-vm.c @@ -57,6 +57,11 @@ typedef struct RAMPage { /* RAM transfer context */ typedef struct RAMCtx { int64_t normal_pages; /* Total number of normal pages */ + int64_t loaded_pages; /* Number of normal pages loaded */ + + RAMPage last_page; /* Last loaded page */ + + RAMBlock *last_sent_block; /* RAM block of last sent page */ =20 /* RAM block list head */ QSIMPLEQ_HEAD(, RAMBlock) ram_block_list; @@ -100,17 +105,26 @@ typedef struct SectionHandlers { =20 /* Forward declarations */ static int default_save(QEMUFile *f, void *opaque, int version_id); +static int default_load(QEMUFile *f, void *opaque, int version_id); + static int ram_save(QEMUFile *f, void *opaque, int version_id); +static int ram_load(QEMUFile *f, void *opaque, int version_id); +static int ram_load_iterate(QEMUFile *f, void *opaque, int version_id); + static int save_state_complete(StateSaveCtx *s); +static int load_section_start_full(StateLoadCtx *s); =20 static RAMCtx ram_ctx; =20 static SectionHandlerOps default_handler_ops =3D { .save_state =3D default_save, + .load_state =3D default_load, }; =20 static SectionHandlerOps ram_handler_ops =3D { .save_state =3D ram_save, + .load_state =3D ram_load, + .load_state_iterate =3D ram_load_iterate, }; =20 static SectionHandlers section_handlers =3D { @@ -218,6 +232,19 @@ static RAMBlock *ram_block_by_idstr(const char *idstr) return NULL; } =20 +static RAMBlock *ram_block_by_bdrv_offset(int64_t bdrv_offset) +{ + RAMBlock *block; + + QSIMPLEQ_FOREACH(block, &ram_ctx.ram_block_list, next) { + if (ram_bdrv_offset_in_block(block, bdrv_offset)) { + return block; + } + } + + return NULL; +} + static RAMBlock *ram_block_from_stream(QEMUFile *f, int flags) { static RAMBlock *block; @@ -803,10 +830,555 @@ int coroutine_fn save_state_main(StateSaveCtx *s) return MIN(res, 0); } =20 +static void load_state_check_errors(StateLoadCtx *s, int *res) +{ + /* + * Check for file errors on success. Replace generic -EINVAL + * retcode with file error if possible. + */ + if (*res >=3D 0 || *res =3D=3D -EINVAL) { + int f_res =3D qemu_file_get_error(s->f_fd); + + if (!f_res) { + f_res =3D qemu_file_get_error(s->f_vmstate); + } + if (f_res) { + *res =3D f_res; + } + } +} + +static void send_section_header_part_end(QEMUFile *f, SectionHandlersEntry= *se, + uint8_t section_type) +{ + assert(section_type =3D=3D QEMU_VM_SECTION_PART || + section_type =3D=3D QEMU_VM_SECTION_END); + + qemu_put_byte(f, section_type); + qemu_put_be32(f, se->real_section_id); +} + +static void send_section_footer(QEMUFile *f, SectionHandlersEntry *se) +{ + qemu_put_byte(f, QEMU_VM_SECTION_FOOTER); + qemu_put_be32(f, se->real_section_id); +} + +static void send_page_header(QEMUFile *f, RAMBlock *block, int64_t offset) +{ + uint8_t hdr_buf[512]; + int hdr_len =3D 8; + + stq_be_p(hdr_buf, offset); + if (!(offset & RAM_SAVE_FLAG_CONTINUE)) { + int id_len; + + id_len =3D strlen(block->idstr); + assert(id_len < 256); + + hdr_buf[hdr_len] =3D id_len; + memcpy((hdr_buf + hdr_len + 1), block->idstr, id_len); + + hdr_len +=3D 1 + id_len; + } + + qemu_put_buffer(f, hdr_buf, hdr_len); +} + +static void send_zeropage(QEMUFile *f, RAMBlock *block, int64_t offset) +{ + send_page_header(f, block, offset | RAM_SAVE_FLAG_ZERO); + qemu_put_byte(f, 0); +} + +static bool find_next_page(RAMPage *page) +{ + RAMCtx *ram =3D &ram_ctx; + RAMBlock *block =3D ram->last_page.block; + int64_t slice =3D ram->last_page.offset >> slice_bits; + bool full_round =3D false; + bool found =3D false; + + if (!block) { +restart: + block =3D QSIMPLEQ_FIRST(&ram->ram_block_list); + slice =3D 0; + full_round =3D true; + } + + while (!found && block) { + slice =3D find_next_bit(block->bitmap, block->nr_slices, slice); + /* Can't find unsent slice in block? */ + if (slice >=3D block->nr_slices) { + /* Try next block */ + block =3D QSIMPLEQ_NEXT(block, next); + slice =3D 0; + + continue; + } + + found =3D true; + } + + /* + * Re-start from the beginning if couldn't find unsent slice, + * but do it only once. + */ + if (!found && !full_round) { + goto restart; + } + + if (found) { + page->block =3D block; + page->offset =3D slice << slice_bits; + } + + return found; +} + +static inline +void get_page_range(RAMPage *page, unsigned *length, unsigned max_length) +{ + int64_t start_slice; + int64_t end_slice; + int64_t tmp; + + assert(QEMU_IS_ALIGNED(page->offset, slice_size)); + assert(max_length >=3D slice_size); + + start_slice =3D page->offset >> slice_bits; + end_slice =3D find_next_zero_bit(page->block->bitmap, page->block->nr_= slices, + page->offset >> slice_bits); + + tmp =3D (end_slice - start_slice) << slice_bits; + tmp =3D MIN(page->block->length - page->offset, tmp); + + /* + * Length is always aligned to slice_size with the exception of case + * when it is the last slice in RAM block. + */ + *length =3D MIN(max_length, tmp); +} + +static inline +void clear_page_range(RAMPage *page, unsigned length) +{ + assert(QEMU_IS_ALIGNED(page->offset, slice_size)); + assert(length); + + /* + * Page offsets are aligned to the slice boundary so we only need + * to round up length for the case when we load last slice in the bloc= k. + */ + bitmap_clear(page->block->bitmap, page->offset >> slice_bits, + ((length - 1) >> slice_bits) + 1); +} + +ssize_t coroutine_fn ram_load_aio_co(AioRingRequest *req) +{ + return blk_pread((BlockBackend *) req->opaque, req->offset, + req->data, req->size); +} + +static void coroutine_fn ram_load_submit_aio(StateLoadCtx *s) +{ + RAMCtx *ram =3D &ram_ctx; + AioRingRequest *req; + + while ((req =3D aio_ring_get_request(s->aio_ring))) { + RAMPage page; + unsigned max_length =3D AIO_TRANSFER_SIZE; + unsigned length; + + if (!find_next_page(&page)) { + break; + } + =20 + /* Get range of contiguous pages that were not transferred yet */ + get_page_range(&page, &length, max_length); + /* Clear range of pages to be queued for I/O */ + clear_page_range(&page, length); + + /* Used by find_next_page() */ + ram->last_page.block =3D page.block; + ram->last_page.offset =3D page.offset + length; + + /* Setup I/O request */ + req->opaque =3D s->blk; + req->data =3D qemu_blockalign(blk_bs(s->blk), length); + req->offset =3D ram_bdrv_from_block_offset(page.block, page.offset= ); + req->size =3D length; + + aio_ring_submit(s->aio_ring); + } +} + +static int ram_load_complete_aio(StateLoadCtx *s, AioRingEvent *ev) +{ + QEMUFile *f =3D s->f_fd; + RAMCtx *ram =3D &ram_ctx; + RAMBlock *block =3D ram->last_sent_block; + void *bdrv_data =3D ev->origin->data; + int64_t bdrv_offset =3D ev->origin->offset; + ssize_t bdrv_count =3D ev->status; + int64_t offset; + int64_t flags =3D RAM_SAVE_FLAG_CONTINUE; + int pages =3D 0; + + /* Need to switch to the another RAM block? */ + if (!ram_bdrv_offset_in_block(block, bdrv_offset)) { + /* + * Lookup RAM block by BDRV offset cause in postcopy we + * can issue AIO loads from arbitrary blocks. + */ + block =3D ram_block_by_bdrv_offset(bdrv_offset); + ram->last_sent_block =3D block; + + /* Reset RAM_SAVE_FLAG_CONTINUE */ + flags =3D 0; + } + offset =3D ram_block_offset_from_bdrv(block, bdrv_offset); + + for (ssize_t count =3D 0; count < bdrv_count; count +=3D page_size) { + if (buffer_is_zero(bdrv_data, page_size)) { + send_zeropage(f, block, (offset | flags)); + } else { + send_page_header(f, block, (offset | RAM_SAVE_FLAG_PAGE | flag= s)); + qemu_put_buffer_async(f, bdrv_data, page_size, false); + + /* Update normal page count */ + ram->loaded_pages++; + } + + /* + * BDRV request shall never cross RAM block boundary so we can + * set RAM_SAVE_FLAG_CONTINUE here unconditionally. + */ + flags =3D RAM_SAVE_FLAG_CONTINUE; + + bdrv_data +=3D page_size; + offset +=3D page_size; + pages++; + } + + /* Need to flush here cause we use qemu_put_buffer_async() */ + qemu_fflush(f); + + return pages; +} + +static int coroutine_fn ram_load_pages(StateLoadCtx *s) +{ + AioRingEvent *event; + int res =3D 0; + + /* Fill blockdev AIO queue */ + ram_load_submit_aio(s); + + /* Check for AIO completion event */ + event =3D aio_ring_wait_event(s->aio_ring); + if (event) { + /* Check completion status */ + res =3D event->status; + if (res > 0) { + res =3D ram_load_complete_aio(s, event); + } + + qemu_vfree(event->origin->data); + aio_ring_complete(s->aio_ring); + } + + return res; +} + +static int coroutine_fn ram_load_pages_flush(StateLoadCtx *s) +{ + AioRingEvent *event; + + while ((event =3D aio_ring_wait_event(s->aio_ring))) { + /* Check completion status */ + if (event->status > 0) { + ram_load_complete_aio(s, event); + } + + qemu_vfree(event->origin->data); + aio_ring_complete(s->aio_ring); + } + + return 0; +} + +static int ram_load(QEMUFile *f, void *opaque, int version_id) +{ + int compat_flags =3D RAM_SAVE_FLAG_MEM_SIZE | RAM_SAVE_FLAG_EOS; + int flags =3D 0; + int res =3D 0; + + if (version_id !=3D 4) { + error_report("Unsupported version %d for 'ram' handler v4", versio= n_id); + return -EINVAL; + } + + while (!res && !(flags & RAM_SAVE_FLAG_EOS)) { + int64_t offset; + + offset =3D qemu_get_be64(f); + flags =3D offset & ~page_mask; + offset &=3D page_mask; + + if (flags & ~compat_flags) { + error_report("Incompatible RAM page flags 0x%x", flags); + res =3D -EINVAL; + break; + } + + switch (flags) { + case RAM_SAVE_FLAG_MEM_SIZE: + /* Fill RAM block list */ + ram_block_list_from_stream(f, offset); + break; + + case RAM_SAVE_FLAG_EOS: + /* Normal exit */ + break; + + default: + error_report("Unknown combination of RAM page flags 0x%x",= flags); + res =3D -EINVAL; + } + + /* Check for file errors even if everything looks good */ + if (!res) { + res =3D qemu_file_get_error(f); + } + } + + return res; +} + +#define YIELD_AFTER_MS 500 /* ms */ + +static int ram_load_iterate(QEMUFile *f, void *opaque, int version_id) +{ + StateLoadCtx *s =3D (StateLoadCtx *) opaque; + int64_t t_start; + int tmp_res; + int res =3D 1; + + t_start =3D qemu_clock_get_ms(QEMU_CLOCK_REALTIME); + + for (int iter =3D 0; res > 0; iter++) { + res =3D ram_load_pages(s); + + if (!(iter & 7)) { + int64_t t_cur =3D qemu_clock_get_ms(QEMU_CLOCK_REALTIME); + + if ((t_cur - t_start) > YIELD_AFTER_MS) { + break; + } + } + } + + /* Zero retcode means that there're no more pages to load */ + if (res >=3D 0) { + res =3D res ? 0 : 1; + } + + /* Process pending AIO ring events */ + tmp_res =3D ram_load_pages_flush(s); + res =3D tmp_res ? tmp_res : res; + + /* Send EOS flag before section footer */ + qemu_put_be64(s->f_fd, RAM_SAVE_FLAG_EOS); + qemu_fflush(s->f_fd); + + return res; +} + +static int ram_load_memory(StateLoadCtx *s) +{ + SectionHandlersEntry *se; + int res; + + se =3D find_se("ram", 0); + assert(se && se->ops->load_state_iterate); + + /* Send section header with QEMU_VM_SECTION_PART type */ + send_section_header_part_end(s->f_fd, se, QEMU_VM_SECTION_PART); + res =3D se->ops->load_state_iterate(s->f_fd, s, se->real_version_id); + send_section_footer(s->f_fd, se); + + return res; +} + +static int default_load(QEMUFile *f, void *opaque, int version_id) +{ + error_report("Unexpected (non-iterable device state) section"); + return -EINVAL; +} + +static int load_state_header(StateLoadCtx *s) +{ + QEMUFile *f =3D s->f_vmstate; + int v; + + /* Validate magic */ + v =3D qemu_get_be32(f); + if (v !=3D VMSTATE_HEADER_MAGIC) { + error_report("Not a valid snapshot"); + return -EINVAL; + } + + v =3D qemu_get_be32(f); + if (v !=3D page_size) { + error_report("Incompatible page size: got %d expected %d", + v, (int) page_size); + return -EINVAL; + } + + /* Number of non-zero pages in all RAM blocks */ + ram_ctx.normal_pages =3D qemu_get_be64(f); + + /* vmstate stream offsets, counted from QEMU_VM_FILE_MAGIC */ + s->ram_list_offset =3D qemu_get_be32(f); + s->device_offset =3D qemu_get_be32(f); + s->eof_offset =3D qemu_get_be32(f); + + /* Check that offsets are within the limits */ + if ((VMSTATE_HEADER_SIZE + s->device_offset) > INPLACE_READ_MAX || + s->device_offset <=3D s->ram_list_offset) { + error_report("Corrupted snapshot header"); + return -EINVAL; + } + + /* Skip up to RAM block list section */ + qemu_file_skip(f, s->ram_list_offset); + + return 0; +} + +static int load_state_ramlist(StateLoadCtx *s) +{ + QEMUFile *f =3D s->f_vmstate; + uint8_t section_type; + int res; + + section_type =3D qemu_get_byte(f); + + if (section_type =3D=3D QEMU_VM_EOF) { + error_report("Unexpected EOF token"); + return -EINVAL; + } else if (section_type !=3D QEMU_VM_SECTION_FULL && + section_type !=3D QEMU_VM_SECTION_START) { + error_report("Unexpected section type %d", section_type); + return -EINVAL; + } + + res =3D load_section_start_full(s); + if (!res) { + ram_block_list_init_bitmaps(); + } + + return res; +} + +static int load_state_complete(StateLoadCtx *s) +{ + /* Forward non-iterable device state */ + qemu_fsplice(s->f_fd, s->f_vmstate, s->eof_offset - s->device_offset); + + qemu_fflush(s->f_fd); + + return 1; +} + +static int load_section_start_full(StateLoadCtx *s) +{ + QEMUFile *f =3D s->f_vmstate; + int section_id; + int instance_id; + int version_id; + char idstr[256]; + SectionHandlersEntry *se; + int res; + + section_id =3D qemu_get_be32(f); + + if (!qemu_get_counted_string(f, idstr)) { + error_report("Failed to get section name(%d)", section_id); + return -EINVAL; + } + + instance_id =3D qemu_get_be32(f); + version_id =3D qemu_get_be32(f); + + /* Find section handler */ + se =3D find_se(idstr, instance_id); + if (!se) { + se =3D §ion_handlers.default_; + } else if (version_id > se->version_id) { + /* Validate version */ + error_report("Unsupported version %d for '%s' v%d", + version_id, idstr, se->version_id); + return -EINVAL; + } + + se->real_section_id =3D section_id; + se->real_version_id =3D version_id; + + res =3D se->ops->load_state(f, s, se->real_version_id); + if (res) { + return res; + } + + if (!check_section_footer(f, se)) { + return -EINVAL; + } + + return 0; +} + +static int send_state_leader(StateLoadCtx *s) +{ + qemu_put_buffer(s->f_fd, s->ioc_leader->data + VMSTATE_HEADER_SIZE, + s->device_offset); + return qemu_file_get_error(s->f_fd); +} + int coroutine_fn load_state_main(StateLoadCtx *s) { - /* TODO: implement */ - return 0; + int res; + + res =3D load_state_header(s); + if (res) { + goto fail; + } + + res =3D load_state_ramlist(s); + if (res) { + goto fail; + } + + res =3D send_state_leader(s); + if (res) { + goto fail; + } + + do { + res =3D ram_load_memory(s); + /* Check for file errors */ + load_state_check_errors(s, &res); + } while (!res); + + if (res =3D=3D 1) { + res =3D load_state_complete(s); + } + +fail: + load_state_check_errors(s, &res); + + /* Replace positive retcode with 0 */ + return MIN(res, 0); } =20 /* Initialize snapshot RAM state */ @@ -815,10 +1387,20 @@ void ram_init_state(void) RAMCtx *ram =3D &ram_ctx; =20 memset(ram, 0, sizeof(ram_ctx)); + + /* Initialize RAM block list head */ + QSIMPLEQ_INIT(&ram->ram_block_list); } =20 /* Destroy snapshot RAM state */ void ram_destroy_state(void) { - /* TODO: implement */ + RAMBlock *block; + RAMBlock *next_block; + + /* Free RAM blocks */ + QSIMPLEQ_FOREACH_SAFE(block, &ram_ctx.ram_block_list, next, next_block= ) { + g_free(block->bitmap); + g_free(block); + } } diff --git a/qemu-snapshot.c b/qemu-snapshot.c index d434b8f245..92956623f7 100644 --- a/qemu-snapshot.c +++ b/qemu-snapshot.c @@ -121,7 +121,20 @@ static void init_load_context(void) =20 static void destroy_load_context(void) { - /* TODO: implement */ + StateLoadCtx *s =3D get_load_context(); + + if (s->f_vmstate) { + qemu_fclose(s->f_vmstate); + } + if (s->blk) { + blk_unref(s->blk); + } + if (s->aio_ring) { + aio_ring_free(s->aio_ring); + } + if (s->ioc_leader) { + object_unref(OBJECT(s->ioc_leader)); + } } =20 static BlockBackend *image_open_opts(const char *optstr, QDict *options, i= nt flags) @@ -212,6 +225,9 @@ fail: static void coroutine_fn snapshot_load_co(void *opaque) { StateLoadCtx *s =3D get_load_context(); + QIOChannel *ioc_fd; + uint8_t *buf; + size_t count; int res =3D -1; =20 init_load_context(); @@ -223,6 +239,35 @@ static void coroutine_fn snapshot_load_co(void *opaque) goto fail; } =20 + /* QEMUFile on vmstate */ + s->f_vmstate =3D qemu_fopen_bdrv_vmstate(blk_bs(s->blk), 0); + qemu_file_set_blocking(s->f_vmstate, false); + + /* QEMUFile on migration fd */ + ioc_fd =3D qio_channel_new_fd(params.fd, NULL); + qio_channel_set_name(QIO_CHANNEL(ioc_fd), "migration-channel-outgoing"= ); + s->f_fd =3D qemu_fopen_channel_output(ioc_fd); + object_unref(OBJECT(ioc_fd)); + qemu_file_set_blocking(s->f_fd, false); + + /* Buffer channel to store leading part of migration stream */ + s->ioc_leader =3D qio_channel_buffer_new(INPLACE_READ_MAX); + qio_channel_set_name(QIO_CHANNEL(s->ioc_leader), "migration-leader-buf= fer"); + + /* AIO ring */ + s->aio_ring =3D aio_ring_new(ram_load_aio_co, AIO_RING_SIZE, AIO_RING_= INFLIGHT); + + /* + * Here we stash the leading part of vmstate stream without promoting = read + * position. + */ + count =3D qemu_peek_buffer(s->f_vmstate, &buf, INPLACE_READ_MAX, 0); + res =3D qemu_file_get_error(s->f_vmstate); + if (res < 0) { + goto fail; + } + qio_channel_write(QIO_CHANNEL(s->ioc_leader), (char *) buf, count, NUL= L); + res =3D load_state_main(s); if (res) { error_report("Failed to load snapshot: %s", strerror(-res)); --=20 2.27.0 From nobody Sat May 4 11:49:39 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=virtuozzo.com ARC-Seal: i=1; a=rsa-sha256; t=1620850068; cv=none; d=zohomail.com; s=zohoarc; b=M0Yd2QY961CxUrrafxaajEWy8NB8lGgx/OgMR+PtNcX+6Qw6DmDj9rdMD1WyQ9gi9MlWDujjsw1F56QBMFOrG4dFy5O6pfHobLKBPeFmgdpyOHLPHn/D7TrrHg0I+71dyF/XQNZRgwtOc+IXXGcBMplrnuGqy1lCANdBO9y4VRY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1620850068; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=tj6Tes50UWKhrj8GdNBn+fPCwCJtWXH0W8xJG4RNS9k=; b=MlZ3FlalAEFRpcp/YUao7JO1bAm2YpcRZDEbrzqL4tbTsca8U74a0ls9RTZAr+3PgiNxW/wf36Ka95APmU1hpNCywzIU4xcJwtPDfZVVg8x3qTiHVOuQyHB4SSbB3+191CTmQERHMETZWUJ/Lnc7ArpBgKljoJ7feteJt7JhcyY= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1620850068862566.7473730771214; Wed, 12 May 2021 13:07:48 -0700 (PDT) Received: from localhost ([::1]:60142 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lgv91-0000s9-GD for importer@patchew.org; Wed, 12 May 2021 16:07:47 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60224) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lguV4-00027O-R3 for qemu-devel@nongnu.org; Wed, 12 May 2021 15:26:30 -0400 Received: from relay.sw.ru ([185.231.240.75]:44768) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lguUz-0000oq-EW for qemu-devel@nongnu.org; Wed, 12 May 2021 15:26:30 -0400 Received: from [192.168.15.22] (helo=andrey-MS-7B54.sw.ru) by relay.sw.ru with esmtp (Exim 4.94) (envelope-from ) id 1lguUu-002BHm-Kv; Wed, 12 May 2021 22:26:20 +0300 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=tj6Tes50UWKhrj8GdNBn+fPCwCJtWXH0W8xJG4RNS9k=; b=gaMnSryW3Cpm EMlEukiI4mR75GFcHwMrsU6DkTg3u0cfFgSTlZCB/pPNLQYl58cUwZo5Rr9ix5P3mfhRCS8hsLNAB rK3l+l6mtMCLDOZrILHLOrqTYCgVCnsILTwF5ox9isjrWIvm9iCGgrGA1UdvBYtVUZkG+VQPlNyWe zgSrQ=; From: Andrey Gruzdev To: qemu-devel@nongnu.org Cc: Den Lunev , Vladimir Sementsov-Ogievskiy , Eric Blake , Paolo Bonzini , Juan Quintela , "Dr . David Alan Gilbert" , Markus Armbruster , Peter Xu , David Hildenbrand , Andrey Gruzdev Subject: [RFC PATCH v1 7/7] migration/snapshot: Implementation of qemu-snapshot load path in postcopy mode Date: Wed, 12 May 2021 22:26:19 +0300 Message-Id: <20210512192619.537268-8-andrey.gruzdev@virtuozzo.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210512192619.537268-1-andrey.gruzdev@virtuozzo.com> References: <20210512192619.537268-1-andrey.gruzdev@virtuozzo.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=185.231.240.75; envelope-from=andrey.gruzdev@virtuozzo.com; helo=relay.sw.ru X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @virtuozzo.com) Content-Type: text/plain; charset="utf-8" The commit enables asynchronous snapshot loading using standard postcopy migration mechanism on destination VM. The point of switchover to postcopy is trivially selected based on percentage of non-zero pages loaded in precopy. Signed-off-by: Andrey Gruzdev --- include/qemu-snapshot.h | 12 + qemu-snapshot-vm.c | 485 +++++++++++++++++++++++++++++++++++++++- qemu-snapshot.c | 16 ++ 3 files changed, 508 insertions(+), 5 deletions(-) diff --git a/include/qemu-snapshot.h b/include/qemu-snapshot.h index aae730d70e..84a0c38e08 100644 --- a/include/qemu-snapshot.h +++ b/include/qemu-snapshot.h @@ -36,10 +36,14 @@ =20 /* AIO transfer size */ #define AIO_TRANSFER_SIZE BDRV_CLUSTER_SIZE +/* AIO transfer size for postcopy */ +#define AIO_TRANSFER_SIZE_LOWLAT (BDRV_CLUSTER_SIZE / 4) /* AIO ring size */ #define AIO_RING_SIZE 64 /* AIO ring in-flight limit */ #define AIO_RING_INFLIGHT 16 +/* AIO ring in-flight limit for postcopy */ +#define AIO_RING_INFLIGHT_LOWLAT 4 =20 typedef struct AioRing AioRing; =20 @@ -97,12 +101,20 @@ typedef struct StateSaveCtx { typedef struct StateLoadCtx { BlockBackend *blk; /* Block backend */ QEMUFile *f_fd; /* QEMUFile for outgoing stream */ + QEMUFile *f_rp_fd; /* QEMUFile for return path stream */ QEMUFile *f_vmstate; /* QEMUFile for vmstate backing */ =20 QIOChannelBuffer *ioc_leader; /* vmstate stream leader */ =20 AioRing *aio_ring; /* AIO ring */ =20 + bool postcopy; /* From command-line --postcopy */ + int postcopy_percent; /* From command-line --postcopy */ + bool in_postcopy; /* In postcopy mode */ + + QemuThread rp_listen_thread; /* Return path listening thread */ + bool has_rp_listen_thread; /* Have listening thread */ + /* vmstate offset of the section containing list of RAM blocks */ int64_t ram_list_offset; /* vmstate offset of the first non-iterable device section */ diff --git a/qemu-snapshot-vm.c b/qemu-snapshot-vm.c index dae5f84b80..76980520ea 100644 --- a/qemu-snapshot-vm.c +++ b/qemu-snapshot-vm.c @@ -40,7 +40,9 @@ typedef struct RAMBlock { int64_t nr_pages; /* Page count */ int64_t nr_slices; /* Number of slices (for bitmap bookkeepin= g) */ =20 - unsigned long *bitmap; /* Bitmap of RAM slices */ + int64_t discard_offset; /* Last page offset sent in precopy */ + + unsigned long *bitmap; /* Bitmap of transferred RAM slices */ =20 /* Link into ram_list */ QSIMPLEQ_ENTRY(RAMBlock) next; @@ -54,17 +56,33 @@ typedef struct RAMPage { int64_t offset; /* Page offset in RAM block */ } RAMPage; =20 +/* Page request from destination in postcopy */ +typedef struct RAMPageRequest { + RAMBlock *block; /* RAM block */ + int64_t offset; /* Offset in RAM block */ + unsigned size; /* Size of request */ + + /* Link into ram_ctx.page_req */ + QSIMPLEQ_ENTRY(RAMPageRequest) next; +} RAMPageRequest; + /* RAM transfer context */ typedef struct RAMCtx { int64_t normal_pages; /* Total number of normal pages */ + int64_t precopy_pages; /* Normal pages to load in precopy */ int64_t loaded_pages; /* Number of normal pages loaded */ =20 RAMPage last_page; /* Last loaded page */ =20 RAMBlock *last_sent_block; /* RAM block of last sent page */ + RAMBlock *last_req_block; /* RAM block of last page request */ =20 /* RAM block list head */ QSIMPLEQ_HEAD(, RAMBlock) ram_block_list; + + /* Page request queue for postcopy */ + QemuMutex page_req_mutex; + QSIMPLEQ_HEAD(, RAMPageRequest) page_req; } RAMCtx; =20 /* Section handler ops */ @@ -848,6 +866,433 @@ static void load_state_check_errors(StateLoadCtx *s, = int *res) } } =20 +static bool get_queued_page(RAMPage *page) +{ + RAMCtx *ram =3D &ram_ctx; + + if (QSIMPLEQ_EMPTY_ATOMIC(&ram->page_req)) { + return false; + } + + QEMU_LOCK_GUARD(&ram->page_req_mutex); + + while (!QSIMPLEQ_EMPTY(&ram->page_req)) { + RAMPageRequest *entry =3D QSIMPLEQ_FIRST(&ram->page_req); + RAMBlock *block =3D entry->block; + int64_t slice =3D entry->offset >> slice_bits; + + QSIMPLEQ_REMOVE_HEAD(&ram->page_req, next); + g_free(entry); + + /* + * Test respective bit in RAM block's slice bitmap to check if + * we still haven't read that slice from the image. + */ + if (test_bit(slice, block->bitmap)) { + page->block =3D block; + page->offset =3D slice << slice_bits; + + return true; + } + } + + return false; +} + +static int queue_page_request(const char *idstr, int64_t offset, unsigned = size) +{ + RAMCtx *ram =3D &ram_ctx; + RAMBlock *block; + RAMPageRequest *new_entry; + + if (!idstr) { + block =3D ram->last_req_block; + if (!block) { + error_report("RP-REQ_PAGES: no previous block"); + return -EINVAL; + } + } else { + block =3D ram_block_by_idstr(idstr); + if (!block) { + error_report("RP-REQ_PAGES: cannot find block %s", idstr); + return -EINVAL; + } + + ram->last_req_block =3D block; + } + + if (!ram_offset_in_block(block, offset)) { + error_report("RP-REQ_PAGES: offset 0x%" PRIx64 " out of RAM block = %s", + offset, idstr); + return -EINVAL; + } + + new_entry =3D g_new0(RAMPageRequest, 1); + new_entry->block =3D block; + new_entry->offset =3D offset; + new_entry->size =3D size; + + qemu_mutex_lock(&ram->page_req_mutex); + QSIMPLEQ_INSERT_TAIL(&ram->page_req, new_entry, next); + qemu_mutex_unlock(&ram->page_req_mutex); + + return 0; +} + +/* QEMU_VM_COMMAND sub-commands */ +typedef enum VmSubCmd { + MIG_CMD_OPEN_RETURN_PATH =3D 1, + MIG_CMD_POSTCOPY_ADVISE =3D 3, + MIG_CMD_POSTCOPY_LISTEN =3D 4, + MIG_CMD_POSTCOPY_RUN =3D 5, + MIG_CMD_POSTCOPY_RAM_DISCARD =3D 6, + MIG_CMD_PACKAGED =3D 7, +} VmSubCmd; + +/* Return-path message types */ +typedef enum RpMsgType { + MIG_RP_MSG_INVALID =3D 0, + MIG_RP_MSG_SHUT =3D 1, + MIG_RP_MSG_REQ_PAGES_ID =3D 3, + MIG_RP_MSG_REQ_PAGES =3D 4, + MIG_RP_MSG_MAX =3D 7, +} RpMsgType; + +typedef struct RpMsgArgs { + int len; + const char *name; +} RpMsgArgs; + +/* + * Return-path message length/name indexed by message type. + * -1 value stands for variable message length. + */ +static RpMsgArgs rp_msg_args[] =3D { + [MIG_RP_MSG_INVALID] =3D { .len =3D -1, .name =3D "INVALID" }, + [MIG_RP_MSG_SHUT] =3D { .len =3D 4, .name =3D "SHUT" }, + [MIG_RP_MSG_REQ_PAGES_ID] =3D { .len =3D -1, .name =3D "REQ_PAGES_ID= " }, + [MIG_RP_MSG_REQ_PAGES] =3D { .len =3D 12, .name =3D "REQ_PAGES" }, + [MIG_RP_MSG_MAX] =3D { .len =3D -1, .name =3D "MAX" }, +}; + +/* Return-path message processing thread */ +static void *rp_listen_thread(void *opaque) +{ + StateLoadCtx *s =3D (StateLoadCtx *) opaque; + QEMUFile *f =3D s->f_rp_fd; + int res =3D 0; + + while (!res) { + uint8_t h_buf[512]; + const int h_max_len =3D sizeof(h_buf); + int h_type; + int h_len; + size_t count; + + h_type =3D qemu_get_be16(f); + h_len =3D qemu_get_be16(f); + + /* Make early check for input errors */ + res =3D qemu_file_get_error(f); + if (res) { + break; + } + + /* Check message type */ + if (h_type >=3D MIG_RP_MSG_MAX || h_type =3D=3D MIG_RP_MSG_INVALID= ) { + error_report("RP: received invalid message type %d length %d", + h_type, h_len); + res =3D -EINVAL; + break; + } + + /* Check message length */ + if (rp_msg_args[h_type].len !=3D -1 && h_len !=3D rp_msg_args[h_ty= pe].len) { + error_report("RP: received %s message len %d expected %d", + rp_msg_args[h_type].name, + h_len, rp_msg_args[h_type].len); + res =3D -EINVAL; + break; + } else if (h_len > h_max_len) { + error_report("RP: received %s message len %d max_len %d", + rp_msg_args[h_type].name, h_len, h_max_len); + res =3D -EINVAL; + break; + } + + count =3D qemu_get_buffer(f, h_buf, h_len); + if (count !=3D h_len) { + break; + } + + switch (h_type) { + case MIG_RP_MSG_SHUT: + { + int shut_error; + + shut_error =3D be32_to_cpu(*(uint32_t *) h_buf); + if (shut_error) { + error_report("RP: sibling shutdown, error %d", shut_error); + } + + /* Exit processing loop */ + res =3D 1; + break; + } + + case MIG_RP_MSG_REQ_PAGES: + case MIG_RP_MSG_REQ_PAGES_ID: + { + uint64_t offset; + uint32_t size; + char *id_str =3D NULL; + + offset =3D be64_to_cpu(*(uint64_t *) (h_buf + 0)); + size =3D be32_to_cpu(*(uint32_t *) (h_buf + 8)); + + if (h_type =3D=3D MIG_RP_MSG_REQ_PAGES_ID) { + int h_parsed_len =3D rp_msg_args[MIG_RP_MSG_REQ_PAGES].len; + + if (h_len > h_parsed_len) { + int id_len; + + /* RAM block id string */ + id_len =3D h_buf[h_parsed_len]; + id_str =3D (char *) &h_buf[h_parsed_len + 1]; + id_str[id_len] =3D 0; + + h_parsed_len +=3D id_len + 1; + } + + if (h_parsed_len !=3D h_len) { + error_report("RP: received %s message len %d expected = %d", + rp_msg_args[MIG_RP_MSG_REQ_PAGES_ID].name, + h_len, h_parsed_len); + res =3D -EINVAL; + break; + } + } + + res =3D queue_page_request(id_str, offset, size); + break; + } + + default: + error_report("RP: received unexpected message type %d len %d", + h_type, h_len); + res =3D -EINVAL; + } + } + + if (res >=3D 0) { + res =3D qemu_file_get_error(f); + } + if (res) { + error_report("RP: listen thread exit, error %d", res); + } + + return NULL; +} + +static void send_command(QEMUFile *f, int cmd, uint16_t len, uint8_t *data) +{ + qemu_put_byte(f, QEMU_VM_COMMAND); + qemu_put_be16(f, (uint16_t) cmd); + qemu_put_be16(f, len); + + qemu_put_buffer_async(f, data, len, false); + qemu_fflush(f); +} + +static void send_ram_block_discard(QEMUFile *f, RAMBlock *block) +{ + int id_len; + int msg_len; + uint8_t msg_buf[512]; + + id_len =3D strlen(block->idstr); + assert(id_len < 256); + + /* Version, always 0 */ + msg_buf[0] =3D 0; + /* RAM block ID string length, not including terminating 0 */ + msg_buf[1] =3D id_len; + /* RAM block ID string with terminating zero */ + memcpy(msg_buf + 2, block->idstr, (id_len + 1)); + msg_len =3D 2 + id_len + 1; + /* Discard range offset */ + stq_be_p(msg_buf + msg_len, block->discard_offset); + msg_len +=3D 8; + /* Discard range length */ + stq_be_p(msg_buf + msg_len, (block->length - block->discard_offset)); + msg_len +=3D 8; + + send_command(f, MIG_CMD_POSTCOPY_RAM_DISCARD, msg_len, msg_buf); +} + +static int send_each_ram_block_discard(QEMUFile *f) +{ + RAMBlock *block; + int res =3D 0; + + QSIMPLEQ_FOREACH(block, &ram_ctx.ram_block_list, next) { + send_ram_block_discard(f, block); + + res =3D qemu_file_get_error(f); + if (res) { + break; + } + } + + return res; +} + +static int prepare_postcopy(StateLoadCtx *s) +{ + QEMUFile *f =3D s->f_fd; + uint64_t tmp[2]; + int res; + + /* Number of pages to load in precopy before switching to postcopy */ + ram_ctx.precopy_pages =3D ram_ctx.normal_pages * s->postcopy_percent /= 100; + + /* Send POSTCOPY_ADVISE */ + tmp[0] =3D cpu_to_be64(page_size); + tmp[1] =3D cpu_to_be64(page_size); + send_command(f, MIG_CMD_POSTCOPY_ADVISE, 16, (uint8_t *) tmp); + + /* Open return path on destination */ + send_command(f, MIG_CMD_OPEN_RETURN_PATH, 0, NULL); + + /* + * Check for file errors after sending POSTCOPY_ADVISE command + * since destination may already have closed input pipe in case + * postcopy had not been enabled in advance. + */ + res =3D qemu_file_get_error(f); + if (!res) { + qemu_thread_create(&s->rp_listen_thread, "rp_thread", + rp_listen_thread, s, QEMU_THREAD_JOINABLE); + s->has_rp_listen_thread =3D true; + } + + return res; +} + +static int start_postcopy(StateLoadCtx *s) +{ + QIOChannelBuffer *bioc; + QEMUFile *fb; + int eof_pos; + uint32_t length; + int res =3D 0; + + /* + * Send RAM discards for each block's unsent part. Without discards, + * the userfault_fd code on destination will not trigger page requests + * as expected. Also, the UFFDIO_COPY ioctl that is used to place inco= ming + * page in postcopy would give an error if that page has not faulted + * with MISSING reason. + */ + res =3D send_each_ram_block_discard(s->f_fd); + if (res) { + return res; + } + + /* + * To perform a switch to postcopy on destination, we need to send + * commands and the device state data in the following order: + * * MIG_CMD_POSTCOPY_LISTEN + * * Non-iterable device state sections + * * MIG_CMD_POSTCOPY_RUN + * + * All this has to be packaged into a single blob using MIG_CMD_PACKAG= ED + * command. While loading the device state we may trigger page transfer + * requests and the fd must be free to process those, thus the destina= tion + * must read the whole device state off the fd before it starts + * processing it. To wrap it up in a package, QEMU buffer channel is u= sed. + */ + bioc =3D qio_channel_buffer_new(512 * 1024); + qio_channel_set_name(QIO_CHANNEL(bioc), "migration-postcopy-buffer"); + fb =3D qemu_fopen_channel_output(QIO_CHANNEL(bioc)); + object_unref(OBJECT(bioc)); + + /* MIG_CMD_POSTCOPY_LISTEN command */ + send_command(fb, MIG_CMD_POSTCOPY_LISTEN, 0, NULL); + + /* The rest of non-iterable device state with an optional vmdesc secti= on */ + qemu_fsplice(fb, s->f_vmstate, s->eof_offset - s->device_offset); + qemu_fflush(fb); + + /* + * vmdesc section may optionally be present at the end of the stream + * so we'll try to locate it and truncate the trailer. + */ + eof_pos =3D bioc->usage - 1; + + for (int offset =3D (bioc->usage - 11); offset >=3D 0; offset--) { + if (bioc->data[offset] =3D=3D QEMU_VM_SECTION_FOOTER && + bioc->data[offset + 5] =3D=3D QEMU_VM_EOF && + bioc->data[offset + 6] =3D=3D QEMU_VM_VMDESCRIPTION) { + uint32_t expected_length =3D bioc->usage - (offset + 11); + uint32_t json_length; + + json_length =3D be32_to_cpu(*(uint32_t *) &bioc->data[offset = + 7]); + if (json_length !=3D expected_length) { + error_report("Corrupted vmdesc trailer: length %" PRIu32 + " expected %" PRIu32, + json_length, expected_length); + res =3D -EINVAL; + goto fail; + } + + eof_pos =3D offset + 5; + break; + } + } + + /* + * When switching to postcopy we need to skip QEMU_VM_EOF token which + * normally is placed after the last non-iterable device state section + * (but before the vmdesc section). + * + * Skipping QEMU_VM_EOF is required to allow migration process to + * continue in postcopy. Vmdesc section also has to be skipped here. + */ + if (eof_pos >=3D 0 && bioc->data[eof_pos] =3D=3D QEMU_VM_EOF) { + bioc->usage =3D eof_pos; + bioc->offset =3D eof_pos; + } + + /* Finally is the MIG_CMD_POSTCOPY_RUN command */ + send_command(fb, MIG_CMD_POSTCOPY_RUN, 0, NULL); + + /* Now send that blob */ + length =3D cpu_to_be32(bioc->usage); + send_command(s->f_fd, MIG_CMD_PACKAGED, sizeof(length), (uint8_t *) &l= ength); + qemu_put_buffer_async(s->f_fd, bioc->data, bioc->usage, false); + qemu_fflush(s->f_fd); + + /* + * Switch to lower setting of in-flight requests limit + * to reduce page request latencies. + */ + aio_ring_set_max_inflight(s->aio_ring, AIO_RING_INFLIGHT_LOWLAT); + + s->in_postcopy =3D true; + +fail: + qemu_fclose(fb); + load_state_check_errors(s, &res); + + return res; +} + +static bool is_postcopy_switchover(StateLoadCtx *s) +{ + return ram_ctx.loaded_pages > ram_ctx.precopy_pages; +} + static void send_section_header_part_end(QEMUFile *f, SectionHandlersEntry= *se, uint8_t section_type) { @@ -987,10 +1432,13 @@ static void coroutine_fn ram_load_submit_aio(StateLo= adCtx *s) =20 while ((req =3D aio_ring_get_request(s->aio_ring))) { RAMPage page; - unsigned max_length =3D AIO_TRANSFER_SIZE; + unsigned max_length =3D s->in_postcopy ? AIO_TRANSFER_SIZE_LOWLAT : + AIO_TRANSFER_SIZE; unsigned length; + bool urgent; =20 - if (!find_next_page(&page)) { + urgent =3D get_queued_page(&page); + if (!urgent && !find_next_page(&page)) { break; } =20 @@ -1003,6 +1451,9 @@ static void coroutine_fn ram_load_submit_aio(StateLoa= dCtx *s) ram->last_page.block =3D page.block; ram->last_page.offset =3D page.offset + length; =20 + /* Used by send_ram_block_discard() */ + page.block->discard_offset =3D ram->last_page.offset; + /* Setup I/O request */ req->opaque =3D s->blk; req->data =3D qemu_blockalign(blk_bs(s->blk), length); @@ -1284,8 +1735,13 @@ static int load_state_ramlist(StateLoadCtx *s) =20 static int load_state_complete(StateLoadCtx *s) { - /* Forward non-iterable device state */ - qemu_fsplice(s->f_fd, s->f_vmstate, s->eof_offset - s->device_offset); + if (!s->in_postcopy) { + /* Forward non-iterable device state */ + qemu_fsplice(s->f_fd, s->f_vmstate, s->eof_offset - s->device_offs= et); + } else { + /* Send terminating QEMU_VM_EOF if in postcopy */ + qemu_put_byte(s->f_fd, QEMU_VM_EOF); + } =20 qemu_fflush(s->f_fd); =20 @@ -1364,10 +1820,22 @@ int coroutine_fn load_state_main(StateLoadCtx *s) goto fail; } =20 + if (s->postcopy) { + res =3D prepare_postcopy(s); + if (res) { + goto fail; + } + } + do { res =3D ram_load_memory(s); /* Check for file errors */ load_state_check_errors(s, &res); + + /* Switch to postcopy? */ + if (!res && s->postcopy && !s->in_postcopy && is_postcopy_switchov= er(s)) { + res =3D start_postcopy(s); + } } while (!res); =20 if (res =3D=3D 1) { @@ -1390,6 +1858,10 @@ void ram_init_state(void) =20 /* Initialize RAM block list head */ QSIMPLEQ_INIT(&ram->ram_block_list); + + /* Initialize postcopy page request queue */ + qemu_mutex_init(&ram->page_req_mutex); + QSIMPLEQ_INIT(&ram->page_req); } =20 /* Destroy snapshot RAM state */ @@ -1403,4 +1875,7 @@ void ram_destroy_state(void) g_free(block->bitmap); g_free(block); } + + /* Destroy page request mutex */ + qemu_mutex_destroy(&ram_ctx.page_req_mutex); } diff --git a/qemu-snapshot.c b/qemu-snapshot.c index 92956623f7..29d954c5d6 100644 --- a/qemu-snapshot.c +++ b/qemu-snapshot.c @@ -123,6 +123,10 @@ static void destroy_load_context(void) { StateLoadCtx *s =3D get_load_context(); =20 + if (s->has_rp_listen_thread) { + qemu_thread_join(&s->rp_listen_thread); + } + if (s->f_vmstate) { qemu_fclose(s->f_vmstate); } @@ -226,12 +230,16 @@ static void coroutine_fn snapshot_load_co(void *opaqu= e) { StateLoadCtx *s =3D get_load_context(); QIOChannel *ioc_fd; + QIOChannel *ioc_rp_fd; uint8_t *buf; size_t count; int res =3D -1; =20 init_load_context(); =20 + s->postcopy =3D params.postcopy; + s->postcopy_percent =3D params.postcopy_percent; + /* Block backend */ s->blk =3D image_open_opts(params.blk_optstr, params.blk_options, params.blk_flags); @@ -250,6 +258,14 @@ static void coroutine_fn snapshot_load_co(void *opaque) object_unref(OBJECT(ioc_fd)); qemu_file_set_blocking(s->f_fd, false); =20 + /* QEMUFile on return path fd if we are going to use postcopy */ + if (params.postcopy) { + ioc_rp_fd =3D qio_channel_new_fd(params.rp_fd, NULL); + qio_channel_set_name(QIO_CHANNEL(ioc_fd), "migration-channel-rp"); + s->f_rp_fd =3D qemu_fopen_channel_input(ioc_rp_fd); + object_unref(OBJECT(ioc_rp_fd)); + } + /* Buffer channel to store leading part of migration stream */ s->ioc_leader =3D qio_channel_buffer_new(INPLACE_READ_MAX); qio_channel_set_name(QIO_CHANNEL(s->ioc_leader), "migration-leader-buf= fer"); --=20 2.27.0