From nobody Tue Feb 10 02:44:17 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=virtuozzo.com ARC-Seal: i=1; a=rsa-sha256; t=1620849837; cv=none; d=zohomail.com; s=zohoarc; b=I/6dGSpc5A1TZfr2ds1H1kpEOGLLmmpRqkbhbk2EwoEE/gG/UjE13J1aBP8XSW1K8oumsDG/YC0ZTdLSI6NayMhTfTUlHjHOQXQl+oa9c8VNotRO/Bu6870jNZEwEYfpIPT/T8UsU7hA0prsCs3SpahkpbklvVapG56+q0cy/gk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1620849837; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=jUzyxOas3D4hkM6Th+OZ/ZukUBJ1FSlPRUclUe0K3ok=; b=NKf6wnqKVzrifnOrEcCOJdcGoHu3FhopXQ0uvP2GgiTDgRoWtSUmvvNE/8Xqb//S0oMOlq7bE5U0BNG8BNIIESJmUIvJit7uhV+Fnfhqh4/6Uqpzo2dfQffZ3bVmcp/NqRp1HLqoPaXK/MUuBbWgC0asQwycOWUgweq1puDpfAg= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1620849836694260.46329429837135; Wed, 12 May 2021 13:03:56 -0700 (PDT) Received: from localhost ([::1]:53332 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lgv5H-0004Wc-Sp for importer@patchew.org; Wed, 12 May 2021 16:03:55 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60176) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lguV2-00022B-IW for qemu-devel@nongnu.org; Wed, 12 May 2021 15:26:28 -0400 Received: from relay.sw.ru ([185.231.240.75]:44762) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lguUz-0000op-FJ for qemu-devel@nongnu.org; Wed, 12 May 2021 15:26:28 -0400 Received: from [192.168.15.22] (helo=andrey-MS-7B54.sw.ru) by relay.sw.ru with esmtp (Exim 4.94) (envelope-from ) id 1lguUu-002BHm-5E; Wed, 12 May 2021 22:26:20 +0300 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=jUzyxOas3D4hkM6Th+OZ/ZukUBJ1FSlPRUclUe0K3ok=; b=N3LeN9g7qc7o HMy+cPEhD40WDZN4U8OGokiw+iKBuiHmtn7JAPSo1k5ZaOLGdhJn3Rlfc9y2xusJxB2GS1zCzIRVC 4Hiehuta2m3lFOd+sVN2APGIEHc0NUriYlvxcqYtSoEsOYaYYLnys5T+O0XCS+u/7fCMiWVoz3I3X pNOh0=; From: Andrey Gruzdev To: qemu-devel@nongnu.org Cc: Den Lunev , Vladimir Sementsov-Ogievskiy , Eric Blake , Paolo Bonzini , Juan Quintela , "Dr . David Alan Gilbert" , Markus Armbruster , Peter Xu , David Hildenbrand , Andrey Gruzdev Subject: [RFC PATCH v1 4/7] migration/snapshot: Block layer AIO support in qemu-snapshot Date: Wed, 12 May 2021 22:26:16 +0300 Message-Id: <20210512192619.537268-5-andrey.gruzdev@virtuozzo.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210512192619.537268-1-andrey.gruzdev@virtuozzo.com> References: <20210512192619.537268-1-andrey.gruzdev@virtuozzo.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=185.231.240.75; envelope-from=andrey.gruzdev@virtuozzo.com; helo=relay.sw.ru X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @virtuozzo.com) Content-Type: text/plain; charset="utf-8" This commit enables asynchronous block layer I/O for qemu-snapshot tool. Implementation provides in-order request completion delivery to simplify migration code. Several file utility routines are introduced as well. Signed-off-by: Andrey Gruzdev --- include/qemu-snapshot.h | 30 +++++ meson.build | 2 +- qemu-snapshot-io.c | 266 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 297 insertions(+), 1 deletion(-) create mode 100644 qemu-snapshot-io.c diff --git a/include/qemu-snapshot.h b/include/qemu-snapshot.h index 154e11e9a5..7b3406fd56 100644 --- a/include/qemu-snapshot.h +++ b/include/qemu-snapshot.h @@ -34,6 +34,23 @@ /* RAM slice size for snapshot revert */ #define SLICE_SIZE_REVERT (16 * PAGE_SIZE_MAX) =20 +typedef struct AioRing AioRing; + +typedef struct AioRingRequest { + void *opaque; /* Opaque */ + + void *data; /* Data buffer */ + int64_t offset; /* Offset */ + size_t size; /* Size */ +} AioRingRequest; + +typedef struct AioRingEvent { + AioRingRequest *origin; /* Originating request */ + ssize_t status; /* Completion status */ +} AioRingEvent; + +typedef ssize_t coroutine_fn (*AioRingFunc)(AioRingRequest *req); + typedef struct StateSaveCtx { BlockBackend *blk; /* Block backend */ } StateSaveCtx; @@ -56,4 +73,17 @@ StateLoadCtx *get_load_context(void); int coroutine_fn save_state_main(StateSaveCtx *s); int coroutine_fn load_state_main(StateLoadCtx *s); =20 +AioRing *coroutine_fn aio_ring_new(AioRingFunc func, unsigned ring_entries, + unsigned max_inflight); +void aio_ring_free(AioRing *ring); +void aio_ring_set_max_inflight(AioRing *ring, unsigned max_inflight); +AioRingRequest *coroutine_fn aio_ring_get_request(AioRing *ring); +void coroutine_fn aio_ring_submit(AioRing *ring); +AioRingEvent *coroutine_fn aio_ring_wait_event(AioRing *ring); +void coroutine_fn aio_ring_complete(AioRing *ring); + +QEMUFile *qemu_fopen_bdrv_vmstate(BlockDriverState *bs, int is_writable); +void qemu_fsplice(QEMUFile *f_dst, QEMUFile *f_src, size_t size); +void qemu_fsplice_tail(QEMUFile *f_dst, QEMUFile *f_src); + #endif /* QEMU_SNAPSHOT_H */ diff --git a/meson.build b/meson.build index b851671914..c25fc518df 100644 --- a/meson.build +++ b/meson.build @@ -2361,7 +2361,7 @@ if have_tools dependencies: [block, qemuutil], install: true) qemu_nbd =3D executable('qemu-nbd', files('qemu-nbd.c'), dependencies: [blockdev, qemuutil, gnutls], install: true) - qemu_snapshot =3D executable('qemu-snapshot', files('qemu-snapshot.c', '= qemu-snapshot-vm.c'), + qemu_snapshot =3D executable('qemu-snapshot', files('qemu-snapshot.c', '= qemu-snapshot-vm.c', 'qemu-snapshot-io.c'), dependencies: [blockdev, qemuutil, migration], install: tru= e) =20 subdir('storage-daemon') diff --git a/qemu-snapshot-io.c b/qemu-snapshot-io.c new file mode 100644 index 0000000000..cd6428a4a2 --- /dev/null +++ b/qemu-snapshot-io.c @@ -0,0 +1,266 @@ +/* + * QEMU External Snapshot Utility + * + * Copyright Virtuozzo GmbH, 2021 + * + * Authors: + * Andrey Gruzdev + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "qemu/coroutine.h" +#include "sysemu/block-backend.h" +#include "migration/qemu-file.h" +#include "qemu-snapshot.h" + +/* + * AIO ring. + * + * Coroutine-based environment to support asynchronous I/O operations + * providing in-order completion event delivery. + * + * All routines (with an exception of aio_ring_free()) are required to be + * called from the same coroutine. + * + * Call sequence to keep AIO ring filled: + * + * aio_ring_new() ! + * ! + * aio_ring_get_request() !<------!<------! + * aio_ring_submit() !------>! ! + * ! ! + * aio_ring_wait_event() ! ! + * aio_ring_complete() !-------------->! + * ! + * aio_ring_free() ! + * + */ + +typedef struct AioRingEntry { + AioRingRequest request; /* I/O request */ + AioRingEvent event; /* I/O completion event */ + bool owned; /* Owned by caller */ +} AioRingEntry; + +typedef struct AioRing { + unsigned head; /* Head entry index */ + unsigned tail; /* Tail entry index */ + + unsigned ring_mask; /* Mask for ring entry indices */ + unsigned ring_entries; /* Number of entries in the ring */ + + AioRingFunc func; /* Routine to call */ + + Coroutine *main_co; /* Caller's coroutine */ + bool waiting; /* Caller is waiting for event */ + + unsigned length; /* Tail-head distance */ + unsigned inflight; /* Number of in-flight requests */ + unsigned max_inflight; /* Maximum in-flight requests */ + + AioRingEntry entries[]; /* Flex-array of AioRingEntry */ +} AioRing; + +static void coroutine_fn aio_ring_co(void *opaque) +{ + AioRing *ring =3D (AioRing *) opaque; + AioRingEntry *entry =3D &ring->entries[ring->tail]; + + ring->tail =3D (ring->tail + 1) & ring->ring_mask; + ring->length++; + + ring->inflight++; + entry->owned =3D false; + + entry->event.status =3D ring->func(&entry->request); + + entry->event.origin =3D &entry->request; + entry->owned =3D true; + ring->inflight--; + + if (ring->waiting) { + ring->waiting =3D false; + aio_co_wake(ring->main_co); + } +} + +AioRingRequest *coroutine_fn aio_ring_get_request(AioRing *ring) +{ + assert(qemu_coroutine_self() =3D=3D ring->main_co); + + if (ring->length >=3D ring->ring_entries || + ring->inflight >=3D ring->max_inflight) { + return NULL; + } + + return &ring->entries[ring->tail].request; +} + +void coroutine_fn aio_ring_submit(AioRing *ring) +{ + assert(qemu_coroutine_self() =3D=3D ring->main_co); + assert(ring->length < ring->ring_entries); + + qemu_coroutine_enter(qemu_coroutine_create(aio_ring_co, ring)); +} + +AioRingEvent *coroutine_fn aio_ring_wait_event(AioRing *ring) +{ + AioRingEntry *entry =3D &ring->entries[ring->head]; + + assert(qemu_coroutine_self() =3D=3D ring->main_co); + + if (!ring->length) { + return NULL; + } + + while (true) { + if (entry->owned) { + return &entry->event; + } + ring->waiting =3D true; + qemu_coroutine_yield(); + } + + /* NOTREACHED */ +} + +void coroutine_fn aio_ring_complete(AioRing *ring) +{ + AioRingEntry *entry =3D &ring->entries[ring->head]; + + assert(qemu_coroutine_self() =3D=3D ring->main_co); + assert(ring->length); + + ring->head =3D (ring->head + 1) & ring->ring_mask; + ring->length--; + + entry->event.origin =3D NULL; + entry->event.status =3D 0; +} + +/* Create new AIO ring */ +AioRing *coroutine_fn aio_ring_new(AioRingFunc func, unsigned ring_entries, + unsigned max_inflight) +{ + AioRing *ring; + + assert(is_power_of_2(ring_entries)); + assert(max_inflight && max_inflight <=3D ring_entries); + + ring =3D g_malloc0(sizeof(AioRing) + ring_entries * sizeof(AioRingEntr= y)); + ring->main_co =3D qemu_coroutine_self(); + ring->ring_entries =3D ring_entries; + ring->ring_mask =3D ring_entries - 1; + ring->max_inflight =3D max_inflight; + ring->func =3D func; + + return ring; +} + +/* Free AIO ring */ +void aio_ring_free(AioRing *ring) +{ + assert(!ring->inflight); + g_free(ring); +} + +/* Limit the maximum number of in-flight AIO requests */ +void aio_ring_set_max_inflight(AioRing *ring, unsigned max_inflight) +{ + ring->max_inflight =3D MIN(max_inflight, ring->ring_entries); +} + +static ssize_t bdrv_vmstate_get_buffer(void *opaque, uint8_t *buf, int64_t= pos, + size_t size, Error **errp) +{ + return bdrv_load_vmstate((BlockDriverState *) opaque, buf, pos, size); +} + +static ssize_t bdrv_vmstate_writev_buffer(void *opaque, struct iovec *iov, + int iovcnt, int64_t pos, Error **errp) +{ + QEMUIOVector qiov; + int res; + + qemu_iovec_init_external(&qiov, iov, iovcnt); + + res =3D bdrv_writev_vmstate((BlockDriverState *) opaque, &qiov, pos); + if (res < 0) { + return res; + } + + return qiov.size; +} + +static int bdrv_vmstate_fclose(void *opaque, Error **errp) +{ + return bdrv_flush((BlockDriverState *) opaque); +} + +static const QEMUFileOps bdrv_vmstate_read_ops =3D { + .get_buffer =3D bdrv_vmstate_get_buffer, + .close =3D bdrv_vmstate_fclose, +}; + +static const QEMUFileOps bdrv_vmstate_write_ops =3D { + .writev_buffer =3D bdrv_vmstate_writev_buffer, + .close =3D bdrv_vmstate_fclose, +}; + +/* Create QEMUFile to access vmstate stream on QCOW2 image */ +QEMUFile *qemu_fopen_bdrv_vmstate(BlockDriverState *bs, int is_writable) +{ + if (is_writable) { + return qemu_fopen_ops(bs, &bdrv_vmstate_write_ops); + } + + return qemu_fopen_ops(bs, &bdrv_vmstate_read_ops); +} + +/* Move number of bytes from the source QEMUFile to destination */ +void qemu_fsplice(QEMUFile *f_dst, QEMUFile *f_src, size_t size) +{ + size_t rest =3D size; + + while (rest) { + uint8_t *ptr =3D NULL; + size_t req_size; + size_t count; + + req_size =3D MIN(rest, INPLACE_READ_MAX); + count =3D qemu_peek_buffer(f_src, &ptr, req_size, 0); + qemu_file_skip(f_src, count); + + qemu_put_buffer(f_dst, ptr, count); + rest -=3D count; + } +} + +/* + * Move data from source QEMUFile to destination + * until EOF is reached on source. + */ +void qemu_fsplice_tail(QEMUFile *f_dst, QEMUFile *f_src) +{ + bool eof =3D false; + + while (!eof) { + const size_t size =3D INPLACE_READ_MAX; + uint8_t *buffer =3D NULL; + size_t count; + + count =3D qemu_peek_buffer(f_src, &buffer, size, 0); + qemu_file_skip(f_src, count); + + /* Reached EOF on source? */ + if (count !=3D size) { + eof =3D true; + } + + qemu_put_buffer(f_dst, buffer, count); + } +} --=20 2.27.0