From nobody Thu Nov 6 06:17:04 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=virtuozzo.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1539620150166672.8684226802501; Mon, 15 Oct 2018 09:15:50 -0700 (PDT) Received: from localhost ([::1]:53107 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gC5X1-0006lA-07 for importer@patchew.org; Mon, 15 Oct 2018 12:15:47 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53380) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gC5OH-0008M5-Qu for qemu-devel@nongnu.org; Mon, 15 Oct 2018 12:06:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gC5OA-00053m-S0 for qemu-devel@nongnu.org; Mon, 15 Oct 2018 12:06:45 -0400 Received: from relay.sw.ru ([185.231.240.75]:41888) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gC5OA-00052H-9K; Mon, 15 Oct 2018 12:06:38 -0400 Received: from [10.28.8.145] (helo=kvm.sw.ru) by relay.sw.ru with esmtp (Exim 4.90_1) (envelope-from ) id 1gC5O7-0004hp-IS; Mon, 15 Oct 2018 19:06:35 +0300 From: Vladimir Sementsov-Ogievskiy To: qemu-devel@nongnu.org, qemu-block@nongnu.org Date: Mon, 15 Oct 2018 19:06:30 +0300 Message-Id: <20181015160633.63130-9-vsementsov@virtuozzo.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20181015160633.63130-1-vsementsov@virtuozzo.com> References: <20181015160633.63130-1-vsementsov@virtuozzo.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 185.231.240.75 Subject: [Qemu-devel] [PATCH v4 08/11] block: introduce backup-top filter driver X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, vsementsov@virtuozzo.com, famz@redhat.com, jcody@redhat.com, mreitz@redhat.com, stefanha@redhat.com, den@openvz.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RDMRC_1 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Backup-top filter does copy-before-write operation. It should be inserted above active disk and has a target node for CBW, like the following: +-------+ | Guest | +---+---+ |r,w v +---+-----------+ target +---------------+ | backup_top |---------->| target(qcow2) | +---+-----------+ CBW +---+-----------+ | backing |r,w v +---+---------+ | Active disk | +-------------+ The driver will be used in backup instead of write-notifiers. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/backup-top.h | 44 +++++++ block/backup-top.c | 298 ++++++++++++++++++++++++++++++++++++++++++++ block/Makefile.objs | 2 + 3 files changed, 344 insertions(+) create mode 100644 block/backup-top.h create mode 100644 block/backup-top.c diff --git a/block/backup-top.h b/block/backup-top.h new file mode 100644 index 0000000000..c26af9fb78 --- /dev/null +++ b/block/backup-top.h @@ -0,0 +1,44 @@ +/* + * backup-top filter driver + * + * The driver performs Copy-Before-Write (CBW) operation: it is injected a= bove + * some node, and before each write it copies _old_ data to the target nod= e. + * + * Copyright (c) 2018 Virtuozzo International GmbH. All rights reserved. + * + * Author: + * Sementsov-Ogievskiy Vladimir + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see . + */ + +#include "qemu/osdep.h" + +#include "block/block_int.h" + +typedef struct BDRVBackupTopState { + HBitmap *copy_bitmap; /* what should be copied to @target + on guest write. */ + BdrvChild *target; + + uint64_t bytes_copied; +} BDRVBackupTopState; + +void bdrv_backup_top_drop(BlockDriverState *bs); +uint64_t bdrv_backup_top_progress(BlockDriverState *bs); + +BlockDriverState *bdrv_backup_top_append(BlockDriverState *source, + BlockDriverState *target, + HBitmap *copy_bitmap, + Error **errp); diff --git a/block/backup-top.c b/block/backup-top.c new file mode 100644 index 0000000000..8cb081f6f3 --- /dev/null +++ b/block/backup-top.c @@ -0,0 +1,298 @@ +/* + * backup-top filter driver + * + * The driver performs Copy-Before-Write (CBW) operation: it is injected a= bove + * some node, and before each write it copies _old_ data to the target nod= e. + * + * Copyright (c) 2018 Virtuozzo International GmbH. All rights reserved. + * + * Author: + * Sementsov-Ogievskiy Vladimir + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see . + */ + +#include "qemu/osdep.h" + +#include "qemu/cutils.h" +#include "qapi/error.h" +#include "block/block_int.h" +#include "block/qdict.h" + +#include "block/backup-top.h" + +static coroutine_fn int backup_top_co_preadv( + BlockDriverState *bs, uint64_t offset, uint64_t bytes, + QEMUIOVector *qiov, int flags) +{ + /* Features to be implemented: + * F1. COR. save read data to fleecing target for fast access + * (to reduce reads). This possibly may be done with use of copy-o= n-read + * filter, but we need an ability to make COR requests optional: f= or + * example, if target is a ram-cache, and if it is full now, we sh= ould + * skip doing COR request, as it is actually not necessary. + * + * F2. Feature for guest: read from fleecing target if data is in ram-= cache + * and is unchanged + */ + + return bdrv_co_preadv(bs->backing, offset, bytes, qiov, flags); +} + +static coroutine_fn int backup_top_cbw(BlockDriverState *bs, uint64_t offs= et, + uint64_t bytes) +{ + int ret =3D 0; + BDRVBackupTopState *s =3D bs->opaque; + uint64_t gran =3D 1UL << hbitmap_granularity(s->copy_bitmap); + uint64_t end =3D QEMU_ALIGN_UP(offset + bytes, gran); + uint64_t off =3D QEMU_ALIGN_DOWN(offset, gran), len; + size_t align =3D MAX(bdrv_opt_mem_align(bs->backing->bs), + bdrv_opt_mem_align(s->target->bs)); + struct iovec iov =3D { + .iov_base =3D qemu_memalign(align, end - off), + .iov_len =3D end - off + }; + QEMUIOVector qiov; + + qemu_iovec_init_external(&qiov, &iov, 1); + + /* Features to be implemented: + * F3. parallelize copying loop + * F4. detect zeros + * F5. use block_status ? + * F6. don't copy clusters which are already cached by COR [see F1] + * F7. if target is ram-cache and it is full, there should be a possib= ility + * to drop not necessary data (cached by COR [see F1]) to handle C= BW + * fast. + */ + + len =3D end - off; + while (hbitmap_next_dirty_area(s->copy_bitmap, &off, &len)) { + iov.iov_len =3D qiov.size =3D len; + + hbitmap_reset(s->copy_bitmap, off, len); + + ret =3D bdrv_co_preadv(bs->backing, off, len, &qiov, + BDRV_REQ_NO_SERIALISING); + if (ret < 0) { + hbitmap_set(s->copy_bitmap, off, len); + goto finish; + } + + ret =3D bdrv_co_pwritev(s->target, off, len, &qiov, BDRV_REQ_SERIA= LISING); + if (ret < 0) { + hbitmap_set(s->copy_bitmap, off, len); + goto finish; + } + + s->bytes_copied +=3D len; + off +=3D len; + if (off >=3D end) { + break; + } + len =3D end - off; + } + +finish: + qemu_vfree(iov.iov_base); + + /* F8. we fail guest request in case of error. We can alter it by + * possibility to fail copying process instead, or retry several times= , or + * may be guest pause, etc. + */ + return ret; +} + +static int coroutine_fn backup_top_co_pdiscard(BlockDriverState *bs, + int64_t offset, int byte= s) +{ + int ret =3D backup_top_cbw(bs, offset, bytes); + if (ret < 0) { + return ret; + } + + /* Features to be implemented: + * F9. possibility of lazy discard: just defer the discard after fleec= ing + * completion. If write (or new discard) occurs to the same area, = just + * drop deferred discard. + */ + + return bdrv_co_pdiscard(bs->backing, offset, bytes); +} + +static int coroutine_fn backup_top_co_pwrite_zeroes(BlockDriverState *bs, + int64_t offset, int bytes, BdrvRequestFlags flags) +{ + int ret =3D backup_top_cbw(bs, offset, bytes); + if (ret < 0) { + return ret; + } + + return bdrv_co_pwrite_zeroes(bs->backing, offset, bytes, flags); +} + +static coroutine_fn int backup_top_co_pwritev(BlockDriverState *bs, + uint64_t offset, + uint64_t bytes, + QEMUIOVector *qiov, int f= lags) +{ + int ret =3D backup_top_cbw(bs, offset, bytes); + if (ret < 0) { + return ret; + } + + return bdrv_co_pwritev(bs->backing, offset, bytes, qiov, flags); +} + +static int coroutine_fn backup_top_co_flush(BlockDriverState *bs) +{ + if (!bs->backing) { + return 0; + } + + return bdrv_co_flush(bs->backing->bs); +} + +static void backup_top_refresh_filename(BlockDriverState *bs, QDict *opts) +{ + if (bs->backing =3D=3D NULL) { + /* we can be here after failed bdrv_attach_child in + * bdrv_set_backing_hd */ + return; + } + bdrv_refresh_filename(bs->backing->bs); + pstrcpy(bs->exact_filename, sizeof(bs->exact_filename), + bs->backing->bs->filename); +} + +static void backup_top_child_perm(BlockDriverState *bs, BdrvChild *c, + const BdrvChildRole *role, + BlockReopenQueue *reopen_queue, + uint64_t perm, uint64_t shared, + uint64_t *nperm, uint64_t *nshared) +{ + bdrv_filter_default_perms(bs, c, role, reopen_queue, perm, shared, npe= rm, + nshared); + + if (role =3D=3D &child_file) { + /* share write to target, to not interfere guest writes to it's di= sk + * which will be in target backing chain */ + *nshared =3D *nshared | BLK_PERM_WRITE; + *nperm =3D *nperm | BLK_PERM_WRITE; + } else { + *nperm =3D *nperm | BLK_PERM_CONSISTENT_READ; + } +} + +BlockDriver bdrv_backup_top_filter =3D { + .format_name =3D "backup-top", + .instance_size =3D sizeof(BDRVBackupTopState), + + .bdrv_co_preadv =3D backup_top_co_preadv, + .bdrv_co_pwritev =3D backup_top_co_pwritev, + .bdrv_co_pwrite_zeroes =3D backup_top_co_pwrite_zeroes, + .bdrv_co_pdiscard =3D backup_top_co_pdiscard, + .bdrv_co_flush =3D backup_top_co_flush, + + .bdrv_co_block_status =3D bdrv_co_block_status_from_backing, + + .bdrv_refresh_filename =3D backup_top_refresh_filename, + + .bdrv_child_perm =3D backup_top_child_perm, + + .is_filter =3D true, +}; + +BlockDriverState *bdrv_backup_top_append(BlockDriverState *source, + BlockDriverState *target, + HBitmap *copy_bitmap, + Error **errp) +{ + Error *local_err =3D NULL; + BDRVBackupTopState *state; + BlockDriverState *top =3D bdrv_new_open_driver(&bdrv_backup_top_filter, + NULL, BDRV_O_RDWR, errp); + + if (!top) { + return NULL; + } + + top->implicit =3D true; + top->total_sectors =3D source->total_sectors; + top->opaque =3D state =3D g_new0(BDRVBackupTopState, 1); + state->copy_bitmap =3D copy_bitmap; + + bdrv_ref(target); + state->target =3D bdrv_attach_child(top, target, "target", &child_file= , errp); + if (!state->target) { + bdrv_unref(target); + bdrv_unref(top); + return NULL; + } + + bdrv_set_aio_context(top, bdrv_get_aio_context(source)); + bdrv_set_aio_context(target, bdrv_get_aio_context(source)); + + bdrv_drained_begin(source); + + bdrv_ref(top); + bdrv_append(top, source, &local_err); + + if (local_err) { + bdrv_unref(top); + } + + bdrv_drained_end(source); + + if (local_err) { + bdrv_unref_child(top, state->target); + bdrv_unref(top); + error_propagate(errp, local_err); + return NULL; + } + + return top; +} + +void bdrv_backup_top_drop(BlockDriverState *bs) +{ + BDRVBackupTopState *s =3D bs->opaque; + + AioContext *aio_context =3D bdrv_get_aio_context(bs); + + aio_context_acquire(aio_context); + + bdrv_drained_begin(bs); + + bdrv_child_try_set_perm(bs->backing, 0, BLK_PERM_ALL, &error_abort); + bdrv_replace_node(bs, backing_bs(bs), &error_abort); + bdrv_set_backing_hd(bs, NULL, &error_abort); + + bdrv_drained_end(bs); + + if (s->target) { + bdrv_unref_child(bs, s->target); + } + bdrv_unref(bs); + + aio_context_release(aio_context); +} + +uint64_t bdrv_backup_top_progress(BlockDriverState *bs) +{ + BDRVBackupTopState *s =3D bs->opaque; + + return s->bytes_copied; +} diff --git a/block/Makefile.objs b/block/Makefile.objs index c8337bf186..7f71263be0 100644 --- a/block/Makefile.objs +++ b/block/Makefile.objs @@ -31,6 +31,8 @@ block-obj-y +=3D throttle.o copy-on-read.o =20 block-obj-y +=3D crypto.o =20 +block-obj-y +=3D backup-top.o + common-obj-y +=3D stream.o =20 nfs.o-libs :=3D $(LIBNFS_LIBS) --=20 2.18.0