From nobody Sun May 5 16:33:48 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1502778175394295.3804004863355; Mon, 14 Aug 2017 23:22:55 -0700 (PDT) Received: from localhost ([::1]:41738 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dhVFe-00014z-7f for importer@patchew.org; Tue, 15 Aug 2017 02:22:54 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49079) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dhVEG-00006V-GU for qemu-devel@nongnu.org; Tue, 15 Aug 2017 02:21:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dhVEE-0003oB-AO for qemu-devel@nongnu.org; Tue, 15 Aug 2017 02:21:28 -0400 Received: from smtp1.ntua.gr ([2001:648:2000:de::183]:48377) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dhVE5-0003b8-45; Tue, 15 Aug 2017 02:21:17 -0400 Received: from mail.ntua.gr (internet4-5-144-200-220.pat.nym.cosmote.net [5.144.200.220]) (authenticated bits=0) by smtp1.ntua.gr (8.15.2/8.15.2) with ESMTPSA id v7F6KFx0005018 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 15 Aug 2017 09:20:36 +0300 (EEST) (envelope-from el13635@mail.ntua.gr) X-Authentication-Warning: smtp1.ntua.gr: Host internet4-5-144-200-220.pat.nym.cosmote.net [5.144.200.220] claimed to be mail.ntua.gr From: Manos Pitsidianakis To: qemu-devel Date: Tue, 15 Aug 2017 09:19:20 +0300 Message-Id: <20170815061921.31596-2-el13635@mail.ntua.gr> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20170815061921.31596-1-el13635@mail.ntua.gr> References: <20170815061921.31596-1-el13635@mail.ntua.gr> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2001:648:2000:de::183 Subject: [Qemu-devel] [PATCH 1/2] block: use internal filter node in backup X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Alberto Garcia , Stefan Hajnoczi , qemu-block Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" block/backup.c currently uses before write notifiers on the targeted node. We can create a filter node instead to intercept write requests for the backup job on the BDS level, instead of the BlockBackend level. Signed-off-by: Manos Pitsidianakis --- block.c | 89 +++++++++++++++++-- block/backup.c | 207 ++++++++++++++++++++++++++++++++++++++++-= ---- block/io.c | 10 +-- block/mirror.c | 4 +- blockdev.c | 2 +- include/block/block.h | 8 +- tests/qemu-iotests/141.out | 2 +- 7 files changed, 277 insertions(+), 45 deletions(-) diff --git a/block.c b/block.c index 2de1c29eb3..81bd51b670 100644 --- a/block.c +++ b/block.c @@ -2088,6 +2088,38 @@ static void bdrv_parent_cb_resize(BlockDriverState *= bs) } =20 /* + * Sets the file link of a BDS. A new reference is created; callers + * which don't need their own reference any more must call bdrv_unref(). + */ +void bdrv_set_file(BlockDriverState *bs, BlockDriverState *file_bs, + Error **errp) +{ + if (file_bs) { + bdrv_ref(file_bs); + } + + if (bs->file) { + bdrv_unref_child(bs, bs->file); + } + + if (!file_bs) { + bs->file =3D NULL; + goto out; + } + + bs->file =3D bdrv_attach_child(bs, file_bs, "file", &child_file, + errp); + if (!bs->file) { + bdrv_unref(file_bs); + } + + bdrv_refresh_filename(bs); + +out: + bdrv_refresh_limits(bs, NULL); +} + +/* * Sets the backing file link of a BDS. A new reference is created; callers * which don't need their own reference any more must call bdrv_unref(). */ @@ -2355,12 +2387,12 @@ static BlockDriverState *bdrv_append_temp_snapshot(= BlockDriverState *bs, goto out; } =20 - /* bdrv_append() consumes a strong reference to bs_snapshot + /* bdrv_append_backing() consumes a strong reference to bs_snapshot * (i.e. it will call bdrv_unref() on it) even on error, so in * order to be able to return one, we have to increase * bs_snapshot's refcount here */ bdrv_ref(bs_snapshot); - bdrv_append(bs_snapshot, bs, &local_err); + bdrv_append_backing(bs_snapshot, bs, &local_err); if (local_err) { error_propagate(errp, local_err); bs_snapshot =3D NULL; @@ -3142,7 +3174,7 @@ static bool should_update_child(BdrvChild *c, BlockDr= iverState *to) return false; } =20 - if (c->role =3D=3D &child_backing) { + if (c->role =3D=3D &child_backing || c->role =3D=3D &child_file) { /* If @from is a backing file of @to, ignore the child to avoid * creating a loop. We only want to change the pointer of other * parents. */ @@ -3213,6 +3245,45 @@ out: } =20 /* + * Add new bs node at the top of a BDS chain while the chain is + * live, while keeping required fields on the top layer. + * + * This will modify the BlockDriverState fields, and swap contents + * between bs_new and bs_top. Both bs_new and bs_top are modified. + * + * bs_new must not be attached to a BlockBackend. + * + * bdrv_append_file() takes ownership of a bs_new reference and unrefs it + * because that's what the callers commonly need. bs_new will be reference= d by + * the old parents of bs_top after bdrv_append_file() returns. If the call= er + * needs to keep a reference of its own, it must call bdrv_ref(). + */ +void bdrv_append_file(BlockDriverState *bs_new, BlockDriverState *bs_top, + Error **errp) +{ + Error *local_err =3D NULL; + + bdrv_ref(bs_top); + bdrv_set_file(bs_new, bs_top, &local_err); + if (local_err) { + error_propagate(errp, local_err); + bdrv_set_file(bs_new, NULL, &error_abort); + goto out; + } + bdrv_replace_node(bs_top, bs_new, &local_err); + if (local_err) { + error_propagate(errp, local_err); + goto out; + } + + /* bs_new is now referenced by its new parents, we don't need the + * additional reference any more. */ +out: + bdrv_unref(bs_top); + bdrv_unref(bs_new); +} + +/* * Add new bs contents at the top of an image chain while the chain is * live, while keeping required fields on the top layer. * @@ -3223,13 +3294,13 @@ out: * * This function does not create any image files. * - * bdrv_append() takes ownership of a bs_new reference and unrefs it becau= se - * that's what the callers commonly need. bs_new will be referenced by the= old - * parents of bs_top after bdrv_append() returns. If the caller needs to k= eep a - * reference of its own, it must call bdrv_ref(). + * bdrv_append_backing() takes ownership of a bs_new reference and unrefs = it + * because that's what the callers commonly need. bs_new will be reference= d by + * the old parents of bs_top after bdrv_append_backing() returns. If the c= aller + * needs to keep a reference of its own, it must call bdrv_ref(). */ -void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top, - Error **errp) +void bdrv_append_backing(BlockDriverState *bs_new, BlockDriverState *bs_to= p, + Error **errp) { Error *local_err =3D NULL; =20 diff --git a/block/backup.c b/block/backup.c index 504a089150..0828d522b6 100644 --- a/block/backup.c +++ b/block/backup.c @@ -43,7 +43,7 @@ typedef struct BackupBlockJob { unsigned long *done_bitmap; int64_t cluster_size; bool compress; - NotifierWithReturn before_write; + BlockDriverState *filter; QLIST_HEAD(, CowRequest) inflight_reqs; } BackupBlockJob; =20 @@ -174,20 +174,6 @@ out: return ret; } =20 -static int coroutine_fn backup_before_write_notify( - NotifierWithReturn *notifier, - void *opaque) -{ - BackupBlockJob *job =3D container_of(notifier, BackupBlockJob, before_= write); - BdrvTrackedRequest *req =3D opaque; - - assert(req->bs =3D=3D blk_bs(job->common.blk)); - assert(QEMU_IS_ALIGNED(req->offset, BDRV_SECTOR_SIZE)); - assert(QEMU_IS_ALIGNED(req->bytes, BDRV_SECTOR_SIZE)); - - return backup_do_cow(job, req->offset, req->bytes, NULL, true); -} - static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp) { BackupBlockJob *s =3D container_of(job, BackupBlockJob, common); @@ -202,7 +188,8 @@ static void backup_set_speed(BlockJob *job, int64_t spe= ed, Error **errp) static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret) { BdrvDirtyBitmap *bm; - BlockDriverState *bs =3D blk_bs(job->common.blk); + BlockDriverState *bs =3D child_bs(blk_bs(job->common.blk)); + assert(bs); =20 if (ret < 0 || block_job_is_cancelled(&job->common)) { /* Merge the successor back into the parent, delete nothing. */ @@ -234,9 +221,31 @@ static void backup_abort(BlockJob *job) static void backup_clean(BlockJob *job) { BackupBlockJob *s =3D container_of(job, BackupBlockJob, common); + BlockDriverState *bs =3D child_bs(s->filter), + *filter =3D s->filter; + assert(s->target); blk_unref(s->target); s->target =3D NULL; + + /* make sure nothing goes away while removing filter */ + bdrv_ref(filter); + bdrv_ref(bs); + bdrv_drained_begin(bs); + + block_job_remove_all_bdrv(job); + bdrv_child_try_set_perm(filter->file, 0, BLK_PERM_ALL, + &error_abort); + bdrv_replace_node(filter, bs, &error_abort); + + blk_remove_bs(job->blk); + blk_set_perm(job->blk, 0, BLK_PERM_ALL, &error_abort); + blk_insert_bs(job->blk, filter, &error_abort); + + bdrv_drained_end(bs); + bdrv_unref(filter); + bdrv_unref(bs); + s->filter =3D NULL; } =20 static void backup_attached_aio_context(BlockJob *job, AioContext *aio_con= text) @@ -421,6 +430,18 @@ out: return ret; } =20 +static void backup_top_enable(BackupBlockJob *job) +{ + BlockDriverState *bs =3D job->filter; + bs->opaque =3D job; +} + +static void backup_top_disable(BackupBlockJob *job) +{ + BlockDriverState *bs =3D job->filter; + bs->opaque =3D NULL; +} + static void coroutine_fn backup_run(void *opaque) { BackupBlockJob *job =3D opaque; @@ -435,13 +456,12 @@ static void coroutine_fn backup_run(void *opaque) job->done_bitmap =3D bitmap_new(DIV_ROUND_UP(job->common.len, job->cluster_size)); =20 - job->before_write.notify =3D backup_before_write_notify; - bdrv_add_before_write_notifier(bs, &job->before_write); + backup_top_enable(job); =20 if (job->sync_mode =3D=3D MIRROR_SYNC_MODE_NONE) { while (!block_job_is_cancelled(&job->common)) { - /* Yield until the job is cancelled. We just let our before_w= rite - * notify callback service CoW requests. */ + /* Yield until the job is cancelled. We just let our backup_t= op + * filter service CoW requests. */ block_job_yield(&job->common); } } else if (job->sync_mode =3D=3D MIRROR_SYNC_MODE_INCREMENTAL) { @@ -507,8 +527,7 @@ static void coroutine_fn backup_run(void *opaque) } } } - - notifier_with_return_remove(&job->before_write); + backup_top_disable(job); =20 /* wait until pending backup_do_cow() calls have completed */ qemu_co_rwlock_wrlock(&job->flush_rwlock); @@ -532,6 +551,124 @@ static const BlockJobDriver backup_job_driver =3D { .drain =3D backup_drain, }; =20 +static int coroutine_fn backup_top_co_preadv(BlockDriverState *bs, + uint64_t offset, + uint64_t bytes, + QEMUIOVector *qiov, int flags) +{ + return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags); +} + +static int coroutine_fn backup_top_co_pwritev(BlockDriverState *bs, + uint64_t offset, + uint64_t bytes, + QEMUIOVector *qiov, int flag= s) +{ + int ret =3D 0; + BackupBlockJob *job =3D bs->opaque; + if (job) { + assert(bs =3D=3D blk_bs(job->common.blk)); + assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE)); + assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE)); + ret =3D backup_do_cow(job, offset, bytes, NULL, true); + } + + return ret < 0 ? ret : bdrv_co_pwritev(bs->file, offset, bytes, + qiov, flags); +} + +static int coroutine_fn backup_top_co_pwrite_zeroes(BlockDriverState *bs, + int64_t offset, + int bytes, + BdrvRequestFlags flags) +{ + int ret =3D 0; + BackupBlockJob *job =3D bs->opaque; + if (job) { + assert(bs =3D=3D blk_bs(job->common.blk)); + assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE)); + assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE)); + ret =3D backup_do_cow(job, offset, bytes, NULL, true); + } + + return ret < 0 ? ret : bdrv_co_pwrite_zeroes(bs->file, offset, bytes, + flags); +} + +static int coroutine_fn backup_top_co_pdiscard(BlockDriverState *bs, + int64_t offset, int bytes) +{ + int ret =3D 0; + BackupBlockJob *job =3D bs->opaque; + if (job) { + assert(bs =3D=3D blk_bs(job->common.blk)); + assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE)); + assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE)); + ret =3D backup_do_cow(job, offset, bytes, NULL, true); + } + + return ret < 0 ? ret : bdrv_co_pdiscard(bs->file->bs, offset, bytes); +} + +static int backup_top_co_flush(BlockDriverState *bs) +{ + return bdrv_co_flush(bs->file->bs); +} + +static int backup_top_open(BlockDriverState *bs, QDict *options, + int flags, Error **errp) +{ + return 0; +} + +static void backup_top_close(BlockDriverState *bs) +{ +} + +static int64_t backup_top_getlength(BlockDriverState *bs) +{ + return bs->file ? bdrv_getlength(bs->file->bs) : 0; +} + +static bool backup_recurse_is_first_non_filter(BlockDriverState *bs, + BlockDriverState *candidate) +{ + return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate); +} + +static int64_t coroutine_fn backup_co_get_block_status(BlockDriverState *b= s, + int64_t sector_num, + int nb_sectors, + int *pnum, + BlockDriverState **= file) +{ + assert(bs->file && bs->file->bs); + *pnum =3D nb_sectors; + *file =3D bs->file->bs; + return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID | + (sector_num << BDRV_SECTOR_BITS); +} +static BlockDriver backup_top =3D { + .format_name =3D "backup-top", + .instance_size =3D sizeof(BackupBlockJob *), + + .bdrv_open =3D backup_top_open, + .bdrv_close =3D backup_top_close, + + .bdrv_co_flush =3D backup_top_co_flush, + .bdrv_co_preadv =3D backup_top_co_preadv, + .bdrv_co_pwritev =3D backup_top_co_pwritev, + .bdrv_co_pwrite_zeroes =3D backup_top_co_pwrite_zeroes, + .bdrv_co_pdiscard =3D backup_top_co_pdiscard, + + .bdrv_getlength =3D backup_top_getlength, + .bdrv_child_perm =3D bdrv_filter_default_perms, + .bdrv_recurse_is_first_non_filter =3D backup_recurse_is_first_non_= filter, + .bdrv_co_get_block_status =3D backup_co_get_block_status, + + .is_filter =3D true, +}; + BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs, BlockDriverState *target, int64_t speed, MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap, @@ -545,6 +682,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDr= iverState *bs, int64_t len; BlockDriverInfo bdi; BackupBlockJob *job =3D NULL; + BlockDriverState *filter =3D NULL; int ret; =20 assert(bs); @@ -606,17 +744,33 @@ BlockJob *backup_job_create(const char *job_id, Block= DriverState *bs, bdrv_get_device_name(bs)); goto error; } + /* Setup before write filter */ + filter =3D + bdrv_new_open_driver(&backup_top, + NULL, bdrv_get_flags(bs), NULL, &error_abort); + filter->implicit =3D true; + filter->total_sectors =3D bs->total_sectors; + bdrv_set_aio_context(filter, bdrv_get_aio_context(bs)); + + /* Insert before write notifier in the BDS chain */ + bdrv_ref(filter); + bdrv_drained_begin(bs); + bdrv_append_file(filter, bs, &error_abort); + bdrv_drained_end(bs); =20 /* job->common.len is fixed, so we can't allow resize */ - job =3D block_job_create(job_id, &backup_job_driver, bs, + job =3D block_job_create(job_id, &backup_job_driver, filter, BLK_PERM_CONSISTENT_READ, BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD, speed, creation_flags, cb, opaque, errp); + bdrv_unref(filter); if (!job) { goto error; } =20 + job->filter =3D filter; + /* The target must match the source in size, so no resize here either = */ job->target =3D blk_new(BLK_PERM_WRITE, BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE | @@ -675,6 +829,13 @@ BlockJob *backup_job_create(const char *job_id, BlockD= riverState *bs, if (job) { backup_clean(&job->common); block_job_early_fail(&job->common); + } else { + /* don't leak filter if job creation failed */ + if (filter) { + bdrv_child_try_set_perm(filter->file, 0, BLK_PERM_ALL, + &error_abort); + bdrv_replace_node(filter, bs, &error_abort); + } } =20 return NULL; diff --git a/block/io.c b/block/io.c index 26003814eb..56aa67105a 100644 --- a/block/io.c +++ b/block/io.c @@ -1332,7 +1332,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChil= d *child, BlockDriverState *bs =3D child->bs; BlockDriver *drv =3D bs->drv; bool waited; - int ret; + int ret =3D 0; =20 int64_t start_sector =3D offset >> BDRV_SECTOR_BITS; int64_t end_sector =3D DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE); @@ -1359,9 +1359,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChil= d *child, assert(child->perm & BLK_PERM_WRITE); assert(end_sector <=3D bs->total_sectors || child->perm & BLK_PERM_RES= IZE); =20 - ret =3D notifier_with_return_list_notify(&bs->before_write_notifiers, = req); - - if (!ret && bs->detect_zeroes !=3D BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF = && + if (bs->detect_zeroes !=3D BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF && !(flags & BDRV_REQ_ZERO_WRITE) && drv->bdrv_co_pwrite_zeroes && qemu_iovec_is_zero(qiov)) { flags |=3D BDRV_REQ_ZERO_WRITE; @@ -1370,9 +1368,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChil= d *child, } } =20 - if (ret < 0) { - /* Do nothing, write notifier decided to fail this request */ - } else if (flags & BDRV_REQ_ZERO_WRITE) { + if (flags & BDRV_REQ_ZERO_WRITE) { bdrv_debug_event(bs, BLKDBG_PWRITEV_ZERO); ret =3D bdrv_co_do_pwrite_zeroes(bs, offset, bytes, flags); } else if (flags & BDRV_REQ_WRITE_COMPRESSED) { diff --git a/block/mirror.c b/block/mirror.c index e1a160e6ea..af7771e93a 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -1174,11 +1174,11 @@ static void mirror_start_job(const char *job_id, Bl= ockDriverState *bs, mirror_top_bs->total_sectors =3D bs->total_sectors; bdrv_set_aio_context(mirror_top_bs, bdrv_get_aio_context(bs)); =20 - /* bdrv_append takes ownership of the mirror_top_bs reference, need to= keep + /* bdrv_append_backing() takes ownership of the mirror_top_bs referenc= e, need to keep * it alive until block_job_create() succeeds even if bs has no parent= . */ bdrv_ref(mirror_top_bs); bdrv_drained_begin(bs); - bdrv_append(mirror_top_bs, bs, &local_err); + bdrv_append_backing(mirror_top_bs, bs, &local_err); bdrv_drained_end(bs); =20 if (local_err) { diff --git a/blockdev.c b/blockdev.c index 5c11c245b0..8e2fc6e64c 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1788,7 +1788,7 @@ static void external_snapshot_prepare(BlkActionState = *common, * can fail, so we need to do it in .prepare; undoing it for abort is * always possible. */ bdrv_ref(state->new_bs); - bdrv_append(state->new_bs, state->old_bs, &local_err); + bdrv_append_backing(state->new_bs, state->old_bs, &local_err); if (local_err) { error_propagate(errp, local_err); return; diff --git a/include/block/block.h b/include/block/block.h index d1f03cb48b..744b50e734 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -244,8 +244,10 @@ int bdrv_create(BlockDriver *drv, const char* filename, QemuOpts *opts, Error **errp); int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp); BlockDriverState *bdrv_new(void); -void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top, - Error **errp); +void bdrv_append_file(BlockDriverState *bs_new, BlockDriverState *bs_top, + Error **errp); +void bdrv_append_backing(BlockDriverState *bs_new, BlockDriverState *bs_to= p, + Error **errp); void bdrv_replace_node(BlockDriverState *from, BlockDriverState *to, Error **errp); =20 @@ -256,6 +258,8 @@ BdrvChild *bdrv_open_child(const char *filename, BlockDriverState* parent, const BdrvChildRole *child_role, bool allow_none, Error **errp); +void bdrv_set_file(BlockDriverState *bs, BlockDriverState *file_bs, + Error **errp); void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_h= d, Error **errp); int bdrv_open_backing_file(BlockDriverState *bs, QDict *parent_options, diff --git a/tests/qemu-iotests/141.out b/tests/qemu-iotests/141.out index 82e763b68d..cc653c317a 100644 --- a/tests/qemu-iotests/141.out +++ b/tests/qemu-iotests/141.out @@ -9,7 +9,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=3DIMGFMT size=3D1048576= backing_file=3DTEST_DIR/m. {"return": {}} Formatting 'TEST_DIR/o.IMGFMT', fmt=3DIMGFMT size=3D1048576 backing_file= =3DTEST_DIR/t.IMGFMT backing_fmt=3DIMGFMT {"return": {}} -{"error": {"class": "GenericError", "desc": "Node drv0 is in use"}} +{"error": {"class": "GenericError", "desc": "Block device drv0 is in use"}} {"return": {}} {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event"= : "BLOCK_JOB_CANCELLED", "data": {"device": "job0", "len": 1048576, "offset= ": 0, "speed": 0, "type": "backup"}} {"return": {}} --=20 2.11.0 From nobody Sun May 5 16:33:48 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1502778275841598.230123603339; Mon, 14 Aug 2017 23:24:35 -0700 (PDT) Received: from localhost ([::1]:41776 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dhVHG-0002FD-Fr for importer@patchew.org; Tue, 15 Aug 2017 02:24:34 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49147) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dhVEg-0000V4-4Z for qemu-devel@nongnu.org; Tue, 15 Aug 2017 02:21:56 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dhVEd-0004DX-O5 for qemu-devel@nongnu.org; Tue, 15 Aug 2017 02:21:54 -0400 Received: from smtp1.ntua.gr ([2001:648:2000:de::183]:48386) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dhVEV-00047t-Ia; Tue, 15 Aug 2017 02:21:44 -0400 Received: from mail.ntua.gr (internet4-5-144-200-220.pat.nym.cosmote.net [5.144.200.220]) (authenticated bits=0) by smtp1.ntua.gr (8.15.2/8.15.2) with ESMTPSA id v7F6KjJB005255 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 15 Aug 2017 09:21:00 +0300 (EEST) (envelope-from el13635@mail.ntua.gr) X-Authentication-Warning: smtp1.ntua.gr: Host internet4-5-144-200-220.pat.nym.cosmote.net [5.144.200.220] claimed to be mail.ntua.gr From: Manos Pitsidianakis To: qemu-devel Date: Tue, 15 Aug 2017 09:19:21 +0300 Message-Id: <20170815061921.31596-3-el13635@mail.ntua.gr> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20170815061921.31596-1-el13635@mail.ntua.gr> References: <20170815061921.31596-1-el13635@mail.ntua.gr> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2001:648:2000:de::183 Subject: [Qemu-devel] [PATCH 2/2] block: add filter driver to block/write-threshold.c X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Alberto Garcia , Stefan Hajnoczi , qemu-block Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" With runtime insertion and removal of filters, write-threshold.c can provide more flexible deliveries of BLOCK_WRITE_THRESHOLD events. After the event trigger, the filter nodes are no longer useful and must be removed. The existing write-threshold cannot be easily converted to using the filter driver, so it is not affected. Signed-off-by: Manos Pitsidianakis --- block/qapi.c | 2 +- block/write-threshold.c | 264 +++++++++++++++++++++++++++++++++++-= ---- include/block/write-threshold.h | 22 ++-- qapi/block-core.json | 19 ++- tests/test-write-threshold.c | 40 +++--- 5 files changed, 281 insertions(+), 66 deletions(-) diff --git a/block/qapi.c b/block/qapi.c index 2be44a6758..fe6cf2eae5 100644 --- a/block/qapi.c +++ b/block/qapi.c @@ -122,7 +122,7 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *b= lk, info->group =3D g_strdup(throttle_group_get_name(tgm)); } =20 - info->write_threshold =3D bdrv_write_threshold_get(bs); + info->write_threshold =3D bdrv_write_threshold_get_legacy(bs); =20 bs0 =3D bs; p_image_info =3D &info->image; diff --git a/block/write-threshold.c b/block/write-threshold.c index 0bd1a01c86..4a67188ea3 100644 --- a/block/write-threshold.c +++ b/block/write-threshold.c @@ -2,9 +2,11 @@ * QEMU System Emulator block write threshold notification * * Copyright Red Hat, Inc. 2014 + * Copyright 2017 Manos Pitsidianakis * * Authors: * Francesco Romani + * Manos Pitsidianakis * * This work is licensed under the terms of the GNU LGPL, version 2 or lat= er. * See the COPYING.LIB file in the top-level directory. @@ -19,46 +21,35 @@ #include "qmp-commands.h" =20 =20 -uint64_t bdrv_write_threshold_get(const BlockDriverState *bs) +uint64_t bdrv_write_threshold_get_legacy(const BlockDriverState *bs) { return bs->write_threshold_offset; } =20 -bool bdrv_write_threshold_is_set(const BlockDriverState *bs) +bool bdrv_write_threshold_is_set_legacy(const BlockDriverState *bs) { return bs->write_threshold_offset > 0; } =20 -static void write_threshold_disable(BlockDriverState *bs) +static void write_threshold_disable_legacy(BlockDriverState *bs) { - if (bdrv_write_threshold_is_set(bs)) { + if (bdrv_write_threshold_is_set_legacy(bs)) { notifier_with_return_remove(&bs->write_threshold_notifier); bs->write_threshold_offset =3D 0; } } =20 -uint64_t bdrv_write_threshold_exceeded(const BlockDriverState *bs, - const BdrvTrackedRequest *req) -{ - if (bdrv_write_threshold_is_set(bs)) { - if (req->offset > bs->write_threshold_offset) { - return (req->offset - bs->write_threshold_offset) + req->bytes; - } - if ((req->offset + req->bytes) > bs->write_threshold_offset) { - return (req->offset + req->bytes) - bs->write_threshold_offset; - } - } - return 0; -} - static int coroutine_fn before_write_notify(NotifierWithReturn *notifier, void *opaque) { BdrvTrackedRequest *req =3D opaque; BlockDriverState *bs =3D req->bs; uint64_t amount =3D 0; + uint64_t threshold =3D bdrv_write_threshold_get_legacy(bs); + uint64_t offset =3D req->offset; + uint64_t bytes =3D req->bytes; =20 - amount =3D bdrv_write_threshold_exceeded(bs, req); + amount =3D bdrv_write_threshold_exceeded(threshold, offset, bytes); if (amount > 0) { qapi_event_send_block_write_threshold( bs->node_name, @@ -67,7 +58,7 @@ static int coroutine_fn before_write_notify(NotifierWithR= eturn *notifier, &error_abort); =20 /* autodisable to avoid flooding the monitor */ - write_threshold_disable(bs); + write_threshold_disable_legacy(bs); } =20 return 0; /* should always let other notifiers run */ @@ -79,25 +70,26 @@ static void write_threshold_register_notifier(BlockDriv= erState *bs) bdrv_add_before_write_notifier(bs, &bs->write_threshold_notifier); } =20 -static void write_threshold_update(BlockDriverState *bs, - int64_t threshold_bytes) +static void write_threshold_update_legacy(BlockDriverState *bs, + int64_t threshold_bytes) { bs->write_threshold_offset =3D threshold_bytes; } =20 -void bdrv_write_threshold_set(BlockDriverState *bs, uint64_t threshold_byt= es) +void bdrv_write_threshold_set_legacy(BlockDriverState *bs, + uint64_t threshold_bytes) { - if (bdrv_write_threshold_is_set(bs)) { + if (bdrv_write_threshold_is_set_legacy(bs)) { if (threshold_bytes > 0) { - write_threshold_update(bs, threshold_bytes); + write_threshold_update_legacy(bs, threshold_bytes); } else { - write_threshold_disable(bs); + write_threshold_disable_legacy(bs); } } else { if (threshold_bytes > 0) { /* avoid multiple registration */ write_threshold_register_notifier(bs); - write_threshold_update(bs, threshold_bytes); + write_threshold_update_legacy(bs, threshold_bytes); } /* discard bogus disable request */ } @@ -119,7 +111,223 @@ void qmp_block_set_write_threshold(const char *node_n= ame, aio_context =3D bdrv_get_aio_context(bs); aio_context_acquire(aio_context); =20 - bdrv_write_threshold_set(bs, threshold_bytes); + bdrv_write_threshold_set_legacy(bs, threshold_bytes); =20 aio_context_release(aio_context); } + + +/* The write-threshold filter drivers delivers a one-time BLOCK_WRITE_THRE= SHOLD + * event when a passing write request exceeds the configured write thresho= ld + * offset of the filter. + * + * This is useful to transparently resize thin-provisioned drives without + * the guest OS noticing. + */ + +#define QEMU_OPT_WRITE_THRESHOLD "write-threshold" +static BlockDriver write_threshold; +static QemuOptsList write_threshold_opts =3D { + .name =3D "write-threshold", + .head =3D QTAILQ_HEAD_INITIALIZER(write_threshold_opts.head), + .desc =3D { + { + .name =3D QEMU_OPT_WRITE_THRESHOLD, + .type =3D QEMU_OPT_NUMBER, + .help =3D "configured threshold for the block device, bytes. U= se 0" + "to disable the threshold", + }, + { /* end of list */ } + }, +}; + +static bool bdrv_write_threshold_is_set(const BlockDriverState *bs) +{ + uint64_t threshold =3D *(uint64_t *)bs->opaque; + return threshold > 0; +} + +static void bdrv_write_threshold_disable(BlockDriverState *bs) +{ + uint64_t *threshold =3D (uint64_t *)bs->opaque; + if (bdrv_write_threshold_is_set(bs)) { + *threshold =3D 0; + } +} + +uint64_t bdrv_write_threshold_exceeded(uint64_t threshold, uint64_t offset, + uint64_t bytes) +{ + if (threshold) { + if (offset > threshold) { + return (offset - threshold) + bytes; + } + if ((offset + bytes) > threshold) { + return (offset + bytes) - threshold; + } + } + return 0; +} + + +static void bdrv_write_threshold_update(BlockDriverState *bs, + int64_t threshold_bytes) +{ + uint64_t *threshold =3D (uint64_t *)bs->opaque; + *threshold =3D threshold_bytes; +} + +static void bdrv_write_threshold_check_amount(BlockDriverState *bs, + uint64_t offset, + uint64_t bytes) +{ + uint64_t threshold =3D *(uint64_t *)bs->opaque; + uint64_t amount =3D 0; + + amount =3D bdrv_write_threshold_exceeded(threshold, offset, bytes); + if (amount > 0) { + qapi_event_send_block_write_threshold(child_bs(bs)->node_name, + amount, + threshold, + &error_abort); + /* autodisable to avoid flooding the monitor */ + bdrv_write_threshold_disable(bs); + } +} + +/* Filter driver methods */ + +static int coroutine_fn write_threshold_co_preadv(BlockDriverState *bs, + uint64_t offset, + uint64_t bytes, + QEMUIOVector *qiov, + int flags) +{ + return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags); +} + +static int coroutine_fn write_threshold_co_pwritev(BlockDriverState *bs, + uint64_t offset, + uint64_t bytes, + QEMUIOVector *qiov, + int flags) +{ + bdrv_write_threshold_check_amount(bs, offset, bytes); + return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags); +} + +static int coroutine_fn write_threshold_co_pwrite_zeroes( + BlockDriverState *= bs, + int64_t offset, + int bytes, + BdrvRequestFlags f= lags) +{ + bdrv_write_threshold_check_amount(bs, offset, bytes); + return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags); +} + +static int coroutine_fn write_threshold_co_pdiscard(BlockDriverState *bs, + int64_t offset, int by= tes) +{ + bdrv_write_threshold_check_amount(bs, offset, bytes); + return bdrv_co_pdiscard(bs->file->bs, offset, bytes); +} + + +static int64_t write_threshold_getlength(BlockDriverState *bs) +{ + return bdrv_getlength(bs->file->bs); +} + +static int write_threshold_open(BlockDriverState *bs, QDict *options, + int flags, Error **errp) +{ + Error *local_err =3D NULL; + int ret =3D 0; + QemuOpts *opts =3D NULL; + uint64_t threshold =3D 0; + + bs->file =3D bdrv_open_child(NULL, options, "file", bs, &child_file, + false, errp); + if (!bs->file) { + return -EINVAL; + } + + bs->supported_write_flags =3D bs->file->bs->supported_write_flags; + bs->supported_zero_flags =3D bs->file->bs->supported_zero_flags; + + opts =3D qemu_opts_create(&write_threshold_opts, NULL, 0, &error_abort= ); + + qemu_opts_absorb_qdict(opts, options, &local_err); + if (local_err) { + error_propagate(errp, local_err); + ret =3D -EINVAL; + goto ret; + } + + threshold =3D qemu_opt_get_number(opts, QEMU_OPT_WRITE_THRESHOLD, 0); + bdrv_write_threshold_update(bs, threshold); + +ret: + qemu_opts_del(opts); + return ret; +} + +static void write_threshold_close(BlockDriverState *bs) +{ +} + +static int write_threshold_co_flush(BlockDriverState *bs) +{ + return bdrv_co_flush(bs->file->bs); +} + +static int64_t coroutine_fn write_threshold_co_get_block_status( + BlockDriverState *b= s, + int64_t sector_num, + int nb_sectors, + int *pnum, + BlockDriverState **= file) +{ + assert(child_bs(bs)); + *pnum =3D nb_sectors; + *file =3D child_bs(bs); + return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID | + (sector_num << BDRV_SECTOR_BITS); +} + +static bool write_threshold_recurse_is_first_non_filter( + BlockDriverState *bs, + BlockDriverState *candi= date) +{ + return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate); +} + +static BlockDriver write_threshold =3D { + .format_name =3D "write-threshold", + .instance_size =3D sizeof(uint64_t), + + .bdrv_open =3D write_threshold_open, + .bdrv_close =3D write_threshold_close, + + .bdrv_co_flush =3D write_threshold_co_flush, + .bdrv_co_preadv =3D write_threshold_co_preadv, + .bdrv_co_pwritev =3D write_threshold_co_pwritev, + .bdrv_co_pwrite_zeroes =3D write_threshold_co_pwrite_zeroes, + .bdrv_co_pdiscard =3D write_threshold_co_pdiscard, + + .bdrv_getlength =3D write_threshold_getlength, + .bdrv_child_perm =3D bdrv_filter_default_perms, + .bdrv_co_get_block_status =3D write_threshold_co_get_block_sta= tus, + .bdrv_recurse_is_first_non_filter =3D + write_threshold_recurse_is_first_non_fi= lter, + + .is_filter =3D true, +}; + +static void bdrv_write_threshold_init(void) +{ + bdrv_register(&write_threshold); +} + +block_init(bdrv_write_threshold_init); diff --git a/include/block/write-threshold.h b/include/block/write-threshol= d.h index 234d2193e0..5cf378564d 100644 --- a/include/block/write-threshold.h +++ b/include/block/write-threshold.h @@ -15,7 +15,7 @@ #include "qemu-common.h" =20 /* - * bdrv_write_threshold_set: + * bdrv_write_threshold_set_legacy: * * Set the write threshold for block devices, in bytes. * Notify when a write exceeds the threshold, meaning the device @@ -24,22 +24,25 @@ * * Use threshold_bytes =3D=3D 0 to disable. */ -void bdrv_write_threshold_set(BlockDriverState *bs, uint64_t threshold_byt= es); +void bdrv_write_threshold_set_legacy(BlockDriverState *bs, + uint64_t threshold_bytes); + =20 /* - * bdrv_write_threshold_get + * bdrv_write_threshold_get_legacy * * Get the configured write threshold, in bytes. * Zero means no threshold configured. + * */ -uint64_t bdrv_write_threshold_get(const BlockDriverState *bs); +uint64_t bdrv_write_threshold_get_legacy(const BlockDriverState *bs); =20 /* - * bdrv_write_threshold_is_set + * bdrv_write_threshold_is_set_legacy * * Tell if a write threshold is set for a given BDS. */ -bool bdrv_write_threshold_is_set(const BlockDriverState *bs); +bool bdrv_write_threshold_is_set_legacy(const BlockDriverState *bs); =20 /* * bdrv_write_threshold_exceeded @@ -51,11 +54,10 @@ bool bdrv_write_threshold_is_set(const BlockDriverState= *bs); * NOTE: here we assume the following holds for each request this code * deals with: * - * assert((req->offset + req->bytes) <=3D UINT64_MAX) + * assert((offset + bytes) <=3D UINT64_MAX) * * Please not there is *not* an actual C assert(). */ -uint64_t bdrv_write_threshold_exceeded(const BlockDriverState *bs, - const BdrvTrackedRequest *req); - +uint64_t bdrv_write_threshold_exceeded(uint64_t threshold, uint64_t offset, + uint64_t bytes); #endif diff --git a/qapi/block-core.json b/qapi/block-core.json index 12fd749a94..4d6ba1baef 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -2232,7 +2232,8 @@ 'host_device', 'http', 'https', 'iscsi', 'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'parallels', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd', 'replication', 'sheepdog', 'ssh', - 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat', 'vxhs' ] } + 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat', 'vxhs', + 'write-threshold'] } =20 ## # @BlockdevOptionsFile: @@ -3113,6 +3114,21 @@ 'file' : 'BlockdevRef', '*limits' : 'ThrottleLimits' } } + +## +# @BlockdevOptionsWriteThreshold: +# +# Driver specific block device options for the write-threshold driver +# +# @file: reference to or definition of the data source block d= evice +# @write-threshold: threshold in bytes +# Since: 2.11 +## +{ 'struct': 'BlockdevOptionsWriteThreshold', + 'data': { '*file' : 'BlockdevRef', + 'write-threshold' : 'int' + } } + ## # @BlockdevOptions: # @@ -3175,6 +3191,7 @@ 'sheepdog': 'BlockdevOptionsSheepdog', 'ssh': 'BlockdevOptionsSsh', 'throttle': 'BlockdevOptionsThrottle', + 'write-threshold': 'BlockdevOptionsWriteThreshold', 'vdi': 'BlockdevOptionsGenericFormat', 'vhdx': 'BlockdevOptionsGenericFormat', 'vmdk': 'BlockdevOptionsGenericCOWFormat', diff --git a/tests/test-write-threshold.c b/tests/test-write-threshold.c index 97ca12f710..1c802d2de4 100644 --- a/tests/test-write-threshold.c +++ b/tests/test-write-threshold.c @@ -17,9 +17,9 @@ static void test_threshold_not_set_on_init(void) BlockDriverState bs; memset(&bs, 0, sizeof(bs)); =20 - g_assert(!bdrv_write_threshold_is_set(&bs)); + g_assert(!bdrv_write_threshold_is_set_legacy(&bs)); =20 - res =3D bdrv_write_threshold_get(&bs); + res =3D bdrv_write_threshold_get_legacy(&bs); g_assert_cmpint(res, =3D=3D, 0); } =20 @@ -30,11 +30,11 @@ static void test_threshold_set_get(void) BlockDriverState bs; memset(&bs, 0, sizeof(bs)); =20 - bdrv_write_threshold_set(&bs, threshold); + bdrv_write_threshold_set_legacy(&bs, threshold); =20 - g_assert(bdrv_write_threshold_is_set(&bs)); + g_assert(bdrv_write_threshold_is_set_legacy(&bs)); =20 - res =3D bdrv_write_threshold_get(&bs); + res =3D bdrv_write_threshold_get_legacy(&bs); g_assert_cmpint(res, =3D=3D, threshold); } =20 @@ -46,9 +46,9 @@ static void test_threshold_multi_set_get(void) BlockDriverState bs; memset(&bs, 0, sizeof(bs)); =20 - bdrv_write_threshold_set(&bs, threshold1); - bdrv_write_threshold_set(&bs, threshold2); - res =3D bdrv_write_threshold_get(&bs); + bdrv_write_threshold_set_legacy(&bs, threshold1); + bdrv_write_threshold_set_legacy(&bs, threshold2); + res =3D bdrv_write_threshold_get_legacy(&bs); g_assert_cmpint(res, =3D=3D, threshold2); } =20 @@ -56,16 +56,10 @@ static void test_threshold_not_trigger(void) { uint64_t amount =3D 0; uint64_t threshold =3D 4 * 1024 * 1024; - BlockDriverState bs; - BdrvTrackedRequest req; + uint64_t offset =3D 1024; + uint64_t bytes =3D 1024; =20 - memset(&bs, 0, sizeof(bs)); - memset(&req, 0, sizeof(req)); - req.offset =3D 1024; - req.bytes =3D 1024; - - bdrv_write_threshold_set(&bs, threshold); - amount =3D bdrv_write_threshold_exceeded(&bs, &req); + amount =3D bdrv_write_threshold_exceeded(threshold, offset, bytes); g_assert_cmpuint(amount, =3D=3D, 0); } =20 @@ -74,16 +68,10 @@ static void test_threshold_trigger(void) { uint64_t amount =3D 0; uint64_t threshold =3D 4 * 1024 * 1024; - BlockDriverState bs; - BdrvTrackedRequest req; + uint64_t offset =3D (4 * 1024 * 1024) - 1024; + uint64_t bytes =3D 2 * 1024; =20 - memset(&bs, 0, sizeof(bs)); - memset(&req, 0, sizeof(req)); - req.offset =3D (4 * 1024 * 1024) - 1024; - req.bytes =3D 2 * 1024; - - bdrv_write_threshold_set(&bs, threshold); - amount =3D bdrv_write_threshold_exceeded(&bs, &req); + amount =3D bdrv_write_threshold_exceeded(threshold, offset, bytes); g_assert_cmpuint(amount, >=3D, 1024); } =20 --=20 2.11.0