From nobody Tue Feb 10 12:40:18 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 152959337676425.53141092490489; Thu, 21 Jun 2018 08:02:56 -0700 (PDT) Received: from localhost ([::1]:55961 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fW16r-0004Xk-8W for importer@patchew.org; Thu, 21 Jun 2018 11:02:53 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42784) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fW12S-0001MX-9s for qemu-devel@nongnu.org; Thu, 21 Jun 2018 10:58:22 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fW12Q-0006q9-40 for qemu-devel@nongnu.org; Thu, 21 Jun 2018 10:58:20 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:41542 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fW12I-0006n5-Lx; Thu, 21 Jun 2018 10:58:10 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2064A4002248; Thu, 21 Jun 2018 14:58:10 +0000 (UTC) Received: from red.redhat.com (ovpn-120-169.rdu2.redhat.com [10.10.120.169]) by smtp.corp.redhat.com (Postfix) with ESMTP id 437E11117643; Thu, 21 Jun 2018 14:58:09 +0000 (UTC) From: Eric Blake To: qemu-devel@nongnu.org Date: Thu, 21 Jun 2018 09:57:46 -0500 Message-Id: <20180621145749.191944-7-eblake@redhat.com> In-Reply-To: <20180621145749.191944-1-eblake@redhat.com> References: <20180621145749.191944-1-eblake@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Thu, 21 Jun 2018 14:58:10 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Thu, 21 Jun 2018 14:58:10 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'eblake@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PULL v2 6/9] nbd/server: implement dirty bitmap export X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Paolo Bonzini , Vladimir Sementsov-Ogievskiy , "open list:Network Block Dev..." , Max Reitz Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Vladimir Sementsov-Ogievskiy Handle a new NBD meta namespace: "qemu", and corresponding queries: "qemu:dirty-bitmap:". With the new metadata context negotiated, BLOCK_STATUS query will reply with dirty-bitmap data, converted to extents. The new public function nbd_export_bitmap selects which bitmap to export. For now, only one bitmap may be exported. Signed-off-by: Vladimir Sementsov-Ogievskiy Message-Id: <20180609151758.17343-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake [eblake: wording tweaks, minor cleanups, additional tracing] Signed-off-by: Eric Blake --- include/block/nbd.h | 8 +- nbd/server.c | 278 +++++++++++++++++++++++++++++++++++++++++++++++-= ---- nbd/trace-events | 1 + 3 files changed, 262 insertions(+), 25 deletions(-) diff --git a/include/block/nbd.h b/include/block/nbd.h index fcdcd545023..8bb9606c39b 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -229,11 +229,13 @@ enum { #define NBD_REPLY_TYPE_ERROR NBD_REPLY_ERR(1) #define NBD_REPLY_TYPE_ERROR_OFFSET NBD_REPLY_ERR(2) -/* Flags for extents (NBDExtent.flags) of NBD_REPLY_TYPE_BLOCK_STATUS, - * for base:allocation meta context */ +/* Extent flags for base:allocation in NBD_REPLY_TYPE_BLOCK_STATUS */ #define NBD_STATE_HOLE (1 << 0) #define NBD_STATE_ZERO (1 << 1) +/* Extent flags for qemu:dirty-bitmap in NBD_REPLY_TYPE_BLOCK_STATUS */ +#define NBD_STATE_DIRTY (1 << 0) + static inline bool nbd_reply_type_is_error(int type) { return type & (1 << 15); @@ -315,6 +317,8 @@ void nbd_client_put(NBDClient *client); void nbd_server_start(SocketAddress *addr, const char *tls_creds, Error **errp); +void nbd_export_bitmap(NBDExport *exp, const char *bitmap, + const char *bitmap_export_name, Error **errp); /* nbd_read * Reads @size bytes from @ioc. Returns 0 on success. diff --git a/nbd/server.c b/nbd/server.c index 9171cd41680..2c2d62c6361 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -23,6 +23,13 @@ #include "nbd-internal.h" #define NBD_META_ID_BASE_ALLOCATION 0 +#define NBD_META_ID_DIRTY_BITMAP 1 + +/* NBD_MAX_BITMAP_EXTENTS: 1 mb of extents data. An empirical + * constant. If an increase is needed, note that the NBD protocol + * recommends no larger than 32 mb, so that the client won't consider + * the reply as a denial of service attack. */ +#define NBD_MAX_BITMAP_EXTENTS (0x100000 / 8) static int system_errno_to_nbd_errno(int err) { @@ -80,6 +87,9 @@ struct NBDExport { BlockBackend *eject_notifier_blk; Notifier eject_notifier; + + BdrvDirtyBitmap *export_bitmap; + char *export_bitmap_context; }; static QTAILQ_HEAD(, NBDExport) exports =3D QTAILQ_HEAD_INITIALIZER(export= s); @@ -92,6 +102,7 @@ typedef struct NBDExportMetaContexts { bool valid; /* means that negotiation of the option finished without errors */ bool base_allocation; /* export base:allocation context (block status)= */ + bool bitmap; /* export qemu:dirty-bitmap: */ } NBDExportMetaContexts; struct NBDClient { @@ -814,6 +825,56 @@ static int nbd_meta_base_query(NBDClient *client, NBDE= xportMetaContexts *meta, &meta->base_allocation, errp); } +/* nbd_meta_bitmap_query + * + * Handle query to 'qemu:' namespace. + * @len is the amount of text remaining to be read from the current name, = after + * the 'qemu:' portion has been stripped. + * + * Return -errno on I/O error, 0 if option was completely handled by + * sending a reply about inconsistent lengths, or 1 on success. */ +static int nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *m= eta, + uint32_t len, Error **errp) +{ + bool dirty_bitmap =3D false; + size_t dirty_bitmap_len =3D strlen("dirty-bitmap:"); + int ret; + + if (!meta->exp->export_bitmap) { + trace_nbd_negotiate_meta_query_skip("no dirty-bitmap exported"); + return nbd_opt_skip(client, len, errp); + } + + if (len =3D=3D 0) { + if (client->opt =3D=3D NBD_OPT_LIST_META_CONTEXT) { + meta->bitmap =3D true; + } + trace_nbd_negotiate_meta_query_parse("empty"); + return 1; + } + + if (len < dirty_bitmap_len) { + trace_nbd_negotiate_meta_query_skip("not dirty-bitmap:"); + return nbd_opt_skip(client, len, errp); + } + + len -=3D dirty_bitmap_len; + ret =3D nbd_meta_pattern(client, "dirty-bitmap:", &dirty_bitmap, errp); + if (ret <=3D 0) { + return ret; + } + if (!dirty_bitmap) { + trace_nbd_negotiate_meta_query_skip("not dirty-bitmap:"); + return nbd_opt_skip(client, len, errp); + } + + trace_nbd_negotiate_meta_query_parse("dirty-bitmap:"); + + return nbd_meta_empty_or_pattern( + client, meta->exp->export_bitmap_context + + strlen("qemu:dirty_bitmap:"), len, &meta->bitmap, errp); +} + /* nbd_negotiate_meta_query * * Parse namespace name and call corresponding function to parse body of t= he @@ -829,9 +890,14 @@ static int nbd_meta_base_query(NBDClient *client, NBDE= xportMetaContexts *meta, static int nbd_negotiate_meta_query(NBDClient *client, NBDExportMetaContexts *meta, Error **e= rrp) { + /* + * Both 'qemu' and 'base' namespaces have length =3D 5 including a + * colon. If another length namespace is later introduced, this + * should certainly be refactored. + */ int ret; - char query[sizeof("base:") - 1]; - size_t baselen =3D strlen("base:"); + size_t ns_len =3D 5; + char ns[5]; uint32_t len; ret =3D nbd_opt_read(client, &len, sizeof(len), errp); @@ -840,25 +906,27 @@ static int nbd_negotiate_meta_query(NBDClient *client, } cpu_to_be32s(&len); - /* The only supported namespace for now is 'base'. So query should sta= rt - * with 'base:'. Otherwise, we can ignore it and skip the remainder. */ - if (len < baselen) { + if (len < ns_len) { trace_nbd_negotiate_meta_query_skip("length too short"); return nbd_opt_skip(client, len, errp); } - len -=3D baselen; - ret =3D nbd_opt_read(client, query, baselen, errp); + len -=3D ns_len; + ret =3D nbd_opt_read(client, ns, ns_len, errp); if (ret <=3D 0) { return ret; } - if (strncmp(query, "base:", baselen) !=3D 0) { - trace_nbd_negotiate_meta_query_skip("not for base: namespace"); - return nbd_opt_skip(client, len, errp); + + if (!strncmp(ns, "base:", ns_len)) { + trace_nbd_negotiate_meta_query_parse("base:"); + return nbd_meta_base_query(client, meta, len, errp); + } else if (!strncmp(ns, "qemu:", ns_len)) { + trace_nbd_negotiate_meta_query_parse("qemu:"); + return nbd_meta_qemu_query(client, meta, len, errp); } - trace_nbd_negotiate_meta_query_parse("base:"); - return nbd_meta_base_query(client, meta, len, errp); + trace_nbd_negotiate_meta_query_skip("unknown namespace"); + return nbd_opt_skip(client, len, errp); } /* nbd_negotiate_meta_queries @@ -928,6 +996,16 @@ static int nbd_negotiate_meta_queries(NBDClient *clien= t, } } + if (meta->bitmap) { + ret =3D nbd_negotiate_send_meta_context(client, + meta->exp->export_bitmap_con= text, + NBD_META_ID_DIRTY_BITMAP, + errp); + if (ret < 0) { + return ret; + } + } + ret =3D nbd_negotiate_send_rep(client, NBD_REP_ACK, errp); if (ret =3D=3D 0) { meta->valid =3D true; @@ -1556,6 +1634,11 @@ void nbd_export_put(NBDExport *exp) exp->blk =3D NULL; } + if (exp->export_bitmap) { + bdrv_dirty_bitmap_set_qmp_locked(exp->export_bitmap, false); + g_free(exp->export_bitmap_context); + } + g_free(exp); } } @@ -1797,9 +1880,15 @@ static int blockstatus_to_extent_be(BlockDriverState= *bs, uint64_t offset, } /* nbd_co_send_extents - * @extents should be in big-endian */ + * + * @length is only for tracing purposes (and may be smaller or larger + * than the client's original request). @last controls whether + * NBD_REPLY_FLAG_DONE is sent. @extents should already be in + * big-endian format. + */ static int nbd_co_send_extents(NBDClient *client, uint64_t handle, - NBDExtent *extents, unsigned nb_extents, + NBDExtent *extents, unsigned int nb_extents, + uint64_t length, bool last, uint32_t context_id, Error **errp) { NBDStructuredMeta chunk; @@ -1809,7 +1898,9 @@ static int nbd_co_send_extents(NBDClient *client, uin= t64_t handle, {.iov_base =3D extents, .iov_len =3D nb_extents * sizeof(extents[0= ])} }; - set_be_chunk(&chunk.h, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_BLOCK_STATU= S, + trace_nbd_co_send_extents(handle, nb_extents, context_id, length, last= ); + set_be_chunk(&chunk.h, last ? NBD_REPLY_FLAG_DONE : 0, + NBD_REPLY_TYPE_BLOCK_STATUS, handle, sizeof(chunk) - sizeof(chunk.h) + iov[1].iov_len); stl_be_p(&chunk.context_id, context_id); @@ -1819,8 +1910,8 @@ static int nbd_co_send_extents(NBDClient *client, uin= t64_t handle, /* Get block status from the exported device and send it to the client */ static int nbd_co_send_block_status(NBDClient *client, uint64_t handle, BlockDriverState *bs, uint64_t offset, - uint64_t length, uint32_t context_id, - Error **errp) + uint64_t length, bool last, + uint32_t context_id, Error **errp) { int ret; NBDExtent extent; @@ -1831,7 +1922,84 @@ static int nbd_co_send_block_status(NBDClient *clien= t, uint64_t handle, client, handle, -ret, "can't get block status", errp); } - return nbd_co_send_extents(client, handle, &extent, 1, context_id, err= p); + return nbd_co_send_extents(client, handle, &extent, 1, length, last, + context_id, errp); +} + +/* + * Populate @extents from a dirty bitmap. Unless @dont_fragment, the + * final extent may exceed the original @length. Store in @length the + * byte length encoded (which may be smaller or larger than the + * original), and return the number of extents used. + */ +static unsigned int bitmap_to_extents(BdrvDirtyBitmap *bitmap, uint64_t of= fset, + uint64_t *length, NBDExtent *extents, + unsigned int nb_extents, + bool dont_fragment) +{ + uint64_t begin =3D offset, end; + uint64_t overall_end =3D offset + *length; + unsigned int i =3D 0; + BdrvDirtyBitmapIter *it; + bool dirty; + + bdrv_dirty_bitmap_lock(bitmap); + + it =3D bdrv_dirty_iter_new(bitmap); + dirty =3D bdrv_get_dirty_locked(NULL, bitmap, offset); + + assert(begin < overall_end && nb_extents); + while (begin < overall_end && i < nb_extents) { + if (dirty) { + end =3D bdrv_dirty_bitmap_next_zero(bitmap, begin); + } else { + bdrv_set_dirty_iter(it, begin); + end =3D bdrv_dirty_iter_next(it); + } + if (end =3D=3D -1 || end - begin > UINT32_MAX) { + /* Cap to an aligned value < 4G beyond begin. */ + end =3D MIN(bdrv_dirty_bitmap_size(bitmap), + begin + UINT32_MAX + 1 - + bdrv_dirty_bitmap_granularity(bitmap)); + } + if (dont_fragment && end > overall_end) { + end =3D overall_end; + } + + extents[i].length =3D cpu_to_be32(end - begin); + extents[i].flags =3D cpu_to_be32(dirty ? NBD_STATE_DIRTY : 0); + i++; + begin =3D end; + dirty =3D !dirty; + } + + bdrv_dirty_iter_free(it); + + bdrv_dirty_bitmap_unlock(bitmap); + + *length =3D end - offset; + return i; +} + +static int nbd_co_send_bitmap(NBDClient *client, uint64_t handle, + BdrvDirtyBitmap *bitmap, uint64_t offset, + uint32_t length, bool dont_fragment, bool la= st, + uint32_t context_id, Error **errp) +{ + int ret; + unsigned int nb_extents =3D dont_fragment ? 1 : NBD_MAX_BITMAP_EXTENTS; + NBDExtent *extents =3D g_new(NBDExtent, nb_extents); + uint64_t final_length =3D length; + + nb_extents =3D bitmap_to_extents(bitmap, offset, &final_length, extent= s, + nb_extents, dont_fragment); + + ret =3D nbd_co_send_extents(client, handle, extents, nb_extents, + final_length, last, context_id, errp); + + g_free(extents); + + return ret; } /* nbd_co_receive_request @@ -2051,11 +2219,34 @@ static coroutine_fn int nbd_handle_request(NBDClien= t *client, return nbd_send_generic_reply(client, request->handle, -EINVAL, "need non-zero length", errp); } - if (client->export_meta.valid && client->export_meta.base_allocati= on) { - return nbd_co_send_block_status(client, request->handle, - blk_bs(exp->blk), request->fro= m, - request->len, - NBD_META_ID_BASE_ALLOCATION, e= rrp); + if (client->export_meta.valid && + (client->export_meta.base_allocation || + client->export_meta.bitmap)) + { + if (client->export_meta.base_allocation) { + ret =3D nbd_co_send_block_status(client, request->handle, + blk_bs(exp->blk), request->= from, + request->len, + !client->export_meta.bitmap, + NBD_META_ID_BASE_ALLOCATION, + errp); + if (ret < 0) { + return ret; + } + } + + if (client->export_meta.bitmap) { + ret =3D nbd_co_send_bitmap(client, request->handle, + client->exp->export_bitmap, + request->from, request->len, + request->flags & NBD_CMD_FLAG_REQ= _ONE, + true, NBD_META_ID_DIRTY_BITMAP, e= rrp); + if (ret < 0) { + return ret; + } + } + + return ret; } else { return nbd_send_generic_reply(client, request->handle, -EINVAL, "CMD_BLOCK_STATUS not negotiated= ", @@ -2207,3 +2398,44 @@ void nbd_client_new(NBDExport *exp, co =3D qemu_coroutine_create(nbd_co_client_start, client); qemu_coroutine_enter(co); } + +void nbd_export_bitmap(NBDExport *exp, const char *bitmap, + const char *bitmap_export_name, Error **errp) +{ + BdrvDirtyBitmap *bm =3D NULL; + BlockDriverState *bs =3D blk_bs(exp->blk); + + if (exp->export_bitmap) { + error_setg(errp, "Export bitmap is already set"); + return; + } + + while (true) { + bm =3D bdrv_find_dirty_bitmap(bs, bitmap); + if (bm !=3D NULL || bs->backing =3D=3D NULL) { + break; + } + + bs =3D bs->backing->bs; + } + + if (bm =3D=3D NULL) { + error_setg(errp, "Bitmap '%s' is not found", bitmap); + return; + } + + if (bdrv_dirty_bitmap_enabled(bm)) { + error_setg(errp, "Bitmap '%s' is enabled", bitmap); + return; + } + + if (bdrv_dirty_bitmap_qmp_locked(bm)) { + error_setg(errp, "Bitmap '%s' is locked", bitmap); + return; + } + + bdrv_dirty_bitmap_set_qmp_locked(bm, true); + exp->export_bitmap =3D bm; + exp->export_bitmap_context =3D + g_strdup_printf("qemu:dirty-bitmap:%s", bitmap_export_name); +} diff --git a/nbd/trace-events b/nbd/trace-events index dee081e7758..5e1d4afe8e6 100644 --- a/nbd/trace-events +++ b/nbd/trace-events @@ -64,6 +64,7 @@ nbd_co_send_simple_reply(uint64_t handle, uint32_t error,= const char *errname, i nbd_co_send_structured_done(uint64_t handle) "Send structured reply done: = handle =3D %" PRIu64 nbd_co_send_structured_read(uint64_t handle, uint64_t offset, void *data, = size_t size) "Send structured read data reply: handle =3D %" PRIu64 ", offs= et =3D %" PRIu64 ", data =3D %p, len =3D %zu" nbd_co_send_structured_read_hole(uint64_t handle, uint64_t offset, size_t = size) "Send structured read hole reply: handle =3D %" PRIu64 ", offset =3D = %" PRIu64 ", len =3D %zu" +nbd_co_send_extents(uint64_t handle, unsigned int extents, uint32_t id, ui= nt64_t length, int last) "Send block status reply: handle =3D %" PRIu64 ", = extents =3D %u, context =3D %d (extents cover %" PRIu64 " bytes, last chunk= =3D %d)" nbd_co_send_structured_error(uint64_t handle, int err, const char *errname= , const char *msg) "Send structured error reply: handle =3D %" PRIu64 ", er= ror =3D %d (%s), msg =3D '%s'" nbd_co_receive_request_decode_type(uint64_t handle, uint16_t type, const c= har *name) "Decoding type: handle =3D %" PRIu64 ", type =3D %" PRIu16 " (%s= )" nbd_co_receive_request_payload_received(uint64_t handle, uint32_t len) "Pa= yload received: handle =3D %" PRIu64 ", len =3D %" PRIu32 --=20 2.14.4