From nobody Mon Feb 9 04:29:56 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1499459861520965.0864146777401; Fri, 7 Jul 2017 13:37:41 -0700 (PDT) Received: from localhost ([::1]:58725 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dTa0S-00007R-6P for importer@patchew.org; Fri, 07 Jul 2017 16:37:40 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46858) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dTZuN-00036c-FO for qemu-devel@nongnu.org; Fri, 07 Jul 2017 16:31:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dTZuL-00043b-Uu for qemu-devel@nongnu.org; Fri, 07 Jul 2017 16:31:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:33910) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dTZuI-00040g-0d; Fri, 07 Jul 2017 16:31:18 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 068F7C049E16; Fri, 7 Jul 2017 20:31:17 +0000 (UTC) Received: from red.redhat.com (ovpn-120-204.rdu2.redhat.com [10.10.120.204]) by smtp.corp.redhat.com (Postfix) with ESMTP id CF98D71C55; Fri, 7 Jul 2017 20:31:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 068F7C049E16 Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=eblake@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 068F7C049E16 From: Eric Blake To: qemu-devel@nongnu.org Date: Fri, 7 Jul 2017 15:30:49 -0500 Message-Id: <20170707203049.534-10-eblake@redhat.com> In-Reply-To: <20170707203049.534-1-eblake@redhat.com> References: <20170707203049.534-1-eblake@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Fri, 07 Jul 2017 20:31:17 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v5 9/9] nbd: Implement NBD_INFO_BLOCK_SIZE on client X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , vsementsov@virtuozzo.com, den@virtuozzo.com, qemu-block@nongnu.org, Max Reitz , marcandre.lureau@redhat.com, pbonzini@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" The upstream NBD Protocol has defined a new extension to allow the server to advertise block sizes to the client, as well as a way for the client to inform the server whether it intends to obey block sizes. When using the block layer as the client, we will obey block sizes; but when used as 'qemu-nbd -c' to hand off to the kernel nbd module as the client, we are still waiting for the kernel to implement a way for us to learn if it will honor block sizes (perhaps by an addition to sysfs, rather than an ioctl), as well as any way to tell the kernel what additional block sizes to obey (NBD_SET_BLKSIZE appears to be accurate for the minimum size, but preferred and maximum sizes would probably be new ioctl()s), so until then, we need to make our request for block sizes conditional. When using ioctl(NBD_SET_BLKSIZE) to hand off to the kernel, use the minimum block size as the sector size if it is larger than 512, which also has the nice effect of cooperating with (non-qemu) servers that don't do read-modify-write when exposing a block device with 4k sectors; it might also allow us to visit a file larger than 2T on a 32-bit kernel. Signed-off-by: Eric Blake --- v5: rebase to master v4: new patch --- include/block/nbd.h | 6 ++++ block/nbd-client.c | 4 +++ block/nbd.c | 14 +++++++-- nbd/client.c | 81 +++++++++++++++++++++++++++++++++++++++++++++----= ---- qemu-nbd.c | 2 +- nbd/trace-events | 1 + 6 files changed, 92 insertions(+), 16 deletions(-) diff --git a/include/block/nbd.h b/include/block/nbd.h index 4a22eca..9c3d0a5 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -144,8 +144,14 @@ enum { /* Details collected by NBD_OPT_EXPORT_NAME and NBD_OPT_GO */ struct NBDExportInfo { + /* Set by client before nbd_receive_negotiate() */ + bool request_sizes; + /* Set by server results during nbd_receive_negotiate() */ uint64_t size; uint16_t flags; + uint32_t min_block; + uint32_t opt_block; + uint32_t max_block; }; typedef struct NBDExportInfo NBDExportInfo; diff --git a/block/nbd-client.c b/block/nbd-client.c index aab1e32..25dd284 100644 --- a/block/nbd-client.c +++ b/block/nbd-client.c @@ -384,6 +384,7 @@ int nbd_client_init(BlockDriverState *bs, logout("session init %s\n", export); qio_channel_set_blocking(QIO_CHANNEL(sioc), true, NULL); + client->info.request_sizes =3D true; ret =3D nbd_receive_negotiate(QIO_CHANNEL(sioc), export, tlscreds, hostname, &client->ioc, &client->info, errp); @@ -398,6 +399,9 @@ int nbd_client_init(BlockDriverState *bs, if (client->info.flags & NBD_FLAG_SEND_WRITE_ZEROES) { bs->supported_zero_flags |=3D BDRV_REQ_MAY_UNMAP; } + if (client->info.min_block > bs->bl.request_alignment) { + bs->bl.request_alignment =3D client->info.min_block; + } qemu_co_mutex_init(&client->send_mutex); qemu_co_queue_init(&client->free_sema); diff --git a/block/nbd.c b/block/nbd.c index 4a9048c..a50d24b 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -472,9 +472,17 @@ static int nbd_co_flush(BlockDriverState *bs) static void nbd_refresh_limits(BlockDriverState *bs, Error **errp) { - bs->bl.max_pdiscard =3D NBD_MAX_BUFFER_SIZE; - bs->bl.max_pwrite_zeroes =3D NBD_MAX_BUFFER_SIZE; - bs->bl.max_transfer =3D NBD_MAX_BUFFER_SIZE; + NBDClientSession *s =3D nbd_get_client_session(bs); + uint32_t max =3D MIN_NON_ZERO(NBD_MAX_BUFFER_SIZE, s->info.max_block); + + bs->bl.max_pdiscard =3D max; + bs->bl.max_pwrite_zeroes =3D max; + bs->bl.max_transfer =3D max; + + if (s->info.opt_block && + s->info.opt_block > bs->bl.opt_transfer) { + bs->bl.opt_transfer =3D s->info.opt_block; + } } static void nbd_close(BlockDriverState *bs) diff --git a/nbd/client.c b/nbd/client.c index cb55f3d..af2b46d 100644 --- a/nbd/client.c +++ b/nbd/client.c @@ -369,12 +369,17 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wa= ntname, info->flags =3D 0; trace_nbd_opt_go_start(wantname); - buf =3D g_malloc(4 + len + 2 + 1); + buf =3D g_malloc(4 + len + 2 + 2 * info->request_sizes + 1); stl_be_p(buf, len); memcpy(buf + 4, wantname, len); - /* No requests, live with whatever server sends */ - stw_be_p(buf + 4 + len, 0); - if (nbd_send_option_request(ioc, NBD_OPT_GO, len + 6, buf, errp) < 0) { + /* At most one request, everything else up to server */ + stw_be_p(buf + 4 + len, info->request_sizes); + if (info->request_sizes) { + stw_be_p(buf + 4 + len + 2, NBD_INFO_BLOCK_SIZE); + } + if (nbd_send_option_request(ioc, NBD_OPT_GO, + 4 + len + 2 + 2 * info->request_sizes, buf, + errp) < 0) { return -1; } @@ -405,8 +410,9 @@ static int nbd_opt_go(QIOChannel *ioc, const char *want= name, return 1; } if (reply.type !=3D NBD_REP_INFO) { - error_setg(errp, "unexpected reply type %" PRIx32 ", expected = %x", - reply.type, NBD_REP_INFO); + error_setg(errp, "unexpected reply type %" PRIx32 + " (%s), expected %x", + reply.type, nbd_rep_lookup(reply.type), NBD_REP_INF= O); nbd_send_opt_abort(ioc); return -1; } @@ -446,6 +452,51 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wan= tname, trace_nbd_receive_negotiate_size_flags(info->size, info->flags= ); break; + case NBD_INFO_BLOCK_SIZE: + if (len !=3D sizeof(info->min_block) * 3) { + error_setg(errp, "remaining export info len %" PRIu32 + " is unexpected size", len); + nbd_send_opt_abort(ioc); + return -1; + } + if (nbd_read(ioc, &info->min_block, sizeof(info->min_block), + errp) < 0) { + error_prepend(errp, "failed to read info minimum block siz= e"); + nbd_send_opt_abort(ioc); + return -1; + } + be32_to_cpus(&info->min_block); + if (!is_power_of_2(info->min_block)) { + error_setg(errp, "server minimum block size %" PRId32 + "is not a power of two", info->min_block); + nbd_send_opt_abort(ioc); + return -1; + } + if (nbd_read(ioc, &info->opt_block, sizeof(info->opt_block), + errp) < 0) { + error_prepend(errp, "failed to read info preferred block s= ize"); + nbd_send_opt_abort(ioc); + return -1; + } + be32_to_cpus(&info->opt_block); + if (!is_power_of_2(info->opt_block) || + info->opt_block < info->min_block) { + error_setg(errp, "server preferred block size %" PRId32 + "is not valid", info->opt_block); + nbd_send_opt_abort(ioc); + return -1; + } + if (nbd_read(ioc, &info->max_block, sizeof(info->max_block), + errp) < 0) { + error_prepend(errp, "failed to read info maximum block siz= e"); + nbd_send_opt_abort(ioc); + return -1; + } + be32_to_cpus(&info->max_block); + trace_nbd_opt_go_info_block_size(info->min_block, info->opt_bl= ock, + info->max_block); + break; + default: trace_nbd_opt_go_info_unknown(type, nbd_info_lookup(type)); if (nbd_drop(ioc, len, errp) < 0) { @@ -729,8 +780,14 @@ fail: int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info, Error **errp) { - unsigned long sectors =3D info->size / BDRV_SECTOR_SIZE; - if (info->size / BDRV_SECTOR_SIZE !=3D sectors) { + unsigned long sector_size =3D MAX(BDRV_SECTOR_SIZE, info->min_block); + unsigned long sectors =3D info->size / sector_size; + + /* FIXME: Once the kernel module is patched to honor block sizes, + * and to advertise that fact to user space, we should update the + * hand-off to the kernel to use any block sizes we learned. */ + assert(!info->request_sizes); + if (info->size / sector_size !=3D sectors) { error_setg(errp, "Export size %" PRId64 " too large for 32-bit ker= nel", info->size); return -E2BIG; @@ -744,17 +801,17 @@ int nbd_init(int fd, QIOChannelSocket *sioc, NBDExpor= tInfo *info, return -serrno; } - trace_nbd_init_set_block_size(BDRV_SECTOR_SIZE); + trace_nbd_init_set_block_size(sector_size); - if (ioctl(fd, NBD_SET_BLKSIZE, (unsigned long)BDRV_SECTOR_SIZE) < 0) { + if (ioctl(fd, NBD_SET_BLKSIZE, sector_size) < 0) { int serrno =3D errno; error_setg(errp, "Failed setting NBD block size"); return -serrno; } trace_nbd_init_set_size(sectors); - if (info->size % BDRV_SECTOR_SIZE) { - trace_nbd_init_trailing_bytes(info->size % BDRV_SECTOR_SIZE); + if (info->size % sector_size) { + trace_nbd_init_trailing_bytes(info->size % sector_size); } if (ioctl(fd, NBD_SET_SIZE_BLOCKS, sectors) < 0) { diff --git a/qemu-nbd.c b/qemu-nbd.c index c8bd47f..78d05be 100644 --- a/qemu-nbd.c +++ b/qemu-nbd.c @@ -255,7 +255,7 @@ static void *show_parts(void *arg) static void *nbd_client_thread(void *arg) { char *device =3D arg; - NBDExportInfo info; + NBDExportInfo info =3D { .request_sizes =3D false, }; QIOChannelSocket *sioc; int fd; int ret; diff --git a/nbd/trace-events b/nbd/trace-events index a3ba4bc..b18d051 100644 --- a/nbd/trace-events +++ b/nbd/trace-events @@ -6,6 +6,7 @@ nbd_reply_err_unsup(uint32_t option, const char *name) "ser= ver doesn't understan nbd_opt_go_start(const char *name) "Attempting NBD_OPT_GO for export '%s'" nbd_opt_go_success(void) "Export is good to go" nbd_opt_go_info_unknown(int info, const char *name) "Ignoring unknown info= %d (%s)" +nbd_opt_go_info_block_size(uint32_t minimum, uint32_t preferred, uint32_t = maximum) "Block sizes are 0x%" PRIx32 ", 0x%" PRIx32 ", 0x%" PRIx32 nbd_receive_query_exports_start(const char *wantname) "Querying export lis= t for '%s'" nbd_receive_query_exports_success(const char *wantname) "Found desired exp= ort name '%s'" nbd_receive_starttls_request(void) "Requesting TLS from server" --=20 2.9.4