From nobody Wed Apr 16 12:45:46 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1682685664; cv=none; d=zohomail.com; s=zohoarc; b=ij/FCEU0Cf5/K6KTAlnBXyf7Yb5wTBcQQi+ZtrmXc/J7unXPhsfFX/S5QOsL6jGTDAvZomU1Sr3g+dKYzyveWzVU5etZ1ulJG1VW+pal1rRDw7XbWzbJaYoJGuk+PZfrdyg1x/v47UYNdvmt3aDo9QsANHlrEsq3ZaCtarwUs+8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1682685664; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=fTMMUFVaws3/lkV6gm1enPvMf61Z6FU1r0Lld9+qWU4=; b=DS/XIjqzs6qJgqtcGVZXgFoJV5PgLwkkKnuxFp+qIAL9MiJUl5iZR7UrQg1F8ibzJrCkdeUykY+jCH8yfTvrSHt+3a+4C6RlGqgsetH8DPoErgqDFv8vALyp3dUGiTDHv4XGFF7Jv85QxnI1DHm+62WVuXrOTkkQToILr7AHZFA= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1682685664870632.2499148315941; Fri, 28 Apr 2023 05:41:04 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1psNP5-0006rS-SG; Fri, 28 Apr 2023 08:40:48 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1psNOq-0006n5-Ci for qemu-devel@nongnu.org; Fri, 28 Apr 2023 08:40:32 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1psNOj-0000Ul-IU for qemu-devel@nongnu.org; Fri, 28 Apr 2023 08:40:29 -0400 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-659-u8zWCYIvNbqk6PYr-tB3lg-1; Fri, 28 Apr 2023 08:40:05 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5064E887403; Fri, 28 Apr 2023 12:40:05 +0000 (UTC) Received: from localhost (unknown [10.39.192.223]) by smtp.corp.redhat.com (Postfix) with ESMTP id 10CF240C6EC4; Fri, 28 Apr 2023 12:40:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1682685609; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fTMMUFVaws3/lkV6gm1enPvMf61Z6FU1r0Lld9+qWU4=; b=QzA7tZqCu9se3A2w0vV6xhZLXqRK3BO0zl3uLGYB2oZQcqLEA5v6HnzgLMCsl3eIwlGxzO fENDM53itflQ8lnSnH7Y11+yN0YJIYjb1a5cXD5IC8fhuBGxQimK8gpo3IY+JiBp7d0786 7+Op/2xH8IXfuqPQv+bzYxWJckRiSZA= X-MC-Unique: u8zWCYIvNbqk6PYr-tB3lg-1 From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Thomas Huth , =?UTF-8?q?Daniel=20P=2E=20Berrang=C3=A9?= , Raphael Norwitz , Julia Suvorova , Cornelia Huck , Eric Blake , Paolo Bonzini , Aarushi Mehta , Stefan Hajnoczi , Richard Henderson , Kevin Wolf , Markus Armbruster , Stefano Garzarella , qemu-block@nongnu.org, Hanna Reitz , Fam Zheng , =?UTF-8?q?Marc-Andr=C3=A9=20Lureau?= , "Michael S. Tsirkin" , kvm@vger.kernel.org, =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Sam Li , Hannes Reinecke , Dmitry Fomichev Subject: [PULL 03/17] block/block-backend: add block layer APIs resembling Linux ZonedBlockDevice ioctls Date: Fri, 28 Apr 2023 08:39:40 -0400 Message-Id: <20230428123954.179035-4-stefanha@redhat.com> In-Reply-To: <20230428123954.179035-1-stefanha@redhat.com> References: <20230428123954.179035-1-stefanha@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=stefanha@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -22 X-Spam_score: -2.3 X-Spam_bar: -- X-Spam_report: (-2.3 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.171, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1682685667279100007 From: Sam Li Add zoned device option to host_device BlockDriver. It will be presented on= ly for zoned host block devices. By adding zone management operations to the host_block_device BlockDriver, users can use the new block layer APIs including Report Zone and four zone management operations (open, close, finish, reset, reset_all). Qemu-io uses the new APIs to perform zoned storage commands of the device: zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs), zone_finish(zf). For example, to test zone_report, use following command: $ ./build/qemu-io --image-opts -n driver=3Dhost_device, filename=3D/dev/nul= lb0 -c "zrp offset nr_zones" Signed-off-by: Sam Li Reviewed-by: Hannes Reinecke Reviewed-by: Stefan Hajnoczi Reviewed-by: Dmitry Fomichev Acked-by: Kevin Wolf Signed-off-by: Stefan Hajnoczi Message-id: 20230427172019.3345-4-faithilikerun@gmail.com Message-id: 20230324090605.28361-4-faithilikerun@gmail.com [Adjust commit message prefix as suggested by Philippe Mathieu-Daud=C3=A9 and remove spurious ret =3D -errno in raw_co_zone_mgmt(). --Stefan] Signed-off-by: Stefan Hajnoczi --- meson.build | 4 + include/block/block-io.h | 9 + include/block/block_int-common.h | 21 ++ include/block/raw-aio.h | 6 +- include/sysemu/block-backend-io.h | 18 ++ block/block-backend.c | 137 +++++++++++++ block/file-posix.c | 313 +++++++++++++++++++++++++++++- block/io.c | 41 ++++ qemu-io-cmds.c | 149 ++++++++++++++ 9 files changed, 695 insertions(+), 3 deletions(-) diff --git a/meson.build b/meson.build index c44d05a13f..a1bc310114 100644 --- a/meson.build +++ b/meson.build @@ -1966,6 +1966,7 @@ config_host_data.set('CONFIG_REPLICATION', get_option= ('replication').allowed()) # has_header config_host_data.set('CONFIG_EPOLL', cc.has_header('sys/epoll.h')) config_host_data.set('CONFIG_LINUX_MAGIC_H', cc.has_header('linux/magic.h'= )) +config_host_data.set('CONFIG_BLKZONED', cc.has_header('linux/blkzoned.h')) config_host_data.set('CONFIG_VALGRIND_H', cc.has_header('valgrind/valgrind= .h')) config_host_data.set('HAVE_BTRFS_H', cc.has_header('linux/btrfs.h')) config_host_data.set('HAVE_DRM_H', cc.has_header('libdrm/drm.h')) @@ -2052,6 +2053,9 @@ config_host_data.set('HAVE_SIGEV_NOTIFY_THREAD_ID', config_host_data.set('HAVE_STRUCT_STAT_ST_ATIM', cc.has_member('struct stat', 'st_atim', prefix: '#include ')) +config_host_data.set('HAVE_BLK_ZONE_REP_CAPACITY', + cc.has_member('struct blk_zone', 'capacity', + prefix: '#include ')) =20 # has_type config_host_data.set('CONFIG_IOVEC', diff --git a/include/block/block-io.h b/include/block/block-io.h index 5dab88521d..58f415ab64 100644 --- a/include/block/block-io.h +++ b/include/block/block-io.h @@ -111,6 +111,15 @@ int coroutine_fn GRAPH_RDLOCK bdrv_co_flush(BlockDrive= rState *bs); int coroutine_fn GRAPH_RDLOCK bdrv_co_pdiscard(BdrvChild *child, int64_t o= ffset, int64_t bytes); =20 +/* Report zone information of zone block device. */ +int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_report(BlockDriverState *bs, + int64_t offset, + unsigned int *nr_zones, + BlockZoneDescriptor *zon= es); +int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs, + BlockZoneOp op, + int64_t offset, int64_t le= n); + bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs); int bdrv_block_status(BlockDriverState *bs, int64_t offset, int64_t bytes, int64_t *pnum, int64_t *map, diff --git a/include/block/block_int-common.h b/include/block/block_int-com= mon.h index 150dc6f68f..997d539890 100644 --- a/include/block/block_int-common.h +++ b/include/block/block_int-common.h @@ -712,6 +712,12 @@ struct BlockDriver { int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_load_vmstate)( BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos); =20 + int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs, + int64_t offset, unsigned int *nr_zones, + BlockZoneDescriptor *zones); + int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneO= p op, + int64_t offset, int64_t len); + /* removable device specific */ bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)( BlockDriverState *bs); @@ -864,6 +870,21 @@ typedef struct BlockLimits { =20 /* device zone model */ BlockZoneModel zoned; + + /* zone size expressed in bytes */ + uint32_t zone_size; + + /* total number of zones */ + uint32_t nr_zones; + + /* maximum sectors of a zone append write operation */ + int64_t max_append_sectors; + + /* maximum number of open zones */ + int64_t max_open_zones; + + /* maximum number of active zones */ + int64_t max_active_zones; } BlockLimits; =20 typedef struct BdrvOpBlocker BdrvOpBlocker; diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h index e46a29c3f0..afb9bdf51b 100644 --- a/include/block/raw-aio.h +++ b/include/block/raw-aio.h @@ -28,6 +28,8 @@ #define QEMU_AIO_WRITE_ZEROES 0x0020 #define QEMU_AIO_COPY_RANGE 0x0040 #define QEMU_AIO_TRUNCATE 0x0080 +#define QEMU_AIO_ZONE_REPORT 0x0100 +#define QEMU_AIO_ZONE_MGMT 0x0200 #define QEMU_AIO_TYPE_MASK \ (QEMU_AIO_READ | \ QEMU_AIO_WRITE | \ @@ -36,7 +38,9 @@ QEMU_AIO_DISCARD | \ QEMU_AIO_WRITE_ZEROES | \ QEMU_AIO_COPY_RANGE | \ - QEMU_AIO_TRUNCATE) + QEMU_AIO_TRUNCATE | \ + QEMU_AIO_ZONE_REPORT | \ + QEMU_AIO_ZONE_MGMT) =20 /* AIO flags */ #define QEMU_AIO_MISALIGNED 0x1000 diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backe= nd-io.h index 851a44de96..eb1c1ebfec 100644 --- a/include/sysemu/block-backend-io.h +++ b/include/sysemu/block-backend-io.h @@ -46,6 +46,13 @@ BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t o= ffset, BlockCompletionFunc *cb, void *opaque); BlockAIOCB *blk_aio_flush(BlockBackend *blk, BlockCompletionFunc *cb, void *opaque); +BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset, + unsigned int *nr_zones, + BlockZoneDescriptor *zones, + BlockCompletionFunc *cb, void *opaque); +BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op, + int64_t offset, int64_t len, + BlockCompletionFunc *cb, void *opaque); BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t by= tes, BlockCompletionFunc *cb, void *opaque); void blk_aio_cancel_async(BlockAIOCB *acb); @@ -191,6 +198,17 @@ int co_wrapper_mixed blk_pwrite_zeroes(BlockBackend *b= lk, int64_t offset, int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset, int64_t bytes, BdrvRequestFlags flag= s); =20 +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset, + unsigned int *nr_zones, + BlockZoneDescriptor *zones); +int co_wrapper_mixed blk_zone_report(BlockBackend *blk, int64_t offset, + unsigned int *nr_zones, + BlockZoneDescriptor *zones); +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op, + int64_t offset, int64_t len); +int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op, + int64_t offset, int64_t len); + int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes); int coroutine_fn blk_co_pdiscard(BlockBackend *blk, int64_t offset, diff --git a/block/block-backend.c b/block/block-backend.c index fc530ded6a..67722eb46d 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -1845,6 +1845,143 @@ int coroutine_fn blk_co_flush(BlockBackend *blk) return ret; } =20 +static void coroutine_fn blk_aio_zone_report_entry(void *opaque) +{ + BlkAioEmAIOCB *acb =3D opaque; + BlkRwCo *rwco =3D &acb->rwco; + + rwco->ret =3D blk_co_zone_report(rwco->blk, rwco->offset, + (unsigned int*)(uintptr_t)acb->bytes, + rwco->iobuf); + blk_aio_complete(acb); +} + +BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset, + unsigned int *nr_zones, + BlockZoneDescriptor *zones, + BlockCompletionFunc *cb, void *opaque) +{ + BlkAioEmAIOCB *acb; + Coroutine *co; + IO_CODE(); + + blk_inc_in_flight(blk); + acb =3D blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque); + acb->rwco =3D (BlkRwCo) { + .blk =3D blk, + .offset =3D offset, + .iobuf =3D zones, + .ret =3D NOT_DONE, + }; + acb->bytes =3D (int64_t)(uintptr_t)nr_zones, + acb->has_returned =3D false; + + co =3D qemu_coroutine_create(blk_aio_zone_report_entry, acb); + aio_co_enter(blk_get_aio_context(blk), co); + + acb->has_returned =3D true; + if (acb->rwco.ret !=3D NOT_DONE) { + replay_bh_schedule_oneshot_event(blk_get_aio_context(blk), + blk_aio_complete_bh, acb); + } + + return &acb->common; +} + +static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque) +{ + BlkAioEmAIOCB *acb =3D opaque; + BlkRwCo *rwco =3D &acb->rwco; + + rwco->ret =3D blk_co_zone_mgmt(rwco->blk, + (BlockZoneOp)(uintptr_t)rwco->iobuf, + rwco->offset, acb->bytes); + blk_aio_complete(acb); +} + +BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op, + int64_t offset, int64_t len, + BlockCompletionFunc *cb, void *opaque) { + BlkAioEmAIOCB *acb; + Coroutine *co; + IO_CODE(); + + blk_inc_in_flight(blk); + acb =3D blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque); + acb->rwco =3D (BlkRwCo) { + .blk =3D blk, + .offset =3D offset, + .iobuf =3D (void *)(uintptr_t)op, + .ret =3D NOT_DONE, + }; + acb->bytes =3D len; + acb->has_returned =3D false; + + co =3D qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb); + aio_co_enter(blk_get_aio_context(blk), co); + + acb->has_returned =3D true; + if (acb->rwco.ret !=3D NOT_DONE) { + replay_bh_schedule_oneshot_event(blk_get_aio_context(blk), + blk_aio_complete_bh, acb); + } + + return &acb->common; +} + +/* + * Send a zone_report command. + * offset is a byte offset from the start of the device. No alignment + * required for offset. + * nr_zones represents IN maximum and OUT actual. + */ +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset, + unsigned int *nr_zones, + BlockZoneDescriptor *zones) +{ + int ret; + IO_CODE(); + + blk_inc_in_flight(blk); /* increase before waiting */ + blk_wait_while_drained(blk); + GRAPH_RDLOCK_GUARD(); + if (!blk_is_available(blk)) { + blk_dec_in_flight(blk); + return -ENOMEDIUM; + } + ret =3D bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones); + blk_dec_in_flight(blk); + return ret; +} + +/* + * Send a zone_management command. + * op is the zone operation; + * offset is the byte offset from the start of the zoned device; + * len is the maximum number of bytes the command should operate on. It + * should be aligned with the device zone size. + */ +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op, + int64_t offset, int64_t len) +{ + int ret; + IO_CODE(); + + blk_inc_in_flight(blk); + blk_wait_while_drained(blk); + GRAPH_RDLOCK_GUARD(); + + ret =3D blk_check_byte_request(blk, offset, len); + if (ret < 0) { + blk_dec_in_flight(blk); + return ret; + } + + ret =3D bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len); + blk_dec_in_flight(blk); + return ret; +} + void blk_drain(BlockBackend *blk) { BlockDriverState *bs =3D blk_bs(blk); diff --git a/block/file-posix.c b/block/file-posix.c index ba15b10eee..3b6575d771 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -68,6 +68,9 @@ #include #include #include +#if defined(CONFIG_BLKZONED) +#include +#endif #include #include #include @@ -216,6 +219,13 @@ typedef struct RawPosixAIOData { PreallocMode prealloc; Error **errp; } truncate; + struct { + unsigned int *nr_zones; + BlockZoneDescriptor *zones; + } zone_report; + struct { + unsigned long op; + } zone_mgmt; }; } RawPosixAIOData; =20 @@ -1236,6 +1246,7 @@ static int get_sysfs_str_val(struct stat *st, const c= har *attribute, #endif } =20 +#if defined(CONFIG_BLKZONED) static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned) { g_autofree char *val =3D NULL; @@ -1257,6 +1268,7 @@ static int get_sysfs_zoned_model(struct stat *st, Blo= ckZoneModel *zoned) } return 0; } +#endif /* defined(CONFIG_BLKZONED) */ =20 /* * Get a sysfs attribute value as a long integer. @@ -1302,6 +1314,7 @@ static int hdev_get_max_segments(int fd, struct stat = *st) #endif } =20 +#if defined(CONFIG_BLKZONED) static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st, Error **errp) { @@ -1315,7 +1328,54 @@ static void raw_refresh_zoned_limits(BlockDriverStat= e *bs, struct stat *st, return; } bs->bl.zoned =3D zoned; + + ret =3D get_sysfs_long_val(st, "max_open_zones"); + if (ret >=3D 0) { + bs->bl.max_open_zones =3D ret; + } + + ret =3D get_sysfs_long_val(st, "max_active_zones"); + if (ret >=3D 0) { + bs->bl.max_active_zones =3D ret; + } + + /* + * The zoned device must at least have zone size and nr_zones fields. + */ + ret =3D get_sysfs_long_val(st, "chunk_sectors"); + if (ret < 0) { + error_setg_errno(errp, -ret, "Unable to read chunk_sectors " + "sysfs attribute"); + return; + } else if (!ret) { + error_setg(errp, "Read 0 from chunk_sectors sysfs attribute"); + return; + } + bs->bl.zone_size =3D ret << BDRV_SECTOR_BITS; + + ret =3D get_sysfs_long_val(st, "nr_zones"); + if (ret < 0) { + error_setg_errno(errp, -ret, "Unable to read nr_zones " + "sysfs attribute"); + return; + } else if (!ret) { + error_setg(errp, "Read 0 from nr_zones sysfs attribute"); + return; + } + bs->bl.nr_zones =3D ret; + + ret =3D get_sysfs_long_val(st, "zone_append_max_bytes"); + if (ret > 0) { + bs->bl.max_append_sectors =3D ret >> BDRV_SECTOR_BITS; + } } +#else /* !defined(CONFIG_BLKZONED) */ +static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st, + Error **errp) +{ + bs->bl.zoned =3D BLK_Z_NONE; +} +#endif /* !defined(CONFIG_BLKZONED) */ =20 static void raw_refresh_limits(BlockDriverState *bs, Error **errp) { @@ -1383,9 +1443,12 @@ static int hdev_probe_blocksizes(BlockDriverState *b= s, BlockSizes *bsz) BDRVRawState *s =3D bs->opaque; int ret; =20 - /* If DASD, get blocksizes */ + /* If DASD or zoned devices, get blocksizes */ if (check_for_dasd(s->fd) < 0) { - return -ENOTSUP; + /* zoned devices are not DASD */ + if (bs->bl.zoned =3D=3D BLK_Z_NONE) { + return -ENOTSUP; + } } ret =3D probe_logical_blocksize(s->fd, &bsz->log); if (ret < 0) { @@ -1853,6 +1916,147 @@ static off_t copy_file_range(int in_fd, off_t *in_o= ff, int out_fd, } #endif =20 +/* + * parse_zone - Fill a zone descriptor + */ +#if defined(CONFIG_BLKZONED) +static inline int parse_zone(struct BlockZoneDescriptor *zone, + const struct blk_zone *blkz) { + zone->start =3D blkz->start << BDRV_SECTOR_BITS; + zone->length =3D blkz->len << BDRV_SECTOR_BITS; + zone->wp =3D blkz->wp << BDRV_SECTOR_BITS; + +#ifdef HAVE_BLK_ZONE_REP_CAPACITY + zone->cap =3D blkz->capacity << BDRV_SECTOR_BITS; +#else + zone->cap =3D blkz->len << BDRV_SECTOR_BITS; +#endif + + switch (blkz->type) { + case BLK_ZONE_TYPE_SEQWRITE_REQ: + zone->type =3D BLK_ZT_SWR; + break; + case BLK_ZONE_TYPE_SEQWRITE_PREF: + zone->type =3D BLK_ZT_SWP; + break; + case BLK_ZONE_TYPE_CONVENTIONAL: + zone->type =3D BLK_ZT_CONV; + break; + default: + error_report("Unsupported zone type: 0x%x", blkz->type); + return -ENOTSUP; + } + + switch (blkz->cond) { + case BLK_ZONE_COND_NOT_WP: + zone->state =3D BLK_ZS_NOT_WP; + break; + case BLK_ZONE_COND_EMPTY: + zone->state =3D BLK_ZS_EMPTY; + break; + case BLK_ZONE_COND_IMP_OPEN: + zone->state =3D BLK_ZS_IOPEN; + break; + case BLK_ZONE_COND_EXP_OPEN: + zone->state =3D BLK_ZS_EOPEN; + break; + case BLK_ZONE_COND_CLOSED: + zone->state =3D BLK_ZS_CLOSED; + break; + case BLK_ZONE_COND_READONLY: + zone->state =3D BLK_ZS_RDONLY; + break; + case BLK_ZONE_COND_FULL: + zone->state =3D BLK_ZS_FULL; + break; + case BLK_ZONE_COND_OFFLINE: + zone->state =3D BLK_ZS_OFFLINE; + break; + default: + error_report("Unsupported zone state: 0x%x", blkz->cond); + return -ENOTSUP; + } + return 0; +} +#endif + +#if defined(CONFIG_BLKZONED) +static int handle_aiocb_zone_report(void *opaque) +{ + RawPosixAIOData *aiocb =3D opaque; + int fd =3D aiocb->aio_fildes; + unsigned int *nr_zones =3D aiocb->zone_report.nr_zones; + BlockZoneDescriptor *zones =3D aiocb->zone_report.zones; + /* zoned block devices use 512-byte sectors */ + uint64_t sector =3D aiocb->aio_offset / 512; + + struct blk_zone *blkz; + size_t rep_size; + unsigned int nrz; + int ret; + unsigned int n =3D 0, i =3D 0; + + nrz =3D *nr_zones; + rep_size =3D sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_= zone); + g_autofree struct blk_zone_report *rep =3D NULL; + rep =3D g_malloc(rep_size); + + blkz =3D (struct blk_zone *)(rep + 1); + while (n < nrz) { + memset(rep, 0, rep_size); + rep->sector =3D sector; + rep->nr_zones =3D nrz - n; + + do { + ret =3D ioctl(fd, BLKREPORTZONE, rep); + } while (ret !=3D 0 && errno =3D=3D EINTR); + if (ret !=3D 0) { + error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d= ", + fd, sector, errno); + return -errno; + } + + if (!rep->nr_zones) { + break; + } + + for (i =3D 0; i < rep->nr_zones; i++, n++) { + ret =3D parse_zone(&zones[n], &blkz[i]); + if (ret !=3D 0) { + return ret; + } + + /* The next report should start after the last zone reported */ + sector =3D blkz[i].start + blkz[i].len; + } + } + + *nr_zones =3D n; + return 0; +} +#endif + +#if defined(CONFIG_BLKZONED) +static int handle_aiocb_zone_mgmt(void *opaque) +{ + RawPosixAIOData *aiocb =3D opaque; + int fd =3D aiocb->aio_fildes; + uint64_t sector =3D aiocb->aio_offset / 512; + int64_t nr_sectors =3D aiocb->aio_nbytes / 512; + struct blk_zone_range range; + int ret; + + /* Execute the operation */ + range.sector =3D sector; + range.nr_sectors =3D nr_sectors; + do { + ret =3D ioctl(fd, aiocb->zone_mgmt.op, &range); + } while (ret !=3D 0 && errno =3D=3D EINTR); + + return ret; +} +#endif + static int handle_aiocb_copy_range(void *opaque) { RawPosixAIOData *aiocb =3D opaque; @@ -3032,6 +3236,104 @@ static void raw_account_discard(BDRVRawState *s, ui= nt64_t nbytes, int ret) } } =20 +/* + * zone report - Get a zone block device's information in the form + * of an array of zone descriptors. + * zones is an array of zone descriptors to hold zone information on reply; + * offset can be any byte within the entire size of the device; + * nr_zones is the maxium number of sectors the command should operate on. + */ +#if defined(CONFIG_BLKZONED) +static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t o= ffset, + unsigned int *nr_zones, + BlockZoneDescriptor *zones) { + BDRVRawState *s =3D bs->opaque; + RawPosixAIOData acb =3D (RawPosixAIOData) { + .bs =3D bs, + .aio_fildes =3D s->fd, + .aio_type =3D QEMU_AIO_ZONE_REPORT, + .aio_offset =3D offset, + .zone_report =3D { + .nr_zones =3D nr_zones, + .zones =3D zones, + }, + }; + + return raw_thread_pool_submit(handle_aiocb_zone_report, &acb); +} +#endif + +/* + * zone management operations - Execute an operation on a zone + */ +#if defined(CONFIG_BLKZONED) +static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp= op, + int64_t offset, int64_t len) { + BDRVRawState *s =3D bs->opaque; + RawPosixAIOData acb; + int64_t zone_size, zone_size_mask; + const char *op_name; + unsigned long zo; + int ret; + int64_t capacity =3D bs->total_sectors << BDRV_SECTOR_BITS; + + zone_size =3D bs->bl.zone_size; + zone_size_mask =3D zone_size - 1; + if (offset & zone_size_mask) { + error_report("sector offset %" PRId64 " is not aligned to zone siz= e " + "%" PRId64 "", offset / 512, zone_size / 512); + return -EINVAL; + } + + if (((offset + len) < capacity && len & zone_size_mask) || + offset + len > capacity) { + error_report("number of sectors %" PRId64 " is not aligned to zone= size" + " %" PRId64 "", len / 512, zone_size / 512); + return -EINVAL; + } + + switch (op) { + case BLK_ZO_OPEN: + op_name =3D "BLKOPENZONE"; + zo =3D BLKOPENZONE; + break; + case BLK_ZO_CLOSE: + op_name =3D "BLKCLOSEZONE"; + zo =3D BLKCLOSEZONE; + break; + case BLK_ZO_FINISH: + op_name =3D "BLKFINISHZONE"; + zo =3D BLKFINISHZONE; + break; + case BLK_ZO_RESET: + op_name =3D "BLKRESETZONE"; + zo =3D BLKRESETZONE; + break; + default: + error_report("Unsupported zone op: 0x%x", op); + return -ENOTSUP; + } + + acb =3D (RawPosixAIOData) { + .bs =3D bs, + .aio_fildes =3D s->fd, + .aio_type =3D QEMU_AIO_ZONE_MGMT, + .aio_offset =3D offset, + .aio_nbytes =3D len, + .zone_mgmt =3D { + .op =3D zo, + }, + }; + + ret =3D raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb); + if (ret !=3D 0) { + error_report("ioctl %s failed %d", op_name, ret); + } + + return ret; +} +#endif + static coroutine_fn int raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes, bool blkdev) @@ -3787,6 +4089,13 @@ static BlockDriver bdrv_host_device =3D { #ifdef __linux__ .bdrv_co_ioctl =3D hdev_co_ioctl, #endif + + /* zoned device */ +#if defined(CONFIG_BLKZONED) + /* zone management operations */ + .bdrv_co_zone_report =3D raw_co_zone_report, + .bdrv_co_zone_mgmt =3D raw_co_zone_mgmt, +#endif }; =20 #if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel= __) diff --git a/block/io.c b/block/io.c index 6fa1993374..74bab69b0f 100644 --- a/block/io.c +++ b/block/io.c @@ -3115,6 +3115,47 @@ out: return co.ret; } =20 +int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset, + unsigned int *nr_zones, + BlockZoneDescriptor *zones) +{ + BlockDriver *drv =3D bs->drv; + CoroutineIOCompletion co =3D { + .coroutine =3D qemu_coroutine_self(), + }; + IO_CODE(); + + bdrv_inc_in_flight(bs); + if (!drv || !drv->bdrv_co_zone_report || bs->bl.zoned =3D=3D BLK_Z_NON= E) { + co.ret =3D -ENOTSUP; + goto out; + } + co.ret =3D drv->bdrv_co_zone_report(bs, offset, nr_zones, zones); +out: + bdrv_dec_in_flight(bs); + return co.ret; +} + +int coroutine_fn bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, + int64_t offset, int64_t len) +{ + BlockDriver *drv =3D bs->drv; + CoroutineIOCompletion co =3D { + .coroutine =3D qemu_coroutine_self(), + }; + IO_CODE(); + + bdrv_inc_in_flight(bs); + if (!drv || !drv->bdrv_co_zone_mgmt || bs->bl.zoned =3D=3D BLK_Z_NONE)= { + co.ret =3D -ENOTSUP; + goto out; + } + co.ret =3D drv->bdrv_co_zone_mgmt(bs, op, offset, len); +out: + bdrv_dec_in_flight(bs); + return co.ret; +} + void *qemu_blockalign(BlockDriverState *bs, size_t size) { IO_CODE(); diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c index e7a02f5b99..f35ea627d7 100644 --- a/qemu-io-cmds.c +++ b/qemu-io-cmds.c @@ -1730,6 +1730,150 @@ static const cmdinfo_t flush_cmd =3D { .oneline =3D "flush all in-core file state to disk", }; =20 +static inline int64_t tosector(int64_t bytes) +{ + return bytes >> BDRV_SECTOR_BITS; +} + +static int zone_report_f(BlockBackend *blk, int argc, char **argv) +{ + int ret; + int64_t offset; + unsigned int nr_zones; + + ++optind; + offset =3D cvtnum(argv[optind]); + ++optind; + nr_zones =3D cvtnum(argv[optind]); + + g_autofree BlockZoneDescriptor *zones =3D NULL; + zones =3D g_new(BlockZoneDescriptor, nr_zones); + ret =3D blk_zone_report(blk, offset, &nr_zones, zones); + if (ret < 0) { + printf("zone report failed: %s\n", strerror(-ret)); + } else { + for (int i =3D 0; i < nr_zones; ++i) { + printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", " + "cap"" 0x%" PRIx64 ", wptr 0x%" PRIx64 ", " + "zcond:%u, [type: %u]\n", + tosector(zones[i].start), tosector(zones[i].length), + tosector(zones[i].cap), tosector(zones[i].wp), + zones[i].state, zones[i].type); + } + } + return ret; +} + +static const cmdinfo_t zone_report_cmd =3D { + .name =3D "zone_report", + .altname =3D "zrp", + .cfunc =3D zone_report_f, + .argmin =3D 2, + .argmax =3D 2, + .args =3D "offset number", + .oneline =3D "report zone information", +}; + +static int zone_open_f(BlockBackend *blk, int argc, char **argv) +{ + int ret; + int64_t offset, len; + ++optind; + offset =3D cvtnum(argv[optind]); + ++optind; + len =3D cvtnum(argv[optind]); + ret =3D blk_zone_mgmt(blk, BLK_ZO_OPEN, offset, len); + if (ret < 0) { + printf("zone open failed: %s\n", strerror(-ret)); + } + return ret; +} + +static const cmdinfo_t zone_open_cmd =3D { + .name =3D "zone_open", + .altname =3D "zo", + .cfunc =3D zone_open_f, + .argmin =3D 2, + .argmax =3D 2, + .args =3D "offset len", + .oneline =3D "explicit open a range of zones in zone block device", +}; + +static int zone_close_f(BlockBackend *blk, int argc, char **argv) +{ + int ret; + int64_t offset, len; + ++optind; + offset =3D cvtnum(argv[optind]); + ++optind; + len =3D cvtnum(argv[optind]); + ret =3D blk_zone_mgmt(blk, BLK_ZO_CLOSE, offset, len); + if (ret < 0) { + printf("zone close failed: %s\n", strerror(-ret)); + } + return ret; +} + +static const cmdinfo_t zone_close_cmd =3D { + .name =3D "zone_close", + .altname =3D "zc", + .cfunc =3D zone_close_f, + .argmin =3D 2, + .argmax =3D 2, + .args =3D "offset len", + .oneline =3D "close a range of zones in zone block device", +}; + +static int zone_finish_f(BlockBackend *blk, int argc, char **argv) +{ + int ret; + int64_t offset, len; + ++optind; + offset =3D cvtnum(argv[optind]); + ++optind; + len =3D cvtnum(argv[optind]); + ret =3D blk_zone_mgmt(blk, BLK_ZO_FINISH, offset, len); + if (ret < 0) { + printf("zone finish failed: %s\n", strerror(-ret)); + } + return ret; +} + +static const cmdinfo_t zone_finish_cmd =3D { + .name =3D "zone_finish", + .altname =3D "zf", + .cfunc =3D zone_finish_f, + .argmin =3D 2, + .argmax =3D 2, + .args =3D "offset len", + .oneline =3D "finish a range of zones in zone block device", +}; + +static int zone_reset_f(BlockBackend *blk, int argc, char **argv) +{ + int ret; + int64_t offset, len; + ++optind; + offset =3D cvtnum(argv[optind]); + ++optind; + len =3D cvtnum(argv[optind]); + ret =3D blk_zone_mgmt(blk, BLK_ZO_RESET, offset, len); + if (ret < 0) { + printf("zone reset failed: %s\n", strerror(-ret)); + } + return ret; +} + +static const cmdinfo_t zone_reset_cmd =3D { + .name =3D "zone_reset", + .altname =3D "zrs", + .cfunc =3D zone_reset_f, + .argmin =3D 2, + .argmax =3D 2, + .args =3D "offset len", + .oneline =3D "reset a zone write pointer in zone block device", +}; + static int truncate_f(BlockBackend *blk, int argc, char **argv); static const cmdinfo_t truncate_cmd =3D { .name =3D "truncate", @@ -2523,6 +2667,11 @@ static void __attribute((constructor)) init_qemuio_c= ommands(void) qemuio_add_command(&aio_write_cmd); qemuio_add_command(&aio_flush_cmd); qemuio_add_command(&flush_cmd); + qemuio_add_command(&zone_report_cmd); + qemuio_add_command(&zone_open_cmd); + qemuio_add_command(&zone_close_cmd); + qemuio_add_command(&zone_finish_cmd); + qemuio_add_command(&zone_reset_cmd); qemuio_add_command(&truncate_cmd); qemuio_add_command(&length_cmd); qemuio_add_command(&info_cmd); --=20 2.40.0