From nobody Thu Apr 25 21:54:03 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=yadro.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1669305602256504.62438523012963; Thu, 24 Nov 2022 08:00:02 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oyEd9-0002WF-C8; Thu, 24 Nov 2022 10:59:15 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oyEd3-0002NP-Kw; Thu, 24 Nov 2022 10:59:09 -0500 Received: from mta-02.yadro.com ([89.207.88.252] helo=mta-01.yadro.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oyEd1-0005IO-5I; Thu, 24 Nov 2022 10:59:09 -0500 Received: from localhost (unknown [127.0.0.1]) by mta-01.yadro.com (Postfix) with ESMTP id E32BE411FD; Thu, 24 Nov 2022 15:59:01 +0000 (UTC) Received: from mta-01.yadro.com ([127.0.0.1]) by localhost (mta-01.yadro.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id S9YzQfdMY05j; Thu, 24 Nov 2022 18:59:00 +0300 (MSK) Received: from T-EXCH-02.corp.yadro.com (T-EXCH-02.corp.yadro.com [172.17.10.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mta-01.yadro.com (Postfix) with ESMTPS id 5E4D940311; Thu, 24 Nov 2022 18:59:00 +0300 (MSK) Received: from T-EXCH-09.corp.yadro.com (172.17.11.59) by T-EXCH-02.corp.yadro.com (172.17.10.102) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.669.32; Thu, 24 Nov 2022 18:59:00 +0300 Received: from archlinux.yadro.com (10.178.113.54) by T-EXCH-09.corp.yadro.com (172.17.11.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.2.1118.9; Thu, 24 Nov 2022 18:58:59 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yadro.com; h= content-type:content-type:content-transfer-encoding:mime-version :references:in-reply-to:x-mailer:message-id:date:date:subject :subject:from:from:received:received:received:received; s= mta-01; t=1669305540; x=1671119941; bh=JrTtmOCQPXqPEEEM7T5Cp9/Ai O+hRlHM74J3LFX39dc=; b=HH+Rchl+xQjv02KaWIOz8oNeHEJ8dABqzgZ4hp4TA p5W/7vcfKlpA3WPcllxw7JtAkCte6JCVeYMf/uLKMb1twOJD2rohLPi3Yk9lz920 5gKSsBDlctMYDYpIYnudaNo5nwB1kzBt0bqukFUflkhlEdTi/2bzEY0MzTo51FtX qw= X-Virus-Scanned: amavisd-new at yadro.com From: Dmitry Tihov To: CC: , , , , , , Subject: [RFC 1/5] docs/nvme: add new feature summary Date: Thu, 24 Nov 2022 18:58:17 +0300 Message-ID: <20221124155821.1501969-2-d.tihov@yadro.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221124155821.1501969-1-d.tihov@yadro.com> References: <20221124155821.1501969-1-d.tihov@yadro.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.178.113.54] X-ClientProxiedBy: T-EXCH-01.corp.yadro.com (172.17.10.101) To T-EXCH-09.corp.yadro.com (172.17.11.59) Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=89.207.88.252; envelope-from=d.tihov@yadro.com; helo=mta-01.yadro.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1669305604587100003 Content-Type: text/plain; charset="utf-8" Describe use of new protection info block-level passthrough nvme feature. Signed-off-by: Dmitry Tihov --- docs/system/devices/nvme.rst | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/docs/system/devices/nvme.rst b/docs/system/devices/nvme.rst index 30f841ef62..7375379810 100644 --- a/docs/system/devices/nvme.rst +++ b/docs/system/devices/nvme.rst @@ -240,6 +240,21 @@ The virtual namespace device supports DIF- and DIX-bas= ed protection information metadata. Otherwise, the protection information is transferred as the la= st eight bytes. =20 +Virtual namespace device can also be backed by integrity capable host block +device. This way protection information is passed through to/from the host= block +device from/to the guest and checked by the host block device itself inste= ad of QEMU. + +``pip=3DBOOL`` (default: ``off``) + Set to ``on`` to allow host block device protection information passthro= ugh. + +To use this feature nvme-ns backend drive must have Linux io_uring AIO bac= kend +and host page cache must be avoided. E.g. the following parameters should = be used: + +.. code-block:: console + + -drive file=3D/dev/nvme0n1,cache.direct=3Don,aio=3Dio_uring,format=3Dra= w,if=3Dnone,id=3Ddif_drive_0 + -device nvme-ns,logical_block_size=3D4096,physical_block_size=3D4096,dr= ive=3Ddif_drive_0,pip=3Don + Virtualization Enhancements and SR-IOV (Experimental Support) ------------------------------------------------------------- =20 --=20 2.38.1 From nobody Thu Apr 25 21:54:03 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=yadro.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1669305588774110.82327554316043; Thu, 24 Nov 2022 07:59:48 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oyEdJ-0002hg-Ax; Thu, 24 Nov 2022 10:59:25 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oyEd5-0002No-Ib; Thu, 24 Nov 2022 10:59:13 -0500 Received: from mta-02.yadro.com ([89.207.88.252] helo=mta-01.yadro.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oyEd1-0005Ij-PR; Thu, 24 Nov 2022 10:59:10 -0500 Received: from localhost (unknown [127.0.0.1]) by mta-01.yadro.com (Postfix) with ESMTP id 65B4C41207; Thu, 24 Nov 2022 15:59:04 +0000 (UTC) Received: from mta-01.yadro.com ([127.0.0.1]) by localhost (mta-01.yadro.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZoD-xndSWQ20; Thu, 24 Nov 2022 18:59:02 +0300 (MSK) Received: from T-EXCH-01.corp.yadro.com (T-EXCH-01.corp.yadro.com [172.17.10.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mta-01.yadro.com (Postfix) with ESMTPS id 2BAC440311; Thu, 24 Nov 2022 18:59:02 +0300 (MSK) Received: from T-EXCH-09.corp.yadro.com (172.17.11.59) by T-EXCH-01.corp.yadro.com (172.17.10.101) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.669.32; Thu, 24 Nov 2022 18:59:01 +0300 Received: from archlinux.yadro.com (10.178.113.54) by T-EXCH-09.corp.yadro.com (172.17.11.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.2.1118.9; Thu, 24 Nov 2022 18:59:00 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yadro.com; h= content-type:content-type:content-transfer-encoding:mime-version :references:in-reply-to:x-mailer:message-id:date:date:subject :subject:from:from:received:received:received:received; s= mta-01; t=1669305542; x=1671119943; bh=g2VWrqkDJiHejX+GoKzn/q2HX BXxif6DTEvFXZGExpA=; b=e7qx4vvVK/20YU9E9kIQTa45w06AlQIifKLgnWuE2 MuyZmSf/utLSPLSns9IAl9rEvvERZW3JYjBzkCjdJ6eAQcc/sLySnoPUWnpCHgZA pVh68QUFhoSOGUpu35328Tw1k3DFbJuCsPhGU2jQG/QPVqpVfe5DMsagM1b6B96S nM= X-Virus-Scanned: amavisd-new at yadro.com From: Dmitry Tihov To: CC: , , , , , , Subject: [RFC 2/5] block: add transfer of protection information Date: Thu, 24 Nov 2022 18:58:18 +0300 Message-ID: <20221124155821.1501969-3-d.tihov@yadro.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221124155821.1501969-1-d.tihov@yadro.com> References: <20221124155821.1501969-1-d.tihov@yadro.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.178.113.54] X-ClientProxiedBy: T-EXCH-01.corp.yadro.com (172.17.10.101) To T-EXCH-09.corp.yadro.com (172.17.11.59) Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=89.207.88.252; envelope-from=d.tihov@yadro.com; helo=mta-01.yadro.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1669305589590100001 Content-Type: text/plain; charset="utf-8" Under linux hosts, T10 protection information can be passed directly from userspace to integrity capable block devices using io_uring API. Discover integrity capable block devices and support submitting IO with integrity payload to such block devices if it is present in request. Signed-off-by: Dmitry Tihov --- block/file-posix.c | 130 +++++++++++++++++++++++++++++++++-- block/io_uring.c | 109 +++++++++++++++++++++++++++-- include/block/block-common.h | 2 + include/block/raw-aio.h | 3 +- include/qemu/iov.h | 6 ++ util/iov.c | 24 +++++++ 6 files changed, 262 insertions(+), 12 deletions(-) diff --git a/block/file-posix.c b/block/file-posix.c index b9647c5ffc..1eec7dd3cb 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -152,6 +152,10 @@ typedef struct BDRVRawState { int perm_change_flags; BDRVReopenState *reopen_state; =20 + /* DIF T10 Protection Information */ + uint8_t t10_type; + uint64_t protection_interval_bytes; + bool has_discard:1; bool has_write_zeroes:1; bool use_linux_aio:1; @@ -2094,8 +2098,9 @@ static int coroutine_fn raw_co_prw(BlockDriverState *= bs, uint64_t offset, #ifdef CONFIG_LINUX_IO_URING } else if (s->use_linux_io_uring) { LuringState *aio =3D aio_get_linux_io_uring(bdrv_get_aio_context(b= s)); + bool is_pi =3D (s->t10_type && qiov->dif.iov_len); assert(qiov->size =3D=3D bytes); - return luring_co_submit(bs, aio, s->fd, offset, qiov, type); + return luring_co_submit(bs, aio, s->fd, offset, qiov, type, is_pi); #endif #ifdef CONFIG_LINUX_AIO } else if (s->use_linux_aio) { @@ -2190,7 +2195,7 @@ static int coroutine_fn raw_co_flush_to_disk(BlockDri= verState *bs) #ifdef CONFIG_LINUX_IO_URING if (s->use_linux_io_uring) { LuringState *aio =3D aio_get_linux_io_uring(bdrv_get_aio_context(b= s)); - return luring_co_submit(bs, aio, s->fd, 0, NULL, QEMU_AIO_FLUSH); + return luring_co_submit(bs, aio, s->fd, 0, NULL, QEMU_AIO_FLUSH, f= alse); } #endif return raw_thread_pool_submit(bs, handle_aiocb_flush, &acb); @@ -3516,6 +3521,110 @@ static bool hdev_is_sg(BlockDriverState *bs) return false; } =20 +#if defined(CONFIG_LINUX_IO_URING) + +static int fill_pi_info(BlockDriverState *bs, Error **errp) +{ + BDRVRawState *s =3D bs->opaque; + int ret =3D 0, bytes; + uint64_t is_integrity_capable; + g_autofree char *sysfs_int_cap =3D NULL; + g_autofree char *sysfs_fmt =3D NULL; + g_autofree char *sysfs_bytes =3D NULL; + const char *str_int_cap; + const char *str_bytes; + int fd_fmt =3D -1, fd_bytes =3D -1, fd_int_cap =3D -1; + char buf[24] =3D {0}; + g_autofree char *dev_name =3D g_path_get_basename(bs->filename); + + str_int_cap =3D "/sys/class/block/%s/integrity/device_is_integrity_cap= able"; + sysfs_int_cap =3D g_strdup_printf(str_int_cap, dev_name); + sysfs_fmt =3D g_strdup_printf("/sys/class/block/%s/integrity/format", + dev_name); + str_bytes =3D "/sys/class/block/%s/integrity/protection_interval_bytes= "; + sysfs_bytes =3D g_strdup_printf(str_bytes, dev_name); + + if (!(bs->open_flags & BDRV_O_NOCACHE)) { + goto out; + } + + fd_int_cap =3D open(sysfs_int_cap, O_RDONLY); + if (fd_int_cap =3D=3D -1) { + error_setg_errno(errp, errno, "Can not open %s integrity capabilit= y" + " sysfs entry", dev_name); + ret =3D -errno; + goto out; + } + bytes =3D read(fd_int_cap, buf, sizeof(buf)); + if (bytes < 0) { + error_setg_errno(errp, errno, "Can not read %s integrity capabilit= y" + " sysfs entry", dev_name); + ret =3D -errno; + goto out; + } + is_integrity_capable =3D g_ascii_strtoull(buf, NULL, 10); + if (!is_integrity_capable) { + goto out; + } + memset(buf, 0, sizeof(buf)); + + fd_fmt =3D open(sysfs_fmt, O_RDONLY); + if (fd_fmt =3D=3D -1) { + error_setg_errno(errp, errno, "Can not open %s integrity format" + " sysfs entry", dev_name); + ret =3D -errno; + goto out; + } + bytes =3D read(fd_fmt, buf, sizeof(buf)); + if (bytes < 0) { + error_setg_errno(errp, errno, "Can not read %s integrity format" + " sysfs entry", dev_name); + ret =3D -errno; + goto out; + } + if (bytes > 0 && buf[bytes - 1] =3D=3D '\n') { + buf[bytes - 1] =3D 0; + } + if (strcmp(buf, "T10-DIF-TYPE1-CRC") =3D=3D 0) { + s->t10_type =3D 1; + } else if (strcmp(buf, "T10-DIF-TYPE3-CRC") =3D=3D 0) { + s->t10_type =3D 3; + } else { + s->t10_type =3D 0; + } + memset(buf, 0, sizeof(buf)); + + fd_bytes =3D open(sysfs_bytes, O_RDONLY); + if (fd_bytes =3D=3D -1) { + error_setg_errno(errp, errno, "Can not open %s protection interval" + " bytes sysfs entry", dev_name); + ret =3D -errno; + goto out; + } + if (read(fd_bytes, buf, sizeof(buf)) < 0) { + error_setg_errno(errp, errno, "Can not read %s protection interval" + " bytes sysfs entry", dev_name); + ret =3D -errno; + goto out; + } + s->protection_interval_bytes =3D g_ascii_strtoull(buf, NULL, 10); + +out: + if (fd_fmt !=3D -1) { + close(fd_fmt); + } + if (fd_bytes !=3D -1) { + close(fd_bytes); + } + if (fd_int_cap !=3D -1) { + close(fd_int_cap); + } + + return ret; +} + +#endif + static int hdev_open(BlockDriverState *bs, QDict *options, int flags, Error **errp) { @@ -3601,6 +3710,11 @@ hdev_open_Mac_error: /* Since this does ioctl the device must be already opened */ bs->sg =3D hdev_is_sg(bs); =20 +#if defined(CONFIG_LINUX_IO_URING) + if (s->use_linux_io_uring) { + ret =3D fill_pi_info(bs, errp); + } +#endif return ret; } =20 @@ -3668,6 +3782,14 @@ static coroutine_fn int hdev_co_pwrite_zeroes(BlockD= riverState *bs, return raw_do_pwrite_zeroes(bs, offset, bytes, flags, true); } =20 +static int hdev_get_info(BlockDriverState *bs, BlockDriverInfo *bdi) +{ + BDRVRawState *s =3D bs->opaque; + bdi->protection_interval =3D s->protection_interval_bytes; + bdi->protection_type =3D s->t10_type; + return 0; +} + static BlockDriver bdrv_host_device =3D { .format_name =3D "host_device", .protocol_name =3D "host_device", @@ -3698,8 +3820,8 @@ static BlockDriver bdrv_host_device =3D { .bdrv_attach_aio_context =3D raw_aio_attach_aio_context, =20 .bdrv_co_truncate =3D raw_co_truncate, - .bdrv_getlength =3D raw_getlength, - .bdrv_get_info =3D raw_get_info, + .bdrv_getlength =3D raw_getlength, + .bdrv_get_info =3D hdev_get_info, .bdrv_get_allocated_file_size =3D raw_get_allocated_file_size, .bdrv_get_specific_stats =3D hdev_get_specific_stats, diff --git a/block/io_uring.c b/block/io_uring.c index 973e15d876..ba9fec1145 100644 --- a/block/io_uring.c +++ b/block/io_uring.c @@ -21,6 +21,84 @@ /* io_uring ring size */ #define MAX_ENTRIES 128 =20 +#define IORING_OP_READV_PI (48) +#define IORING_OP_WRITEV_PI (49) + +#pragma pack(push, 1) + +struct __io_uring_sqe { + __u8 opcode; /* type of operation for this sqe */ + __u8 flags; /* IOSQE_ flags */ + __u16 ioprio; /* ioprio for the request */ + __s32 fd; /* file descriptor to do IO on */ + union { + __u64 off; /* offset into file */ + __u64 addr2; + }; + union { + __u64 addr; /* pointer to buffer or iovecs */ + __u64 splice_off_in; + }; + __u32 len; /* buffer size or number of iovecs */ + union { + __kernel_rwf_t rw_flags; + __u32 fsync_flags; + __u16 poll_events; /* compatibility */ + __u32 poll32_events; /* word-reversed for BE */ + __u32 sync_range_flags; + __u32 msg_flags; + __u32 timeout_flags; + __u32 accept_flags; + __u32 cancel_flags; + __u32 open_flags; + __u32 statx_flags; + __u32 fadvise_advice; + __u32 splice_flags; + __u32 rename_flags; + __u32 unlink_flags; + __u32 hardlink_flags; + }; + __u64 user_data; /* data to be passed back at completion time */ + /* pack this to avoid bogus arm OABI complaints */ + union { + /* index into fixed buffers, if used */ + __u16 buf_index; + /* for grouped buffer selection */ + __u16 buf_group; + } __attribute__((packed)); + /* personality to use, if used */ + __u16 personality; + union { + __s32 splice_fd_in; + __u32 file_index; + }; + __u64 pi_addr; + __u32 pi_len; + __u32 __pad2[1]; +}; + +#pragma pack(pop) + +static inline void __io_uring_prep_writev_pi(uint8_t op, + struct io_uring_sqe *sqe, int fd, const struct iovec *iovecs, + unsigned nr_vecs, const struct iovec *pi_iovec, unsigned nr_pi_vecs, + off_t offset) +{ + io_uring_prep_rw(op, sqe, fd, iovecs, nr_vecs, offset); + ((struct __io_uring_sqe *)sqe)->pi_addr =3D (__u64)pi_iovec; + ((struct __io_uring_sqe *)sqe)->pi_len =3D nr_pi_vecs; +} + +static inline void __io_uring_prep_readv_pi(uint8_t op, + struct io_uring_sqe *sqe, int fd, const struct iovec *iovecs, + unsigned nr_vecs, const struct iovec *pi_iovec, unsigned nr_pi_vecs, + off_t offset) +{ + io_uring_prep_rw(op, sqe, fd, iovecs, nr_vecs, offset); + ((struct __io_uring_sqe *)sqe)->pi_addr =3D (__u64)pi_iovec; + ((struct __io_uring_sqe *)sqe)->pi_len =3D nr_pi_vecs; +} + typedef struct LuringAIOCB { Coroutine *co; struct io_uring_sqe sqeq; @@ -330,24 +408,39 @@ void luring_io_unplug(BlockDriverState *bs, LuringSta= te *s) * @s: AIO state * @offset: offset for request * @type: type of request + * @is_pi: is protection information attached * * Fetches sqes from ring, adds to pending queue and preps them * */ static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s, - uint64_t offset, int type) + uint64_t offset, int type, bool is_pi) { int ret; struct io_uring_sqe *sqes =3D &luringcb->sqeq; =20 switch (type) { case QEMU_AIO_WRITE: - io_uring_prep_writev(sqes, fd, luringcb->qiov->iov, - luringcb->qiov->niov, offset); + if (is_pi) { + __io_uring_prep_writev_pi(IORING_OP_WRITEV_PI, sqes, fd, + luringcb->qiov->iov, + luringcb->qiov->niov, + &luringcb->qiov->dif, 1, offset); + } else { + io_uring_prep_writev(sqes, fd, luringcb->qiov->iov, + luringcb->qiov->niov, offset); + } break; case QEMU_AIO_READ: - io_uring_prep_readv(sqes, fd, luringcb->qiov->iov, - luringcb->qiov->niov, offset); + if (is_pi) { + __io_uring_prep_readv_pi(IORING_OP_READV_PI, sqes, fd, + luringcb->qiov->iov, + luringcb->qiov->niov, + &luringcb->qiov->dif, 1, offset); + } else { + io_uring_prep_readv(sqes, fd, luringcb->qiov->iov, + luringcb->qiov->niov, offset); + } break; case QEMU_AIO_FLUSH: io_uring_prep_fsync(sqes, fd, IORING_FSYNC_DATASYNC); @@ -374,7 +467,8 @@ static int luring_do_submit(int fd, LuringAIOCB *luring= cb, LuringState *s, } =20 int coroutine_fn luring_co_submit(BlockDriverState *bs, LuringState *s, in= t fd, - uint64_t offset, QEMUIOVector *qiov, int= type) + uint64_t offset, QEMUIOVector *qiov, int= type, + bool is_pi) { int ret; LuringAIOCB luringcb =3D { @@ -383,9 +477,10 @@ int coroutine_fn luring_co_submit(BlockDriverState *bs= , LuringState *s, int fd, .qiov =3D qiov, .is_read =3D (type =3D=3D QEMU_AIO_READ), }; + trace_luring_co_submit(bs, s, &luringcb, fd, offset, qiov ? qiov->size= : 0, type); - ret =3D luring_do_submit(fd, &luringcb, s, offset, type); + ret =3D luring_do_submit(fd, &luringcb, s, offset, type, is_pi); =20 if (ret < 0) { return ret; diff --git a/include/block/block-common.h b/include/block/block-common.h index 297704c1e9..1f283dbef8 100644 --- a/include/block/block-common.h +++ b/include/block/block-common.h @@ -59,6 +59,8 @@ typedef struct BlockDriverInfo { * True if this block driver only supports compressed writes */ bool needs_compressed_writes; + uint8_t protection_type; + uint32_t protection_interval; } BlockDriverInfo; =20 typedef struct BlockFragInfo { diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h index 21fc10c4c9..3f715b4bcc 100644 --- a/include/block/raw-aio.h +++ b/include/block/raw-aio.h @@ -65,7 +65,8 @@ typedef struct LuringState LuringState; LuringState *luring_init(Error **errp); void luring_cleanup(LuringState *s); int coroutine_fn luring_co_submit(BlockDriverState *bs, LuringState *s, in= t fd, - uint64_t offset, QEMUIOVector *qiov, int t= ype); + uint64_t offset, QEMUIOVector *qiov, int t= ype, + bool is_pi); void luring_detach_aio_context(LuringState *s, AioContext *old_context); void luring_attach_aio_context(LuringState *s, AioContext *new_context); void luring_io_plug(BlockDriverState *bs, LuringState *s); diff --git a/include/qemu/iov.h b/include/qemu/iov.h index 9330746680..58ae2d1f51 100644 --- a/include/qemu/iov.h +++ b/include/qemu/iov.h @@ -181,6 +181,9 @@ typedef struct QEMUIOVector { size_t size; }; }; + + /* T10 data integrity field */ + struct iovec dif; } QEMUIOVector; =20 QEMU_BUILD_BUG_ON(offsetof(QEMUIOVector, size) !=3D @@ -229,6 +232,9 @@ int qemu_iovec_init_extended( void *tail_buf, size_t tail_len); void qemu_iovec_init_slice(QEMUIOVector *qiov, QEMUIOVector *source, size_t offset, size_t len); +void qemu_iovec_init_pi(QEMUIOVector *qiov, int alloc_hint, + unsigned int lba_cnt); +void qemu_iovec_destroy_pi(QEMUIOVector *qiov); int qemu_iovec_subvec_niov(QEMUIOVector *qiov, size_t offset, size_t len); void qemu_iovec_add(QEMUIOVector *qiov, void *base, size_t len); void qemu_iovec_concat(QEMUIOVector *dst, diff --git a/util/iov.c b/util/iov.c index b4be580022..f0e51d5e66 100644 --- a/util/iov.c +++ b/util/iov.c @@ -20,6 +20,7 @@ #include "qemu/iov.h" #include "qemu/sockets.h" #include "qemu/cutils.h" +#include "qemu/memalign.h" =20 size_t iov_from_buf_full(const struct iovec *iov, unsigned int iov_cnt, size_t offset, const void *buf, size_t bytes) @@ -278,6 +279,8 @@ void qemu_iovec_init(QEMUIOVector *qiov, int alloc_hint) qiov->niov =3D 0; qiov->nalloc =3D alloc_hint; qiov->size =3D 0; + qiov->dif.iov_base =3D NULL; + qiov->dif.iov_len =3D 0; } =20 void qemu_iovec_init_external(QEMUIOVector *qiov, struct iovec *iov, int n= iov) @@ -292,6 +295,19 @@ void qemu_iovec_init_external(QEMUIOVector *qiov, stru= ct iovec *iov, int niov) qiov->size +=3D iov[i].iov_len; } =20 +void qemu_iovec_init_pi(QEMUIOVector *qiov, int alloc_hint, + unsigned int lba_cnt) +{ + void *alignd_mem =3D NULL; + qemu_iovec_init(qiov, alloc_hint); + + /* dif size is always 8 bytes */ + qiov->dif.iov_len =3D lba_cnt << 3; + + alignd_mem =3D qemu_memalign(qemu_real_host_page_size(), qiov->dif.iov= _len); + qiov->dif.iov_base =3D memset(alignd_mem, 0, qiov->dif.iov_len); +} + void qemu_iovec_add(QEMUIOVector *qiov, void *base, size_t len) { assert(qiov->nalloc !=3D -1); @@ -530,12 +546,20 @@ void qemu_iovec_destroy(QEMUIOVector *qiov) memset(qiov, 0, sizeof(*qiov)); } =20 +void qemu_iovec_destroy_pi(QEMUIOVector *qiov) +{ + g_free(qiov->dif.iov_base); + + qemu_iovec_destroy(qiov); +} + void qemu_iovec_reset(QEMUIOVector *qiov) { assert(qiov->nalloc !=3D -1); =20 qiov->niov =3D 0; qiov->size =3D 0; + qiov->dif.iov_len =3D 0; } =20 size_t qemu_iovec_to_buf(QEMUIOVector *qiov, size_t offset, --=20 2.38.1 From nobody Thu Apr 25 21:54:03 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=yadro.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 166930560772854.722664627682775; Thu, 24 Nov 2022 08:00:07 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oyEdH-0002f5-Au; Thu, 24 Nov 2022 10:59:23 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oyEd5-0002Ns-Qq; Thu, 24 Nov 2022 10:59:13 -0500 Received: from mta-02.yadro.com ([89.207.88.252] helo=mta-01.yadro.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oyEd3-0005Ip-Le; Thu, 24 Nov 2022 10:59:11 -0500 Received: from localhost (unknown [127.0.0.1]) by mta-01.yadro.com (Postfix) with ESMTP id 35D4140311; Thu, 24 Nov 2022 15:59:05 +0000 (UTC) Received: from mta-01.yadro.com ([127.0.0.1]) by localhost (mta-01.yadro.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xSVwTlhnNwLv; Thu, 24 Nov 2022 18:59:02 +0300 (MSK) Received: from T-EXCH-02.corp.yadro.com (T-EXCH-02.corp.yadro.com [172.17.10.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mta-01.yadro.com (Postfix) with ESMTPS id E51DE4014D; Thu, 24 Nov 2022 18:59:02 +0300 (MSK) Received: from T-EXCH-09.corp.yadro.com (172.17.11.59) by T-EXCH-02.corp.yadro.com (172.17.10.102) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.669.32; Thu, 24 Nov 2022 18:59:02 +0300 Received: from archlinux.yadro.com (10.178.113.54) by T-EXCH-09.corp.yadro.com (172.17.11.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.2.1118.9; Thu, 24 Nov 2022 18:59:01 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yadro.com; h= content-type:content-type:content-transfer-encoding:mime-version :references:in-reply-to:x-mailer:message-id:date:date:subject :subject:from:from:received:received:received:received; s= mta-01; t=1669305542; x=1671119943; bh=TyDsjRjeQub7YIZdTGQR9xxCB bJwRxPEqnZOuMpHouQ=; b=JP5GyAyP/aqpRpQMTudINdRjfsfDVLX0fseNp/gBh XGekNELD4Q0BS1qCGQCMyUAK15tZjsywOnA5qZJVyycXmulzsULht3QQcRbAFJnb WlOgzw6YX61ZsqZ6qouZG2k/hWBrhYusYoFX5gTIhzeIH1xQ0APcv6USLKEIuyxy ZE= X-Virus-Scanned: amavisd-new at yadro.com From: Dmitry Tihov To: CC: , , , , , , Subject: [RFC 3/5] hw/nvme: add protection information pass parameter Date: Thu, 24 Nov 2022 18:58:19 +0300 Message-ID: <20221124155821.1501969-4-d.tihov@yadro.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221124155821.1501969-1-d.tihov@yadro.com> References: <20221124155821.1501969-1-d.tihov@yadro.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.178.113.54] X-ClientProxiedBy: T-EXCH-01.corp.yadro.com (172.17.10.101) To T-EXCH-09.corp.yadro.com (172.17.11.59) Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=89.207.88.252; envelope-from=d.tihov@yadro.com; helo=mta-01.yadro.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1669305608644100003 Content-Type: text/plain; charset="utf-8" Allow namespace to enable pass-through of protection information between guest and integrity capable BlockBackend. Signed-off-by: Dmitry Tihov --- hw/nvme/ns.c | 59 +++++++++++++++++++++++++++++++++++++++++++++----- hw/nvme/nvme.h | 2 ++ 2 files changed, 55 insertions(+), 6 deletions(-) diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c index 62a1f97be0..da0cff71f8 100644 --- a/hw/nvme/ns.c +++ b/hw/nvme/ns.c @@ -35,7 +35,11 @@ void nvme_ns_init_format(NvmeNamespace *ns) ns->lbaf =3D id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)]; ns->lbasz =3D 1 << ns->lbaf.ds; =20 - nlbas =3D ns->size / (ns->lbasz + ns->lbaf.ms); + if (ns->pip) { + nlbas =3D ns->size / (ns->lbasz); + } else { + nlbas =3D ns->size / (ns->lbasz + ns->lbaf.ms); + } =20 id_ns->nsze =3D cpu_to_le64(nlbas); =20 @@ -60,17 +64,22 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp) static uint64_t ns_count; NvmeIdNs *id_ns =3D &ns->id_ns; NvmeIdNsNvm *id_ns_nvm =3D &ns->id_ns_nvm; + BlockDriverInfo bdi; uint8_t ds; uint16_t ms; - int i; + int i, ret; + + ns->pip =3D ns->params.pip; =20 ns->csi =3D NVME_CSI_NVM; ns->status =3D 0x0; =20 ns->id_ns.dlfeat =3D 0x1; =20 - /* support DULBE and I/O optimization fields */ - id_ns->nsfeat |=3D (0x4 | 0x10); + if (!ns->pip) { + /* support DULBE and I/O optimization fields */ + id_ns->nsfeat |=3D (0x4 | 0x10); + } =20 if (ns->params.shared) { id_ns->nmic |=3D NVME_NMIC_NS_SHARED; @@ -89,7 +98,11 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp) id_ns->eui64 =3D cpu_to_be64(ns->params.eui64); =20 ds =3D 31 - clz32(ns->blkconf.logical_block_size); - ms =3D ns->params.ms; + if (ns->pip) { + ms =3D 8; + } else { + ms =3D ns->params.ms; + } =20 id_ns->mc =3D NVME_ID_NS_MC_EXTENDED | NVME_ID_NS_MC_SEPARATE; =20 @@ -105,6 +118,14 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **err= p) =20 ns->pif =3D ns->params.pif; =20 + if (ns->pip) { + ret =3D bdrv_get_info(blk_bs(ns->blkconf.blk), &bdi); + if (ret >=3D 0) { + id_ns->dps =3D bdi.protection_type; + ns->pif =3D NVME_PI_GUARD_16; + } + } + static const NvmeLBAF lbaf[16] =3D { [0] =3D { .ds =3D 9 }, [1] =3D { .ds =3D 9, .ms =3D 8 }, @@ -380,13 +401,38 @@ static void nvme_zoned_ns_shutdown(NvmeNamespace *ns) static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp) { unsigned int pi_size; + BlockDriverInfo bdi; + int ret; =20 if (!ns->blkconf.blk) { error_setg(errp, "block backend not configured"); return -1; } =20 - if (ns->params.pi) { + if (ns->params.pip) { + if (ns->params.mset) { + error_setg(errp, "invalid mset parameter, metadata must be " + "stored in a separate buffer to use integrity passthrough"= ); + return -1; + } + ret =3D bdrv_get_info(blk_bs(ns->blkconf.blk), &bdi); + if (ret < 0) { + error_setg(errp, "could not determine host block device" + " integrity information"); + return -1; + } + if (!bdi.protection_type) { + error_setg(errp, "nvme-ns backend block device does not" + " support integrity passthrough"); + return -1; + } + if (bdi.protection_interval !=3D ns->blkconf.logical_block_size) { + error_setg(errp, "logical block size parameter (%u bytes) must= be" + " equal to protection information interval (%u bytes)", + ns->blkconf.logical_block_size, bdi.protection_interval); + return -1; + } + } else if (ns->params.pi) { if (ns->params.pi > NVME_ID_NS_DPS_TYPE_3) { error_setg(errp, "invalid 'pi' value"); return -1; @@ -623,6 +669,7 @@ static Property nvme_ns_props[] =3D { DEFINE_PROP_UINT8("pi", NvmeNamespace, params.pi, 0), DEFINE_PROP_UINT8("pil", NvmeNamespace, params.pil, 0), DEFINE_PROP_UINT8("pif", NvmeNamespace, params.pif, 0), + DEFINE_PROP_BOOL("pip", NvmeNamespace, params.pip, false), DEFINE_PROP_UINT16("mssrl", NvmeNamespace, params.mssrl, 128), DEFINE_PROP_UINT32("mcl", NvmeNamespace, params.mcl, 128), DEFINE_PROP_UINT8("msrc", NvmeNamespace, params.msrc, 127), diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 79f5c281c2..4876670d26 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -109,6 +109,7 @@ typedef struct NvmeNamespaceParams { uint8_t pi; uint8_t pil; uint8_t pif; + bool pip; =20 uint16_t mssrl; uint32_t mcl; @@ -143,6 +144,7 @@ typedef struct NvmeNamespace { uint16_t status; int attached; uint8_t pif; + bool pip; =20 struct { uint16_t zrwas; --=20 2.38.1 From nobody Thu Apr 25 21:54:03 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=yadro.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 166930563869794.83896159823144; Thu, 24 Nov 2022 08:00:38 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oyEdK-0002im-EA; Thu, 24 Nov 2022 10:59:26 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oyEd7-0002Sa-IE; Thu, 24 Nov 2022 10:59:13 -0500 Received: from mta-02.yadro.com ([89.207.88.252] helo=mta-01.yadro.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oyEd3-0005Iu-Mj; Thu, 24 Nov 2022 10:59:13 -0500 Received: from localhost (unknown [127.0.0.1]) by mta-01.yadro.com (Postfix) with ESMTP id 69BE341209; Thu, 24 Nov 2022 15:59:06 +0000 (UTC) Received: from mta-01.yadro.com ([127.0.0.1]) by localhost (mta-01.yadro.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id S8f3lHSOos6H; Thu, 24 Nov 2022 18:59:04 +0300 (MSK) Received: from T-EXCH-01.corp.yadro.com (T-EXCH-01.corp.yadro.com [172.17.10.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mta-01.yadro.com (Postfix) with ESMTPS id 20C4041203; Thu, 24 Nov 2022 18:59:04 +0300 (MSK) Received: from T-EXCH-09.corp.yadro.com (172.17.11.59) by T-EXCH-01.corp.yadro.com (172.17.10.101) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.669.32; Thu, 24 Nov 2022 18:59:03 +0300 Received: from archlinux.yadro.com (10.178.113.54) by T-EXCH-09.corp.yadro.com (172.17.11.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.2.1118.9; Thu, 24 Nov 2022 18:59:02 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yadro.com; h= content-type:content-type:content-transfer-encoding:mime-version :references:in-reply-to:x-mailer:message-id:date:date:subject :subject:from:from:received:received:received:received; s= mta-01; t=1669305544; x=1671119945; bh=PeK7GEWSt4ex34y1OeCRvDLCe Au0hEUvqmhyVLMM5mU=; b=oFaAXMnzbYdlNx4Ox9NLE9QuY/1iSET2cAlkJ32Cy LHRUxknlANMTm4o2Ca5tUozhLcz76xKoqRAupXk6se/0wfFH44QtAdfZxXliSsOT wT8RAQcYEAr2Kr4UY5GH4C+UkfdI+aBBcd1cTC18xIlRNWg89W/jILguG+QZQHRB E0= X-Virus-Scanned: amavisd-new at yadro.com From: Dmitry Tihov To: CC: , , , , , , Subject: [RFC 4/5] hw/nvme: implement pi pass read/write/wrz commands Date: Thu, 24 Nov 2022 18:58:20 +0300 Message-ID: <20221124155821.1501969-5-d.tihov@yadro.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221124155821.1501969-1-d.tihov@yadro.com> References: <20221124155821.1501969-1-d.tihov@yadro.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.178.113.54] X-ClientProxiedBy: T-EXCH-01.corp.yadro.com (172.17.10.101) To T-EXCH-09.corp.yadro.com (172.17.11.59) Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=89.207.88.252; envelope-from=d.tihov@yadro.com; helo=mta-01.yadro.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1669305640841100003 Content-Type: text/plain; charset="utf-8" This patch adds ability for read, write and write zeroes commands to submit single request with data and integrity directly to underlying device using block-level transfer of protection information. Block level supports only type1/type3 protection types and for the type1 protection type guard/reftag are always checked, while for the type3 protection type guardtag is always checked. This way NVME PRCHK field can not be used to disable checking of guard/reftag properly, so error from block level is caught and reported for cases of unset 02/00 bits in PRCHK and invalid guard/reftag. Also, because apptag is never checked by block devices, check it explicitly in case of set 01 bit in PRCHK. Signed-off-by: Dmitry Tihov --- hw/nvme/ctrl.c | 13 +- hw/nvme/dif.c | 303 +++++++++++++++++++++++++++++++++++++++++++ hw/nvme/dif.h | 18 +++ hw/nvme/trace-events | 4 + 4 files changed, 335 insertions(+), 3 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 87aeba0564..c646345bcc 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -2045,7 +2045,7 @@ static void nvme_rw_cb(void *opaque, int ret) goto out; } =20 - if (ns->lbaf.ms) { + if (ns->lbaf.ms && !ns->pip) { NvmeRwCmd *rw =3D (NvmeRwCmd *)&req->cmd; uint64_t slba =3D le64_to_cpu(rw->slba); uint32_t nlb =3D (uint32_t)le16_to_cpu(rw->nlb) + 1; @@ -3349,7 +3349,9 @@ static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *r= eq) } } =20 - if (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) { + if (ns->pip) { + return nvme_dif_pass_rw(n, req); + } else if (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) { return nvme_dif_rw(n, req); } =20 @@ -3379,6 +3381,7 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeReques= t *req, bool append, uint32_t nlb =3D (uint32_t)le16_to_cpu(rw->nlb) + 1; uint16_t ctrl =3D le16_to_cpu(rw->control); uint8_t prinfo =3D NVME_RW_PRINFO(ctrl); + bool pract =3D !!(prinfo & NVME_PRINFO_PRACT); uint64_t data_size =3D nvme_l2b(ns, nlb); uint64_t mapped_size =3D data_size; uint64_t data_offset; @@ -3483,7 +3486,11 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeReque= st *req, bool append, =20 data_offset =3D nvme_l2b(ns, slba); =20 - if (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) { + if (ns->pip) { + if (!wrz || pract) { + return nvme_dif_pass_rw(n, req); + } + } else if (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) { return nvme_dif_rw(n, req); } =20 diff --git a/hw/nvme/dif.c b/hw/nvme/dif.c index 63c44c86ab..0b562cf45a 100644 --- a/hw/nvme/dif.c +++ b/hw/nvme/dif.c @@ -11,6 +11,7 @@ #include "qemu/osdep.h" #include "qapi/error.h" #include "sysemu/block-backend.h" +#include "qemu/memalign.h" =20 #include "nvme.h" #include "dif.h" @@ -714,3 +715,305 @@ err: =20 return status; } + +void nvme_dif_pass_dump(NvmeNamespace *ns, uint8_t *mdata_buf, size_t mdat= a_len) +{ + size_t i; + uint8_t *end =3D mdata_buf + mdata_len; + for (i =3D 1; mdata_buf < end; ++i, mdata_buf +=3D ns->lbaf.ms) { + NvmeDifTuple *mdata =3D (NvmeDifTuple *) mdata_buf; + trace_pci_nvme_dif_dump_pass_pi(i, be16_to_cpu(mdata->g16.guard), + be16_to_cpu(mdata->g16.apptag), + be32_to_cpu(mdata->g16.reftag)); + } +} + +static void nvme_dif_pass_read_cb(void *opaque, int ret) +{ + NvmeRequest *req =3D opaque; + NvmeCtrl *n =3D nvme_ctrl(req); + NvmeDifPassContext *ctx =3D req->opaque; + NvmeNamespace *ns =3D req->ns; + NvmeRwCmd *rw =3D (NvmeRwCmd *)&req->cmd; + uint8_t prinfo =3D NVME_RW_PRINFO(le16_to_cpu(rw->control)); + bool pract =3D !!(prinfo & NVME_PRINFO_PRACT); + uint16_t apptag =3D le16_to_cpu(rw->apptag); + uint16_t appmask =3D le16_to_cpu(rw->appmask); + uint32_t reftag =3D le32_to_cpu(rw->reftag); + uint64_t slba =3D le64_to_cpu(rw->slba); + uint16_t status; + + trace_pci_nvme_dif_pass_read_cb(nvme_cid(req), ctx->iov.dif.iov_len >>= 3); + if (trace_event_get_state_backends(TRACE_PCI_NVME_DIF_DUMP_PASS_PI)) { + nvme_dif_pass_dump(ns, ctx->iov.dif.iov_base, ctx->iov.dif.iov_len= ); + } + + /* block layer returns EILSEQ in case of integrity check failure */ + /* determine exact pi error and return status accordingly */ + if (unlikely(ret =3D=3D -EILSEQ)) { + req->status =3D nvme_dif_pass_check(ns, ctx->data.bounce, ctx->dat= a.len, + ctx->iov.dif.iov_base, prinfo, slba, reftag); + if (req->status) { + /* zero out ret to allow req->status passthrough */ + ret =3D 0; + } + goto out; + } + + if (ret) { + goto out; + } + + status =3D nvme_dif_pass_apptag_check(ns, ctx->iov.dif.iov_base, + ctx->iov.dif.iov_len, prinfo, apptag, appmask); + if (status) { + req->status =3D status; + goto out; + } + + status =3D nvme_bounce_data(n, ctx->data.bounce, ctx->data.len, + NVME_TX_DIRECTION_FROM_DEVICE, req); + if (status) { + req->status =3D status; + goto out; + } + + if (!pract) { + status =3D nvme_bounce_mdata(n, ctx->iov.dif.iov_base, + ctx->iov.dif.iov_len, NVME_TX_DIRECTION_FROM_DEVICE, = req); + if (status) { + req->status =3D status; + } + } + +out: + qemu_iovec_destroy_pi(&ctx->iov); + g_free(ctx->data.bounce); + g_free(ctx); + + nvme_rw_complete_cb(req, ret); +} + +static void nvme_diff_pass_write_cb(void *opaque, int ret) +{ + NvmeRequest *req =3D opaque; + NvmeDifPassContext *ctx =3D req->opaque; + NvmeNamespace *ns =3D req->ns; + NvmeRwCmd *rw =3D (NvmeRwCmd *)&req->cmd; + uint8_t prinfo =3D NVME_RW_PRINFO(le16_to_cpu(rw->control)); + uint32_t reftag =3D le32_to_cpu(rw->reftag); + uint64_t slba =3D le64_to_cpu(rw->slba); + + trace_pci_nvme_dif_pass_write_cb(nvme_cid(req), ctx->iov.dif.iov_len >= > 3); + if (trace_event_get_state_backends(TRACE_PCI_NVME_DIF_DUMP_PASS_PI)) { + nvme_dif_pass_dump(ns, ctx->iov.dif.iov_base, ctx->iov.dif.iov_len= ); + } + + /* block layer returns EILSEQ in case of integrity check failure */ + /* determine exact pi error and return status accordingly */ + if (unlikely(ret =3D=3D -EILSEQ)) { + req->status =3D nvme_dif_pass_check(ns, ctx->data.bounce, ctx->dat= a.len, + ctx->iov.dif.iov_base, prinfo, slba, reftag); + if (req->status) { + /* zero out ret to allow req->status passthrough */ + ret =3D 0; + } + } + + qemu_iovec_destroy_pi(&ctx->iov); + g_free(ctx->data.bounce); + g_free(ctx); + + nvme_rw_complete_cb(req, ret); +} + +uint16_t nvme_dif_pass_rw(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeRwCmd *rw =3D (NvmeRwCmd *)&req->cmd; + uint8_t prinfo =3D NVME_RW_PRINFO(le16_to_cpu(rw->control)); + uint16_t apptag =3D le16_to_cpu(rw->apptag); + uint16_t appmask =3D le16_to_cpu(rw->appmask); + uint64_t reftag =3D le32_to_cpu(rw->reftag); + bool pract =3D !!(prinfo & NVME_PRINFO_PRACT); + NvmeNamespace *ns =3D req->ns; + BlockBackend *blk =3D ns->blkconf.blk; + bool wrz =3D rw->opcode =3D=3D NVME_CMD_WRITE_ZEROES; + uint32_t nlb =3D le16_to_cpu(rw->nlb) + 1; + uint64_t slba =3D le64_to_cpu(rw->slba); + size_t len =3D nvme_l2b(ns, nlb); + int64_t offset =3D nvme_l2b(ns, slba); + NvmeDifPassContext *ctx; + uint16_t status; + + trace_pci_nvme_dif_pass_rw(nvme_cid(req), + NVME_ID_NS_DPS_TYPE(ns->id_ns.dps), prinfo, apptag, appmask, refta= g); + + ctx =3D g_new0(NvmeDifPassContext, 1); + qemu_iovec_init_pi(&ctx->iov, 1, nlb); + ctx->data.len =3D len; + ctx->data.bounce =3D qemu_memalign(qemu_real_host_page_size(), + ctx->data.len); + qemu_iovec_add(&ctx->iov, ctx->data.bounce, ctx->data.len); + + req->opaque =3D ctx; + + status =3D nvme_check_prinfo(ns, prinfo, slba, reftag); + if (status) { + goto err; + } + status =3D nvme_map_dptr(n, &req->sg, len, &req->cmd); + if (status) { + goto err; + } + + if (req->cmd.opcode =3D=3D NVME_CMD_READ) { + block_acct_start(blk_get_stats(blk), &req->acct, ctx->iov.size, + BLOCK_ACCT_READ); + + req->aiocb =3D blk_aio_preadv(ns->blkconf.blk, offset, &ctx->iov, = 0, + nvme_dif_pass_read_cb, req); + + return NVME_NO_COMPLETE; + } + + if (wrz) { + + assert(pract); + + if (prinfo & NVME_PRINFO_PRCHK_MASK) { + status =3D NVME_INVALID_PROT_INFO | NVME_DNR; + goto err; + } + uint8_t *mbuf, *end; + + mbuf =3D ctx->iov.dif.iov_base; + end =3D mbuf + ctx->iov.dif.iov_len; + + for (; mbuf < end; mbuf +=3D ns->lbaf.ms) { + NvmeDifTuple *dif =3D (NvmeDifTuple *)(mbuf); + + dif->g16.apptag =3D cpu_to_be16(apptag); + dif->g16.reftag =3D cpu_to_be32(reftag); + + switch (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) { + case NVME_ID_NS_DPS_TYPE_1: + case NVME_ID_NS_DPS_TYPE_2: + reftag++; + } + } + memset(ctx->data.bounce, 0, ctx->data.len); + + req->aiocb =3D blk_aio_pwritev(ns->blkconf.blk, offset, &ctx->iov,= 0, + nvme_diff_pass_write_cb, req); + + } else { + + status =3D nvme_bounce_data(n, ctx->data.bounce, ctx->data.len, + NVME_TX_DIRECTION_TO_DEVICE, req); + if (status) { + goto err; + } + if (pract) { + nvme_dif_pract_generate_dif(ns, ctx->data.bounce, + ctx->data.len, ctx->iov.dif.iov_ba= se, + ctx->iov.dif.iov_len, apptag, &ref= tag); + } else { + status =3D nvme_bounce_mdata(n, ctx->iov.dif.iov_base, + ctx->iov.dif.iov_len, NVME_TX_DIRECTION_TO_DEVICE, + req); + if (status) { + goto err; + } + status =3D nvme_dif_pass_apptag_check(ns, ctx->iov.dif.iov_bas= e, + ctx->iov.dif.iov_len, prinfo, apptag, appmask); + if (status) { + goto err; + } + } + + block_acct_start(blk_get_stats(blk), &req->acct, ctx->iov.size, + BLOCK_ACCT_WRITE); + + req->aiocb =3D blk_aio_pwritev(ns->blkconf.blk, offset, &ctx->iov,= 0, + nvme_diff_pass_write_cb, req); + + } + + return NVME_NO_COMPLETE; + +err: + qemu_iovec_destroy_pi(&ctx->iov); + g_free(ctx->data.bounce); + g_free(ctx); + + return status; +} + +uint16_t nvme_dif_pass_check(NvmeNamespace *ns, uint8_t *buf, size_t len, + uint8_t *mbuf, uint8_t prinfo, uint64_t slba, + uint32_t reftag) +{ + Error *local_err =3D NULL; + uint16_t status; + + status =3D nvme_check_prinfo(ns, prinfo, slba, reftag); + if (status) { + return status; + } + + uint8_t *end =3D buf + len; + + for (uint8_t *bufp =3D buf, *mbufp =3D mbuf; bufp < end; bufp +=3D ns-= >lbasz, + mbufp +=3D ns->lbaf.ms) { + NvmeDifTuple *dif =3D (NvmeDifTuple *)mbufp; + + if (be16_to_cpu(dif->g16.guard) !=3D crc16_t10dif(0x0, bufp, ns->l= basz)) { + if (prinfo & NVME_PRINFO_PRCHK_GUARD) { + return NVME_E2E_GUARD_ERROR; + } else { + error_setg(&local_err, "Nvme namespace %u, backed by %s" + " drive, can not pass custom guard tag", + nvme_nsid(ns), blk_name(ns->blkconf.blk)); + error_report_err(local_err); + return NVME_INTERNAL_DEV_ERROR; + } + } + + if (be32_to_cpu(dif->g16.reftag) !=3D reftag) { + if (prinfo & NVME_PRINFO_PRCHK_REF) { + return NVME_E2E_REF_ERROR; + } else if (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps) !=3D + NVME_ID_NS_DPS_TYPE_3) { + error_setg(&local_err, "Nvme namespace %u, backed by %s" + " drive can not pass custom ref tag", + nvme_nsid(ns), blk_name(ns->blkconf.blk)); + error_report_err(local_err); + return NVME_INTERNAL_DEV_ERROR; + } + } + + if (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps) !=3D NVME_ID_NS_DPS_TYPE_3)= { + reftag++; + } + } + + return NVME_SUCCESS; +} + +uint16_t nvme_dif_pass_apptag_check(NvmeNamespace *ns, uint8_t *mbuf, + size_t mlen, uint8_t prinfo, + uint16_t apptag, uint16_t appmask) +{ + if (prinfo & NVME_PRINFO_PRCHK_APP) { + uint8_t *end =3D mbuf + mlen; + for (uint8_t *mbufp =3D mbuf; mbufp < end; mbufp +=3D ns->lbaf.ms)= { + NvmeDifTuple *dif =3D (NvmeDifTuple *)mbufp; + if ((be16_to_cpu(dif->g16.apptag) & appmask) !=3D + (apptag & appmask)) { + return NVME_E2E_APP_ERROR; + } + } + } + + return NVME_SUCCESS; +} diff --git a/hw/nvme/dif.h b/hw/nvme/dif.h index f12e312250..08e3630461 100644 --- a/hw/nvme/dif.h +++ b/hw/nvme/dif.h @@ -188,4 +188,22 @@ uint16_t nvme_dif_check(NvmeNamespace *ns, uint8_t *bu= f, size_t len, uint16_t appmask, uint64_t *reftag); uint16_t nvme_dif_rw(NvmeCtrl *n, NvmeRequest *req); =20 +typedef struct NvmeDifPassContext { + struct { + uint8_t *bounce; + size_t len; + } data; + QEMUIOVector iov; +} NvmeDifPassContext; + +uint16_t nvme_dif_pass_rw(NvmeCtrl *n, NvmeRequest *req); +void nvme_dif_pass_dump(NvmeNamespace *ns, uint8_t *mdata_buf, + size_t mdata_len); +uint16_t nvme_dif_pass_check(NvmeNamespace *ns, uint8_t *buf, size_t len, + uint8_t *mbuf, uint8_t prinfo, uint64_t slba, + uint32_t reftag); +uint16_t nvme_dif_pass_apptag_check(NvmeNamespace *ns, uint8_t *mbuf, + size_t mlen, uint8_t prinfo, + uint16_t apptag, uint16_t appmask); + #endif /* HW_NVME_DIF_H */ diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events index fccb79f489..259fa8ffa2 100644 --- a/hw/nvme/trace-events +++ b/hw/nvme/trace-events @@ -17,6 +17,10 @@ pci_nvme_write(uint16_t cid, const char *verb, uint32_t = nsid, uint32_t nlb, uint pci_nvme_rw_cb(uint16_t cid, const char *blkname) "cid %"PRIu16" blk '%s'" pci_nvme_misc_cb(uint16_t cid) "cid %"PRIu16"" pci_nvme_dif_rw(uint8_t pract, uint8_t prinfo) "pract 0x%"PRIx8" prinfo 0x= %"PRIx8"" +pci_nvme_dif_pass_rw(uint16_t cid, uint8_t type, uint8_t prinfo, uint16_t = apptag, uint16_t appmask, uint32_t reftag) "cid %"PRIu16" type %"PRIu8" pri= nfo 0x%"PRIx8" apptag 0x%"PRIx16" appmask 0x%"PRIx16" reftag 0x%"PRIx32"" +pci_nvme_dif_pass_read_cb(uint16_t cid, size_t count) "cid %"PRIu16" numbe= r of DIF elements %zu" +pci_nvme_dif_pass_write_cb(uint16_t cid, size_t count) "cid %"PRIu16" numb= er of DIF elements %zu" +pci_nvme_dif_dump_pass_pi(size_t dif_num, uint16_t guard, uint16_t apptag,= uint32_t reftag) "DIF element %zu guard tag 0x%"PRIx16" apptag 0x%"PRIx16"= reftag 0x%"PRIx32"" pci_nvme_dif_rw_cb(uint16_t cid, const char *blkname) "cid %"PRIu16" blk '= %s'" pci_nvme_dif_rw_mdata_in_cb(uint16_t cid, const char *blkname) "cid %"PRIu= 16" blk '%s'" pci_nvme_dif_rw_mdata_out_cb(uint16_t cid, const char *blkname) "cid %"PRI= u16" blk '%s'" --=20 2.38.1 From nobody Thu Apr 25 21:54:03 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=yadro.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1669305595577240.73023651185213; Thu, 24 Nov 2022 07:59:55 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oyEdL-0002jy-HC; Thu, 24 Nov 2022 10:59:27 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oyEd9-0002VR-3W; Thu, 24 Nov 2022 10:59:15 -0500 Received: from mta-02.yadro.com ([89.207.88.252] helo=mta-01.yadro.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oyEd5-0005JT-Kd; Thu, 24 Nov 2022 10:59:14 -0500 Received: from localhost (unknown [127.0.0.1]) by mta-01.yadro.com (Postfix) with ESMTP id 607A641203; Thu, 24 Nov 2022 15:59:07 +0000 (UTC) Received: from mta-01.yadro.com ([127.0.0.1]) by localhost (mta-01.yadro.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qe5NDYxprIEg; Thu, 24 Nov 2022 18:59:05 +0300 (MSK) Received: from T-EXCH-02.corp.yadro.com (T-EXCH-02.corp.yadro.com [172.17.10.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mta-01.yadro.com (Postfix) with ESMTPS id 4DBAB4014D; Thu, 24 Nov 2022 18:59:05 +0300 (MSK) Received: from T-EXCH-09.corp.yadro.com (172.17.11.59) by T-EXCH-02.corp.yadro.com (172.17.10.102) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.669.32; Thu, 24 Nov 2022 18:59:05 +0300 Received: from archlinux.yadro.com (10.178.113.54) by T-EXCH-09.corp.yadro.com (172.17.11.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.2.1118.9; Thu, 24 Nov 2022 18:59:03 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yadro.com; h= content-type:content-type:content-transfer-encoding:mime-version :references:in-reply-to:x-mailer:message-id:date:date:subject :subject:from:from:received:received:received:received; s= mta-01; t=1669305545; x=1671119946; bh=pXAQF+K2VPh3wKQFgPdhszwcM LBaSQYAevMWzwia36Y=; b=aAEbkHNebADmjsiB9QoGIErlS5MshM9QHm69LRk0A NblfSPld/QhEQJVwPavghaWtV0xoAwV860l1jwZUE2M+0LEOpv/r5pUSSKRXGsF4 Y+EocudMq9lZIAcNe528AElAfNwCW74zNFjmW7SRQnDQcaQodBQb6xn+FQTxB3CD Es= X-Virus-Scanned: amavisd-new at yadro.com From: Dmitry Tihov To: CC: , , , , , , Subject: [RFC 5/5] hw/nvme: extend pi pass capable commands Date: Thu, 24 Nov 2022 18:58:21 +0300 Message-ID: <20221124155821.1501969-6-d.tihov@yadro.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221124155821.1501969-1-d.tihov@yadro.com> References: <20221124155821.1501969-1-d.tihov@yadro.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.178.113.54] X-ClientProxiedBy: T-EXCH-01.corp.yadro.com (172.17.10.101) To T-EXCH-09.corp.yadro.com (172.17.11.59) Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=89.207.88.252; envelope-from=d.tihov@yadro.com; helo=mta-01.yadro.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1669305597600100003 Content-Type: text/plain; charset="utf-8" Add protection information block level passthrough support to compare, dataset management, verify and copy nvme commands. Signed-off-by: Dmitry Tihov --- hw/nvme/ctrl.c | 348 +++++++++++++++++++++++++++++++++++++++---- hw/nvme/trace-events | 2 + 2 files changed, 325 insertions(+), 25 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index c646345bcc..950d773d59 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -197,6 +197,7 @@ #include "hw/pci/msix.h" #include "hw/pci/pcie_sriov.h" #include "migration/vmstate.h" +#include "qemu/memalign.h" =20 #include "nvme.h" #include "dif.h" @@ -2168,6 +2169,50 @@ out: nvme_verify_cb(ctx, ret); } =20 +static void nvme_dif_pass_verify_cb(void *opaque, int ret) +{ + NvmeBounceContext *ctx =3D opaque; + NvmeRequest *req =3D ctx->req; + NvmeNamespace *ns =3D req->ns; + NvmeRwCmd *rw =3D (NvmeRwCmd *)&req->cmd; + uint64_t slba =3D le64_to_cpu(rw->slba); + uint8_t prinfo =3D NVME_RW_PRINFO(le16_to_cpu(rw->control)); + uint16_t apptag =3D le16_to_cpu(rw->apptag); + uint16_t appmask =3D le16_to_cpu(rw->appmask); + uint32_t reftag =3D le32_to_cpu(rw->reftag); + + trace_pci_nvme_dif_pass_verify_cb(nvme_cid(req)); + if (trace_event_get_state_backends(TRACE_PCI_NVME_DIF_DUMP_PASS_PI)) { + nvme_dif_pass_dump(ns, ctx->data.iov.dif.iov_base, + ctx->data.iov.dif.iov_len); + } + + if (unlikely(ret =3D=3D -EILSEQ)) { + req->status =3D nvme_dif_pass_check(ns, ctx->data.bounce, + ctx->data.iov.size, ctx->data.iov.dif.iov_base, + prinfo, slba, reftag); + if (req->status) { + /* zero out ret to allow req->status passthrough */ + ret =3D 0; + } + goto out; + } + + if (ret) { + goto out; + } + + req->status =3D nvme_dif_pass_apptag_check(ns, ctx->data.iov.dif.iov_b= ase, + ctx->data.iov.dif.iov_len, prinfo, apptag, appmask); + +out: + qemu_iovec_destroy_pi(&ctx->data.iov); + g_free(ctx->data.bounce); + g_free(ctx); + + nvme_rw_complete_cb(req, ret); +} + struct nvme_compare_ctx { struct { QEMUIOVector iov; @@ -2331,6 +2376,83 @@ out: nvme_enqueue_req_completion(nvme_cq(req), req); } =20 +static void nvme_dif_pass_compare_cb(void *opaque, int ret) +{ + NvmeRequest *req =3D opaque; + NvmeCtrl *n =3D nvme_ctrl(req); + NvmeNamespace *ns =3D req->ns; + NvmeRwCmd *rw =3D (NvmeRwCmd *)&req->cmd; + uint64_t slba =3D le64_to_cpu(rw->slba); + uint32_t nlb =3D le16_to_cpu(rw->nlb) + 1; + size_t mlen =3D nvme_m2b(ns, nlb); + uint8_t prinfo =3D NVME_RW_PRINFO(le16_to_cpu(rw->control)); + uint16_t apptag =3D le16_to_cpu(rw->apptag); + uint16_t appmask =3D le16_to_cpu(rw->appmask); + uint32_t reftag =3D le32_to_cpu(rw->reftag); + struct nvme_compare_ctx *ctx =3D req->opaque; + g_autofree uint8_t *buf =3D NULL; + uint16_t status; + + trace_pci_nvme_dif_pass_compare_cb(nvme_cid(req)); + if (trace_event_get_state_backends(TRACE_PCI_NVME_DIF_DUMP_PASS_PI)) { + nvme_dif_pass_dump(ns, ctx->data.iov.dif.iov_base, + ctx->data.iov.dif.iov_len); + } + + if (unlikely(ret =3D=3D -EILSEQ)) { + status =3D nvme_dif_pass_check(ns, ctx->data.bounce, ctx->data.iov= .size, + ctx->data.iov.dif.iov_base, prinfo, s= lba, + reftag); + if (status) { + /* zero out ret to allow req->status passthrough */ + ret =3D 0; + req->status =3D status; + } + goto out; + } + + if (ret) { + goto out; + } + + status =3D nvme_dif_pass_apptag_check(ns, ctx->data.iov.dif.iov_base, + ctx->data.iov.dif.iov_len, prinfo, apptag, appmask); + if (status) { + req->status =3D status; + goto out; + } + + buf =3D g_malloc(ctx->data.iov.size); + status =3D nvme_bounce_data(n, buf, ctx->data.iov.size, + NVME_TX_DIRECTION_TO_DEVICE, req); + if (status) { + req->status =3D status; + goto out; + } + if (memcmp(buf, ctx->data.bounce, ctx->data.iov.size)) { + req->status =3D NVME_CMP_FAILURE; + goto out; + } + + ctx->mdata.bounce =3D g_malloc(mlen); + status =3D nvme_bounce_mdata(n, ctx->mdata.bounce, mlen, + NVME_TX_DIRECTION_TO_DEVICE, req); + if (status) { + req->status =3D status; + goto out; + } + if (memcmp(ctx->mdata.bounce, ctx->data.iov.dif.iov_base, mlen)) { + req->status =3D NVME_CMP_FAILURE; + } + +out: + qemu_iovec_destroy_pi(&ctx->data.iov); + g_free(ctx->data.bounce); + g_free(ctx); + + nvme_rw_complete_cb(req, ret); +} + typedef struct NvmeDSMAIOCB { BlockAIOCB common; BlockAIOCB *aiocb; @@ -2395,7 +2517,7 @@ static void nvme_dsm_md_cb(void *opaque, int ret) goto done; } =20 - if (!ns->lbaf.ms) { + if (!ns->lbaf.ms || ns->pip) { nvme_dsm_cb(iocb, 0); return; } @@ -2556,19 +2678,35 @@ static uint16_t nvme_verify(NvmeCtrl *n, NvmeReques= t *req) } } =20 - ctx =3D g_new0(NvmeBounceContext, 1); - ctx->req =3D req; + if (ns->pip) { + ctx =3D g_new0(NvmeBounceContext, 1); + ctx->req =3D req; =20 - ctx->data.bounce =3D g_malloc(len); + ctx->data.bounce =3D qemu_memalign(qemu_real_host_page_size(), len= ); =20 - qemu_iovec_init(&ctx->data.iov, 1); - qemu_iovec_add(&ctx->data.iov, ctx->data.bounce, len); + qemu_iovec_init_pi(&ctx->data.iov, 1, nlb); + qemu_iovec_add(&ctx->data.iov, ctx->data.bounce, len); =20 - block_acct_start(blk_get_stats(blk), &req->acct, ctx->data.iov.size, - BLOCK_ACCT_READ); + block_acct_start(blk_get_stats(blk), &req->acct, ctx->data.iov.siz= e, + BLOCK_ACCT_READ); + + req->aiocb =3D blk_aio_preadv(ns->blkconf.blk, offset, &ctx->data.= iov, 0, + nvme_dif_pass_verify_cb, ctx); + } else { + ctx =3D g_new0(NvmeBounceContext, 1); + ctx->req =3D req; + + ctx->data.bounce =3D g_malloc(len); + + qemu_iovec_init(&ctx->data.iov, 1); + qemu_iovec_add(&ctx->data.iov, ctx->data.bounce, len); + + block_acct_start(blk_get_stats(blk), &req->acct, ctx->data.iov.siz= e, + BLOCK_ACCT_READ); =20 - req->aiocb =3D blk_aio_preadv(ns->blkconf.blk, offset, &ctx->data.iov,= 0, - nvme_verify_mdata_in_cb, ctx); + req->aiocb =3D blk_aio_preadv(ns->blkconf.blk, offset, &ctx->data.= iov, 0, + nvme_verify_mdata_in_cb, ctx); + } return NVME_NO_COMPLETE; } =20 @@ -2625,7 +2763,11 @@ static void nvme_copy_bh(void *opaque) req->cqe.result =3D cpu_to_le32(iocb->idx); } =20 - qemu_iovec_destroy(&iocb->iov); + if (ns->pip) { + qemu_iovec_destroy_pi(&iocb->iov); + } else { + qemu_iovec_destroy(&iocb->iov); + } g_free(iocb->bounce); =20 qemu_bh_delete(iocb->bh); @@ -2737,10 +2879,29 @@ static void nvme_copy_out_completed_cb(void *opaque= , int ret) NvmeRequest *req =3D iocb->req; NvmeNamespace *ns =3D req->ns; uint32_t nlb; + uint16_t status; =20 nvme_copy_source_range_parse(iocb->ranges, iocb->idx, iocb->format, NU= LL, &nlb, NULL, NULL, NULL); =20 + if (ns->pip) { + if (iocb->iov.dif.iov_len) { + NvmeCopyCmd *copy =3D (NvmeCopyCmd *)&req->cmd; + uint64_t slba =3D le64_to_cpu(copy->sdlba); + uint16_t prinfo =3D ((copy->control[2] >> 2) & 0xf); + size_t len =3D nvme_l2b(ns, nlb); + if (unlikely(ret =3D=3D -EILSEQ)) { + status =3D nvme_dif_pass_check(ns, iocb->bounce, len, + iocb->iov.dif.iov_base, prinfo, slba, + iocb->reftag); + if (status) { + goto invalid; + } + } + } + + iocb->reftag +=3D nlb; + } if (ret < 0) { iocb->ret =3D ret; goto out; @@ -2754,8 +2915,17 @@ static void nvme_copy_out_completed_cb(void *opaque,= int ret) =20 iocb->idx++; iocb->slba +=3D nlb; + out: nvme_copy_cb(iocb, iocb->ret); + return; + +invalid: + req->status =3D status; + iocb->aiocb =3D NULL; + if (iocb->bh) { + qemu_bh_schedule(iocb->bh); + } } =20 static void nvme_copy_out_cb(void *opaque, int ret) @@ -2900,6 +3070,99 @@ out: nvme_copy_cb(iocb, ret); } =20 +static void nvme_dif_pass_copy_cb(void *opaque, int ret) +{ + NvmeCopyAIOCB *iocb =3D opaque; + NvmeRequest *req =3D iocb->req; + NvmeNamespace *ns =3D req->ns; + NvmeCopyCmd *copy =3D (NvmeCopyCmd *)&req->cmd; + uint16_t prinfor =3D ((copy->control[0] >> 4) & 0xf); + uint16_t prinfow =3D ((copy->control[2] >> 2) & 0xf); + uint32_t nlb; + size_t len; + uint16_t status; + uint64_t slba; + uint16_t apptag; + uint16_t appmask; + uint64_t reftag; + + nvme_copy_source_range_parse(iocb->ranges, iocb->idx, iocb->format, &s= lba, + &nlb, &apptag, &appmask, &reftag); + len =3D nvme_l2b(ns, nlb); + + if (unlikely(ret =3D=3D -EILSEQ)) { + status =3D nvme_dif_pass_check(ns, iocb->bounce, len, + iocb->iov.dif.iov_base, prinfor, slba, + reftag); + if (status) { + goto invalid; + } + } + + if (ret < 0) { + iocb->ret =3D ret; + goto out; + } else if (iocb->ret < 0) { + goto out; + } + + status =3D nvme_dif_pass_apptag_check(ns, iocb->iov.dif.iov_base, + nvme_m2b(ns, nlb), prinfor, apptag, + appmask); + if (status) { + goto invalid; + } + + status =3D nvme_check_prinfo(ns, prinfow, iocb->slba, iocb->reftag); + if (status) { + goto invalid; + } + status =3D nvme_check_bounds(ns, iocb->slba, nlb); + if (status) { + goto invalid; + } + + if (ns->params.zoned) { + status =3D nvme_check_zone_write(ns, iocb->zone, iocb->slba, nlb); + if (status) { + goto invalid; + } + + iocb->zone->w_ptr +=3D nlb; + } + + if (prinfow & NVME_PRINFO_PRACT) { + qemu_iovec_reset(&iocb->iov); + qemu_iovec_add(&iocb->iov, iocb->bounce, len); + } else { + appmask =3D le16_to_cpu(copy->appmask); + apptag =3D le16_to_cpu(copy->apptag); + status =3D nvme_dif_pass_apptag_check(ns, iocb->iov.dif.iov_base, + nvme_m2b(ns, nlb), prinfow, ap= ptag, + appmask); + if (status) { + goto invalid; + } + } + iocb->aiocb =3D blk_aio_pwritev(ns->blkconf.blk, nvme_l2b(ns, iocb->sl= ba), + &iocb->iov, 0, nvme_copy_out_completed_c= b, + iocb); + + return; + +invalid: + req->status =3D status; + iocb->aiocb =3D NULL; + if (iocb->bh) { + qemu_bh_schedule(iocb->bh); + } + + return; + +out: + nvme_copy_cb(iocb, ret); +} + static void nvme_copy_in_cb(void *opaque, int ret) { NvmeCopyAIOCB *iocb =3D opaque; @@ -2943,6 +3206,7 @@ static void nvme_copy_cb(void *opaque, int ret) NvmeNamespace *ns =3D req->ns; uint64_t slba; uint32_t nlb; + uint64_t reftag; size_t len; uint16_t status; =20 @@ -2958,7 +3222,7 @@ static void nvme_copy_cb(void *opaque, int ret) } =20 nvme_copy_source_range_parse(iocb->ranges, iocb->idx, iocb->format, &s= lba, - &nlb, NULL, NULL, NULL); + &nlb, NULL, NULL, &reftag); len =3D nvme_l2b(ns, nlb); =20 trace_pci_nvme_copy_source_range(slba, nlb); @@ -2990,8 +3254,21 @@ static void nvme_copy_cb(void *opaque, int ret) qemu_iovec_reset(&iocb->iov); qemu_iovec_add(&iocb->iov, iocb->bounce, len); =20 - iocb->aiocb =3D blk_aio_preadv(ns->blkconf.blk, nvme_l2b(ns, slba), - &iocb->iov, 0, nvme_copy_in_cb, iocb); + if (ns->pip) { + NvmeCopyCmd *copy =3D (NvmeCopyCmd *)&req->cmd; + uint16_t prinfor =3D ((copy->control[0] >> 4) & 0xf); + status =3D nvme_check_prinfo(ns, prinfor, slba, reftag); + if (status) { + goto invalid; + } + iocb->iov.dif.iov_len =3D nvme_m2b(ns, nlb); + iocb->aiocb =3D blk_aio_preadv(ns->blkconf.blk, nvme_l2b(ns, slba), + &iocb->iov, 0, nvme_dif_pass_copy_cb, + iocb); + } else { + iocb->aiocb =3D blk_aio_preadv(ns->blkconf.blk, nvme_l2b(ns, slba), + &iocb->iov, 0, nvme_copy_in_cb, iocb); + } return; =20 invalid: @@ -3078,11 +3355,19 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest = *req) iocb->idx =3D 0; iocb->reftag =3D le32_to_cpu(copy->reftag); iocb->reftag |=3D (uint64_t)le32_to_cpu(copy->cdw3) << 32; - iocb->bounce =3D g_malloc_n(le16_to_cpu(ns->id_ns.mssrl), - ns->lbasz + ns->lbaf.ms); =20 qemu_iovec_init(&iocb->iov, 1); =20 + if (ns->pip) { + qemu_iovec_init_pi(&iocb->iov, 1, le16_to_cpu(ns->id_ns.mssrl)); + iocb->bounce =3D qemu_memalign(qemu_real_host_page_size(), + le16_to_cpu(ns->id_ns.mssrl) * ns->lb= asz); + } else { + qemu_iovec_init(&iocb->iov, 1); + iocb->bounce =3D g_malloc_n(le16_to_cpu(ns->id_ns.mssrl), + ns->lbasz + ns->lbaf.ms); + } + block_acct_start(blk_get_stats(ns->blkconf.blk), &iocb->acct.read, 0, BLOCK_ACCT_READ); block_acct_start(blk_get_stats(ns->blkconf.blk), &iocb->acct.write, 0, @@ -3145,18 +3430,31 @@ static uint16_t nvme_compare(NvmeCtrl *n, NvmeReque= st *req) return status; } =20 - ctx =3D g_new(struct nvme_compare_ctx, 1); - ctx->data.bounce =3D g_malloc(data_len); + if (ns->pip) { + ctx =3D g_new0(struct nvme_compare_ctx, 1); + ctx->data.bounce =3D qemu_memalign(qemu_real_host_page_size(), dat= a_len); + + req->opaque =3D ctx; =20 - req->opaque =3D ctx; + qemu_iovec_init_pi(&ctx->data.iov, 1, nlb); + qemu_iovec_add(&ctx->data.iov, ctx->data.bounce, data_len); + block_acct_start(blk_get_stats(blk), &req->acct, data_len, + BLOCK_ACCT_READ); + req->aiocb =3D blk_aio_preadv(blk, offset, &ctx->data.iov, 0, + nvme_dif_pass_compare_cb, req); + } else { + ctx =3D g_new(struct nvme_compare_ctx, 1); + ctx->data.bounce =3D g_malloc(data_len); =20 - qemu_iovec_init(&ctx->data.iov, 1); - qemu_iovec_add(&ctx->data.iov, ctx->data.bounce, data_len); + req->opaque =3D ctx; =20 - block_acct_start(blk_get_stats(blk), &req->acct, data_len, - BLOCK_ACCT_READ); - req->aiocb =3D blk_aio_preadv(blk, offset, &ctx->data.iov, 0, - nvme_compare_data_cb, req); + qemu_iovec_init(&ctx->data.iov, 1); + qemu_iovec_add(&ctx->data.iov, ctx->data.bounce, data_len); + block_acct_start(blk_get_stats(blk), &req->acct, data_len, + BLOCK_ACCT_READ); + req->aiocb =3D blk_aio_preadv(blk, offset, &ctx->data.iov, 0, + nvme_compare_data_cb, req); + } =20 return NVME_NO_COMPLETE; } diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events index 259fa8ffa2..42c171ed72 100644 --- a/hw/nvme/trace-events +++ b/hw/nvme/trace-events @@ -41,12 +41,14 @@ pci_nvme_copy_out(uint64_t slba, uint32_t nlb) "slba 0x= %"PRIx64" nlb %"PRIu32"" pci_nvme_verify(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) = "cid %"PRIu16" nsid %"PRIu32" slba 0x%"PRIx64" nlb %"PRIu32"" pci_nvme_verify_mdata_in_cb(uint16_t cid, const char *blkname) "cid %"PRIu= 16" blk '%s'" pci_nvme_verify_cb(uint16_t cid, uint8_t prinfo, uint16_t apptag, uint16_t= appmask, uint32_t reftag) "cid %"PRIu16" prinfo 0x%"PRIx8" apptag 0x%"PRIx= 16" appmask 0x%"PRIx16" reftag 0x%"PRIx32"" +pci_nvme_dif_pass_verify_cb(uint16_t cid) "cid %"PRIu16"" pci_nvme_rw_complete_cb(uint16_t cid, const char *blkname) "cid %"PRIu16" = blk '%s'" pci_nvme_block_status(int64_t offset, int64_t bytes, int64_t pnum, int ret= , bool zeroed) "offset %"PRId64" bytes %"PRId64" pnum %"PRId64" ret 0x%x ze= roed %d" pci_nvme_dsm(uint32_t nr, uint32_t attr) "nr %"PRIu32" attr 0x%"PRIx32"" pci_nvme_dsm_deallocate(uint64_t slba, uint32_t nlb) "slba %"PRIu64" nlb %= "PRIu32"" pci_nvme_dsm_single_range_limit_exceeded(uint32_t nlb, uint32_t dmrsl) "nl= b %"PRIu32" dmrsl %"PRIu32"" pci_nvme_compare(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb)= "cid %"PRIu16" nsid %"PRIu32" slba 0x%"PRIx64" nlb %"PRIu32"" +pci_nvme_dif_pass_compare_cb(uint16_t cid) "cid %"PRIu16"" pci_nvme_compare_data_cb(uint16_t cid) "cid %"PRIu16"" pci_nvme_compare_mdata_cb(uint16_t cid) "cid %"PRIu16"" pci_nvme_aio_discard_cb(uint16_t cid) "cid %"PRIu16"" --=20 2.38.1