1
The following changes since commit 05d50ba2d4668d43a835c5a502efdec9b92646e6:
1
The following changes since commit fea445e8fe9acea4f775a832815ee22bdf2b0222:
2
2
3
Merge tag 'migration-20230427-pull-request' of https://gitlab.com/juan.quintela/qemu into staging (2023-04-28 08:35:06 +0100)
3
Merge tag 'pull-maintainer-final-for-real-this-time-200324-1' of https://gitlab.com/stsquad/qemu into staging (2024-03-21 10:31:56 +0000)
4
4
5
are available in the Git repository at:
5
are available in the Git repository at:
6
6
7
https://gitlab.com/stefanha/qemu.git tags/block-pull-request
7
https://gitlab.com/stefanha/qemu.git tags/block-pull-request
8
8
9
for you to fetch changes up to d3c760be786571d83d5cea01953e543df4d76f51:
9
for you to fetch changes up to 9352f80cd926fe2dde7c89b93ee33bb0356ff40e:
10
10
11
docs/zoned-storage:add zoned emulation use case (2023-04-28 08:34:07 -0400)
11
coroutine: reserve 5,000 mappings (2024-03-21 13:14:30 -0400)
12
12
13
----------------------------------------------------------------
13
----------------------------------------------------------------
14
Pull request
14
Pull request
15
15
16
This pull request contains Sam Li's virtio-blk zoned storage work. These
16
I was too quick in sending the coroutine pool sizing change for -rc0 and still
17
patches were dropped from my previous block pull request due to CI failures.
17
needed to address feedback from Daniel Berrangé.
18
18
19
----------------------------------------------------------------
19
----------------------------------------------------------------
20
20
21
Sam Li (17):
21
Stefan Hajnoczi (1):
22
block/block-common: add zoned device structs
22
coroutine: reserve 5,000 mappings
23
block/file-posix: introduce helper functions for sysfs attributes
24
block/block-backend: add block layer APIs resembling Linux
25
ZonedBlockDevice ioctls
26
block/raw-format: add zone operations to pass through requests
27
block: add zoned BlockDriver check to block layer
28
iotests: test new zone operations
29
block: add some trace events for new block layer APIs
30
docs/zoned-storage: add zoned device documentation
31
file-posix: add tracking of the zone write pointers
32
block: introduce zone append write for zoned devices
33
qemu-iotests: test zone append operation
34
block: add some trace events for zone append
35
include: update virtio_blk headers to v6.3-rc1
36
virtio-blk: add zoned storage emulation for zoned devices
37
block: add accounting for zone append operation
38
virtio-blk: add some trace events for zoned emulation
39
docs/zoned-storage:add zoned emulation use case
40
23
41
docs/devel/index-api.rst | 1 +
24
util/qemu-coroutine.c | 15 ++++++++++-----
42
docs/devel/zoned-storage.rst | 62 ++
25
1 file changed, 10 insertions(+), 5 deletions(-)
43
qapi/block-core.json | 68 +-
44
qapi/block.json | 4 +
45
meson.build | 4 +
46
include/block/accounting.h | 1 +
47
include/block/block-common.h | 57 ++
48
include/block/block-io.h | 13 +
49
include/block/block_int-common.h | 37 +
50
include/block/raw-aio.h | 8 +-
51
include/standard-headers/drm/drm_fourcc.h | 12 +
52
include/standard-headers/linux/ethtool.h | 48 +-
53
include/standard-headers/linux/fuse.h | 45 +-
54
include/standard-headers/linux/pci_regs.h | 1 +
55
include/standard-headers/linux/vhost_types.h | 2 +
56
include/standard-headers/linux/virtio_blk.h | 105 +++
57
include/sysemu/block-backend-io.h | 27 +
58
linux-headers/asm-arm64/kvm.h | 1 +
59
linux-headers/asm-x86/kvm.h | 34 +-
60
linux-headers/linux/kvm.h | 9 +
61
linux-headers/linux/vfio.h | 15 +-
62
linux-headers/linux/vhost.h | 8 +
63
block.c | 19 +
64
block/block-backend.c | 198 ++++++
65
block/file-posix.c | 696 +++++++++++++++++--
66
block/io.c | 68 ++
67
block/io_uring.c | 4 +
68
block/linux-aio.c | 3 +
69
block/qapi-sysemu.c | 11 +
70
block/qapi.c | 18 +
71
block/raw-format.c | 26 +
72
hw/block/virtio-blk-common.c | 2 +
73
hw/block/virtio-blk.c | 405 +++++++++++
74
hw/virtio/virtio-qmp.c | 2 +
75
qemu-io-cmds.c | 224 ++++++
76
block/trace-events | 4 +
77
docs/system/qemu-block-drivers.rst.inc | 6 +
78
hw/block/trace-events | 7 +
79
tests/qemu-iotests/tests/zoned | 105 +++
80
tests/qemu-iotests/tests/zoned.out | 69 ++
81
40 files changed, 2361 insertions(+), 68 deletions(-)
82
create mode 100644 docs/devel/zoned-storage.rst
83
create mode 100755 tests/qemu-iotests/tests/zoned
84
create mode 100644 tests/qemu-iotests/tests/zoned.out
85
26
86
--
27
--
87
2.40.0
28
2.44.0
29
30
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
6
Reviewed-by: Hannes Reinecke <hare@suse.de>
7
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
8
Acked-by: Kevin Wolf <kwolf@redhat.com>
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Message-id: 20230427172019.3345-2-faithilikerun@gmail.com
11
Message-id: 20230324090605.28361-2-faithilikerun@gmail.com
12
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
13
<philmd@linaro.org>.
14
--Stefan]
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
16
---
17
include/block/block-common.h | 43 ++++++++++++++++++++++++++++++++++++
18
1 file changed, 43 insertions(+)
19
20
diff --git a/include/block/block-common.h b/include/block/block-common.h
21
index XXXXXXX..XXXXXXX 100644
22
--- a/include/block/block-common.h
23
+++ b/include/block/block-common.h
24
@@ -XXX,XX +XXX,XX @@ typedef struct BlockDriver BlockDriver;
25
typedef struct BdrvChild BdrvChild;
26
typedef struct BdrvChildClass BdrvChildClass;
27
28
+typedef enum BlockZoneOp {
29
+ BLK_ZO_OPEN,
30
+ BLK_ZO_CLOSE,
31
+ BLK_ZO_FINISH,
32
+ BLK_ZO_RESET,
33
+} BlockZoneOp;
34
+
35
+typedef enum BlockZoneModel {
36
+ BLK_Z_NONE = 0x0, /* Regular block device */
37
+ BLK_Z_HM = 0x1, /* Host-managed zoned block device */
38
+ BLK_Z_HA = 0x2, /* Host-aware zoned block device */
39
+} BlockZoneModel;
40
+
41
+typedef enum BlockZoneState {
42
+ BLK_ZS_NOT_WP = 0x0,
43
+ BLK_ZS_EMPTY = 0x1,
44
+ BLK_ZS_IOPEN = 0x2,
45
+ BLK_ZS_EOPEN = 0x3,
46
+ BLK_ZS_CLOSED = 0x4,
47
+ BLK_ZS_RDONLY = 0xD,
48
+ BLK_ZS_FULL = 0xE,
49
+ BLK_ZS_OFFLINE = 0xF,
50
+} BlockZoneState;
51
+
52
+typedef enum BlockZoneType {
53
+ BLK_ZT_CONV = 0x1, /* Conventional random writes supported */
54
+ BLK_ZT_SWR = 0x2, /* Sequential writes required */
55
+ BLK_ZT_SWP = 0x3, /* Sequential writes preferred */
56
+} BlockZoneType;
57
+
58
+/*
59
+ * Zone descriptor data structure.
60
+ * Provides information on a zone with all position and size values in bytes.
61
+ */
62
+typedef struct BlockZoneDescriptor {
63
+ uint64_t start;
64
+ uint64_t length;
65
+ uint64_t cap;
66
+ uint64_t wp;
67
+ BlockZoneType type;
68
+ BlockZoneState state;
69
+} BlockZoneDescriptor;
70
+
71
typedef struct BlockDriverInfo {
72
/* in bytes, 0 if irrelevant */
73
int cluster_size;
74
--
75
2.40.0
76
77
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Use get_sysfs_str_val() to get the string value of device
4
zoned model. Then get_sysfs_zoned_model() can convert it to
5
BlockZoneModel type of QEMU.
6
7
Use get_sysfs_long_val() to get the long value of zoned device
8
information.
9
10
Signed-off-by: Sam Li <faithilikerun@gmail.com>
11
Reviewed-by: Hannes Reinecke <hare@suse.de>
12
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
14
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
15
Acked-by: Kevin Wolf <kwolf@redhat.com>
16
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
17
Message-id: 20230427172019.3345-3-faithilikerun@gmail.com
18
Message-id: 20230324090605.28361-3-faithilikerun@gmail.com
19
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
20
<philmd@linaro.org>.
21
--Stefan]
22
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
23
---
24
include/block/block_int-common.h | 3 +
25
block/file-posix.c | 139 ++++++++++++++++++++++---------
26
2 files changed, 104 insertions(+), 38 deletions(-)
27
28
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
29
index XXXXXXX..XXXXXXX 100644
30
--- a/include/block/block_int-common.h
31
+++ b/include/block/block_int-common.h
32
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
33
* an explicit monitor command to load the disk inside the guest).
34
*/
35
bool has_variable_length;
36
+
37
+ /* device zone model */
38
+ BlockZoneModel zoned;
39
} BlockLimits;
40
41
typedef struct BdrvOpBlocker BdrvOpBlocker;
42
diff --git a/block/file-posix.c b/block/file-posix.c
43
index XXXXXXX..XXXXXXX 100644
44
--- a/block/file-posix.c
45
+++ b/block/file-posix.c
46
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_hw_transfer(int fd, struct stat *st)
47
#endif
48
}
49
50
-static int hdev_get_max_segments(int fd, struct stat *st)
51
+/*
52
+ * Get a sysfs attribute value as character string.
53
+ */
54
+static int get_sysfs_str_val(struct stat *st, const char *attribute,
55
+ char **val) {
56
+#ifdef CONFIG_LINUX
57
+ g_autofree char *sysfspath = NULL;
58
+ int ret;
59
+ size_t len;
60
+
61
+ if (!S_ISBLK(st->st_mode)) {
62
+ return -ENOTSUP;
63
+ }
64
+
65
+ sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s",
66
+ major(st->st_rdev), minor(st->st_rdev),
67
+ attribute);
68
+ ret = g_file_get_contents(sysfspath, val, &len, NULL);
69
+ if (ret == -1) {
70
+ return -ENOENT;
71
+ }
72
+
73
+ /* The file is ended with '\n' */
74
+ char *p;
75
+ p = *val;
76
+ if (*(p + len - 1) == '\n') {
77
+ *(p + len - 1) = '\0';
78
+ }
79
+ return ret;
80
+#else
81
+ return -ENOTSUP;
82
+#endif
83
+}
84
+
85
+static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
86
+{
87
+ g_autofree char *val = NULL;
88
+ int ret;
89
+
90
+ ret = get_sysfs_str_val(st, "zoned", &val);
91
+ if (ret < 0) {
92
+ return ret;
93
+ }
94
+
95
+ if (strcmp(val, "host-managed") == 0) {
96
+ *zoned = BLK_Z_HM;
97
+ } else if (strcmp(val, "host-aware") == 0) {
98
+ *zoned = BLK_Z_HA;
99
+ } else if (strcmp(val, "none") == 0) {
100
+ *zoned = BLK_Z_NONE;
101
+ } else {
102
+ return -ENOTSUP;
103
+ }
104
+ return 0;
105
+}
106
+
107
+/*
108
+ * Get a sysfs attribute value as a long integer.
109
+ */
110
+static long get_sysfs_long_val(struct stat *st, const char *attribute)
111
{
112
#ifdef CONFIG_LINUX
113
- char buf[32];
114
+ g_autofree char *str = NULL;
115
const char *end;
116
- char *sysfspath = NULL;
117
+ long val;
118
+ int ret;
119
+
120
+ ret = get_sysfs_str_val(st, attribute, &str);
121
+ if (ret < 0) {
122
+ return ret;
123
+ }
124
+
125
+ /* The file is ended with '\n', pass 'end' to accept that. */
126
+ ret = qemu_strtol(str, &end, 10, &val);
127
+ if (ret == 0 && end && *end == '\0') {
128
+ ret = val;
129
+ }
130
+ return ret;
131
+#else
132
+ return -ENOTSUP;
133
+#endif
134
+}
135
+
136
+static int hdev_get_max_segments(int fd, struct stat *st)
137
+{
138
+#ifdef CONFIG_LINUX
139
int ret;
140
- int sysfd = -1;
141
- long max_segments;
142
143
if (S_ISCHR(st->st_mode)) {
144
if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
145
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_segments(int fd, struct stat *st)
146
}
147
return -ENOTSUP;
148
}
149
-
150
- if (!S_ISBLK(st->st_mode)) {
151
- return -ENOTSUP;
152
- }
153
-
154
- sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
155
- major(st->st_rdev), minor(st->st_rdev));
156
- sysfd = open(sysfspath, O_RDONLY);
157
- if (sysfd == -1) {
158
- ret = -errno;
159
- goto out;
160
- }
161
- ret = RETRY_ON_EINTR(read(sysfd, buf, sizeof(buf) - 1));
162
- if (ret < 0) {
163
- ret = -errno;
164
- goto out;
165
- } else if (ret == 0) {
166
- ret = -EIO;
167
- goto out;
168
- }
169
- buf[ret] = 0;
170
- /* The file is ended with '\n', pass 'end' to accept that. */
171
- ret = qemu_strtol(buf, &end, 10, &max_segments);
172
- if (ret == 0 && end && *end == '\n') {
173
- ret = max_segments;
174
- }
175
-
176
-out:
177
- if (sysfd != -1) {
178
- close(sysfd);
179
- }
180
- g_free(sysfspath);
181
- return ret;
182
+ return get_sysfs_long_val(st, "max_segments");
183
#else
184
return -ENOTSUP;
185
#endif
186
}
187
188
+static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
189
+ Error **errp)
190
+{
191
+ BlockZoneModel zoned;
192
+ int ret;
193
+
194
+ bs->bl.zoned = BLK_Z_NONE;
195
+
196
+ ret = get_sysfs_zoned_model(st, &zoned);
197
+ if (ret < 0 || zoned == BLK_Z_NONE) {
198
+ return;
199
+ }
200
+ bs->bl.zoned = zoned;
201
+}
202
+
203
static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
204
{
205
BDRVRawState *s = bs->opaque;
206
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
207
bs->bl.max_hw_iov = ret;
208
}
209
}
210
+
211
+ raw_refresh_zoned_limits(bs, &st, errp);
212
}
213
214
static int check_for_dasd(int fd)
215
--
216
2.40.0
217
218
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
Daniel P. Berrangé <berrange@redhat.com> pointed out that the coroutine
2
pool size heuristic is very conservative. Instead of halving
3
max_map_count, he suggested reserving 5,000 mappings for non-coroutine
4
users based on observations of guests he has access to.
2
5
3
Add zoned device option to host_device BlockDriver. It will be presented only
6
Fixes: 86a637e48104 ("coroutine: cap per-thread local pool size")
4
for zoned host block devices. By adding zone management operations to the
5
host_block_device BlockDriver, users can use the new block layer APIs
6
including Report Zone and four zone management operations
7
(open, close, finish, reset, reset_all).
8
9
Qemu-io uses the new APIs to perform zoned storage commands of the device:
10
zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs),
11
zone_finish(zf).
12
13
For example, to test zone_report, use following command:
14
$ ./build/qemu-io --image-opts -n driver=host_device, filename=/dev/nullb0
15
-c "zrp offset nr_zones"
16
17
Signed-off-by: Sam Li <faithilikerun@gmail.com>
18
Reviewed-by: Hannes Reinecke <hare@suse.de>
19
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
20
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
21
Acked-by: Kevin Wolf <kwolf@redhat.com>
22
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
23
Message-id: 20230427172019.3345-4-faithilikerun@gmail.com
8
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
24
Message-id: 20230324090605.28361-4-faithilikerun@gmail.com
9
Message-id: 20240320181232.1464819-1-stefanha@redhat.com
25
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
26
<philmd@linaro.org> and remove spurious ret = -errno in
27
raw_co_zone_mgmt().
28
--Stefan]
29
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
30
---
11
---
31
meson.build | 4 +
12
util/qemu-coroutine.c | 15 ++++++++++-----
32
include/block/block-io.h | 9 +
13
1 file changed, 10 insertions(+), 5 deletions(-)
33
include/block/block_int-common.h | 21 ++
34
include/block/raw-aio.h | 6 +-
35
include/sysemu/block-backend-io.h | 18 ++
36
block/block-backend.c | 137 +++++++++++++
37
block/file-posix.c | 313 +++++++++++++++++++++++++++++-
38
block/io.c | 41 ++++
39
qemu-io-cmds.c | 149 ++++++++++++++
40
9 files changed, 695 insertions(+), 3 deletions(-)
41
14
42
diff --git a/meson.build b/meson.build
15
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
43
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
44
--- a/meson.build
17
--- a/util/qemu-coroutine.c
45
+++ b/meson.build
18
+++ b/util/qemu-coroutine.c
46
@@ -XXX,XX +XXX,XX @@ config_host_data.set('CONFIG_REPLICATION', get_option('replication').allowed())
19
@@ -XXX,XX +XXX,XX @@ static unsigned int get_global_pool_hard_max_size(void)
47
# has_header
20
NULL) &&
48
config_host_data.set('CONFIG_EPOLL', cc.has_header('sys/epoll.h'))
21
qemu_strtoi(contents, NULL, 10, &max_map_count) == 0) {
49
config_host_data.set('CONFIG_LINUX_MAGIC_H', cc.has_header('linux/magic.h'))
22
/*
50
+config_host_data.set('CONFIG_BLKZONED', cc.has_header('linux/blkzoned.h'))
23
- * This is a conservative upper bound that avoids exceeding
51
config_host_data.set('CONFIG_VALGRIND_H', cc.has_header('valgrind/valgrind.h'))
24
- * max_map_count. Leave half for non-coroutine users like library
52
config_host_data.set('HAVE_BTRFS_H', cc.has_header('linux/btrfs.h'))
25
- * dependencies, vhost-user, etc. Each coroutine takes up 2 VMAs so
53
config_host_data.set('HAVE_DRM_H', cc.has_header('libdrm/drm.h'))
26
- * halve the amount again.
54
@@ -XXX,XX +XXX,XX @@ config_host_data.set('HAVE_SIGEV_NOTIFY_THREAD_ID',
27
+ * This is an upper bound that avoids exceeding max_map_count. Leave a
55
config_host_data.set('HAVE_STRUCT_STAT_ST_ATIM',
28
+ * fixed amount for non-coroutine users like library dependencies,
56
cc.has_member('struct stat', 'st_atim',
29
+ * vhost-user, etc. Each coroutine takes up 2 VMAs so halve the
57
prefix: '#include <sys/stat.h>'))
30
+ * remaining amount.
58
+config_host_data.set('HAVE_BLK_ZONE_REP_CAPACITY',
31
*/
59
+ cc.has_member('struct blk_zone', 'capacity',
32
- return max_map_count / 4;
60
+ prefix: '#include <linux/blkzoned.h>'))
33
+ if (max_map_count > 5000) {
61
34
+ return (max_map_count - 5000) / 2;
62
# has_type
35
+ } else {
63
config_host_data.set('CONFIG_IOVEC',
36
+ /* Disable the global pool but threads still have local pools */
64
diff --git a/include/block/block-io.h b/include/block/block-io.h
37
+ return 0;
65
index XXXXXXX..XXXXXXX 100644
66
--- a/include/block/block-io.h
67
+++ b/include/block/block-io.h
68
@@ -XXX,XX +XXX,XX @@ int coroutine_fn GRAPH_RDLOCK bdrv_co_flush(BlockDriverState *bs);
69
int coroutine_fn GRAPH_RDLOCK bdrv_co_pdiscard(BdrvChild *child, int64_t offset,
70
int64_t bytes);
71
72
+/* Report zone information of zone block device. */
73
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_report(BlockDriverState *bs,
74
+ int64_t offset,
75
+ unsigned int *nr_zones,
76
+ BlockZoneDescriptor *zones);
77
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs,
78
+ BlockZoneOp op,
79
+ int64_t offset, int64_t len);
80
+
81
bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
82
int bdrv_block_status(BlockDriverState *bs, int64_t offset,
83
int64_t bytes, int64_t *pnum, int64_t *map,
84
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
85
index XXXXXXX..XXXXXXX 100644
86
--- a/include/block/block_int-common.h
87
+++ b/include/block/block_int-common.h
88
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
89
int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_load_vmstate)(
90
BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
91
92
+ int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs,
93
+ int64_t offset, unsigned int *nr_zones,
94
+ BlockZoneDescriptor *zones);
95
+ int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
96
+ int64_t offset, int64_t len);
97
+
98
/* removable device specific */
99
bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)(
100
BlockDriverState *bs);
101
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
102
103
/* device zone model */
104
BlockZoneModel zoned;
105
+
106
+ /* zone size expressed in bytes */
107
+ uint32_t zone_size;
108
+
109
+ /* total number of zones */
110
+ uint32_t nr_zones;
111
+
112
+ /* maximum sectors of a zone append write operation */
113
+ int64_t max_append_sectors;
114
+
115
+ /* maximum number of open zones */
116
+ int64_t max_open_zones;
117
+
118
+ /* maximum number of active zones */
119
+ int64_t max_active_zones;
120
} BlockLimits;
121
122
typedef struct BdrvOpBlocker BdrvOpBlocker;
123
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
124
index XXXXXXX..XXXXXXX 100644
125
--- a/include/block/raw-aio.h
126
+++ b/include/block/raw-aio.h
127
@@ -XXX,XX +XXX,XX @@
128
#define QEMU_AIO_WRITE_ZEROES 0x0020
129
#define QEMU_AIO_COPY_RANGE 0x0040
130
#define QEMU_AIO_TRUNCATE 0x0080
131
+#define QEMU_AIO_ZONE_REPORT 0x0100
132
+#define QEMU_AIO_ZONE_MGMT 0x0200
133
#define QEMU_AIO_TYPE_MASK \
134
(QEMU_AIO_READ | \
135
QEMU_AIO_WRITE | \
136
@@ -XXX,XX +XXX,XX @@
137
QEMU_AIO_DISCARD | \
138
QEMU_AIO_WRITE_ZEROES | \
139
QEMU_AIO_COPY_RANGE | \
140
- QEMU_AIO_TRUNCATE)
141
+ QEMU_AIO_TRUNCATE | \
142
+ QEMU_AIO_ZONE_REPORT | \
143
+ QEMU_AIO_ZONE_MGMT)
144
145
/* AIO flags */
146
#define QEMU_AIO_MISALIGNED 0x1000
147
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
148
index XXXXXXX..XXXXXXX 100644
149
--- a/include/sysemu/block-backend-io.h
150
+++ b/include/sysemu/block-backend-io.h
151
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t offset,
152
BlockCompletionFunc *cb, void *opaque);
153
BlockAIOCB *blk_aio_flush(BlockBackend *blk,
154
BlockCompletionFunc *cb, void *opaque);
155
+BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
156
+ unsigned int *nr_zones,
157
+ BlockZoneDescriptor *zones,
158
+ BlockCompletionFunc *cb, void *opaque);
159
+BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
160
+ int64_t offset, int64_t len,
161
+ BlockCompletionFunc *cb, void *opaque);
162
BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes,
163
BlockCompletionFunc *cb, void *opaque);
164
void blk_aio_cancel_async(BlockAIOCB *acb);
165
@@ -XXX,XX +XXX,XX @@ int co_wrapper_mixed blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
166
int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
167
int64_t bytes, BdrvRequestFlags flags);
168
169
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
170
+ unsigned int *nr_zones,
171
+ BlockZoneDescriptor *zones);
172
+int co_wrapper_mixed blk_zone_report(BlockBackend *blk, int64_t offset,
173
+ unsigned int *nr_zones,
174
+ BlockZoneDescriptor *zones);
175
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
176
+ int64_t offset, int64_t len);
177
+int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
178
+ int64_t offset, int64_t len);
179
+
180
int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset,
181
int64_t bytes);
182
int coroutine_fn blk_co_pdiscard(BlockBackend *blk, int64_t offset,
183
diff --git a/block/block-backend.c b/block/block-backend.c
184
index XXXXXXX..XXXXXXX 100644
185
--- a/block/block-backend.c
186
+++ b/block/block-backend.c
187
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
188
return ret;
189
}
190
191
+static void coroutine_fn blk_aio_zone_report_entry(void *opaque)
192
+{
193
+ BlkAioEmAIOCB *acb = opaque;
194
+ BlkRwCo *rwco = &acb->rwco;
195
+
196
+ rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset,
197
+ (unsigned int*)(uintptr_t)acb->bytes,
198
+ rwco->iobuf);
199
+ blk_aio_complete(acb);
200
+}
201
+
202
+BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
203
+ unsigned int *nr_zones,
204
+ BlockZoneDescriptor *zones,
205
+ BlockCompletionFunc *cb, void *opaque)
206
+{
207
+ BlkAioEmAIOCB *acb;
208
+ Coroutine *co;
209
+ IO_CODE();
210
+
211
+ blk_inc_in_flight(blk);
212
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
213
+ acb->rwco = (BlkRwCo) {
214
+ .blk = blk,
215
+ .offset = offset,
216
+ .iobuf = zones,
217
+ .ret = NOT_DONE,
218
+ };
219
+ acb->bytes = (int64_t)(uintptr_t)nr_zones,
220
+ acb->has_returned = false;
221
+
222
+ co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
223
+ aio_co_enter(blk_get_aio_context(blk), co);
224
+
225
+ acb->has_returned = true;
226
+ if (acb->rwco.ret != NOT_DONE) {
227
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
228
+ blk_aio_complete_bh, acb);
229
+ }
230
+
231
+ return &acb->common;
232
+}
233
+
234
+static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque)
235
+{
236
+ BlkAioEmAIOCB *acb = opaque;
237
+ BlkRwCo *rwco = &acb->rwco;
238
+
239
+ rwco->ret = blk_co_zone_mgmt(rwco->blk,
240
+ (BlockZoneOp)(uintptr_t)rwco->iobuf,
241
+ rwco->offset, acb->bytes);
242
+ blk_aio_complete(acb);
243
+}
244
+
245
+BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
246
+ int64_t offset, int64_t len,
247
+ BlockCompletionFunc *cb, void *opaque) {
248
+ BlkAioEmAIOCB *acb;
249
+ Coroutine *co;
250
+ IO_CODE();
251
+
252
+ blk_inc_in_flight(blk);
253
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
254
+ acb->rwco = (BlkRwCo) {
255
+ .blk = blk,
256
+ .offset = offset,
257
+ .iobuf = (void *)(uintptr_t)op,
258
+ .ret = NOT_DONE,
259
+ };
260
+ acb->bytes = len;
261
+ acb->has_returned = false;
262
+
263
+ co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
264
+ aio_co_enter(blk_get_aio_context(blk), co);
265
+
266
+ acb->has_returned = true;
267
+ if (acb->rwco.ret != NOT_DONE) {
268
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
269
+ blk_aio_complete_bh, acb);
270
+ }
271
+
272
+ return &acb->common;
273
+}
274
+
275
+/*
276
+ * Send a zone_report command.
277
+ * offset is a byte offset from the start of the device. No alignment
278
+ * required for offset.
279
+ * nr_zones represents IN maximum and OUT actual.
280
+ */
281
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
282
+ unsigned int *nr_zones,
283
+ BlockZoneDescriptor *zones)
284
+{
285
+ int ret;
286
+ IO_CODE();
287
+
288
+ blk_inc_in_flight(blk); /* increase before waiting */
289
+ blk_wait_while_drained(blk);
290
+ GRAPH_RDLOCK_GUARD();
291
+ if (!blk_is_available(blk)) {
292
+ blk_dec_in_flight(blk);
293
+ return -ENOMEDIUM;
294
+ }
295
+ ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones);
296
+ blk_dec_in_flight(blk);
297
+ return ret;
298
+}
299
+
300
+/*
301
+ * Send a zone_management command.
302
+ * op is the zone operation;
303
+ * offset is the byte offset from the start of the zoned device;
304
+ * len is the maximum number of bytes the command should operate on. It
305
+ * should be aligned with the device zone size.
306
+ */
307
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
308
+ int64_t offset, int64_t len)
309
+{
310
+ int ret;
311
+ IO_CODE();
312
+
313
+ blk_inc_in_flight(blk);
314
+ blk_wait_while_drained(blk);
315
+ GRAPH_RDLOCK_GUARD();
316
+
317
+ ret = blk_check_byte_request(blk, offset, len);
318
+ if (ret < 0) {
319
+ blk_dec_in_flight(blk);
320
+ return ret;
321
+ }
322
+
323
+ ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len);
324
+ blk_dec_in_flight(blk);
325
+ return ret;
326
+}
327
+
328
void blk_drain(BlockBackend *blk)
329
{
330
BlockDriverState *bs = blk_bs(blk);
331
diff --git a/block/file-posix.c b/block/file-posix.c
332
index XXXXXXX..XXXXXXX 100644
333
--- a/block/file-posix.c
334
+++ b/block/file-posix.c
335
@@ -XXX,XX +XXX,XX @@
336
#include <sys/param.h>
337
#include <sys/syscall.h>
338
#include <sys/vfs.h>
339
+#if defined(CONFIG_BLKZONED)
340
+#include <linux/blkzoned.h>
341
+#endif
342
#include <linux/cdrom.h>
343
#include <linux/fd.h>
344
#include <linux/fs.h>
345
@@ -XXX,XX +XXX,XX @@ typedef struct RawPosixAIOData {
346
PreallocMode prealloc;
347
Error **errp;
348
} truncate;
349
+ struct {
350
+ unsigned int *nr_zones;
351
+ BlockZoneDescriptor *zones;
352
+ } zone_report;
353
+ struct {
354
+ unsigned long op;
355
+ } zone_mgmt;
356
};
357
} RawPosixAIOData;
358
359
@@ -XXX,XX +XXX,XX @@ static int get_sysfs_str_val(struct stat *st, const char *attribute,
360
#endif
361
}
362
363
+#if defined(CONFIG_BLKZONED)
364
static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
365
{
366
g_autofree char *val = NULL;
367
@@ -XXX,XX +XXX,XX @@ static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
368
}
369
return 0;
370
}
371
+#endif /* defined(CONFIG_BLKZONED) */
372
373
/*
374
* Get a sysfs attribute value as a long integer.
375
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_segments(int fd, struct stat *st)
376
#endif
377
}
378
379
+#if defined(CONFIG_BLKZONED)
380
static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
381
Error **errp)
382
{
383
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
384
return;
385
}
386
bs->bl.zoned = zoned;
387
+
388
+ ret = get_sysfs_long_val(st, "max_open_zones");
389
+ if (ret >= 0) {
390
+ bs->bl.max_open_zones = ret;
391
+ }
392
+
393
+ ret = get_sysfs_long_val(st, "max_active_zones");
394
+ if (ret >= 0) {
395
+ bs->bl.max_active_zones = ret;
396
+ }
397
+
398
+ /*
399
+ * The zoned device must at least have zone size and nr_zones fields.
400
+ */
401
+ ret = get_sysfs_long_val(st, "chunk_sectors");
402
+ if (ret < 0) {
403
+ error_setg_errno(errp, -ret, "Unable to read chunk_sectors "
404
+ "sysfs attribute");
405
+ return;
406
+ } else if (!ret) {
407
+ error_setg(errp, "Read 0 from chunk_sectors sysfs attribute");
408
+ return;
409
+ }
410
+ bs->bl.zone_size = ret << BDRV_SECTOR_BITS;
411
+
412
+ ret = get_sysfs_long_val(st, "nr_zones");
413
+ if (ret < 0) {
414
+ error_setg_errno(errp, -ret, "Unable to read nr_zones "
415
+ "sysfs attribute");
416
+ return;
417
+ } else if (!ret) {
418
+ error_setg(errp, "Read 0 from nr_zones sysfs attribute");
419
+ return;
420
+ }
421
+ bs->bl.nr_zones = ret;
422
+
423
+ ret = get_sysfs_long_val(st, "zone_append_max_bytes");
424
+ if (ret > 0) {
425
+ bs->bl.max_append_sectors = ret >> BDRV_SECTOR_BITS;
426
+ }
427
}
428
+#else /* !defined(CONFIG_BLKZONED) */
429
+static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
430
+ Error **errp)
431
+{
432
+ bs->bl.zoned = BLK_Z_NONE;
433
+}
434
+#endif /* !defined(CONFIG_BLKZONED) */
435
436
static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
437
{
438
@@ -XXX,XX +XXX,XX @@ static int hdev_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
439
BDRVRawState *s = bs->opaque;
440
int ret;
441
442
- /* If DASD, get blocksizes */
443
+ /* If DASD or zoned devices, get blocksizes */
444
if (check_for_dasd(s->fd) < 0) {
445
- return -ENOTSUP;
446
+ /* zoned devices are not DASD */
447
+ if (bs->bl.zoned == BLK_Z_NONE) {
448
+ return -ENOTSUP;
449
+ }
38
+ }
450
}
39
}
451
ret = probe_logical_blocksize(s->fd, &bsz->log);
452
if (ret < 0) {
453
@@ -XXX,XX +XXX,XX @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
454
}
455
#endif
40
#endif
456
41
457
+/*
458
+ * parse_zone - Fill a zone descriptor
459
+ */
460
+#if defined(CONFIG_BLKZONED)
461
+static inline int parse_zone(struct BlockZoneDescriptor *zone,
462
+ const struct blk_zone *blkz) {
463
+ zone->start = blkz->start << BDRV_SECTOR_BITS;
464
+ zone->length = blkz->len << BDRV_SECTOR_BITS;
465
+ zone->wp = blkz->wp << BDRV_SECTOR_BITS;
466
+
467
+#ifdef HAVE_BLK_ZONE_REP_CAPACITY
468
+ zone->cap = blkz->capacity << BDRV_SECTOR_BITS;
469
+#else
470
+ zone->cap = blkz->len << BDRV_SECTOR_BITS;
471
+#endif
472
+
473
+ switch (blkz->type) {
474
+ case BLK_ZONE_TYPE_SEQWRITE_REQ:
475
+ zone->type = BLK_ZT_SWR;
476
+ break;
477
+ case BLK_ZONE_TYPE_SEQWRITE_PREF:
478
+ zone->type = BLK_ZT_SWP;
479
+ break;
480
+ case BLK_ZONE_TYPE_CONVENTIONAL:
481
+ zone->type = BLK_ZT_CONV;
482
+ break;
483
+ default:
484
+ error_report("Unsupported zone type: 0x%x", blkz->type);
485
+ return -ENOTSUP;
486
+ }
487
+
488
+ switch (blkz->cond) {
489
+ case BLK_ZONE_COND_NOT_WP:
490
+ zone->state = BLK_ZS_NOT_WP;
491
+ break;
492
+ case BLK_ZONE_COND_EMPTY:
493
+ zone->state = BLK_ZS_EMPTY;
494
+ break;
495
+ case BLK_ZONE_COND_IMP_OPEN:
496
+ zone->state = BLK_ZS_IOPEN;
497
+ break;
498
+ case BLK_ZONE_COND_EXP_OPEN:
499
+ zone->state = BLK_ZS_EOPEN;
500
+ break;
501
+ case BLK_ZONE_COND_CLOSED:
502
+ zone->state = BLK_ZS_CLOSED;
503
+ break;
504
+ case BLK_ZONE_COND_READONLY:
505
+ zone->state = BLK_ZS_RDONLY;
506
+ break;
507
+ case BLK_ZONE_COND_FULL:
508
+ zone->state = BLK_ZS_FULL;
509
+ break;
510
+ case BLK_ZONE_COND_OFFLINE:
511
+ zone->state = BLK_ZS_OFFLINE;
512
+ break;
513
+ default:
514
+ error_report("Unsupported zone state: 0x%x", blkz->cond);
515
+ return -ENOTSUP;
516
+ }
517
+ return 0;
518
+}
519
+#endif
520
+
521
+#if defined(CONFIG_BLKZONED)
522
+static int handle_aiocb_zone_report(void *opaque)
523
+{
524
+ RawPosixAIOData *aiocb = opaque;
525
+ int fd = aiocb->aio_fildes;
526
+ unsigned int *nr_zones = aiocb->zone_report.nr_zones;
527
+ BlockZoneDescriptor *zones = aiocb->zone_report.zones;
528
+ /* zoned block devices use 512-byte sectors */
529
+ uint64_t sector = aiocb->aio_offset / 512;
530
+
531
+ struct blk_zone *blkz;
532
+ size_t rep_size;
533
+ unsigned int nrz;
534
+ int ret;
535
+ unsigned int n = 0, i = 0;
536
+
537
+ nrz = *nr_zones;
538
+ rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
539
+ g_autofree struct blk_zone_report *rep = NULL;
540
+ rep = g_malloc(rep_size);
541
+
542
+ blkz = (struct blk_zone *)(rep + 1);
543
+ while (n < nrz) {
544
+ memset(rep, 0, rep_size);
545
+ rep->sector = sector;
546
+ rep->nr_zones = nrz - n;
547
+
548
+ do {
549
+ ret = ioctl(fd, BLKREPORTZONE, rep);
550
+ } while (ret != 0 && errno == EINTR);
551
+ if (ret != 0) {
552
+ error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
553
+ fd, sector, errno);
554
+ return -errno;
555
+ }
556
+
557
+ if (!rep->nr_zones) {
558
+ break;
559
+ }
560
+
561
+ for (i = 0; i < rep->nr_zones; i++, n++) {
562
+ ret = parse_zone(&zones[n], &blkz[i]);
563
+ if (ret != 0) {
564
+ return ret;
565
+ }
566
+
567
+ /* The next report should start after the last zone reported */
568
+ sector = blkz[i].start + blkz[i].len;
569
+ }
570
+ }
571
+
572
+ *nr_zones = n;
573
+ return 0;
574
+}
575
+#endif
576
+
577
+#if defined(CONFIG_BLKZONED)
578
+static int handle_aiocb_zone_mgmt(void *opaque)
579
+{
580
+ RawPosixAIOData *aiocb = opaque;
581
+ int fd = aiocb->aio_fildes;
582
+ uint64_t sector = aiocb->aio_offset / 512;
583
+ int64_t nr_sectors = aiocb->aio_nbytes / 512;
584
+ struct blk_zone_range range;
585
+ int ret;
586
+
587
+ /* Execute the operation */
588
+ range.sector = sector;
589
+ range.nr_sectors = nr_sectors;
590
+ do {
591
+ ret = ioctl(fd, aiocb->zone_mgmt.op, &range);
592
+ } while (ret != 0 && errno == EINTR);
593
+
594
+ return ret;
595
+}
596
+#endif
597
+
598
static int handle_aiocb_copy_range(void *opaque)
599
{
600
RawPosixAIOData *aiocb = opaque;
601
@@ -XXX,XX +XXX,XX @@ static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
602
}
603
}
604
605
+/*
606
+ * zone report - Get a zone block device's information in the form
607
+ * of an array of zone descriptors.
608
+ * zones is an array of zone descriptors to hold zone information on reply;
609
+ * offset can be any byte within the entire size of the device;
610
+ * nr_zones is the maxium number of sectors the command should operate on.
611
+ */
612
+#if defined(CONFIG_BLKZONED)
613
+static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
614
+ unsigned int *nr_zones,
615
+ BlockZoneDescriptor *zones) {
616
+ BDRVRawState *s = bs->opaque;
617
+ RawPosixAIOData acb = (RawPosixAIOData) {
618
+ .bs = bs,
619
+ .aio_fildes = s->fd,
620
+ .aio_type = QEMU_AIO_ZONE_REPORT,
621
+ .aio_offset = offset,
622
+ .zone_report = {
623
+ .nr_zones = nr_zones,
624
+ .zones = zones,
625
+ },
626
+ };
627
+
628
+ return raw_thread_pool_submit(handle_aiocb_zone_report, &acb);
629
+}
630
+#endif
631
+
632
+/*
633
+ * zone management operations - Execute an operation on a zone
634
+ */
635
+#if defined(CONFIG_BLKZONED)
636
+static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
637
+ int64_t offset, int64_t len) {
638
+ BDRVRawState *s = bs->opaque;
639
+ RawPosixAIOData acb;
640
+ int64_t zone_size, zone_size_mask;
641
+ const char *op_name;
642
+ unsigned long zo;
643
+ int ret;
644
+ int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
645
+
646
+ zone_size = bs->bl.zone_size;
647
+ zone_size_mask = zone_size - 1;
648
+ if (offset & zone_size_mask) {
649
+ error_report("sector offset %" PRId64 " is not aligned to zone size "
650
+ "%" PRId64 "", offset / 512, zone_size / 512);
651
+ return -EINVAL;
652
+ }
653
+
654
+ if (((offset + len) < capacity && len & zone_size_mask) ||
655
+ offset + len > capacity) {
656
+ error_report("number of sectors %" PRId64 " is not aligned to zone size"
657
+ " %" PRId64 "", len / 512, zone_size / 512);
658
+ return -EINVAL;
659
+ }
660
+
661
+ switch (op) {
662
+ case BLK_ZO_OPEN:
663
+ op_name = "BLKOPENZONE";
664
+ zo = BLKOPENZONE;
665
+ break;
666
+ case BLK_ZO_CLOSE:
667
+ op_name = "BLKCLOSEZONE";
668
+ zo = BLKCLOSEZONE;
669
+ break;
670
+ case BLK_ZO_FINISH:
671
+ op_name = "BLKFINISHZONE";
672
+ zo = BLKFINISHZONE;
673
+ break;
674
+ case BLK_ZO_RESET:
675
+ op_name = "BLKRESETZONE";
676
+ zo = BLKRESETZONE;
677
+ break;
678
+ default:
679
+ error_report("Unsupported zone op: 0x%x", op);
680
+ return -ENOTSUP;
681
+ }
682
+
683
+ acb = (RawPosixAIOData) {
684
+ .bs = bs,
685
+ .aio_fildes = s->fd,
686
+ .aio_type = QEMU_AIO_ZONE_MGMT,
687
+ .aio_offset = offset,
688
+ .aio_nbytes = len,
689
+ .zone_mgmt = {
690
+ .op = zo,
691
+ },
692
+ };
693
+
694
+ ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb);
695
+ if (ret != 0) {
696
+ error_report("ioctl %s failed %d", op_name, ret);
697
+ }
698
+
699
+ return ret;
700
+}
701
+#endif
702
+
703
static coroutine_fn int
704
raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
705
bool blkdev)
706
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_device = {
707
#ifdef __linux__
708
.bdrv_co_ioctl = hdev_co_ioctl,
709
#endif
710
+
711
+ /* zoned device */
712
+#if defined(CONFIG_BLKZONED)
713
+ /* zone management operations */
714
+ .bdrv_co_zone_report = raw_co_zone_report,
715
+ .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
716
+#endif
717
};
718
719
#if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
720
diff --git a/block/io.c b/block/io.c
721
index XXXXXXX..XXXXXXX 100644
722
--- a/block/io.c
723
+++ b/block/io.c
724
@@ -XXX,XX +XXX,XX @@ out:
725
return co.ret;
726
}
727
728
+int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
729
+ unsigned int *nr_zones,
730
+ BlockZoneDescriptor *zones)
731
+{
732
+ BlockDriver *drv = bs->drv;
733
+ CoroutineIOCompletion co = {
734
+ .coroutine = qemu_coroutine_self(),
735
+ };
736
+ IO_CODE();
737
+
738
+ bdrv_inc_in_flight(bs);
739
+ if (!drv || !drv->bdrv_co_zone_report || bs->bl.zoned == BLK_Z_NONE) {
740
+ co.ret = -ENOTSUP;
741
+ goto out;
742
+ }
743
+ co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones);
744
+out:
745
+ bdrv_dec_in_flight(bs);
746
+ return co.ret;
747
+}
748
+
749
+int coroutine_fn bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
750
+ int64_t offset, int64_t len)
751
+{
752
+ BlockDriver *drv = bs->drv;
753
+ CoroutineIOCompletion co = {
754
+ .coroutine = qemu_coroutine_self(),
755
+ };
756
+ IO_CODE();
757
+
758
+ bdrv_inc_in_flight(bs);
759
+ if (!drv || !drv->bdrv_co_zone_mgmt || bs->bl.zoned == BLK_Z_NONE) {
760
+ co.ret = -ENOTSUP;
761
+ goto out;
762
+ }
763
+ co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len);
764
+out:
765
+ bdrv_dec_in_flight(bs);
766
+ return co.ret;
767
+}
768
+
769
void *qemu_blockalign(BlockDriverState *bs, size_t size)
770
{
771
IO_CODE();
772
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
773
index XXXXXXX..XXXXXXX 100644
774
--- a/qemu-io-cmds.c
775
+++ b/qemu-io-cmds.c
776
@@ -XXX,XX +XXX,XX @@ static const cmdinfo_t flush_cmd = {
777
.oneline = "flush all in-core file state to disk",
778
};
779
780
+static inline int64_t tosector(int64_t bytes)
781
+{
782
+ return bytes >> BDRV_SECTOR_BITS;
783
+}
784
+
785
+static int zone_report_f(BlockBackend *blk, int argc, char **argv)
786
+{
787
+ int ret;
788
+ int64_t offset;
789
+ unsigned int nr_zones;
790
+
791
+ ++optind;
792
+ offset = cvtnum(argv[optind]);
793
+ ++optind;
794
+ nr_zones = cvtnum(argv[optind]);
795
+
796
+ g_autofree BlockZoneDescriptor *zones = NULL;
797
+ zones = g_new(BlockZoneDescriptor, nr_zones);
798
+ ret = blk_zone_report(blk, offset, &nr_zones, zones);
799
+ if (ret < 0) {
800
+ printf("zone report failed: %s\n", strerror(-ret));
801
+ } else {
802
+ for (int i = 0; i < nr_zones; ++i) {
803
+ printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", "
804
+ "cap"" 0x%" PRIx64 ", wptr 0x%" PRIx64 ", "
805
+ "zcond:%u, [type: %u]\n",
806
+ tosector(zones[i].start), tosector(zones[i].length),
807
+ tosector(zones[i].cap), tosector(zones[i].wp),
808
+ zones[i].state, zones[i].type);
809
+ }
810
+ }
811
+ return ret;
812
+}
813
+
814
+static const cmdinfo_t zone_report_cmd = {
815
+ .name = "zone_report",
816
+ .altname = "zrp",
817
+ .cfunc = zone_report_f,
818
+ .argmin = 2,
819
+ .argmax = 2,
820
+ .args = "offset number",
821
+ .oneline = "report zone information",
822
+};
823
+
824
+static int zone_open_f(BlockBackend *blk, int argc, char **argv)
825
+{
826
+ int ret;
827
+ int64_t offset, len;
828
+ ++optind;
829
+ offset = cvtnum(argv[optind]);
830
+ ++optind;
831
+ len = cvtnum(argv[optind]);
832
+ ret = blk_zone_mgmt(blk, BLK_ZO_OPEN, offset, len);
833
+ if (ret < 0) {
834
+ printf("zone open failed: %s\n", strerror(-ret));
835
+ }
836
+ return ret;
837
+}
838
+
839
+static const cmdinfo_t zone_open_cmd = {
840
+ .name = "zone_open",
841
+ .altname = "zo",
842
+ .cfunc = zone_open_f,
843
+ .argmin = 2,
844
+ .argmax = 2,
845
+ .args = "offset len",
846
+ .oneline = "explicit open a range of zones in zone block device",
847
+};
848
+
849
+static int zone_close_f(BlockBackend *blk, int argc, char **argv)
850
+{
851
+ int ret;
852
+ int64_t offset, len;
853
+ ++optind;
854
+ offset = cvtnum(argv[optind]);
855
+ ++optind;
856
+ len = cvtnum(argv[optind]);
857
+ ret = blk_zone_mgmt(blk, BLK_ZO_CLOSE, offset, len);
858
+ if (ret < 0) {
859
+ printf("zone close failed: %s\n", strerror(-ret));
860
+ }
861
+ return ret;
862
+}
863
+
864
+static const cmdinfo_t zone_close_cmd = {
865
+ .name = "zone_close",
866
+ .altname = "zc",
867
+ .cfunc = zone_close_f,
868
+ .argmin = 2,
869
+ .argmax = 2,
870
+ .args = "offset len",
871
+ .oneline = "close a range of zones in zone block device",
872
+};
873
+
874
+static int zone_finish_f(BlockBackend *blk, int argc, char **argv)
875
+{
876
+ int ret;
877
+ int64_t offset, len;
878
+ ++optind;
879
+ offset = cvtnum(argv[optind]);
880
+ ++optind;
881
+ len = cvtnum(argv[optind]);
882
+ ret = blk_zone_mgmt(blk, BLK_ZO_FINISH, offset, len);
883
+ if (ret < 0) {
884
+ printf("zone finish failed: %s\n", strerror(-ret));
885
+ }
886
+ return ret;
887
+}
888
+
889
+static const cmdinfo_t zone_finish_cmd = {
890
+ .name = "zone_finish",
891
+ .altname = "zf",
892
+ .cfunc = zone_finish_f,
893
+ .argmin = 2,
894
+ .argmax = 2,
895
+ .args = "offset len",
896
+ .oneline = "finish a range of zones in zone block device",
897
+};
898
+
899
+static int zone_reset_f(BlockBackend *blk, int argc, char **argv)
900
+{
901
+ int ret;
902
+ int64_t offset, len;
903
+ ++optind;
904
+ offset = cvtnum(argv[optind]);
905
+ ++optind;
906
+ len = cvtnum(argv[optind]);
907
+ ret = blk_zone_mgmt(blk, BLK_ZO_RESET, offset, len);
908
+ if (ret < 0) {
909
+ printf("zone reset failed: %s\n", strerror(-ret));
910
+ }
911
+ return ret;
912
+}
913
+
914
+static const cmdinfo_t zone_reset_cmd = {
915
+ .name = "zone_reset",
916
+ .altname = "zrs",
917
+ .cfunc = zone_reset_f,
918
+ .argmin = 2,
919
+ .argmax = 2,
920
+ .args = "offset len",
921
+ .oneline = "reset a zone write pointer in zone block device",
922
+};
923
+
924
static int truncate_f(BlockBackend *blk, int argc, char **argv);
925
static const cmdinfo_t truncate_cmd = {
926
.name = "truncate",
927
@@ -XXX,XX +XXX,XX @@ static void __attribute((constructor)) init_qemuio_commands(void)
928
qemuio_add_command(&aio_write_cmd);
929
qemuio_add_command(&aio_flush_cmd);
930
qemuio_add_command(&flush_cmd);
931
+ qemuio_add_command(&zone_report_cmd);
932
+ qemuio_add_command(&zone_open_cmd);
933
+ qemuio_add_command(&zone_close_cmd);
934
+ qemuio_add_command(&zone_finish_cmd);
935
+ qemuio_add_command(&zone_reset_cmd);
936
qemuio_add_command(&truncate_cmd);
937
qemuio_add_command(&length_cmd);
938
qemuio_add_command(&info_cmd);
939
--
42
--
940
2.40.0
43
2.44.0
941
44
942
45
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
raw-format driver usually sits on top of file-posix driver. It needs to
4
pass through requests of zone commands.
5
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
9
Reviewed-by: Hannes Reinecke <hare@suse.de>
10
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
11
Acked-by: Kevin Wolf <kwolf@redhat.com>
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Message-id: 20230427172019.3345-5-faithilikerun@gmail.com
14
Message-id: 20230324090605.28361-5-faithilikerun@gmail.com
15
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
16
<philmd@linaro.org>.
17
--Stefan]
18
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
19
---
20
block/raw-format.c | 17 +++++++++++++++++
21
1 file changed, 17 insertions(+)
22
23
diff --git a/block/raw-format.c b/block/raw-format.c
24
index XXXXXXX..XXXXXXX 100644
25
--- a/block/raw-format.c
26
+++ b/block/raw-format.c
27
@@ -XXX,XX +XXX,XX @@ raw_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
28
return bdrv_co_pdiscard(bs->file, offset, bytes);
29
}
30
31
+static int coroutine_fn GRAPH_RDLOCK
32
+raw_co_zone_report(BlockDriverState *bs, int64_t offset,
33
+ unsigned int *nr_zones,
34
+ BlockZoneDescriptor *zones)
35
+{
36
+ return bdrv_co_zone_report(bs->file->bs, offset, nr_zones, zones);
37
+}
38
+
39
+static int coroutine_fn GRAPH_RDLOCK
40
+raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
41
+ int64_t offset, int64_t len)
42
+{
43
+ return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
44
+}
45
+
46
static int64_t coroutine_fn GRAPH_RDLOCK
47
raw_co_getlength(BlockDriverState *bs)
48
{
49
@@ -XXX,XX +XXX,XX @@ BlockDriver bdrv_raw = {
50
.bdrv_co_pwritev = &raw_co_pwritev,
51
.bdrv_co_pwrite_zeroes = &raw_co_pwrite_zeroes,
52
.bdrv_co_pdiscard = &raw_co_pdiscard,
53
+ .bdrv_co_zone_report = &raw_co_zone_report,
54
+ .bdrv_co_zone_mgmt = &raw_co_zone_mgmt,
55
.bdrv_co_block_status = &raw_co_block_status,
56
.bdrv_co_copy_range_from = &raw_co_copy_range_from,
57
.bdrv_co_copy_range_to = &raw_co_copy_range_to,
58
--
59
2.40.0
60
61
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Putting zoned/non-zoned BlockDrivers on top of each other is not
4
allowed.
5
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Reviewed-by: Hannes Reinecke <hare@suse.de>
9
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Message-id: 20230427172019.3345-6-faithilikerun@gmail.com
13
Message-id: 20230324090605.28361-6-faithilikerun@gmail.com
14
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
15
<philmd@linaro.org> and clarify that the check is about zoned
16
BlockDrivers.
17
--Stefan]
18
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
19
---
20
include/block/block_int-common.h | 5 +++++
21
block.c | 19 +++++++++++++++++++
22
block/file-posix.c | 12 ++++++++++++
23
block/raw-format.c | 1 +
24
4 files changed, 37 insertions(+)
25
26
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
27
index XXXXXXX..XXXXXXX 100644
28
--- a/include/block/block_int-common.h
29
+++ b/include/block/block_int-common.h
30
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
31
*/
32
bool is_format;
33
34
+ /*
35
+ * Set to true if the BlockDriver supports zoned children.
36
+ */
37
+ bool supports_zoned_children;
38
+
39
/*
40
* Drivers not implementing bdrv_parse_filename nor bdrv_open should have
41
* this field set to true, except ones that are defined only by their
42
diff --git a/block.c b/block.c
43
index XXXXXXX..XXXXXXX 100644
44
--- a/block.c
45
+++ b/block.c
46
@@ -XXX,XX +XXX,XX @@ void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
47
return;
48
}
49
50
+ /*
51
+ * Non-zoned block drivers do not follow zoned storage constraints
52
+ * (i.e. sequential writes to zones). Refuse mixing zoned and non-zoned
53
+ * drivers in a graph.
54
+ */
55
+ if (!parent_bs->drv->supports_zoned_children &&
56
+ child_bs->bl.zoned == BLK_Z_HM) {
57
+ /*
58
+ * The host-aware model allows zoned storage constraints and random
59
+ * write. Allow mixing host-aware and non-zoned drivers. Using
60
+ * host-aware device as a regular device.
61
+ */
62
+ error_setg(errp, "Cannot add a %s child to a %s parent",
63
+ child_bs->bl.zoned == BLK_Z_HM ? "zoned" : "non-zoned",
64
+ parent_bs->drv->supports_zoned_children ?
65
+ "support zoned children" : "not support zoned children");
66
+ return;
67
+ }
68
+
69
if (!QLIST_EMPTY(&child_bs->parents)) {
70
error_setg(errp, "The node %s already has a parent",
71
child_bs->node_name);
72
diff --git a/block/file-posix.c b/block/file-posix.c
73
index XXXXXXX..XXXXXXX 100644
74
--- a/block/file-posix.c
75
+++ b/block/file-posix.c
76
@@ -XXX,XX +XXX,XX @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
77
goto fail;
78
}
79
}
80
+#ifdef CONFIG_BLKZONED
81
+ /*
82
+ * The kernel page cache does not reliably work for writes to SWR zones
83
+ * of zoned block device because it can not guarantee the order of writes.
84
+ */
85
+ if ((bs->bl.zoned != BLK_Z_NONE) &&
86
+ (!(s->open_flags & O_DIRECT))) {
87
+ error_setg(errp, "The driver supports zoned devices, and it requires "
88
+ "cache.direct=on, which was not specified.");
89
+ return -EINVAL; /* No host kernel page cache */
90
+ }
91
+#endif
92
93
if (S_ISBLK(st.st_mode)) {
94
#ifdef __linux__
95
diff --git a/block/raw-format.c b/block/raw-format.c
96
index XXXXXXX..XXXXXXX 100644
97
--- a/block/raw-format.c
98
+++ b/block/raw-format.c
99
@@ -XXX,XX +XXX,XX @@ static void raw_child_perm(BlockDriverState *bs, BdrvChild *c,
100
BlockDriver bdrv_raw = {
101
.format_name = "raw",
102
.instance_size = sizeof(BDRVRawState),
103
+ .supports_zoned_children = true,
104
.bdrv_probe = &raw_probe,
105
.bdrv_reopen_prepare = &raw_reopen_prepare,
106
.bdrv_reopen_commit = &raw_reopen_commit,
107
--
108
2.40.0
109
110
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
The new block layer APIs of zoned block devices can be tested by:
4
$ tests/qemu-iotests/check zoned
5
Run each zone operation on a newly created null_blk device
6
and see whether it outputs the same zone information.
7
8
Signed-off-by: Sam Li <faithilikerun@gmail.com>
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Message-id: 20230427172019.3345-7-faithilikerun@gmail.com
13
Message-id: 20230324090605.28361-7-faithilikerun@gmail.com
14
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
15
<philmd@linaro.org>.
16
--Stefan]
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
18
---
19
tests/qemu-iotests/tests/zoned | 89 ++++++++++++++++++++++++++++++
20
tests/qemu-iotests/tests/zoned.out | 53 ++++++++++++++++++
21
2 files changed, 142 insertions(+)
22
create mode 100755 tests/qemu-iotests/tests/zoned
23
create mode 100644 tests/qemu-iotests/tests/zoned.out
24
25
diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned
26
new file mode 100755
27
index XXXXXXX..XXXXXXX
28
--- /dev/null
29
+++ b/tests/qemu-iotests/tests/zoned
30
@@ -XXX,XX +XXX,XX @@
31
+#!/usr/bin/env bash
32
+#
33
+# Test zone management operations.
34
+#
35
+
36
+seq="$(basename $0)"
37
+echo "QA output created by $seq"
38
+status=1 # failure is the default!
39
+
40
+_cleanup()
41
+{
42
+ _cleanup_test_img
43
+ sudo -n rmmod null_blk
44
+}
45
+trap "_cleanup; exit \$status" 0 1 2 3 15
46
+
47
+# get standard environment, filters and checks
48
+. ../common.rc
49
+. ../common.filter
50
+. ../common.qemu
51
+
52
+# This test only runs on Linux hosts with raw image files.
53
+_supported_fmt raw
54
+_supported_proto file
55
+_supported_os Linux
56
+
57
+sudo -n true || \
58
+ _notrun 'Password-less sudo required'
59
+
60
+IMG="--image-opts -n driver=host_device,filename=/dev/nullb0"
61
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
62
+
63
+echo "Testing a null_blk device:"
64
+echo "case 1: if the operations work"
65
+sudo -n modprobe null_blk nr_devices=1 zoned=1
66
+sudo -n chmod 0666 /dev/nullb0
67
+
68
+echo "(1) report the first zone:"
69
+$QEMU_IO $IMG -c "zrp 0 1"
70
+echo
71
+echo "report the first 10 zones"
72
+$QEMU_IO $IMG -c "zrp 0 10"
73
+echo
74
+echo "report the last zone:"
75
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2" # 0x3e70000000 / 512 = 0x1f380000
76
+echo
77
+echo
78
+echo "(2) opening the first zone"
79
+$QEMU_IO $IMG -c "zo 0 268435456" # 268435456 / 512 = 524288
80
+echo "report after:"
81
+$QEMU_IO $IMG -c "zrp 0 1"
82
+echo
83
+echo "opening the second zone"
84
+$QEMU_IO $IMG -c "zo 268435456 268435456" #
85
+echo "report after:"
86
+$QEMU_IO $IMG -c "zrp 268435456 1"
87
+echo
88
+echo "opening the last zone"
89
+$QEMU_IO $IMG -c "zo 0x3e70000000 268435456"
90
+echo "report after:"
91
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2"
92
+echo
93
+echo
94
+echo "(3) closing the first zone"
95
+$QEMU_IO $IMG -c "zc 0 268435456"
96
+echo "report after:"
97
+$QEMU_IO $IMG -c "zrp 0 1"
98
+echo
99
+echo "closing the last zone"
100
+$QEMU_IO $IMG -c "zc 0x3e70000000 268435456"
101
+echo "report after:"
102
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2"
103
+echo
104
+echo
105
+echo "(4) finishing the second zone"
106
+$QEMU_IO $IMG -c "zf 268435456 268435456"
107
+echo "After finishing a zone:"
108
+$QEMU_IO $IMG -c "zrp 268435456 1"
109
+echo
110
+echo
111
+echo "(5) resetting the second zone"
112
+$QEMU_IO $IMG -c "zrs 268435456 268435456"
113
+echo "After resetting a zone:"
114
+$QEMU_IO $IMG -c "zrp 268435456 1"
115
+
116
+# success, all done
117
+echo "*** done"
118
+rm -f $seq.full
119
+status=0
120
diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out
121
new file mode 100644
122
index XXXXXXX..XXXXXXX
123
--- /dev/null
124
+++ b/tests/qemu-iotests/tests/zoned.out
125
@@ -XXX,XX +XXX,XX @@
126
+QA output created by zoned
127
+Testing a null_blk device:
128
+case 1: if the operations work
129
+(1) report the first zone:
130
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
131
+
132
+report the first 10 zones
133
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
134
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
135
+start: 0x100000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:1, [type: 2]
136
+start: 0x180000, len 0x80000, cap 0x80000, wptr 0x180000, zcond:1, [type: 2]
137
+start: 0x200000, len 0x80000, cap 0x80000, wptr 0x200000, zcond:1, [type: 2]
138
+start: 0x280000, len 0x80000, cap 0x80000, wptr 0x280000, zcond:1, [type: 2]
139
+start: 0x300000, len 0x80000, cap 0x80000, wptr 0x300000, zcond:1, [type: 2]
140
+start: 0x380000, len 0x80000, cap 0x80000, wptr 0x380000, zcond:1, [type: 2]
141
+start: 0x400000, len 0x80000, cap 0x80000, wptr 0x400000, zcond:1, [type: 2]
142
+start: 0x480000, len 0x80000, cap 0x80000, wptr 0x480000, zcond:1, [type: 2]
143
+
144
+report the last zone:
145
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:1, [type: 2]
146
+
147
+
148
+(2) opening the first zone
149
+report after:
150
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:3, [type: 2]
151
+
152
+opening the second zone
153
+report after:
154
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:3, [type: 2]
155
+
156
+opening the last zone
157
+report after:
158
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:3, [type: 2]
159
+
160
+
161
+(3) closing the first zone
162
+report after:
163
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
164
+
165
+closing the last zone
166
+report after:
167
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:1, [type: 2]
168
+
169
+
170
+(4) finishing the second zone
171
+After finishing a zone:
172
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:14, [type: 2]
173
+
174
+
175
+(5) resetting the second zone
176
+After resetting a zone:
177
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
178
+*** done
179
--
180
2.40.0
181
182
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
6
Acked-by: Kevin Wolf <kwolf@redhat.com>
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Message-id: 20230427172019.3345-8-faithilikerun@gmail.com
9
Message-id: 20230324090605.28361-8-faithilikerun@gmail.com
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
12
block/file-posix.c | 3 +++
13
block/trace-events | 2 ++
14
2 files changed, 5 insertions(+)
15
16
diff --git a/block/file-posix.c b/block/file-posix.c
17
index XXXXXXX..XXXXXXX 100644
18
--- a/block/file-posix.c
19
+++ b/block/file-posix.c
20
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
21
},
22
};
23
24
+ trace_zbd_zone_report(bs, *nr_zones, offset >> BDRV_SECTOR_BITS);
25
return raw_thread_pool_submit(handle_aiocb_zone_report, &acb);
26
}
27
#endif
28
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
29
},
30
};
31
32
+ trace_zbd_zone_mgmt(bs, op_name, offset >> BDRV_SECTOR_BITS,
33
+ len >> BDRV_SECTOR_BITS);
34
ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb);
35
if (ret != 0) {
36
error_report("ioctl %s failed %d", op_name, ret);
37
diff --git a/block/trace-events b/block/trace-events
38
index XXXXXXX..XXXXXXX 100644
39
--- a/block/trace-events
40
+++ b/block/trace-events
41
@@ -XXX,XX +XXX,XX @@ file_FindEjectableOpticalMedia(const char *media) "Matching using %s"
42
file_setup_cdrom(const char *partition) "Using %s as optical disc"
43
file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d"
44
file_flush_fdatasync_failed(int err) "errno %d"
45
+zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report %d zones starting at sector offset 0x%" PRIx64 ""
46
+zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs %p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " sectors"
47
48
# ssh.c
49
sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
50
--
51
2.40.0
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Add the documentation about the zoned device support to virtio-blk
4
emulation.
5
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
9
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Message-id: 20230427172019.3345-9-faithilikerun@gmail.com
13
Message-id: 20230324090605.28361-9-faithilikerun@gmail.com
14
[Add index-api.rst to fix "zoned-storage.rst:document isn't included in
15
any toctree" error and fix pre-formatted command-line indentation.
16
--Stefan]
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
18
---
19
docs/devel/index-api.rst | 1 +
20
docs/devel/zoned-storage.rst | 43 ++++++++++++++++++++++++++
21
docs/system/qemu-block-drivers.rst.inc | 6 ++++
22
3 files changed, 50 insertions(+)
23
create mode 100644 docs/devel/zoned-storage.rst
24
25
diff --git a/docs/devel/index-api.rst b/docs/devel/index-api.rst
26
index XXXXXXX..XXXXXXX 100644
27
--- a/docs/devel/index-api.rst
28
+++ b/docs/devel/index-api.rst
29
@@ -XXX,XX +XXX,XX @@ generated from in-code annotations to function prototypes.
30
memory
31
modules
32
ui
33
+ zoned-storage
34
diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
35
new file mode 100644
36
index XXXXXXX..XXXXXXX
37
--- /dev/null
38
+++ b/docs/devel/zoned-storage.rst
39
@@ -XXX,XX +XXX,XX @@
40
+=============
41
+zoned-storage
42
+=============
43
+
44
+Zoned Block Devices (ZBDs) divide the LBA space into block regions called zones
45
+that are larger than the LBA size. They can only allow sequential writes, which
46
+can reduce write amplification in SSDs, and potentially lead to higher
47
+throughput and increased capacity. More details about ZBDs can be found at:
48
+
49
+https://zonedstorage.io/docs/introduction/zoned-storage
50
+
51
+1. Block layer APIs for zoned storage
52
+-------------------------------------
53
+QEMU block layer supports three zoned storage models:
54
+- BLK_Z_HM: The host-managed zoned model only allows sequential writes access
55
+to zones. It supports ZBD-specific I/O commands that can be used by a host to
56
+manage the zones of a device.
57
+- BLK_Z_HA: The host-aware zoned model allows random write operations in
58
+zones, making it backward compatible with regular block devices.
59
+- BLK_Z_NONE: The non-zoned model has no zones support. It includes both
60
+regular and drive-managed ZBD devices. ZBD-specific I/O commands are not
61
+supported.
62
+
63
+The block device information resides inside BlockDriverState. QEMU uses
64
+BlockLimits struct(BlockDriverState::bl) that is continuously accessed by the
65
+block layer while processing I/O requests. A BlockBackend has a root pointer to
66
+a BlockDriverState graph(for example, raw format on top of file-posix). The
67
+zoned storage information can be propagated from the leaf BlockDriverState all
68
+the way up to the BlockBackend. If the zoned storage model in file-posix is
69
+set to BLK_Z_HM, then block drivers will declare support for zoned host device.
70
+
71
+The block layer APIs support commands needed for zoned storage devices,
72
+including report zones, four zone operations, and zone append.
73
+
74
+2. Emulating zoned storage controllers
75
+--------------------------------------
76
+When the BlockBackend's BlockLimits model reports a zoned storage device, users
77
+like the virtio-blk emulation or the qemu-io-cmds.c utility can use block layer
78
+APIs for zoned storage emulation or testing.
79
+
80
+For example, to test zone_report on a null_blk device using qemu-io is::
81
+
82
+ $ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0 -c "zrp offset nr_zones"
83
diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc
84
index XXXXXXX..XXXXXXX 100644
85
--- a/docs/system/qemu-block-drivers.rst.inc
86
+++ b/docs/system/qemu-block-drivers.rst.inc
87
@@ -XXX,XX +XXX,XX @@ Hard disks
88
you may corrupt your host data (use the ``-snapshot`` command
89
line option or modify the device permissions accordingly).
90
91
+Zoned block devices
92
+ Zoned block devices can be passed through to the guest if the emulated storage
93
+ controller supports zoned storage. Use ``--blockdev host_device,
94
+ node-name=drive0,filename=/dev/nullb0,cache.direct=on`` to pass through
95
+ ``/dev/nullb0`` as ``drive0``.
96
+
97
Windows
98
^^^^^^^
99
100
--
101
2.40.0
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Since Linux doesn't have a user API to issue zone append operations to
4
zoned devices from user space, the file-posix driver is modified to add
5
zone append emulation using regular writes. To do this, the file-posix
6
driver tracks the wp location of all zones of the device. It uses an
7
array of uint64_t. The most significant bit of each wp location indicates
8
if the zone type is conventional zones.
9
10
The zones wp can be changed due to the following operations issued:
11
- zone reset: change the wp to the start offset of that zone
12
- zone finish: change to the end location of that zone
13
- write to a zone
14
- zone append
15
16
Signed-off-by: Sam Li <faithilikerun@gmail.com>
17
Message-id: 20230427172339.3709-2-faithilikerun@gmail.com
18
[Fix errno propagation from handle_aiocb_zone_mgmt()
19
--Stefan]
20
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
21
---
22
include/block/block-common.h | 14 +++
23
include/block/block_int-common.h | 5 +
24
block/file-posix.c | 178 ++++++++++++++++++++++++++++++-
25
3 files changed, 193 insertions(+), 4 deletions(-)
26
27
diff --git a/include/block/block-common.h b/include/block/block-common.h
28
index XXXXXXX..XXXXXXX 100644
29
--- a/include/block/block-common.h
30
+++ b/include/block/block-common.h
31
@@ -XXX,XX +XXX,XX @@ typedef struct BlockZoneDescriptor {
32
BlockZoneState state;
33
} BlockZoneDescriptor;
34
35
+/*
36
+ * Track write pointers of a zone in bytes.
37
+ */
38
+typedef struct BlockZoneWps {
39
+ CoMutex colock;
40
+ uint64_t wp[];
41
+} BlockZoneWps;
42
+
43
typedef struct BlockDriverInfo {
44
/* in bytes, 0 if irrelevant */
45
int cluster_size;
46
@@ -XXX,XX +XXX,XX @@ typedef enum {
47
#define BDRV_SECTOR_BITS 9
48
#define BDRV_SECTOR_SIZE (1ULL << BDRV_SECTOR_BITS)
49
50
+/*
51
+ * Get the first most significant bit of wp. If it is zero, then
52
+ * the zone type is SWR.
53
+ */
54
+#define BDRV_ZT_IS_CONV(wp) (wp & (1ULL << 63))
55
+
56
#define BDRV_REQUEST_MAX_SECTORS MIN_CONST(SIZE_MAX >> BDRV_SECTOR_BITS, \
57
INT_MAX >> BDRV_SECTOR_BITS)
58
#define BDRV_REQUEST_MAX_BYTES (BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS)
59
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
60
index XXXXXXX..XXXXXXX 100644
61
--- a/include/block/block_int-common.h
62
+++ b/include/block/block_int-common.h
63
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
64
65
/* maximum number of active zones */
66
int64_t max_active_zones;
67
+
68
+ int64_t write_granularity;
69
} BlockLimits;
70
71
typedef struct BdrvOpBlocker BdrvOpBlocker;
72
@@ -XXX,XX +XXX,XX @@ struct BlockDriverState {
73
CoMutex bsc_modify_lock;
74
/* Always non-NULL, but must only be dereferenced under an RCU read guard */
75
BdrvBlockStatusCache *block_status_cache;
76
+
77
+ /* array of write pointers' location of each zone in the zoned device. */
78
+ BlockZoneWps *wps;
79
};
80
81
struct BlockBackendRootState {
82
diff --git a/block/file-posix.c b/block/file-posix.c
83
index XXXXXXX..XXXXXXX 100644
84
--- a/block/file-posix.c
85
+++ b/block/file-posix.c
86
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_segments(int fd, struct stat *st)
87
}
88
89
#if defined(CONFIG_BLKZONED)
90
+/*
91
+ * If the reset_all flag is true, then the wps of zone whose state is
92
+ * not readonly or offline should be all reset to the start sector.
93
+ * Else, take the real wp of the device.
94
+ */
95
+static int get_zones_wp(BlockDriverState *bs, int fd, int64_t offset,
96
+ unsigned int nrz, bool reset_all)
97
+{
98
+ struct blk_zone *blkz;
99
+ size_t rep_size;
100
+ uint64_t sector = offset >> BDRV_SECTOR_BITS;
101
+ BlockZoneWps *wps = bs->wps;
102
+ unsigned int j = offset / bs->bl.zone_size;
103
+ unsigned int n = 0, i = 0;
104
+ int ret;
105
+ rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
106
+ g_autofree struct blk_zone_report *rep = NULL;
107
+
108
+ rep = g_malloc(rep_size);
109
+ blkz = (struct blk_zone *)(rep + 1);
110
+ while (n < nrz) {
111
+ memset(rep, 0, rep_size);
112
+ rep->sector = sector;
113
+ rep->nr_zones = nrz - n;
114
+
115
+ do {
116
+ ret = ioctl(fd, BLKREPORTZONE, rep);
117
+ } while (ret != 0 && errno == EINTR);
118
+ if (ret != 0) {
119
+ error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
120
+ fd, offset, errno);
121
+ return -errno;
122
+ }
123
+
124
+ if (!rep->nr_zones) {
125
+ break;
126
+ }
127
+
128
+ for (i = 0; i < rep->nr_zones; ++i, ++n, ++j) {
129
+ /*
130
+ * The wp tracking cares only about sequential writes required and
131
+ * sequential write preferred zones so that the wp can advance to
132
+ * the right location.
133
+ * Use the most significant bit of the wp location to indicate the
134
+ * zone type: 0 for SWR/SWP zones and 1 for conventional zones.
135
+ */
136
+ if (blkz[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
137
+ wps->wp[j] |= 1ULL << 63;
138
+ } else {
139
+ switch(blkz[i].cond) {
140
+ case BLK_ZONE_COND_FULL:
141
+ case BLK_ZONE_COND_READONLY:
142
+ /* Zone not writable */
143
+ wps->wp[j] = (blkz[i].start + blkz[i].len) << BDRV_SECTOR_BITS;
144
+ break;
145
+ case BLK_ZONE_COND_OFFLINE:
146
+ /* Zone not writable nor readable */
147
+ wps->wp[j] = (blkz[i].start) << BDRV_SECTOR_BITS;
148
+ break;
149
+ default:
150
+ if (reset_all) {
151
+ wps->wp[j] = blkz[i].start << BDRV_SECTOR_BITS;
152
+ } else {
153
+ wps->wp[j] = blkz[i].wp << BDRV_SECTOR_BITS;
154
+ }
155
+ break;
156
+ }
157
+ }
158
+ }
159
+ sector = blkz[i - 1].start + blkz[i - 1].len;
160
+ }
161
+
162
+ return 0;
163
+}
164
+
165
+static void update_zones_wp(BlockDriverState *bs, int fd, int64_t offset,
166
+ unsigned int nrz)
167
+{
168
+ if (get_zones_wp(bs, fd, offset, nrz, 0) < 0) {
169
+ error_report("update zone wp failed");
170
+ }
171
+}
172
+
173
static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
174
Error **errp)
175
{
176
+ BDRVRawState *s = bs->opaque;
177
BlockZoneModel zoned;
178
int ret;
179
180
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
181
if (ret > 0) {
182
bs->bl.max_append_sectors = ret >> BDRV_SECTOR_BITS;
183
}
184
+
185
+ ret = get_sysfs_long_val(st, "physical_block_size");
186
+ if (ret >= 0) {
187
+ bs->bl.write_granularity = ret;
188
+ }
189
+
190
+ /* The refresh_limits() function can be called multiple times. */
191
+ g_free(bs->wps);
192
+ bs->wps = g_malloc(sizeof(BlockZoneWps) +
193
+ sizeof(int64_t) * bs->bl.nr_zones);
194
+ ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 0);
195
+ if (ret < 0) {
196
+ error_setg_errno(errp, -ret, "report wps failed");
197
+ bs->wps = NULL;
198
+ return;
199
+ }
200
+ qemu_co_mutex_init(&bs->wps->colock);
201
}
202
#else /* !defined(CONFIG_BLKZONED) */
203
static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
204
@@ -XXX,XX +XXX,XX @@ static int handle_aiocb_zone_mgmt(void *opaque)
205
ret = ioctl(fd, aiocb->zone_mgmt.op, &range);
206
} while (ret != 0 && errno == EINTR);
207
208
- return ret;
209
+ return ret < 0 ? -errno : ret;
210
}
211
#endif
212
213
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
214
{
215
BDRVRawState *s = bs->opaque;
216
RawPosixAIOData acb;
217
+ int ret;
218
219
if (fd_open(bs) < 0)
220
return -EIO;
221
+#if defined(CONFIG_BLKZONED)
222
+ if (type & QEMU_AIO_WRITE && bs->wps) {
223
+ qemu_co_mutex_lock(&bs->wps->colock);
224
+ }
225
+#endif
226
227
/*
228
* When using O_DIRECT, the request must be aligned to be able to use
229
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
230
#ifdef CONFIG_LINUX_IO_URING
231
} else if (s->use_linux_io_uring) {
232
assert(qiov->size == bytes);
233
- return luring_co_submit(bs, s->fd, offset, qiov, type);
234
+ ret = luring_co_submit(bs, s->fd, offset, qiov, type);
235
+ goto out;
236
#endif
237
#ifdef CONFIG_LINUX_AIO
238
} else if (s->use_linux_aio) {
239
assert(qiov->size == bytes);
240
- return laio_co_submit(s->fd, offset, qiov, type, s->aio_max_batch);
241
+ ret = laio_co_submit(s->fd, offset, qiov, type,
242
+ s->aio_max_batch);
243
+ goto out;
244
#endif
245
}
246
247
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
248
};
249
250
assert(qiov->size == bytes);
251
- return raw_thread_pool_submit(handle_aiocb_rw, &acb);
252
+ ret = raw_thread_pool_submit(handle_aiocb_rw, &acb);
253
+ goto out; /* Avoid the compiler err of unused label */
254
+
255
+out:
256
+#if defined(CONFIG_BLKZONED)
257
+{
258
+ BlockZoneWps *wps = bs->wps;
259
+ if (ret == 0) {
260
+ if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) {
261
+ uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
262
+ if (!BDRV_ZT_IS_CONV(*wp)) {
263
+ /* Advance the wp if needed */
264
+ if (offset + bytes > *wp) {
265
+ *wp = offset + bytes;
266
+ }
267
+ }
268
+ }
269
+ } else {
270
+ if (type & QEMU_AIO_WRITE) {
271
+ update_zones_wp(bs, s->fd, 0, 1);
272
+ }
273
+ }
274
+
275
+ if (type & QEMU_AIO_WRITE && wps) {
276
+ qemu_co_mutex_unlock(&wps->colock);
277
+ }
278
+}
279
+#endif
280
+ return ret;
281
}
282
283
static int coroutine_fn raw_co_preadv(BlockDriverState *bs, int64_t offset,
284
@@ -XXX,XX +XXX,XX @@ static void raw_close(BlockDriverState *bs)
285
BDRVRawState *s = bs->opaque;
286
287
if (s->fd >= 0) {
288
+#if defined(CONFIG_BLKZONED)
289
+ g_free(bs->wps);
290
+#endif
291
qemu_close(s->fd);
292
s->fd = -1;
293
}
294
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
295
const char *op_name;
296
unsigned long zo;
297
int ret;
298
+ BlockZoneWps *wps = bs->wps;
299
int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
300
301
zone_size = bs->bl.zone_size;
302
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
303
return -EINVAL;
304
}
305
306
+ uint32_t i = offset / bs->bl.zone_size;
307
+ uint32_t nrz = len / bs->bl.zone_size;
308
+ uint64_t *wp = &wps->wp[i];
309
+ if (BDRV_ZT_IS_CONV(*wp) && len != capacity) {
310
+ error_report("zone mgmt operations are not allowed for conventional zones");
311
+ return -EIO;
312
+ }
313
+
314
switch (op) {
315
case BLK_ZO_OPEN:
316
op_name = "BLKOPENZONE";
317
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
318
len >> BDRV_SECTOR_BITS);
319
ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb);
320
if (ret != 0) {
321
+ update_zones_wp(bs, s->fd, offset, i);
322
error_report("ioctl %s failed %d", op_name, ret);
323
+ return ret;
324
+ }
325
+
326
+ if (zo == BLKRESETZONE && len == capacity) {
327
+ ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 1);
328
+ if (ret < 0) {
329
+ error_report("reporting single wp failed");
330
+ return ret;
331
+ }
332
+ } else if (zo == BLKRESETZONE) {
333
+ for (unsigned int j = 0; j < nrz; ++j) {
334
+ wp[j] = offset + j * zone_size;
335
+ }
336
+ } else if (zo == BLKFINISHZONE) {
337
+ for (unsigned int j = 0; j < nrz; ++j) {
338
+ /* The zoned device allows the last zone smaller that the
339
+ * zone size. */
340
+ wp[j] = MIN(offset + (j + 1) * zone_size, offset + len);
341
+ }
342
}
343
344
return ret;
345
--
346
2.40.0
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
A zone append command is a write operation that specifies the first
4
logical block of a zone as the write position. When writing to a zoned
5
block device using zone append, the byte offset of the call may point at
6
any position within the zone to which the data is being appended. Upon
7
completion the device will respond with the position where the data has
8
been written in the zone.
9
10
Signed-off-by: Sam Li <faithilikerun@gmail.com>
11
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
12
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Message-id: 20230427172339.3709-3-faithilikerun@gmail.com
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
15
---
16
include/block/block-io.h | 4 ++
17
include/block/block_int-common.h | 3 ++
18
include/block/raw-aio.h | 4 +-
19
include/sysemu/block-backend-io.h | 9 +++++
20
block/block-backend.c | 61 +++++++++++++++++++++++++++++++
21
block/file-posix.c | 58 +++++++++++++++++++++++++----
22
block/io.c | 27 ++++++++++++++
23
block/io_uring.c | 4 ++
24
block/linux-aio.c | 3 ++
25
block/raw-format.c | 8 ++++
26
10 files changed, 173 insertions(+), 8 deletions(-)
27
28
diff --git a/include/block/block-io.h b/include/block/block-io.h
29
index XXXXXXX..XXXXXXX 100644
30
--- a/include/block/block-io.h
31
+++ b/include/block/block-io.h
32
@@ -XXX,XX +XXX,XX @@ int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_report(BlockDriverState *bs,
33
int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs,
34
BlockZoneOp op,
35
int64_t offset, int64_t len);
36
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_append(BlockDriverState *bs,
37
+ int64_t *offset,
38
+ QEMUIOVector *qiov,
39
+ BdrvRequestFlags flags);
40
41
bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
42
int bdrv_block_status(BlockDriverState *bs, int64_t offset,
43
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
44
index XXXXXXX..XXXXXXX 100644
45
--- a/include/block/block_int-common.h
46
+++ b/include/block/block_int-common.h
47
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
48
BlockZoneDescriptor *zones);
49
int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
50
int64_t offset, int64_t len);
51
+ int coroutine_fn (*bdrv_co_zone_append)(BlockDriverState *bs,
52
+ int64_t *offset, QEMUIOVector *qiov,
53
+ BdrvRequestFlags flags);
54
55
/* removable device specific */
56
bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)(
57
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
58
index XXXXXXX..XXXXXXX 100644
59
--- a/include/block/raw-aio.h
60
+++ b/include/block/raw-aio.h
61
@@ -XXX,XX +XXX,XX @@
62
#define QEMU_AIO_TRUNCATE 0x0080
63
#define QEMU_AIO_ZONE_REPORT 0x0100
64
#define QEMU_AIO_ZONE_MGMT 0x0200
65
+#define QEMU_AIO_ZONE_APPEND 0x0400
66
#define QEMU_AIO_TYPE_MASK \
67
(QEMU_AIO_READ | \
68
QEMU_AIO_WRITE | \
69
@@ -XXX,XX +XXX,XX @@
70
QEMU_AIO_COPY_RANGE | \
71
QEMU_AIO_TRUNCATE | \
72
QEMU_AIO_ZONE_REPORT | \
73
- QEMU_AIO_ZONE_MGMT)
74
+ QEMU_AIO_ZONE_MGMT | \
75
+ QEMU_AIO_ZONE_APPEND)
76
77
/* AIO flags */
78
#define QEMU_AIO_MISALIGNED 0x1000
79
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
80
index XXXXXXX..XXXXXXX 100644
81
--- a/include/sysemu/block-backend-io.h
82
+++ b/include/sysemu/block-backend-io.h
83
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
84
BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
85
int64_t offset, int64_t len,
86
BlockCompletionFunc *cb, void *opaque);
87
+BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
88
+ QEMUIOVector *qiov, BdrvRequestFlags flags,
89
+ BlockCompletionFunc *cb, void *opaque);
90
BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes,
91
BlockCompletionFunc *cb, void *opaque);
92
void blk_aio_cancel_async(BlockAIOCB *acb);
93
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
94
int64_t offset, int64_t len);
95
int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
96
int64_t offset, int64_t len);
97
+int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
98
+ QEMUIOVector *qiov,
99
+ BdrvRequestFlags flags);
100
+int co_wrapper_mixed blk_zone_append(BlockBackend *blk, int64_t *offset,
101
+ QEMUIOVector *qiov,
102
+ BdrvRequestFlags flags);
103
104
int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset,
105
int64_t bytes);
106
diff --git a/block/block-backend.c b/block/block-backend.c
107
index XXXXXXX..XXXXXXX 100644
108
--- a/block/block-backend.c
109
+++ b/block/block-backend.c
110
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
111
return &acb->common;
112
}
113
114
+static void coroutine_fn blk_aio_zone_append_entry(void *opaque)
115
+{
116
+ BlkAioEmAIOCB *acb = opaque;
117
+ BlkRwCo *rwco = &acb->rwco;
118
+
119
+ rwco->ret = blk_co_zone_append(rwco->blk, (int64_t *)(uintptr_t)acb->bytes,
120
+ rwco->iobuf, rwco->flags);
121
+ blk_aio_complete(acb);
122
+}
123
+
124
+BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
125
+ QEMUIOVector *qiov, BdrvRequestFlags flags,
126
+ BlockCompletionFunc *cb, void *opaque) {
127
+ BlkAioEmAIOCB *acb;
128
+ Coroutine *co;
129
+ IO_CODE();
130
+
131
+ blk_inc_in_flight(blk);
132
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
133
+ acb->rwco = (BlkRwCo) {
134
+ .blk = blk,
135
+ .ret = NOT_DONE,
136
+ .flags = flags,
137
+ .iobuf = qiov,
138
+ };
139
+ acb->bytes = (int64_t)(uintptr_t)offset;
140
+ acb->has_returned = false;
141
+
142
+ co = qemu_coroutine_create(blk_aio_zone_append_entry, acb);
143
+ aio_co_enter(blk_get_aio_context(blk), co);
144
+ acb->has_returned = true;
145
+ if (acb->rwco.ret != NOT_DONE) {
146
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
147
+ blk_aio_complete_bh, acb);
148
+ }
149
+
150
+ return &acb->common;
151
+}
152
+
153
/*
154
* Send a zone_report command.
155
* offset is a byte offset from the start of the device. No alignment
156
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
157
return ret;
158
}
159
160
+/*
161
+ * Send a zone_append command.
162
+ */
163
+int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
164
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
165
+{
166
+ int ret;
167
+ IO_CODE();
168
+
169
+ blk_inc_in_flight(blk);
170
+ blk_wait_while_drained(blk);
171
+ GRAPH_RDLOCK_GUARD();
172
+ if (!blk_is_available(blk)) {
173
+ blk_dec_in_flight(blk);
174
+ return -ENOMEDIUM;
175
+ }
176
+
177
+ ret = bdrv_co_zone_append(blk_bs(blk), offset, qiov, flags);
178
+ blk_dec_in_flight(blk);
179
+ return ret;
180
+}
181
+
182
void blk_drain(BlockBackend *blk)
183
{
184
BlockDriverState *bs = blk_bs(blk);
185
diff --git a/block/file-posix.c b/block/file-posix.c
186
index XXXXXXX..XXXXXXX 100644
187
--- a/block/file-posix.c
188
+++ b/block/file-posix.c
189
@@ -XXX,XX +XXX,XX @@ typedef struct BDRVRawState {
190
bool has_write_zeroes:1;
191
bool use_linux_aio:1;
192
bool use_linux_io_uring:1;
193
+ int64_t *offset; /* offset of zone append operation */
194
int page_cache_inconsistent; /* errno from fdatasync failure */
195
bool has_fallocate;
196
bool needs_alignment;
197
@@ -XXX,XX +XXX,XX @@ static ssize_t handle_aiocb_rw_vector(RawPosixAIOData *aiocb)
198
ssize_t len;
199
200
len = RETRY_ON_EINTR(
201
- (aiocb->aio_type & QEMU_AIO_WRITE) ?
202
+ (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) ?
203
qemu_pwritev(aiocb->aio_fildes,
204
aiocb->io.iov,
205
aiocb->io.niov,
206
@@ -XXX,XX +XXX,XX @@ static ssize_t handle_aiocb_rw_linear(RawPosixAIOData *aiocb, char *buf)
207
ssize_t len;
208
209
while (offset < aiocb->aio_nbytes) {
210
- if (aiocb->aio_type & QEMU_AIO_WRITE) {
211
+ if (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
212
len = pwrite(aiocb->aio_fildes,
213
(const char *)buf + offset,
214
aiocb->aio_nbytes - offset,
215
@@ -XXX,XX +XXX,XX @@ static int handle_aiocb_rw(void *opaque)
216
}
217
218
nbytes = handle_aiocb_rw_linear(aiocb, buf);
219
- if (!(aiocb->aio_type & QEMU_AIO_WRITE)) {
220
+ if (!(aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))) {
221
char *p = buf;
222
size_t count = aiocb->aio_nbytes, copy;
223
int i;
224
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
225
if (fd_open(bs) < 0)
226
return -EIO;
227
#if defined(CONFIG_BLKZONED)
228
- if (type & QEMU_AIO_WRITE && bs->wps) {
229
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && bs->wps) {
230
qemu_co_mutex_lock(&bs->wps->colock);
231
+ if (type & QEMU_AIO_ZONE_APPEND && bs->bl.zone_size) {
232
+ int index = offset / bs->bl.zone_size;
233
+ offset = bs->wps->wp[index];
234
+ }
235
}
236
#endif
237
238
@@ -XXX,XX +XXX,XX @@ out:
239
{
240
BlockZoneWps *wps = bs->wps;
241
if (ret == 0) {
242
- if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) {
243
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))
244
+ && wps && bs->bl.zone_size) {
245
uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
246
if (!BDRV_ZT_IS_CONV(*wp)) {
247
+ if (type & QEMU_AIO_ZONE_APPEND) {
248
+ *s->offset = *wp;
249
+ }
250
/* Advance the wp if needed */
251
if (offset + bytes > *wp) {
252
*wp = offset + bytes;
253
@@ -XXX,XX +XXX,XX @@ out:
254
}
255
}
256
} else {
257
- if (type & QEMU_AIO_WRITE) {
258
+ if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
259
update_zones_wp(bs, s->fd, 0, 1);
260
}
261
}
262
263
- if (type & QEMU_AIO_WRITE && wps) {
264
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && wps) {
265
qemu_co_mutex_unlock(&wps->colock);
266
}
267
}
268
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
269
}
270
#endif
271
272
+#if defined(CONFIG_BLKZONED)
273
+static int coroutine_fn raw_co_zone_append(BlockDriverState *bs,
274
+ int64_t *offset,
275
+ QEMUIOVector *qiov,
276
+ BdrvRequestFlags flags) {
277
+ assert(flags == 0);
278
+ int64_t zone_size_mask = bs->bl.zone_size - 1;
279
+ int64_t iov_len = 0;
280
+ int64_t len = 0;
281
+ BDRVRawState *s = bs->opaque;
282
+ s->offset = offset;
283
+
284
+ if (*offset & zone_size_mask) {
285
+ error_report("sector offset %" PRId64 " is not aligned to zone size "
286
+ "%" PRId32 "", *offset / 512, bs->bl.zone_size / 512);
287
+ return -EINVAL;
288
+ }
289
+
290
+ int64_t wg = bs->bl.write_granularity;
291
+ int64_t wg_mask = wg - 1;
292
+ for (int i = 0; i < qiov->niov; i++) {
293
+ iov_len = qiov->iov[i].iov_len;
294
+ if (iov_len & wg_mask) {
295
+ error_report("len of IOVector[%d] %" PRId64 " is not aligned to "
296
+ "block size %" PRId64 "", i, iov_len, wg);
297
+ return -EINVAL;
298
+ }
299
+ len += iov_len;
300
+ }
301
+
302
+ return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
303
+}
304
+#endif
305
+
306
static coroutine_fn int
307
raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
308
bool blkdev)
309
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_device = {
310
/* zone management operations */
311
.bdrv_co_zone_report = raw_co_zone_report,
312
.bdrv_co_zone_mgmt = raw_co_zone_mgmt,
313
+ .bdrv_co_zone_append = raw_co_zone_append,
314
#endif
315
};
316
317
diff --git a/block/io.c b/block/io.c
318
index XXXXXXX..XXXXXXX 100644
319
--- a/block/io.c
320
+++ b/block/io.c
321
@@ -XXX,XX +XXX,XX @@ out:
322
return co.ret;
323
}
324
325
+int coroutine_fn bdrv_co_zone_append(BlockDriverState *bs, int64_t *offset,
326
+ QEMUIOVector *qiov,
327
+ BdrvRequestFlags flags)
328
+{
329
+ int ret;
330
+ BlockDriver *drv = bs->drv;
331
+ CoroutineIOCompletion co = {
332
+ .coroutine = qemu_coroutine_self(),
333
+ };
334
+ IO_CODE();
335
+
336
+ ret = bdrv_check_qiov_request(*offset, qiov->size, qiov, 0, NULL);
337
+ if (ret < 0) {
338
+ return ret;
339
+ }
340
+
341
+ bdrv_inc_in_flight(bs);
342
+ if (!drv || !drv->bdrv_co_zone_append || bs->bl.zoned == BLK_Z_NONE) {
343
+ co.ret = -ENOTSUP;
344
+ goto out;
345
+ }
346
+ co.ret = drv->bdrv_co_zone_append(bs, offset, qiov, flags);
347
+out:
348
+ bdrv_dec_in_flight(bs);
349
+ return co.ret;
350
+}
351
+
352
void *qemu_blockalign(BlockDriverState *bs, size_t size)
353
{
354
IO_CODE();
355
diff --git a/block/io_uring.c b/block/io_uring.c
356
index XXXXXXX..XXXXXXX 100644
357
--- a/block/io_uring.c
358
+++ b/block/io_uring.c
359
@@ -XXX,XX +XXX,XX @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s,
360
io_uring_prep_writev(sqes, fd, luringcb->qiov->iov,
361
luringcb->qiov->niov, offset);
362
break;
363
+ case QEMU_AIO_ZONE_APPEND:
364
+ io_uring_prep_writev(sqes, fd, luringcb->qiov->iov,
365
+ luringcb->qiov->niov, offset);
366
+ break;
367
case QEMU_AIO_READ:
368
io_uring_prep_readv(sqes, fd, luringcb->qiov->iov,
369
luringcb->qiov->niov, offset);
370
diff --git a/block/linux-aio.c b/block/linux-aio.c
371
index XXXXXXX..XXXXXXX 100644
372
--- a/block/linux-aio.c
373
+++ b/block/linux-aio.c
374
@@ -XXX,XX +XXX,XX @@ static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset,
375
case QEMU_AIO_WRITE:
376
io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset);
377
break;
378
+ case QEMU_AIO_ZONE_APPEND:
379
+ io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset);
380
+ break;
381
case QEMU_AIO_READ:
382
io_prep_preadv(iocbs, fd, qiov->iov, qiov->niov, offset);
383
break;
384
diff --git a/block/raw-format.c b/block/raw-format.c
385
index XXXXXXX..XXXXXXX 100644
386
--- a/block/raw-format.c
387
+++ b/block/raw-format.c
388
@@ -XXX,XX +XXX,XX @@ raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
389
return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
390
}
391
392
+static int coroutine_fn GRAPH_RDLOCK
393
+raw_co_zone_append(BlockDriverState *bs,int64_t *offset, QEMUIOVector *qiov,
394
+ BdrvRequestFlags flags)
395
+{
396
+ return bdrv_co_zone_append(bs->file->bs, offset, qiov, flags);
397
+}
398
+
399
static int64_t coroutine_fn GRAPH_RDLOCK
400
raw_co_getlength(BlockDriverState *bs)
401
{
402
@@ -XXX,XX +XXX,XX @@ BlockDriver bdrv_raw = {
403
.bdrv_co_pdiscard = &raw_co_pdiscard,
404
.bdrv_co_zone_report = &raw_co_zone_report,
405
.bdrv_co_zone_mgmt = &raw_co_zone_mgmt,
406
+ .bdrv_co_zone_append = &raw_co_zone_append,
407
.bdrv_co_block_status = &raw_co_block_status,
408
.bdrv_co_copy_range_from = &raw_co_copy_range_from,
409
.bdrv_co_copy_range_to = &raw_co_copy_range_to,
410
--
411
2.40.0
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
The patch tests zone append writes by reporting the zone wp after
4
the completion of the call. "zap -p" option can print the sector
5
offset value after completion, which should be the start sector
6
where the append write begins.
7
8
Signed-off-by: Sam Li <faithilikerun@gmail.com>
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Message-id: 20230427172339.3709-4-faithilikerun@gmail.com
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
13
qemu-io-cmds.c | 75 ++++++++++++++++++++++++++++++
14
tests/qemu-iotests/tests/zoned | 16 +++++++
15
tests/qemu-iotests/tests/zoned.out | 16 +++++++
16
3 files changed, 107 insertions(+)
17
18
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
19
index XXXXXXX..XXXXXXX 100644
20
--- a/qemu-io-cmds.c
21
+++ b/qemu-io-cmds.c
22
@@ -XXX,XX +XXX,XX @@ static const cmdinfo_t zone_reset_cmd = {
23
.oneline = "reset a zone write pointer in zone block device",
24
};
25
26
+static int do_aio_zone_append(BlockBackend *blk, QEMUIOVector *qiov,
27
+ int64_t *offset, int flags, int *total)
28
+{
29
+ int async_ret = NOT_DONE;
30
+
31
+ blk_aio_zone_append(blk, offset, qiov, flags, aio_rw_done, &async_ret);
32
+ while (async_ret == NOT_DONE) {
33
+ main_loop_wait(false);
34
+ }
35
+
36
+ *total = qiov->size;
37
+ return async_ret < 0 ? async_ret : 1;
38
+}
39
+
40
+static int zone_append_f(BlockBackend *blk, int argc, char **argv)
41
+{
42
+ int ret;
43
+ bool pflag = false;
44
+ int flags = 0;
45
+ int total = 0;
46
+ int64_t offset;
47
+ char *buf;
48
+ int c, nr_iov;
49
+ int pattern = 0xcd;
50
+ QEMUIOVector qiov;
51
+
52
+ if (optind > argc - 3) {
53
+ return -EINVAL;
54
+ }
55
+
56
+ if ((c = getopt(argc, argv, "p")) != -1) {
57
+ pflag = true;
58
+ }
59
+
60
+ offset = cvtnum(argv[optind]);
61
+ if (offset < 0) {
62
+ print_cvtnum_err(offset, argv[optind]);
63
+ return offset;
64
+ }
65
+ optind++;
66
+ nr_iov = argc - optind;
67
+ buf = create_iovec(blk, &qiov, &argv[optind], nr_iov, pattern,
68
+ flags & BDRV_REQ_REGISTERED_BUF);
69
+ if (buf == NULL) {
70
+ return -EINVAL;
71
+ }
72
+ ret = do_aio_zone_append(blk, &qiov, &offset, flags, &total);
73
+ if (ret < 0) {
74
+ printf("zone append failed: %s\n", strerror(-ret));
75
+ goto out;
76
+ }
77
+
78
+ if (pflag) {
79
+ printf("After zap done, the append sector is 0x%" PRIx64 "\n",
80
+ tosector(offset));
81
+ }
82
+
83
+out:
84
+ qemu_io_free(blk, buf, qiov.size,
85
+ flags & BDRV_REQ_REGISTERED_BUF);
86
+ qemu_iovec_destroy(&qiov);
87
+ return ret;
88
+}
89
+
90
+static const cmdinfo_t zone_append_cmd = {
91
+ .name = "zone_append",
92
+ .altname = "zap",
93
+ .cfunc = zone_append_f,
94
+ .argmin = 3,
95
+ .argmax = 4,
96
+ .args = "offset len [len..]",
97
+ .oneline = "append write a number of bytes at a specified offset",
98
+};
99
+
100
static int truncate_f(BlockBackend *blk, int argc, char **argv);
101
static const cmdinfo_t truncate_cmd = {
102
.name = "truncate",
103
@@ -XXX,XX +XXX,XX @@ static void __attribute((constructor)) init_qemuio_commands(void)
104
qemuio_add_command(&zone_close_cmd);
105
qemuio_add_command(&zone_finish_cmd);
106
qemuio_add_command(&zone_reset_cmd);
107
+ qemuio_add_command(&zone_append_cmd);
108
qemuio_add_command(&truncate_cmd);
109
qemuio_add_command(&length_cmd);
110
qemuio_add_command(&info_cmd);
111
diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned
112
index XXXXXXX..XXXXXXX 100755
113
--- a/tests/qemu-iotests/tests/zoned
114
+++ b/tests/qemu-iotests/tests/zoned
115
@@ -XXX,XX +XXX,XX @@ echo "(5) resetting the second zone"
116
$QEMU_IO $IMG -c "zrs 268435456 268435456"
117
echo "After resetting a zone:"
118
$QEMU_IO $IMG -c "zrp 268435456 1"
119
+echo
120
+echo
121
+echo "(6) append write" # the physical block size of the device is 4096
122
+$QEMU_IO $IMG -c "zrp 0 1"
123
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
124
+echo "After appending the first zone firstly:"
125
+$QEMU_IO $IMG -c "zrp 0 1"
126
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
127
+echo "After appending the first zone secondly:"
128
+$QEMU_IO $IMG -c "zrp 0 1"
129
+$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000"
130
+echo "After appending the second zone firstly:"
131
+$QEMU_IO $IMG -c "zrp 268435456 1"
132
+$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000"
133
+echo "After appending the second zone secondly:"
134
+$QEMU_IO $IMG -c "zrp 268435456 1"
135
136
# success, all done
137
echo "*** done"
138
diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out
139
index XXXXXXX..XXXXXXX 100644
140
--- a/tests/qemu-iotests/tests/zoned.out
141
+++ b/tests/qemu-iotests/tests/zoned.out
142
@@ -XXX,XX +XXX,XX @@ start: 0x80000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:14, [type: 2]
143
(5) resetting the second zone
144
After resetting a zone:
145
start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
146
+
147
+
148
+(6) append write
149
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
150
+After zap done, the append sector is 0x0
151
+After appending the first zone firstly:
152
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x18, zcond:2, [type: 2]
153
+After zap done, the append sector is 0x18
154
+After appending the first zone secondly:
155
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x30, zcond:2, [type: 2]
156
+After zap done, the append sector is 0x80000
157
+After appending the second zone firstly:
158
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80018, zcond:2, [type: 2]
159
+After zap done, the append sector is 0x80018
160
+After appending the second zone secondly:
161
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80030, zcond:2, [type: 2]
162
*** done
163
--
164
2.40.0
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
4
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
5
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Message-id: 20230427172339.3709-5-faithilikerun@gmail.com
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
---
9
block/file-posix.c | 3 +++
10
block/trace-events | 2 ++
11
2 files changed, 5 insertions(+)
12
13
diff --git a/block/file-posix.c b/block/file-posix.c
14
index XXXXXXX..XXXXXXX 100644
15
--- a/block/file-posix.c
16
+++ b/block/file-posix.c
17
@@ -XXX,XX +XXX,XX @@ out:
18
if (!BDRV_ZT_IS_CONV(*wp)) {
19
if (type & QEMU_AIO_ZONE_APPEND) {
20
*s->offset = *wp;
21
+ trace_zbd_zone_append_complete(bs, *s->offset
22
+ >> BDRV_SECTOR_BITS);
23
}
24
/* Advance the wp if needed */
25
if (offset + bytes > *wp) {
26
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_append(BlockDriverState *bs,
27
len += iov_len;
28
}
29
30
+ trace_zbd_zone_append(bs, *offset >> BDRV_SECTOR_BITS);
31
return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
32
}
33
#endif
34
diff --git a/block/trace-events b/block/trace-events
35
index XXXXXXX..XXXXXXX 100644
36
--- a/block/trace-events
37
+++ b/block/trace-events
38
@@ -XXX,XX +XXX,XX @@ file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d"
39
file_flush_fdatasync_failed(int err) "errno %d"
40
zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report %d zones starting at sector offset 0x%" PRIx64 ""
41
zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs %p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " sectors"
42
+zbd_zone_append(void *bs, int64_t sector) "bs %p append at sector offset 0x%" PRIx64 ""
43
+zbd_zone_append_complete(void *bs, int64_t sector) "bs %p returns append sector 0x%" PRIx64 ""
44
45
# ssh.c
46
sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
47
--
48
2.40.0
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Use scripts/update-linux-headers.sh to update headers to 6.3-rc1.
4
5
Signed-off-by: Sam Li <faithilikerun@gmail.com>
6
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
[Reran scripts/update-linux-headers.sh on Linux v6.3. The only change
10
was the use of __virtioXX types instead of uintXX_t.
11
--Stefan]
12
Message-Id: <20230407082528.18841-2-faithilikerun@gmail.com>
13
---
14
include/standard-headers/drm/drm_fourcc.h | 12 +++
15
include/standard-headers/linux/ethtool.h | 48 ++++++++-
16
include/standard-headers/linux/fuse.h | 45 +++++++-
17
include/standard-headers/linux/pci_regs.h | 1 +
18
include/standard-headers/linux/vhost_types.h | 2 +
19
include/standard-headers/linux/virtio_blk.h | 105 +++++++++++++++++++
20
linux-headers/asm-arm64/kvm.h | 1 +
21
linux-headers/asm-x86/kvm.h | 34 +++++-
22
linux-headers/linux/kvm.h | 9 ++
23
linux-headers/linux/vfio.h | 15 +--
24
linux-headers/linux/vhost.h | 8 ++
25
11 files changed, 270 insertions(+), 10 deletions(-)
26
27
diff --git a/include/standard-headers/drm/drm_fourcc.h b/include/standard-headers/drm/drm_fourcc.h
28
index XXXXXXX..XXXXXXX 100644
29
--- a/include/standard-headers/drm/drm_fourcc.h
30
+++ b/include/standard-headers/drm/drm_fourcc.h
31
@@ -XXX,XX +XXX,XX @@ extern "C" {
32
*
33
* The authoritative list of format modifier codes is found in
34
* `include/uapi/drm/drm_fourcc.h`
35
+ *
36
+ * Open Source User Waiver
37
+ * -----------------------
38
+ *
39
+ * Because this is the authoritative source for pixel formats and modifiers
40
+ * referenced by GL, Vulkan extensions and other standards and hence used both
41
+ * by open source and closed source driver stacks, the usual requirement for an
42
+ * upstream in-kernel or open source userspace user does not apply.
43
+ *
44
+ * To ensure, as much as feasible, compatibility across stacks and avoid
45
+ * confusion with incompatible enumerations stakeholders for all relevant driver
46
+ * stacks should approve additions.
47
*/
48
49
#define fourcc_code(a, b, c, d) ((uint32_t)(a) | ((uint32_t)(b) << 8) | \
50
diff --git a/include/standard-headers/linux/ethtool.h b/include/standard-headers/linux/ethtool.h
51
index XXXXXXX..XXXXXXX 100644
52
--- a/include/standard-headers/linux/ethtool.h
53
+++ b/include/standard-headers/linux/ethtool.h
54
@@ -XXX,XX +XXX,XX @@ enum ethtool_stringset {
55
    ETH_SS_COUNT
56
};
57
58
+/**
59
+ * enum ethtool_mac_stats_src - source of ethtool MAC statistics
60
+ * @ETHTOOL_MAC_STATS_SRC_AGGREGATE:
61
+ *    if device supports a MAC merge layer, this retrieves the aggregate
62
+ *    statistics of the eMAC and pMAC. Otherwise, it retrieves just the
63
+ *    statistics of the single (express) MAC.
64
+ * @ETHTOOL_MAC_STATS_SRC_EMAC:
65
+ *    if device supports a MM layer, this retrieves the eMAC statistics.
66
+ *    Otherwise, it retrieves the statistics of the single (express) MAC.
67
+ * @ETHTOOL_MAC_STATS_SRC_PMAC:
68
+ *    if device supports a MM layer, this retrieves the pMAC statistics.
69
+ */
70
+enum ethtool_mac_stats_src {
71
+    ETHTOOL_MAC_STATS_SRC_AGGREGATE,
72
+    ETHTOOL_MAC_STATS_SRC_EMAC,
73
+    ETHTOOL_MAC_STATS_SRC_PMAC,
74
+};
75
+
76
/**
77
* enum ethtool_module_power_mode_policy - plug-in module power mode policy
78
* @ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH: Module is always in high power mode.
79
@@ -XXX,XX +XXX,XX @@ enum ethtool_podl_pse_pw_d_status {
80
    ETHTOOL_PODL_PSE_PW_D_STATUS_ERROR,
81
};
82
83
+/**
84
+ * enum ethtool_mm_verify_status - status of MAC Merge Verify function
85
+ * @ETHTOOL_MM_VERIFY_STATUS_UNKNOWN:
86
+ *    verification status is unknown
87
+ * @ETHTOOL_MM_VERIFY_STATUS_INITIAL:
88
+ *    the 802.3 Verify State diagram is in the state INIT_VERIFICATION
89
+ * @ETHTOOL_MM_VERIFY_STATUS_VERIFYING:
90
+ *    the Verify State diagram is in the state VERIFICATION_IDLE,
91
+ *    SEND_VERIFY or WAIT_FOR_RESPONSE
92
+ * @ETHTOOL_MM_VERIFY_STATUS_SUCCEEDED:
93
+ *    indicates that the Verify State diagram is in the state VERIFIED
94
+ * @ETHTOOL_MM_VERIFY_STATUS_FAILED:
95
+ *    the Verify State diagram is in the state VERIFY_FAIL
96
+ * @ETHTOOL_MM_VERIFY_STATUS_DISABLED:
97
+ *    verification of preemption operation is disabled
98
+ */
99
+enum ethtool_mm_verify_status {
100
+    ETHTOOL_MM_VERIFY_STATUS_UNKNOWN,
101
+    ETHTOOL_MM_VERIFY_STATUS_INITIAL,
102
+    ETHTOOL_MM_VERIFY_STATUS_VERIFYING,
103
+    ETHTOOL_MM_VERIFY_STATUS_SUCCEEDED,
104
+    ETHTOOL_MM_VERIFY_STATUS_FAILED,
105
+    ETHTOOL_MM_VERIFY_STATUS_DISABLED,
106
+};
107
+
108
/**
109
* struct ethtool_gstrings - string set for data tagging
110
* @cmd: Command number = %ETHTOOL_GSTRINGS
111
@@ -XXX,XX +XXX,XX @@ struct ethtool_rxnfc {
112
        uint32_t            rule_cnt;
113
        uint32_t            rss_context;
114
    };
115
-    uint32_t                rule_locs[0];
116
+    uint32_t                rule_locs[];
117
};
118
119
120
@@ -XXX,XX +XXX,XX @@ enum ethtool_link_mode_bit_indices {
121
    ETHTOOL_LINK_MODE_800000baseDR8_2_Full_BIT     = 96,
122
    ETHTOOL_LINK_MODE_800000baseSR8_Full_BIT     = 97,
123
    ETHTOOL_LINK_MODE_800000baseVR8_Full_BIT     = 98,
124
+    ETHTOOL_LINK_MODE_10baseT1S_Full_BIT         = 99,
125
+    ETHTOOL_LINK_MODE_10baseT1S_Half_BIT         = 100,
126
+    ETHTOOL_LINK_MODE_10baseT1S_P2MP_Half_BIT     = 101,
127
128
    /* must be last entry */
129
    __ETHTOOL_LINK_MODE_MASK_NBITS
130
diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h
131
index XXXXXXX..XXXXXXX 100644
132
--- a/include/standard-headers/linux/fuse.h
133
+++ b/include/standard-headers/linux/fuse.h
134
@@ -XXX,XX +XXX,XX @@
135
* 7.38
136
* - add FUSE_EXPIRE_ONLY flag to fuse_notify_inval_entry
137
* - add FOPEN_PARALLEL_DIRECT_WRITES
138
+ * - add total_extlen to fuse_in_header
139
+ * - add FUSE_MAX_NR_SECCTX
140
+ * - add extension header
141
+ * - add FUSE_EXT_GROUPS
142
+ * - add FUSE_CREATE_SUPP_GROUP
143
*/
144
145
#ifndef _LINUX_FUSE_H
146
@@ -XXX,XX +XXX,XX @@ struct fuse_file_lock {
147
* FUSE_SECURITY_CTX:    add security context to create, mkdir, symlink, and
148
*            mknod
149
* FUSE_HAS_INODE_DAX: use per inode DAX
150
+ * FUSE_CREATE_SUPP_GROUP: add supplementary group info to create, mkdir,
151
+ *            symlink and mknod (single group that matches parent)
152
*/
153
#define FUSE_ASYNC_READ        (1 << 0)
154
#define FUSE_POSIX_LOCKS    (1 << 1)
155
@@ -XXX,XX +XXX,XX @@ struct fuse_file_lock {
156
/* bits 32..63 get shifted down 32 bits into the flags2 field */
157
#define FUSE_SECURITY_CTX    (1ULL << 32)
158
#define FUSE_HAS_INODE_DAX    (1ULL << 33)
159
+#define FUSE_CREATE_SUPP_GROUP    (1ULL << 34)
160
161
/**
162
* CUSE INIT request/reply flags
163
@@ -XXX,XX +XXX,XX @@ struct fuse_file_lock {
164
*/
165
#define FUSE_EXPIRE_ONLY        (1 << 0)
166
167
+/**
168
+ * extension type
169
+ * FUSE_MAX_NR_SECCTX: maximum value of &fuse_secctx_header.nr_secctx
170
+ * FUSE_EXT_GROUPS: &fuse_supp_groups extension
171
+ */
172
+enum fuse_ext_type {
173
+    /* Types 0..31 are reserved for fuse_secctx_header */
174
+    FUSE_MAX_NR_SECCTX    = 31,
175
+    FUSE_EXT_GROUPS        = 32,
176
+};
177
+
178
enum fuse_opcode {
179
    FUSE_LOOKUP        = 1,
180
    FUSE_FORGET        = 2, /* no reply */
181
@@ -XXX,XX +XXX,XX @@ struct fuse_in_header {
182
    uint32_t    uid;
183
    uint32_t    gid;
184
    uint32_t    pid;
185
-    uint32_t    padding;
186
+    uint16_t    total_extlen; /* length of extensions in 8byte units */
187
+    uint16_t    padding;
188
};
189
190
struct fuse_out_header {
191
@@ -XXX,XX +XXX,XX @@ struct fuse_secctx_header {
192
    uint32_t    nr_secctx;
193
};
194
195
+/**
196
+ * struct fuse_ext_header - extension header
197
+ * @size: total size of this extension including this header
198
+ * @type: type of extension
199
+ *
200
+ * This is made compatible with fuse_secctx_header by using type values >
201
+ * FUSE_MAX_NR_SECCTX
202
+ */
203
+struct fuse_ext_header {
204
+    uint32_t    size;
205
+    uint32_t    type;
206
+};
207
+
208
+/**
209
+ * struct fuse_supp_groups - Supplementary group extension
210
+ * @nr_groups: number of supplementary groups
211
+ * @groups: flexible array of group IDs
212
+ */
213
+struct fuse_supp_groups {
214
+    uint32_t    nr_groups;
215
+    uint32_t    groups[];
216
+};
217
+
218
#endif /* _LINUX_FUSE_H */
219
diff --git a/include/standard-headers/linux/pci_regs.h b/include/standard-headers/linux/pci_regs.h
220
index XXXXXXX..XXXXXXX 100644
221
--- a/include/standard-headers/linux/pci_regs.h
222
+++ b/include/standard-headers/linux/pci_regs.h
223
@@ -XXX,XX +XXX,XX @@
224
#define PCI_EXP_LNKCTL2_TX_MARGIN    0x0380 /* Transmit Margin */
225
#define PCI_EXP_LNKCTL2_HASD        0x0020 /* HW Autonomous Speed Disable */
226
#define PCI_EXP_LNKSTA2        0x32    /* Link Status 2 */
227
+#define PCI_EXP_LNKSTA2_FLIT        0x0400 /* Flit Mode Status */
228
#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2    0x32    /* end of v2 EPs w/ link */
229
#define PCI_EXP_SLTCAP2        0x34    /* Slot Capabilities 2 */
230
#define PCI_EXP_SLTCAP2_IBPD    0x00000001 /* In-band PD Disable Supported */
231
diff --git a/include/standard-headers/linux/vhost_types.h b/include/standard-headers/linux/vhost_types.h
232
index XXXXXXX..XXXXXXX 100644
233
--- a/include/standard-headers/linux/vhost_types.h
234
+++ b/include/standard-headers/linux/vhost_types.h
235
@@ -XXX,XX +XXX,XX @@ struct vhost_vdpa_iova_range {
236
#define VHOST_BACKEND_F_IOTLB_ASID 0x3
237
/* Device can be suspended */
238
#define VHOST_BACKEND_F_SUSPEND 0x4
239
+/* Device can be resumed */
240
+#define VHOST_BACKEND_F_RESUME 0x5
241
242
#endif
243
diff --git a/include/standard-headers/linux/virtio_blk.h b/include/standard-headers/linux/virtio_blk.h
244
index XXXXXXX..XXXXXXX 100644
245
--- a/include/standard-headers/linux/virtio_blk.h
246
+++ b/include/standard-headers/linux/virtio_blk.h
247
@@ -XXX,XX +XXX,XX @@
248
#define VIRTIO_BLK_F_DISCARD    13    /* DISCARD is supported */
249
#define VIRTIO_BLK_F_WRITE_ZEROES    14    /* WRITE ZEROES is supported */
250
#define VIRTIO_BLK_F_SECURE_ERASE    16 /* Secure Erase is supported */
251
+#define VIRTIO_BLK_F_ZONED        17    /* Zoned block device */
252
253
/* Legacy feature bits */
254
#ifndef VIRTIO_BLK_NO_LEGACY
255
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_config {
256
    /* Secure erase commands must be aligned to this number of sectors. */
257
    __virtio32 secure_erase_sector_alignment;
258
259
+    /* Zoned block device characteristics (if VIRTIO_BLK_F_ZONED) */
260
+    struct virtio_blk_zoned_characteristics {
261
+        __virtio32 zone_sectors;
262
+        __virtio32 max_open_zones;
263
+        __virtio32 max_active_zones;
264
+        __virtio32 max_append_sectors;
265
+        __virtio32 write_granularity;
266
+        uint8_t model;
267
+        uint8_t unused2[3];
268
+    } zoned;
269
} QEMU_PACKED;
270
271
/*
272
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_config {
273
/* Secure erase command */
274
#define VIRTIO_BLK_T_SECURE_ERASE    14
275
276
+/* Zone append command */
277
+#define VIRTIO_BLK_T_ZONE_APPEND 15
278
+
279
+/* Report zones command */
280
+#define VIRTIO_BLK_T_ZONE_REPORT 16
281
+
282
+/* Open zone command */
283
+#define VIRTIO_BLK_T_ZONE_OPEN 18
284
+
285
+/* Close zone command */
286
+#define VIRTIO_BLK_T_ZONE_CLOSE 20
287
+
288
+/* Finish zone command */
289
+#define VIRTIO_BLK_T_ZONE_FINISH 22
290
+
291
+/* Reset zone command */
292
+#define VIRTIO_BLK_T_ZONE_RESET 24
293
+
294
+/* Reset All zones command */
295
+#define VIRTIO_BLK_T_ZONE_RESET_ALL 26
296
+
297
#ifndef VIRTIO_BLK_NO_LEGACY
298
/* Barrier before this op. */
299
#define VIRTIO_BLK_T_BARRIER    0x80000000
300
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_outhdr {
301
    __virtio64 sector;
302
};
303
304
+/*
305
+ * Supported zoned device models.
306
+ */
307
+
308
+/* Regular block device */
309
+#define VIRTIO_BLK_Z_NONE 0
310
+/* Host-managed zoned device */
311
+#define VIRTIO_BLK_Z_HM 1
312
+/* Host-aware zoned device */
313
+#define VIRTIO_BLK_Z_HA 2
314
+
315
+/*
316
+ * Zone descriptor. A part of VIRTIO_BLK_T_ZONE_REPORT command reply.
317
+ */
318
+struct virtio_blk_zone_descriptor {
319
+    /* Zone capacity */
320
+    __virtio64 z_cap;
321
+    /* The starting sector of the zone */
322
+    __virtio64 z_start;
323
+    /* Zone write pointer position in sectors */
324
+    __virtio64 z_wp;
325
+    /* Zone type */
326
+    uint8_t z_type;
327
+    /* Zone state */
328
+    uint8_t z_state;
329
+    uint8_t reserved[38];
330
+};
331
+
332
+struct virtio_blk_zone_report {
333
+    __virtio64 nr_zones;
334
+    uint8_t reserved[56];
335
+    struct virtio_blk_zone_descriptor zones[];
336
+};
337
+
338
+/*
339
+ * Supported zone types.
340
+ */
341
+
342
+/* Conventional zone */
343
+#define VIRTIO_BLK_ZT_CONV 1
344
+/* Sequential Write Required zone */
345
+#define VIRTIO_BLK_ZT_SWR 2
346
+/* Sequential Write Preferred zone */
347
+#define VIRTIO_BLK_ZT_SWP 3
348
+
349
+/*
350
+ * Zone states that are available for zones of all types.
351
+ */
352
+
353
+/* Not a write pointer (conventional zones only) */
354
+#define VIRTIO_BLK_ZS_NOT_WP 0
355
+/* Empty */
356
+#define VIRTIO_BLK_ZS_EMPTY 1
357
+/* Implicitly Open */
358
+#define VIRTIO_BLK_ZS_IOPEN 2
359
+/* Explicitly Open */
360
+#define VIRTIO_BLK_ZS_EOPEN 3
361
+/* Closed */
362
+#define VIRTIO_BLK_ZS_CLOSED 4
363
+/* Read-Only */
364
+#define VIRTIO_BLK_ZS_RDONLY 13
365
+/* Full */
366
+#define VIRTIO_BLK_ZS_FULL 14
367
+/* Offline */
368
+#define VIRTIO_BLK_ZS_OFFLINE 15
369
+
370
/* Unmap this range (only valid for write zeroes command) */
371
#define VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP    0x00000001
372
373
@@ -XXX,XX +XXX,XX @@ struct virtio_scsi_inhdr {
374
#define VIRTIO_BLK_S_OK        0
375
#define VIRTIO_BLK_S_IOERR    1
376
#define VIRTIO_BLK_S_UNSUPP    2
377
+
378
+/* Error codes that are specific to zoned block devices */
379
+#define VIRTIO_BLK_S_ZONE_INVALID_CMD 3
380
+#define VIRTIO_BLK_S_ZONE_UNALIGNED_WP 4
381
+#define VIRTIO_BLK_S_ZONE_OPEN_RESOURCE 5
382
+#define VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE 6
383
+
384
#endif /* _LINUX_VIRTIO_BLK_H */
385
diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
386
index XXXXXXX..XXXXXXX 100644
387
--- a/linux-headers/asm-arm64/kvm.h
388
+++ b/linux-headers/asm-arm64/kvm.h
389
@@ -XXX,XX +XXX,XX @@ struct kvm_regs {
390
#define KVM_ARM_VCPU_SVE        4 /* enable SVE for this CPU */
391
#define KVM_ARM_VCPU_PTRAUTH_ADDRESS    5 /* VCPU uses address authentication */
392
#define KVM_ARM_VCPU_PTRAUTH_GENERIC    6 /* VCPU uses generic authentication */
393
+#define KVM_ARM_VCPU_HAS_EL2        7 /* Support nested virtualization */
394
395
struct kvm_vcpu_init {
396
    __u32 target;
397
diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
398
index XXXXXXX..XXXXXXX 100644
399
--- a/linux-headers/asm-x86/kvm.h
400
+++ b/linux-headers/asm-x86/kvm.h
401
@@ -XXX,XX +XXX,XX @@
402
403
#include <linux/types.h>
404
#include <linux/ioctl.h>
405
+#include <linux/stddef.h>
406
407
#define KVM_PIO_PAGE_OFFSET 1
408
#define KVM_COALESCED_MMIO_PAGE_OFFSET 2
409
@@ -XXX,XX +XXX,XX @@ struct kvm_nested_state {
410
     * KVM_{GET,PUT}_NESTED_STATE ioctl values.
411
     */
412
    union {
413
-        struct kvm_vmx_nested_state_data vmx[0];
414
-        struct kvm_svm_nested_state_data svm[0];
415
+        __DECLARE_FLEX_ARRAY(struct kvm_vmx_nested_state_data, vmx);
416
+        __DECLARE_FLEX_ARRAY(struct kvm_svm_nested_state_data, svm);
417
    } data;
418
};
419
420
@@ -XXX,XX +XXX,XX @@ struct kvm_pmu_event_filter {
421
#define KVM_PMU_EVENT_ALLOW 0
422
#define KVM_PMU_EVENT_DENY 1
423
424
+#define KVM_PMU_EVENT_FLAG_MASKED_EVENTS BIT(0)
425
+#define KVM_PMU_EVENT_FLAGS_VALID_MASK (KVM_PMU_EVENT_FLAG_MASKED_EVENTS)
426
+
427
+/*
428
+ * Masked event layout.
429
+ * Bits Description
430
+ * ---- -----------
431
+ * 7:0 event select (low bits)
432
+ * 15:8 umask match
433
+ * 31:16 unused
434
+ * 35:32 event select (high bits)
435
+ * 36:54 unused
436
+ * 55 exclude bit
437
+ * 63:56 umask mask
438
+ */
439
+
440
+#define KVM_PMU_ENCODE_MASKED_ENTRY(event_select, mask, match, exclude) \
441
+    (((event_select) & 0xFFULL) | (((event_select) & 0XF00ULL) << 24) | \
442
+    (((mask) & 0xFFULL) << 56) | \
443
+    (((match) & 0xFFULL) << 8) | \
444
+    ((__u64)(!!(exclude)) << 55))
445
+
446
+#define KVM_PMU_MASKED_ENTRY_EVENT_SELECT \
447
+    (GENMASK_ULL(7, 0) | GENMASK_ULL(35, 32))
448
+#define KVM_PMU_MASKED_ENTRY_UMASK_MASK        (GENMASK_ULL(63, 56))
449
+#define KVM_PMU_MASKED_ENTRY_UMASK_MATCH    (GENMASK_ULL(15, 8))
450
+#define KVM_PMU_MASKED_ENTRY_EXCLUDE        (BIT_ULL(55))
451
+#define KVM_PMU_MASKED_ENTRY_UMASK_MASK_SHIFT    (56)
452
+
453
/* for KVM_{GET,SET,HAS}_DEVICE_ATTR */
454
#define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */
455
#define KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */
456
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
457
index XXXXXXX..XXXXXXX 100644
458
--- a/linux-headers/linux/kvm.h
459
+++ b/linux-headers/linux/kvm.h
460
@@ -XXX,XX +XXX,XX @@ struct kvm_s390_mem_op {
461
        struct {
462
            __u8 ar;    /* the access register number */
463
            __u8 key;    /* access key, ignored if flag unset */
464
+            __u8 pad1[6];    /* ignored */
465
+            __u64 old_addr;    /* ignored if cmpxchg flag unset */
466
        };
467
        __u32 sida_offset; /* offset into the sida */
468
        __u8 reserved[32]; /* ignored */
469
@@ -XXX,XX +XXX,XX @@ struct kvm_s390_mem_op {
470
#define KVM_S390_MEMOP_SIDA_WRITE    3
471
#define KVM_S390_MEMOP_ABSOLUTE_READ    4
472
#define KVM_S390_MEMOP_ABSOLUTE_WRITE    5
473
+#define KVM_S390_MEMOP_ABSOLUTE_CMPXCHG    6
474
+
475
/* flags for kvm_s390_mem_op->flags */
476
#define KVM_S390_MEMOP_F_CHECK_ONLY        (1ULL << 0)
477
#define KVM_S390_MEMOP_F_INJECT_EXCEPTION    (1ULL << 1)
478
#define KVM_S390_MEMOP_F_SKEY_PROTECTION    (1ULL << 2)
479
480
+/* flags specifying extension support via KVM_CAP_S390_MEM_OP_EXTENSION */
481
+#define KVM_S390_MEMOP_EXTENSION_CAP_BASE    (1 << 0)
482
+#define KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG    (1 << 1)
483
+
484
/* for KVM_INTERRUPT */
485
struct kvm_interrupt {
486
    /* in */
487
@@ -XXX,XX +XXX,XX @@ struct kvm_ppc_resize_hpt {
488
#define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
489
#define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224
490
#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225
491
+#define KVM_CAP_PMU_EVENT_MASKED_EVENTS 226
492
493
#ifdef KVM_CAP_IRQ_ROUTING
494
495
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
496
index XXXXXXX..XXXXXXX 100644
497
--- a/linux-headers/linux/vfio.h
498
+++ b/linux-headers/linux/vfio.h
499
@@ -XXX,XX +XXX,XX @@
500
/* Supports VFIO_DMA_UNMAP_FLAG_ALL */
501
#define VFIO_UNMAP_ALL            9
502
503
-/* Supports the vaddr flag for DMA map and unmap */
504
+/*
505
+ * Supports the vaddr flag for DMA map and unmap. Not supported for mediated
506
+ * devices, so this capability is subject to change as groups are added or
507
+ * removed.
508
+ */
509
#define VFIO_UPDATE_VADDR        10
510
511
/*
512
@@ -XXX,XX +XXX,XX @@ struct vfio_iommu_type1_info_dma_avail {
513
* Map process virtual addresses to IO virtual addresses using the
514
* provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
515
*
516
- * If flags & VFIO_DMA_MAP_FLAG_VADDR, update the base vaddr for iova, and
517
- * unblock translation of host virtual addresses in the iova range. The vaddr
518
+ * If flags & VFIO_DMA_MAP_FLAG_VADDR, update the base vaddr for iova. The vaddr
519
* must have previously been invalidated with VFIO_DMA_UNMAP_FLAG_VADDR. To
520
* maintain memory consistency within the user application, the updated vaddr
521
* must address the same memory object as originally mapped. Failure to do so
522
@@ -XXX,XX +XXX,XX @@ struct vfio_bitmap {
523
* must be 0. This cannot be combined with the get-dirty-bitmap flag.
524
*
525
* If flags & VFIO_DMA_UNMAP_FLAG_VADDR, do not unmap, but invalidate host
526
- * virtual addresses in the iova range. Tasks that attempt to translate an
527
- * iova's vaddr will block. DMA to already-mapped pages continues. This
528
- * cannot be combined with the get-dirty-bitmap flag.
529
+ * virtual addresses in the iova range. DMA to already-mapped pages continues.
530
+ * Groups may not be added to the container while any addresses are invalid.
531
+ * This cannot be combined with the get-dirty-bitmap flag.
532
*/
533
struct vfio_iommu_type1_dma_unmap {
534
    __u32    argsz;
535
diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
536
index XXXXXXX..XXXXXXX 100644
537
--- a/linux-headers/linux/vhost.h
538
+++ b/linux-headers/linux/vhost.h
539
@@ -XXX,XX +XXX,XX @@
540
*/
541
#define VHOST_VDPA_SUSPEND        _IO(VHOST_VIRTIO, 0x7D)
542
543
+/* Resume a device so it can resume processing virtqueue requests
544
+ *
545
+ * After the return of this ioctl the device will have restored all the
546
+ * necessary states and it is fully operational to continue processing the
547
+ * virtqueue descriptors.
548
+ */
549
+#define VHOST_VDPA_RESUME        _IO(VHOST_VIRTIO, 0x7E)
550
+
551
#endif
552
--
553
2.40.0
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
This patch extends virtio-blk emulation to handle zoned device commands
4
by calling the new block layer APIs to perform zoned device I/O on
5
behalf of the guest. It supports Report Zone, four zone oparations (open,
6
close, finish, reset), and Append Zone.
7
8
The VIRTIO_BLK_F_ZONED feature bit will only be set if the host does
9
support zoned block devices. Regular block devices(conventional zones)
10
will not be set.
11
12
The guest os can use blktests, fio to test those commands on zoned devices.
13
Furthermore, using zonefs to test zone append write is also supported.
14
15
Signed-off-by: Sam Li <faithilikerun@gmail.com>
16
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
17
Message-Id: <20230407082528.18841-3-faithilikerun@gmail.com>
18
---
19
hw/block/virtio-blk-common.c | 2 +
20
hw/block/virtio-blk.c | 389 +++++++++++++++++++++++++++++++++++
21
hw/virtio/virtio-qmp.c | 2 +
22
3 files changed, 393 insertions(+)
23
24
diff --git a/hw/block/virtio-blk-common.c b/hw/block/virtio-blk-common.c
25
index XXXXXXX..XXXXXXX 100644
26
--- a/hw/block/virtio-blk-common.c
27
+++ b/hw/block/virtio-blk-common.c
28
@@ -XXX,XX +XXX,XX @@ static const VirtIOFeature feature_sizes[] = {
29
.end = endof(struct virtio_blk_config, discard_sector_alignment)},
30
{.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES,
31
.end = endof(struct virtio_blk_config, write_zeroes_may_unmap)},
32
+ {.flags = 1ULL << VIRTIO_BLK_F_ZONED,
33
+ .end = endof(struct virtio_blk_config, zoned)},
34
{}
35
};
36
37
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
38
index XXXXXXX..XXXXXXX 100644
39
--- a/hw/block/virtio-blk.c
40
+++ b/hw/block/virtio-blk.c
41
@@ -XXX,XX +XXX,XX @@
42
#include "qemu/module.h"
43
#include "qemu/error-report.h"
44
#include "qemu/main-loop.h"
45
+#include "block/block_int.h"
46
#include "trace.h"
47
#include "hw/block/block.h"
48
#include "hw/qdev-properties.h"
49
@@ -XXX,XX +XXX,XX @@ err:
50
return err_status;
51
}
52
53
+typedef struct ZoneCmdData {
54
+ VirtIOBlockReq *req;
55
+ struct iovec *in_iov;
56
+ unsigned in_num;
57
+ union {
58
+ struct {
59
+ unsigned int nr_zones;
60
+ BlockZoneDescriptor *zones;
61
+ } zone_report_data;
62
+ struct {
63
+ int64_t offset;
64
+ } zone_append_data;
65
+ };
66
+} ZoneCmdData;
67
+
68
+/*
69
+ * check zoned_request: error checking before issuing requests. If all checks
70
+ * passed, return true.
71
+ * append: true if only zone append requests issued.
72
+ */
73
+static bool check_zoned_request(VirtIOBlock *s, int64_t offset, int64_t len,
74
+ bool append, uint8_t *status) {
75
+ BlockDriverState *bs = blk_bs(s->blk);
76
+ int index;
77
+
78
+ if (!virtio_has_feature(s->host_features, VIRTIO_BLK_F_ZONED)) {
79
+ *status = VIRTIO_BLK_S_UNSUPP;
80
+ return false;
81
+ }
82
+
83
+ if (offset < 0 || len < 0 || len > (bs->total_sectors << BDRV_SECTOR_BITS)
84
+ || offset > (bs->total_sectors << BDRV_SECTOR_BITS) - len) {
85
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
86
+ return false;
87
+ }
88
+
89
+ if (append) {
90
+ if (bs->bl.write_granularity) {
91
+ if ((offset % bs->bl.write_granularity) != 0) {
92
+ *status = VIRTIO_BLK_S_ZONE_UNALIGNED_WP;
93
+ return false;
94
+ }
95
+ }
96
+
97
+ index = offset / bs->bl.zone_size;
98
+ if (BDRV_ZT_IS_CONV(bs->wps->wp[index])) {
99
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
100
+ return false;
101
+ }
102
+
103
+ if (len / 512 > bs->bl.max_append_sectors) {
104
+ if (bs->bl.max_append_sectors == 0) {
105
+ *status = VIRTIO_BLK_S_UNSUPP;
106
+ } else {
107
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
108
+ }
109
+ return false;
110
+ }
111
+ }
112
+ return true;
113
+}
114
+
115
+static void virtio_blk_zone_report_complete(void *opaque, int ret)
116
+{
117
+ ZoneCmdData *data = opaque;
118
+ VirtIOBlockReq *req = data->req;
119
+ VirtIOBlock *s = req->dev;
120
+ VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
121
+ struct iovec *in_iov = data->in_iov;
122
+ unsigned in_num = data->in_num;
123
+ int64_t zrp_size, n, j = 0;
124
+ int64_t nz = data->zone_report_data.nr_zones;
125
+ int8_t err_status = VIRTIO_BLK_S_OK;
126
+
127
+ if (ret) {
128
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
129
+ goto out;
130
+ }
131
+
132
+ struct virtio_blk_zone_report zrp_hdr = (struct virtio_blk_zone_report) {
133
+ .nr_zones = cpu_to_le64(nz),
134
+ };
135
+ zrp_size = sizeof(struct virtio_blk_zone_report)
136
+ + sizeof(struct virtio_blk_zone_descriptor) * nz;
137
+ n = iov_from_buf(in_iov, in_num, 0, &zrp_hdr, sizeof(zrp_hdr));
138
+ if (n != sizeof(zrp_hdr)) {
139
+ virtio_error(vdev, "Driver provided input buffer that is too small!");
140
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
141
+ goto out;
142
+ }
143
+
144
+ for (size_t i = sizeof(zrp_hdr); i < zrp_size;
145
+ i += sizeof(struct virtio_blk_zone_descriptor), ++j) {
146
+ struct virtio_blk_zone_descriptor desc =
147
+ (struct virtio_blk_zone_descriptor) {
148
+ .z_start = cpu_to_le64(data->zone_report_data.zones[j].start
149
+ >> BDRV_SECTOR_BITS),
150
+ .z_cap = cpu_to_le64(data->zone_report_data.zones[j].cap
151
+ >> BDRV_SECTOR_BITS),
152
+ .z_wp = cpu_to_le64(data->zone_report_data.zones[j].wp
153
+ >> BDRV_SECTOR_BITS),
154
+ };
155
+
156
+ switch (data->zone_report_data.zones[j].type) {
157
+ case BLK_ZT_CONV:
158
+ desc.z_type = VIRTIO_BLK_ZT_CONV;
159
+ break;
160
+ case BLK_ZT_SWR:
161
+ desc.z_type = VIRTIO_BLK_ZT_SWR;
162
+ break;
163
+ case BLK_ZT_SWP:
164
+ desc.z_type = VIRTIO_BLK_ZT_SWP;
165
+ break;
166
+ default:
167
+ g_assert_not_reached();
168
+ }
169
+
170
+ switch (data->zone_report_data.zones[j].state) {
171
+ case BLK_ZS_RDONLY:
172
+ desc.z_state = VIRTIO_BLK_ZS_RDONLY;
173
+ break;
174
+ case BLK_ZS_OFFLINE:
175
+ desc.z_state = VIRTIO_BLK_ZS_OFFLINE;
176
+ break;
177
+ case BLK_ZS_EMPTY:
178
+ desc.z_state = VIRTIO_BLK_ZS_EMPTY;
179
+ break;
180
+ case BLK_ZS_CLOSED:
181
+ desc.z_state = VIRTIO_BLK_ZS_CLOSED;
182
+ break;
183
+ case BLK_ZS_FULL:
184
+ desc.z_state = VIRTIO_BLK_ZS_FULL;
185
+ break;
186
+ case BLK_ZS_EOPEN:
187
+ desc.z_state = VIRTIO_BLK_ZS_EOPEN;
188
+ break;
189
+ case BLK_ZS_IOPEN:
190
+ desc.z_state = VIRTIO_BLK_ZS_IOPEN;
191
+ break;
192
+ case BLK_ZS_NOT_WP:
193
+ desc.z_state = VIRTIO_BLK_ZS_NOT_WP;
194
+ break;
195
+ default:
196
+ g_assert_not_reached();
197
+ }
198
+
199
+ /* TODO: it takes O(n^2) time complexity. Optimizations required. */
200
+ n = iov_from_buf(in_iov, in_num, i, &desc, sizeof(desc));
201
+ if (n != sizeof(desc)) {
202
+ virtio_error(vdev, "Driver provided input buffer "
203
+ "for descriptors that is too small!");
204
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
205
+ }
206
+ }
207
+
208
+out:
209
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
210
+ virtio_blk_req_complete(req, err_status);
211
+ virtio_blk_free_request(req);
212
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
213
+ g_free(data->zone_report_data.zones);
214
+ g_free(data);
215
+}
216
+
217
+static void virtio_blk_handle_zone_report(VirtIOBlockReq *req,
218
+ struct iovec *in_iov,
219
+ unsigned in_num)
220
+{
221
+ VirtIOBlock *s = req->dev;
222
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
223
+ unsigned int nr_zones;
224
+ ZoneCmdData *data;
225
+ int64_t zone_size, offset;
226
+ uint8_t err_status;
227
+
228
+ if (req->in_len < sizeof(struct virtio_blk_inhdr) +
229
+ sizeof(struct virtio_blk_zone_report) +
230
+ sizeof(struct virtio_blk_zone_descriptor)) {
231
+ virtio_error(vdev, "in buffer too small for zone report");
232
+ return;
233
+ }
234
+
235
+ /* start byte offset of the zone report */
236
+ offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
237
+ if (!check_zoned_request(s, offset, 0, false, &err_status)) {
238
+ goto out;
239
+ }
240
+ nr_zones = (req->in_len - sizeof(struct virtio_blk_inhdr) -
241
+ sizeof(struct virtio_blk_zone_report)) /
242
+ sizeof(struct virtio_blk_zone_descriptor);
243
+
244
+ zone_size = sizeof(BlockZoneDescriptor) * nr_zones;
245
+ data = g_malloc(sizeof(ZoneCmdData));
246
+ data->req = req;
247
+ data->in_iov = in_iov;
248
+ data->in_num = in_num;
249
+ data->zone_report_data.nr_zones = nr_zones;
250
+ data->zone_report_data.zones = g_malloc(zone_size),
251
+
252
+ blk_aio_zone_report(s->blk, offset, &data->zone_report_data.nr_zones,
253
+ data->zone_report_data.zones,
254
+ virtio_blk_zone_report_complete, data);
255
+ return;
256
+out:
257
+ virtio_blk_req_complete(req, err_status);
258
+ virtio_blk_free_request(req);
259
+}
260
+
261
+static void virtio_blk_zone_mgmt_complete(void *opaque, int ret)
262
+{
263
+ VirtIOBlockReq *req = opaque;
264
+ VirtIOBlock *s = req->dev;
265
+ int8_t err_status = VIRTIO_BLK_S_OK;
266
+
267
+ if (ret) {
268
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
269
+ }
270
+
271
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
272
+ virtio_blk_req_complete(req, err_status);
273
+ virtio_blk_free_request(req);
274
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
275
+}
276
+
277
+static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
278
+{
279
+ VirtIOBlock *s = req->dev;
280
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
281
+ BlockDriverState *bs = blk_bs(s->blk);
282
+ int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
283
+ uint64_t len;
284
+ uint64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
285
+ uint8_t err_status = VIRTIO_BLK_S_OK;
286
+
287
+ uint32_t type = virtio_ldl_p(vdev, &req->out.type);
288
+ if (type == VIRTIO_BLK_T_ZONE_RESET_ALL) {
289
+ /* Entire drive capacity */
290
+ offset = 0;
291
+ len = capacity;
292
+ } else {
293
+ if (bs->bl.zone_size > capacity - offset) {
294
+ /* The zoned device allows the last smaller zone. */
295
+ len = capacity - bs->bl.zone_size * (bs->bl.nr_zones - 1);
296
+ } else {
297
+ len = bs->bl.zone_size;
298
+ }
299
+ }
300
+
301
+ if (!check_zoned_request(s, offset, len, false, &err_status)) {
302
+ goto out;
303
+ }
304
+
305
+ blk_aio_zone_mgmt(s->blk, op, offset, len,
306
+ virtio_blk_zone_mgmt_complete, req);
307
+
308
+ return 0;
309
+out:
310
+ virtio_blk_req_complete(req, err_status);
311
+ virtio_blk_free_request(req);
312
+ return err_status;
313
+}
314
+
315
+static void virtio_blk_zone_append_complete(void *opaque, int ret)
316
+{
317
+ ZoneCmdData *data = opaque;
318
+ VirtIOBlockReq *req = data->req;
319
+ VirtIOBlock *s = req->dev;
320
+ VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
321
+ int64_t append_sector, n;
322
+ uint8_t err_status = VIRTIO_BLK_S_OK;
323
+
324
+ if (ret) {
325
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
326
+ goto out;
327
+ }
328
+
329
+ virtio_stq_p(vdev, &append_sector,
330
+ data->zone_append_data.offset >> BDRV_SECTOR_BITS);
331
+ n = iov_from_buf(data->in_iov, data->in_num, 0, &append_sector,
332
+ sizeof(append_sector));
333
+ if (n != sizeof(append_sector)) {
334
+ virtio_error(vdev, "Driver provided input buffer less than size of "
335
+ "append_sector");
336
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
337
+ goto out;
338
+ }
339
+
340
+out:
341
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
342
+ virtio_blk_req_complete(req, err_status);
343
+ virtio_blk_free_request(req);
344
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
345
+ g_free(data);
346
+}
347
+
348
+static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
349
+ struct iovec *out_iov,
350
+ struct iovec *in_iov,
351
+ uint64_t out_num,
352
+ unsigned in_num) {
353
+ VirtIOBlock *s = req->dev;
354
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
355
+ uint8_t err_status = VIRTIO_BLK_S_OK;
356
+
357
+ int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
358
+ int64_t len = iov_size(out_iov, out_num);
359
+
360
+ if (!check_zoned_request(s, offset, len, true, &err_status)) {
361
+ goto out;
362
+ }
363
+
364
+ ZoneCmdData *data = g_malloc(sizeof(ZoneCmdData));
365
+ data->req = req;
366
+ data->in_iov = in_iov;
367
+ data->in_num = in_num;
368
+ data->zone_append_data.offset = offset;
369
+ qemu_iovec_init_external(&req->qiov, out_iov, out_num);
370
+ blk_aio_zone_append(s->blk, &data->zone_append_data.offset, &req->qiov, 0,
371
+ virtio_blk_zone_append_complete, data);
372
+ return 0;
373
+
374
+out:
375
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
376
+ virtio_blk_req_complete(req, err_status);
377
+ virtio_blk_free_request(req);
378
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
379
+ return err_status;
380
+}
381
+
382
static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
383
{
384
uint32_t type;
385
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
386
case VIRTIO_BLK_T_FLUSH:
387
virtio_blk_handle_flush(req, mrb);
388
break;
389
+ case VIRTIO_BLK_T_ZONE_REPORT:
390
+ virtio_blk_handle_zone_report(req, in_iov, in_num);
391
+ break;
392
+ case VIRTIO_BLK_T_ZONE_OPEN:
393
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_OPEN);
394
+ break;
395
+ case VIRTIO_BLK_T_ZONE_CLOSE:
396
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_CLOSE);
397
+ break;
398
+ case VIRTIO_BLK_T_ZONE_FINISH:
399
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_FINISH);
400
+ break;
401
+ case VIRTIO_BLK_T_ZONE_RESET:
402
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_RESET);
403
+ break;
404
+ case VIRTIO_BLK_T_ZONE_RESET_ALL:
405
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_RESET);
406
+ break;
407
case VIRTIO_BLK_T_SCSI_CMD:
408
virtio_blk_handle_scsi(req);
409
break;
410
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
411
virtio_blk_free_request(req);
412
break;
413
}
414
+ case VIRTIO_BLK_T_ZONE_APPEND & ~VIRTIO_BLK_T_OUT:
415
+ /*
416
+ * Passing out_iov/out_num and in_iov/in_num is not safe
417
+ * to access req->elem.out_sg directly because it may be
418
+ * modified by virtio_blk_handle_request().
419
+ */
420
+ virtio_blk_handle_zone_append(req, out_iov, in_iov, out_num, in_num);
421
+ break;
422
/*
423
* VIRTIO_BLK_T_DISCARD and VIRTIO_BLK_T_WRITE_ZEROES are defined with
424
* VIRTIO_BLK_T_OUT flag set. We masked this flag in the switch statement,
425
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
426
{
427
VirtIOBlock *s = VIRTIO_BLK(vdev);
428
BlockConf *conf = &s->conf.conf;
429
+ BlockDriverState *bs = blk_bs(s->blk);
430
struct virtio_blk_config blkcfg;
431
uint64_t capacity;
432
int64_t length;
433
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
434
blkcfg.write_zeroes_may_unmap = 1;
435
virtio_stl_p(vdev, &blkcfg.max_write_zeroes_seg, 1);
436
}
437
+ if (bs->bl.zoned != BLK_Z_NONE) {
438
+ switch (bs->bl.zoned) {
439
+ case BLK_Z_HM:
440
+ blkcfg.zoned.model = VIRTIO_BLK_Z_HM;
441
+ break;
442
+ case BLK_Z_HA:
443
+ blkcfg.zoned.model = VIRTIO_BLK_Z_HA;
444
+ break;
445
+ default:
446
+ g_assert_not_reached();
447
+ }
448
+
449
+ virtio_stl_p(vdev, &blkcfg.zoned.zone_sectors,
450
+ bs->bl.zone_size / 512);
451
+ virtio_stl_p(vdev, &blkcfg.zoned.max_active_zones,
452
+ bs->bl.max_active_zones);
453
+ virtio_stl_p(vdev, &blkcfg.zoned.max_open_zones,
454
+ bs->bl.max_open_zones);
455
+ virtio_stl_p(vdev, &blkcfg.zoned.write_granularity, blk_size);
456
+ virtio_stl_p(vdev, &blkcfg.zoned.max_append_sectors,
457
+ bs->bl.max_append_sectors);
458
+ } else {
459
+ blkcfg.zoned.model = VIRTIO_BLK_Z_NONE;
460
+ }
461
memcpy(config, &blkcfg, s->config_size);
462
}
463
464
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
465
VirtIODevice *vdev = VIRTIO_DEVICE(dev);
466
VirtIOBlock *s = VIRTIO_BLK(dev);
467
VirtIOBlkConf *conf = &s->conf;
468
+ BlockDriverState *bs = blk_bs(conf->conf.blk);
469
Error *err = NULL;
470
unsigned i;
471
472
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
473
return;
474
}
475
476
+ if (bs->bl.zoned != BLK_Z_NONE) {
477
+ virtio_add_feature(&s->host_features, VIRTIO_BLK_F_ZONED);
478
+ if (bs->bl.zoned == BLK_Z_HM) {
479
+ virtio_clear_feature(&s->host_features, VIRTIO_BLK_F_DISCARD);
480
+ }
481
+ }
482
+
483
if (virtio_has_feature(s->host_features, VIRTIO_BLK_F_DISCARD) &&
484
(!conf->max_discard_sectors ||
485
conf->max_discard_sectors > BDRV_REQUEST_MAX_SECTORS)) {
486
diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c
487
index XXXXXXX..XXXXXXX 100644
488
--- a/hw/virtio/virtio-qmp.c
489
+++ b/hw/virtio/virtio-qmp.c
490
@@ -XXX,XX +XXX,XX @@ static const qmp_virtio_feature_map_t virtio_blk_feature_map[] = {
491
"VIRTIO_BLK_F_DISCARD: Discard command supported"),
492
FEATURE_ENTRY(VIRTIO_BLK_F_WRITE_ZEROES, \
493
"VIRTIO_BLK_F_WRITE_ZEROES: Write zeroes command supported"),
494
+ FEATURE_ENTRY(VIRTIO_BLK_F_ZONED, \
495
+ "VIRTIO_BLK_F_ZONED: Zoned block devices"),
496
#ifndef VIRTIO_BLK_NO_LEGACY
497
FEATURE_ENTRY(VIRTIO_BLK_F_BARRIER, \
498
"VIRTIO_BLK_F_BARRIER: Request barriers supported"),
499
--
500
2.40.0
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Taking account of the new zone append write operation for zoned devices,
4
BLOCK_ACCT_ZONE_APPEND enum is introduced as other I/O request type (read,
5
write, flush).
6
7
Signed-off-by: Sam Li <faithilikerun@gmail.com>
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Message-Id: <20230407082528.18841-4-faithilikerun@gmail.com>
10
---
11
qapi/block-core.json | 68 ++++++++++++++++++++++++++++++++------
12
qapi/block.json | 4 +++
13
include/block/accounting.h | 1 +
14
block/qapi-sysemu.c | 11 ++++++
15
block/qapi.c | 18 ++++++++++
16
hw/block/virtio-blk.c | 4 +++
17
6 files changed, 95 insertions(+), 11 deletions(-)
18
19
diff --git a/qapi/block-core.json b/qapi/block-core.json
20
index XXXXXXX..XXXXXXX 100644
21
--- a/qapi/block-core.json
22
+++ b/qapi/block-core.json
23
@@ -XXX,XX +XXX,XX @@
24
# @min_wr_latency_ns: Minimum latency of write operations in the
25
# defined interval, in nanoseconds.
26
#
27
+# @min_zone_append_latency_ns: Minimum latency of zone append operations
28
+# in the defined interval, in nanoseconds
29
+# (since 8.1)
30
+#
31
# @min_flush_latency_ns: Minimum latency of flush operations in the
32
# defined interval, in nanoseconds.
33
#
34
@@ -XXX,XX +XXX,XX @@
35
# @max_wr_latency_ns: Maximum latency of write operations in the
36
# defined interval, in nanoseconds.
37
#
38
+# @max_zone_append_latency_ns: Maximum latency of zone append operations
39
+# in the defined interval, in nanoseconds
40
+# (since 8.1)
41
+#
42
# @max_flush_latency_ns: Maximum latency of flush operations in the
43
# defined interval, in nanoseconds.
44
#
45
@@ -XXX,XX +XXX,XX @@
46
# @avg_wr_latency_ns: Average latency of write operations in the
47
# defined interval, in nanoseconds.
48
#
49
+# @avg_zone_append_latency_ns: Average latency of zone append operations
50
+# in the defined interval, in nanoseconds
51
+# (since 8.1)
52
+#
53
# @avg_flush_latency_ns: Average latency of flush operations in the
54
# defined interval, in nanoseconds.
55
#
56
@@ -XXX,XX +XXX,XX @@
57
# @avg_wr_queue_depth: Average number of pending write operations
58
# in the defined interval.
59
#
60
+# @avg_zone_append_queue_depth: Average number of pending zone append
61
+# operations in the defined interval
62
+# (since 8.1).
63
+#
64
# Since: 2.5
65
##
66
{ 'struct': 'BlockDeviceTimedStats',
67
'data': { 'interval_length': 'int', 'min_rd_latency_ns': 'int',
68
'max_rd_latency_ns': 'int', 'avg_rd_latency_ns': 'int',
69
'min_wr_latency_ns': 'int', 'max_wr_latency_ns': 'int',
70
- 'avg_wr_latency_ns': 'int', 'min_flush_latency_ns': 'int',
71
- 'max_flush_latency_ns': 'int', 'avg_flush_latency_ns': 'int',
72
- 'avg_rd_queue_depth': 'number', 'avg_wr_queue_depth': 'number' } }
73
+ 'avg_wr_latency_ns': 'int', 'min_zone_append_latency_ns': 'int',
74
+ 'max_zone_append_latency_ns': 'int',
75
+ 'avg_zone_append_latency_ns': 'int',
76
+ 'min_flush_latency_ns': 'int', 'max_flush_latency_ns': 'int',
77
+ 'avg_flush_latency_ns': 'int', 'avg_rd_queue_depth': 'number',
78
+ 'avg_wr_queue_depth': 'number',
79
+ 'avg_zone_append_queue_depth': 'number' } }
80
81
##
82
# @BlockDeviceStats:
83
@@ -XXX,XX +XXX,XX @@
84
#
85
# @wr_bytes: The number of bytes written by the device.
86
#
87
+# @zone_append_bytes: The number of bytes appended by the zoned devices
88
+# (since 8.1)
89
+#
90
# @unmap_bytes: The number of bytes unmapped by the device (Since 4.2)
91
#
92
# @rd_operations: The number of read operations performed by the device.
93
#
94
# @wr_operations: The number of write operations performed by the device.
95
#
96
+# @zone_append_operations: The number of zone append operations performed
97
+# by the zoned devices (since 8.1)
98
+#
99
# @flush_operations: The number of cache flush operations performed by the
100
# device (since 0.15)
101
#
102
@@ -XXX,XX +XXX,XX @@
103
#
104
# @wr_total_time_ns: Total time spent on writes in nanoseconds (since 0.15).
105
#
106
+# @zone_append_total_time_ns: Total time spent on zone append writes
107
+# in nanoseconds (since 8.1)
108
+#
109
# @flush_total_time_ns: Total time spent on cache flushes in nanoseconds
110
# (since 0.15).
111
#
112
@@ -XXX,XX +XXX,XX @@
113
# @wr_merged: Number of write requests that have been merged into another
114
# request (Since 2.3).
115
#
116
+# @zone_append_merged: Number of zone append requests that have been merged
117
+# into another request (since 8.1)
118
+#
119
# @unmap_merged: Number of unmap requests that have been merged into another
120
# request (Since 4.2)
121
#
122
@@ -XXX,XX +XXX,XX @@
123
# @failed_wr_operations: The number of failed write operations
124
# performed by the device (Since 2.5)
125
#
126
+# @failed_zone_append_operations: The number of failed zone append write
127
+# operations performed by the zoned devices
128
+# (since 8.1)
129
+#
130
# @failed_flush_operations: The number of failed flush operations
131
# performed by the device (Since 2.5)
132
#
133
@@ -XXX,XX +XXX,XX @@
134
# @invalid_wr_operations: The number of invalid write operations
135
# performed by the device (Since 2.5)
136
#
137
+# @invalid_zone_append_operations: The number of invalid zone append operations
138
+# performed by the zoned device (since 8.1)
139
+#
140
# @invalid_flush_operations: The number of invalid flush operations
141
# performed by the device (Since 2.5)
142
#
143
@@ -XXX,XX +XXX,XX @@
144
#
145
# @wr_latency_histogram: @BlockLatencyHistogramInfo. (Since 4.0)
146
#
147
+# @zone_append_latency_histogram: @BlockLatencyHistogramInfo. (since 8.1)
148
+#
149
# @flush_latency_histogram: @BlockLatencyHistogramInfo. (Since 4.0)
150
#
151
# Since: 0.14
152
##
153
{ 'struct': 'BlockDeviceStats',
154
- 'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'unmap_bytes' : 'int',
155
- 'rd_operations': 'int', 'wr_operations': 'int',
156
+ 'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'zone_append_bytes': 'int',
157
+ 'unmap_bytes' : 'int', 'rd_operations': 'int',
158
+ 'wr_operations': 'int', 'zone_append_operations': 'int',
159
'flush_operations': 'int', 'unmap_operations': 'int',
160
'rd_total_time_ns': 'int', 'wr_total_time_ns': 'int',
161
- 'flush_total_time_ns': 'int', 'unmap_total_time_ns': 'int',
162
- 'wr_highest_offset': 'int',
163
- 'rd_merged': 'int', 'wr_merged': 'int', 'unmap_merged': 'int',
164
- '*idle_time_ns': 'int',
165
+ 'zone_append_total_time_ns': 'int', 'flush_total_time_ns': 'int',
166
+ 'unmap_total_time_ns': 'int', 'wr_highest_offset': 'int',
167
+ 'rd_merged': 'int', 'wr_merged': 'int', 'zone_append_merged': 'int',
168
+ 'unmap_merged': 'int', '*idle_time_ns': 'int',
169
'failed_rd_operations': 'int', 'failed_wr_operations': 'int',
170
- 'failed_flush_operations': 'int', 'failed_unmap_operations': 'int',
171
- 'invalid_rd_operations': 'int', 'invalid_wr_operations': 'int',
172
+ 'failed_zone_append_operations': 'int',
173
+ 'failed_flush_operations': 'int',
174
+ 'failed_unmap_operations': 'int', 'invalid_rd_operations': 'int',
175
+ 'invalid_wr_operations': 'int',
176
+ 'invalid_zone_append_operations': 'int',
177
'invalid_flush_operations': 'int', 'invalid_unmap_operations': 'int',
178
'account_invalid': 'bool', 'account_failed': 'bool',
179
'timed_stats': ['BlockDeviceTimedStats'],
180
'*rd_latency_histogram': 'BlockLatencyHistogramInfo',
181
'*wr_latency_histogram': 'BlockLatencyHistogramInfo',
182
+ '*zone_append_latency_histogram': 'BlockLatencyHistogramInfo',
183
'*flush_latency_histogram': 'BlockLatencyHistogramInfo' } }
184
185
##
186
diff --git a/qapi/block.json b/qapi/block.json
187
index XXXXXXX..XXXXXXX 100644
188
--- a/qapi/block.json
189
+++ b/qapi/block.json
190
@@ -XXX,XX +XXX,XX @@
191
# @boundaries-write: list of interval boundary values for write latency
192
# histogram.
193
#
194
+# @boundaries-zap: list of interval boundary values for zone append write
195
+# latency histogram.
196
+#
197
# @boundaries-flush: list of interval boundary values for flush latency
198
# histogram.
199
#
200
@@ -XXX,XX +XXX,XX @@
201
'*boundaries': ['uint64'],
202
'*boundaries-read': ['uint64'],
203
'*boundaries-write': ['uint64'],
204
+ '*boundaries-zap': ['uint64'],
205
'*boundaries-flush': ['uint64'] },
206
'allow-preconfig': true }
207
diff --git a/include/block/accounting.h b/include/block/accounting.h
208
index XXXXXXX..XXXXXXX 100644
209
--- a/include/block/accounting.h
210
+++ b/include/block/accounting.h
211
@@ -XXX,XX +XXX,XX @@ enum BlockAcctType {
212
BLOCK_ACCT_READ,
213
BLOCK_ACCT_WRITE,
214
BLOCK_ACCT_FLUSH,
215
+ BLOCK_ACCT_ZONE_APPEND,
216
BLOCK_ACCT_UNMAP,
217
BLOCK_MAX_IOTYPE,
218
};
219
diff --git a/block/qapi-sysemu.c b/block/qapi-sysemu.c
220
index XXXXXXX..XXXXXXX 100644
221
--- a/block/qapi-sysemu.c
222
+++ b/block/qapi-sysemu.c
223
@@ -XXX,XX +XXX,XX @@ void qmp_block_latency_histogram_set(
224
bool has_boundaries, uint64List *boundaries,
225
bool has_boundaries_read, uint64List *boundaries_read,
226
bool has_boundaries_write, uint64List *boundaries_write,
227
+ bool has_boundaries_append, uint64List *boundaries_append,
228
bool has_boundaries_flush, uint64List *boundaries_flush,
229
Error **errp)
230
{
231
@@ -XXX,XX +XXX,XX @@ void qmp_block_latency_histogram_set(
232
}
233
}
234
235
+ if (has_boundaries || has_boundaries_append) {
236
+ ret = block_latency_histogram_set(
237
+ stats, BLOCK_ACCT_ZONE_APPEND,
238
+ has_boundaries_append ? boundaries_append : boundaries);
239
+ if (ret) {
240
+ error_setg(errp, "Device '%s' set append write boundaries fail", id);
241
+ return;
242
+ }
243
+ }
244
+
245
if (has_boundaries || has_boundaries_flush) {
246
ret = block_latency_histogram_set(
247
stats, BLOCK_ACCT_FLUSH,
248
diff --git a/block/qapi.c b/block/qapi.c
249
index XXXXXXX..XXXXXXX 100644
250
--- a/block/qapi.c
251
+++ b/block/qapi.c
252
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
253
254
ds->rd_bytes = stats->nr_bytes[BLOCK_ACCT_READ];
255
ds->wr_bytes = stats->nr_bytes[BLOCK_ACCT_WRITE];
256
+ ds->zone_append_bytes = stats->nr_bytes[BLOCK_ACCT_ZONE_APPEND];
257
ds->unmap_bytes = stats->nr_bytes[BLOCK_ACCT_UNMAP];
258
ds->rd_operations = stats->nr_ops[BLOCK_ACCT_READ];
259
ds->wr_operations = stats->nr_ops[BLOCK_ACCT_WRITE];
260
+ ds->zone_append_operations = stats->nr_ops[BLOCK_ACCT_ZONE_APPEND];
261
ds->unmap_operations = stats->nr_ops[BLOCK_ACCT_UNMAP];
262
263
ds->failed_rd_operations = stats->failed_ops[BLOCK_ACCT_READ];
264
ds->failed_wr_operations = stats->failed_ops[BLOCK_ACCT_WRITE];
265
+ ds->failed_zone_append_operations =
266
+ stats->failed_ops[BLOCK_ACCT_ZONE_APPEND];
267
ds->failed_flush_operations = stats->failed_ops[BLOCK_ACCT_FLUSH];
268
ds->failed_unmap_operations = stats->failed_ops[BLOCK_ACCT_UNMAP];
269
270
ds->invalid_rd_operations = stats->invalid_ops[BLOCK_ACCT_READ];
271
ds->invalid_wr_operations = stats->invalid_ops[BLOCK_ACCT_WRITE];
272
+ ds->invalid_zone_append_operations =
273
+ stats->invalid_ops[BLOCK_ACCT_ZONE_APPEND];
274
ds->invalid_flush_operations =
275
stats->invalid_ops[BLOCK_ACCT_FLUSH];
276
ds->invalid_unmap_operations = stats->invalid_ops[BLOCK_ACCT_UNMAP];
277
278
ds->rd_merged = stats->merged[BLOCK_ACCT_READ];
279
ds->wr_merged = stats->merged[BLOCK_ACCT_WRITE];
280
+ ds->zone_append_merged = stats->merged[BLOCK_ACCT_ZONE_APPEND];
281
ds->unmap_merged = stats->merged[BLOCK_ACCT_UNMAP];
282
ds->flush_operations = stats->nr_ops[BLOCK_ACCT_FLUSH];
283
ds->wr_total_time_ns = stats->total_time_ns[BLOCK_ACCT_WRITE];
284
+ ds->zone_append_total_time_ns =
285
+ stats->total_time_ns[BLOCK_ACCT_ZONE_APPEND];
286
ds->rd_total_time_ns = stats->total_time_ns[BLOCK_ACCT_READ];
287
ds->flush_total_time_ns = stats->total_time_ns[BLOCK_ACCT_FLUSH];
288
ds->unmap_total_time_ns = stats->total_time_ns[BLOCK_ACCT_UNMAP];
289
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
290
291
TimedAverage *rd = &ts->latency[BLOCK_ACCT_READ];
292
TimedAverage *wr = &ts->latency[BLOCK_ACCT_WRITE];
293
+ TimedAverage *zap = &ts->latency[BLOCK_ACCT_ZONE_APPEND];
294
TimedAverage *fl = &ts->latency[BLOCK_ACCT_FLUSH];
295
296
dev_stats->interval_length = ts->interval_length;
297
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
298
dev_stats->max_wr_latency_ns = timed_average_max(wr);
299
dev_stats->avg_wr_latency_ns = timed_average_avg(wr);
300
301
+ dev_stats->min_zone_append_latency_ns = timed_average_min(zap);
302
+ dev_stats->max_zone_append_latency_ns = timed_average_max(zap);
303
+ dev_stats->avg_zone_append_latency_ns = timed_average_avg(zap);
304
+
305
dev_stats->min_flush_latency_ns = timed_average_min(fl);
306
dev_stats->max_flush_latency_ns = timed_average_max(fl);
307
dev_stats->avg_flush_latency_ns = timed_average_avg(fl);
308
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
309
block_acct_queue_depth(ts, BLOCK_ACCT_READ);
310
dev_stats->avg_wr_queue_depth =
311
block_acct_queue_depth(ts, BLOCK_ACCT_WRITE);
312
+ dev_stats->avg_zone_append_queue_depth =
313
+ block_acct_queue_depth(ts, BLOCK_ACCT_ZONE_APPEND);
314
315
QAPI_LIST_PREPEND(ds->timed_stats, dev_stats);
316
}
317
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
318
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_READ]);
319
ds->wr_latency_histogram
320
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_WRITE]);
321
+ ds->zone_append_latency_histogram
322
+ = bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_ZONE_APPEND]);
323
ds->flush_latency_histogram
324
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_FLUSH]);
325
}
326
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
327
index XXXXXXX..XXXXXXX 100644
328
--- a/hw/block/virtio-blk.c
329
+++ b/hw/block/virtio-blk.c
330
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
331
data->in_num = in_num;
332
data->zone_append_data.offset = offset;
333
qemu_iovec_init_external(&req->qiov, out_iov, out_num);
334
+
335
+ block_acct_start(blk_get_stats(s->blk), &req->acct, len,
336
+ BLOCK_ACCT_ZONE_APPEND);
337
+
338
blk_aio_zone_append(s->blk, &data->zone_append_data.offset, &req->qiov, 0,
339
virtio_blk_zone_append_complete, data);
340
return 0;
341
--
342
2.40.0
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Message-Id: <20230407082528.18841-5-faithilikerun@gmail.com>
7
---
8
hw/block/virtio-blk.c | 12 ++++++++++++
9
hw/block/trace-events | 7 +++++++
10
2 files changed, 19 insertions(+)
11
12
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
13
index XXXXXXX..XXXXXXX 100644
14
--- a/hw/block/virtio-blk.c
15
+++ b/hw/block/virtio-blk.c
16
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_report_complete(void *opaque, int ret)
17
int64_t nz = data->zone_report_data.nr_zones;
18
int8_t err_status = VIRTIO_BLK_S_OK;
19
20
+ trace_virtio_blk_zone_report_complete(vdev, req, nz, ret);
21
if (ret) {
22
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
23
goto out;
24
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_handle_zone_report(VirtIOBlockReq *req,
25
nr_zones = (req->in_len - sizeof(struct virtio_blk_inhdr) -
26
sizeof(struct virtio_blk_zone_report)) /
27
sizeof(struct virtio_blk_zone_descriptor);
28
+ trace_virtio_blk_handle_zone_report(vdev, req,
29
+ offset >> BDRV_SECTOR_BITS, nr_zones);
30
31
zone_size = sizeof(BlockZoneDescriptor) * nr_zones;
32
data = g_malloc(sizeof(ZoneCmdData));
33
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_mgmt_complete(void *opaque, int ret)
34
{
35
VirtIOBlockReq *req = opaque;
36
VirtIOBlock *s = req->dev;
37
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
38
int8_t err_status = VIRTIO_BLK_S_OK;
39
+ trace_virtio_blk_zone_mgmt_complete(vdev, req,ret);
40
41
if (ret) {
42
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
43
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
44
/* Entire drive capacity */
45
offset = 0;
46
len = capacity;
47
+ trace_virtio_blk_handle_zone_reset_all(vdev, req, 0,
48
+ bs->total_sectors);
49
} else {
50
if (bs->bl.zone_size > capacity - offset) {
51
/* The zoned device allows the last smaller zone. */
52
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
53
} else {
54
len = bs->bl.zone_size;
55
}
56
+ trace_virtio_blk_handle_zone_mgmt(vdev, req, op,
57
+ offset >> BDRV_SECTOR_BITS,
58
+ len >> BDRV_SECTOR_BITS);
59
}
60
61
if (!check_zoned_request(s, offset, len, false, &err_status)) {
62
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_append_complete(void *opaque, int ret)
63
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
64
goto out;
65
}
66
+ trace_virtio_blk_zone_append_complete(vdev, req, append_sector, ret);
67
68
out:
69
aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
70
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
71
int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
72
int64_t len = iov_size(out_iov, out_num);
73
74
+ trace_virtio_blk_handle_zone_append(vdev, req, offset >> BDRV_SECTOR_BITS);
75
if (!check_zoned_request(s, offset, len, true, &err_status)) {
76
goto out;
77
}
78
diff --git a/hw/block/trace-events b/hw/block/trace-events
79
index XXXXXXX..XXXXXXX 100644
80
--- a/hw/block/trace-events
81
+++ b/hw/block/trace-events
82
@@ -XXX,XX +XXX,XX @@ pflash_write_unknown(const char *name, uint8_t cmd) "%s: unknown command 0x%02x"
83
# virtio-blk.c
84
virtio_blk_req_complete(void *vdev, void *req, int status) "vdev %p req %p status %d"
85
virtio_blk_rw_complete(void *vdev, void *req, int ret) "vdev %p req %p ret %d"
86
+virtio_blk_zone_report_complete(void *vdev, void *req, unsigned int nr_zones, int ret) "vdev %p req %p nr_zones %u ret %d"
87
+virtio_blk_zone_mgmt_complete(void *vdev, void *req, int ret) "vdev %p req %p ret %d"
88
+virtio_blk_zone_append_complete(void *vdev, void *req, int64_t sector, int ret) "vdev %p req %p, append sector 0x%" PRIx64 " ret %d"
89
virtio_blk_handle_write(void *vdev, void *req, uint64_t sector, size_t nsectors) "vdev %p req %p sector %"PRIu64" nsectors %zu"
90
virtio_blk_handle_read(void *vdev, void *req, uint64_t sector, size_t nsectors) "vdev %p req %p sector %"PRIu64" nsectors %zu"
91
virtio_blk_submit_multireq(void *vdev, void *mrb, int start, int num_reqs, uint64_t offset, size_t size, bool is_write) "vdev %p mrb %p start %d num_reqs %d offset %"PRIu64" size %zu is_write %d"
92
+virtio_blk_handle_zone_report(void *vdev, void *req, int64_t sector, unsigned int nr_zones) "vdev %p req %p sector 0x%" PRIx64 " nr_zones %u"
93
+virtio_blk_handle_zone_mgmt(void *vdev, void *req, uint8_t op, int64_t sector, int64_t len) "vdev %p req %p op 0x%x sector 0x%" PRIx64 " len 0x%" PRIx64 ""
94
+virtio_blk_handle_zone_reset_all(void *vdev, void *req, int64_t sector, int64_t len) "vdev %p req %p sector 0x%" PRIx64 " cap 0x%" PRIx64 ""
95
+virtio_blk_handle_zone_append(void *vdev, void *req, int64_t sector) "vdev %p req %p, append sector 0x%" PRIx64 ""
96
97
# hd-geometry.c
98
hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p LCHS %d %d %d"
99
--
100
2.40.0
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Add the documentation about the example of using virtio-blk driver
4
to pass the zoned block devices through to the guest.
5
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
[Fix rST syntax
9
--Stefan]
10
Message-Id: <20230407082528.18841-6-faithilikerun@gmail.com>
11
---
12
docs/devel/zoned-storage.rst | 19 +++++++++++++++++++
13
1 file changed, 19 insertions(+)
14
15
diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
16
index XXXXXXX..XXXXXXX 100644
17
--- a/docs/devel/zoned-storage.rst
18
+++ b/docs/devel/zoned-storage.rst
19
@@ -XXX,XX +XXX,XX @@ APIs for zoned storage emulation or testing.
20
For example, to test zone_report on a null_blk device using qemu-io is::
21
22
$ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0 -c "zrp offset nr_zones"
23
+
24
+To expose the host's zoned block device through virtio-blk, the command line
25
+can be (includes the -device parameter)::
26
+
27
+ -blockdev node-name=drive0,driver=host_device,filename=/dev/nullb0,cache.direct=on \
28
+ -device virtio-blk-pci,drive=drive0
29
+
30
+Or only use the -drive parameter::
31
+
32
+ -driver driver=host_device,file=/dev/nullb0,if=virtio,cache.direct=on
33
+
34
+Additionally, QEMU has several ways of supporting zoned storage, including:
35
+(1) Using virtio-scsi: --device scsi-block allows for the passing through of
36
+SCSI ZBC devices, enabling the attachment of ZBC or ZAC HDDs to QEMU.
37
+(2) PCI device pass-through: While NVMe ZNS emulation is available for testing
38
+purposes, it cannot yet pass through a zoned device from the host. To pass on
39
+the NVMe ZNS device to the guest, use VFIO PCI pass the entire NVMe PCI adapter
40
+through to the guest. Likewise, an HDD HBA can be passed on to QEMU all HDDs
41
+attached to the HBA.
42
--
43
2.40.0
diff view generated by jsdifflib