1
The following changes since commit 05d50ba2d4668d43a835c5a502efdec9b92646e6:
1
The following changes since commit c6a5fc2ac76c5ab709896ee1b0edd33685a67ed1:
2
2
3
Merge tag 'migration-20230427-pull-request' of https://gitlab.com/juan.quintela/qemu into staging (2023-04-28 08:35:06 +0100)
3
decodetree: Add --output-null for meson testing (2023-05-31 19:56:42 -0700)
4
4
5
are available in the Git repository at:
5
are available in the Git repository at:
6
6
7
https://gitlab.com/stefanha/qemu.git tags/block-pull-request
7
https://gitlab.com/stefanha/qemu.git tags/block-pull-request
8
8
9
for you to fetch changes up to d3c760be786571d83d5cea01953e543df4d76f51:
9
for you to fetch changes up to 98b126f5e3228a346c774e569e26689943b401dd:
10
10
11
docs/zoned-storage:add zoned emulation use case (2023-04-28 08:34:07 -0400)
11
qapi: add '@fdset' feature for BlockdevOptionsVirtioBlkVhostVdpa (2023-06-01 11:08:21 -0400)
12
12
13
----------------------------------------------------------------
13
----------------------------------------------------------------
14
Pull request
14
Pull request
15
15
16
This pull request contains Sam Li's virtio-blk zoned storage work. These
16
- Stefano Garzarella's blkio block driver 'fd' parameter
17
patches were dropped from my previous block pull request due to CI failures.
17
- My thread-local blk_io_plug() series
18
18
19
----------------------------------------------------------------
19
----------------------------------------------------------------
20
20
21
Sam Li (17):
21
Stefan Hajnoczi (6):
22
block/block-common: add zoned device structs
22
block: add blk_io_plug_call() API
23
block/file-posix: introduce helper functions for sysfs attributes
23
block/nvme: convert to blk_io_plug_call() API
24
block/block-backend: add block layer APIs resembling Linux
24
block/blkio: convert to blk_io_plug_call() API
25
ZonedBlockDevice ioctls
25
block/io_uring: convert to blk_io_plug_call() API
26
block/raw-format: add zone operations to pass through requests
26
block/linux-aio: convert to blk_io_plug_call() API
27
block: add zoned BlockDriver check to block layer
27
block: remove bdrv_co_io_plug() API
28
iotests: test new zone operations
29
block: add some trace events for new block layer APIs
30
docs/zoned-storage: add zoned device documentation
31
file-posix: add tracking of the zone write pointers
32
block: introduce zone append write for zoned devices
33
qemu-iotests: test zone append operation
34
block: add some trace events for zone append
35
include: update virtio_blk headers to v6.3-rc1
36
virtio-blk: add zoned storage emulation for zoned devices
37
block: add accounting for zone append operation
38
virtio-blk: add some trace events for zoned emulation
39
docs/zoned-storage:add zoned emulation use case
40
28
41
docs/devel/index-api.rst | 1 +
29
Stefano Garzarella (2):
42
docs/devel/zoned-storage.rst | 62 ++
30
block/blkio: use qemu_open() to support fd passing for virtio-blk
43
qapi/block-core.json | 68 +-
31
qapi: add '@fdset' feature for BlockdevOptionsVirtioBlkVhostVdpa
44
qapi/block.json | 4 +
32
45
meson.build | 4 +
33
MAINTAINERS | 1 +
46
include/block/accounting.h | 1 +
34
qapi/block-core.json | 6 ++
47
include/block/block-common.h | 57 ++
35
meson.build | 4 +
48
include/block/block-io.h | 13 +
36
include/block/block-io.h | 3 -
49
include/block/block_int-common.h | 37 +
37
include/block/block_int-common.h | 11 ---
50
include/block/raw-aio.h | 8 +-
38
include/block/raw-aio.h | 14 ---
51
include/standard-headers/drm/drm_fourcc.h | 12 +
39
include/sysemu/block-backend-io.h | 13 +--
52
include/standard-headers/linux/ethtool.h | 48 +-
40
block/blkio.c | 96 ++++++++++++------
53
include/standard-headers/linux/fuse.h | 45 +-
41
block/block-backend.c | 22 -----
54
include/standard-headers/linux/pci_regs.h | 1 +
42
block/file-posix.c | 38 -------
55
include/standard-headers/linux/vhost_types.h | 2 +
43
block/io.c | 37 -------
56
include/standard-headers/linux/virtio_blk.h | 105 +++
44
block/io_uring.c | 44 ++++-----
57
include/sysemu/block-backend-io.h | 27 +
45
block/linux-aio.c | 41 +++-----
58
linux-headers/asm-arm64/kvm.h | 1 +
46
block/nvme.c | 44 +++------
59
linux-headers/asm-x86/kvm.h | 34 +-
47
block/plug.c | 159 ++++++++++++++++++++++++++++++
60
linux-headers/linux/kvm.h | 9 +
48
hw/block/dataplane/xen-block.c | 8 +-
61
linux-headers/linux/vfio.h | 15 +-
49
hw/block/virtio-blk.c | 4 +-
62
linux-headers/linux/vhost.h | 8 +
50
hw/scsi/virtio-scsi.c | 6 +-
63
block.c | 19 +
51
block/meson.build | 1 +
64
block/block-backend.c | 198 ++++++
52
block/trace-events | 6 +-
65
block/file-posix.c | 696 +++++++++++++++++--
53
20 files changed, 293 insertions(+), 265 deletions(-)
66
block/io.c | 68 ++
54
create mode 100644 block/plug.c
67
block/io_uring.c | 4 +
68
block/linux-aio.c | 3 +
69
block/qapi-sysemu.c | 11 +
70
block/qapi.c | 18 +
71
block/raw-format.c | 26 +
72
hw/block/virtio-blk-common.c | 2 +
73
hw/block/virtio-blk.c | 405 +++++++++++
74
hw/virtio/virtio-qmp.c | 2 +
75
qemu-io-cmds.c | 224 ++++++
76
block/trace-events | 4 +
77
docs/system/qemu-block-drivers.rst.inc | 6 +
78
hw/block/trace-events | 7 +
79
tests/qemu-iotests/tests/zoned | 105 +++
80
tests/qemu-iotests/tests/zoned.out | 69 ++
81
40 files changed, 2361 insertions(+), 68 deletions(-)
82
create mode 100644 docs/devel/zoned-storage.rst
83
create mode 100755 tests/qemu-iotests/tests/zoned
84
create mode 100644 tests/qemu-iotests/tests/zoned.out
85
55
86
--
56
--
87
2.40.0
57
2.40.1
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
6
Reviewed-by: Hannes Reinecke <hare@suse.de>
7
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
8
Acked-by: Kevin Wolf <kwolf@redhat.com>
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Message-id: 20230427172019.3345-2-faithilikerun@gmail.com
11
Message-id: 20230324090605.28361-2-faithilikerun@gmail.com
12
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
13
<philmd@linaro.org>.
14
--Stefan]
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
16
---
17
include/block/block-common.h | 43 ++++++++++++++++++++++++++++++++++++
18
1 file changed, 43 insertions(+)
19
20
diff --git a/include/block/block-common.h b/include/block/block-common.h
21
index XXXXXXX..XXXXXXX 100644
22
--- a/include/block/block-common.h
23
+++ b/include/block/block-common.h
24
@@ -XXX,XX +XXX,XX @@ typedef struct BlockDriver BlockDriver;
25
typedef struct BdrvChild BdrvChild;
26
typedef struct BdrvChildClass BdrvChildClass;
27
28
+typedef enum BlockZoneOp {
29
+ BLK_ZO_OPEN,
30
+ BLK_ZO_CLOSE,
31
+ BLK_ZO_FINISH,
32
+ BLK_ZO_RESET,
33
+} BlockZoneOp;
34
+
35
+typedef enum BlockZoneModel {
36
+ BLK_Z_NONE = 0x0, /* Regular block device */
37
+ BLK_Z_HM = 0x1, /* Host-managed zoned block device */
38
+ BLK_Z_HA = 0x2, /* Host-aware zoned block device */
39
+} BlockZoneModel;
40
+
41
+typedef enum BlockZoneState {
42
+ BLK_ZS_NOT_WP = 0x0,
43
+ BLK_ZS_EMPTY = 0x1,
44
+ BLK_ZS_IOPEN = 0x2,
45
+ BLK_ZS_EOPEN = 0x3,
46
+ BLK_ZS_CLOSED = 0x4,
47
+ BLK_ZS_RDONLY = 0xD,
48
+ BLK_ZS_FULL = 0xE,
49
+ BLK_ZS_OFFLINE = 0xF,
50
+} BlockZoneState;
51
+
52
+typedef enum BlockZoneType {
53
+ BLK_ZT_CONV = 0x1, /* Conventional random writes supported */
54
+ BLK_ZT_SWR = 0x2, /* Sequential writes required */
55
+ BLK_ZT_SWP = 0x3, /* Sequential writes preferred */
56
+} BlockZoneType;
57
+
58
+/*
59
+ * Zone descriptor data structure.
60
+ * Provides information on a zone with all position and size values in bytes.
61
+ */
62
+typedef struct BlockZoneDescriptor {
63
+ uint64_t start;
64
+ uint64_t length;
65
+ uint64_t cap;
66
+ uint64_t wp;
67
+ BlockZoneType type;
68
+ BlockZoneState state;
69
+} BlockZoneDescriptor;
70
+
71
typedef struct BlockDriverInfo {
72
/* in bytes, 0 if irrelevant */
73
int cluster_size;
74
--
75
2.40.0
76
77
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Use get_sysfs_str_val() to get the string value of device
4
zoned model. Then get_sysfs_zoned_model() can convert it to
5
BlockZoneModel type of QEMU.
6
7
Use get_sysfs_long_val() to get the long value of zoned device
8
information.
9
10
Signed-off-by: Sam Li <faithilikerun@gmail.com>
11
Reviewed-by: Hannes Reinecke <hare@suse.de>
12
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
14
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
15
Acked-by: Kevin Wolf <kwolf@redhat.com>
16
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
17
Message-id: 20230427172019.3345-3-faithilikerun@gmail.com
18
Message-id: 20230324090605.28361-3-faithilikerun@gmail.com
19
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
20
<philmd@linaro.org>.
21
--Stefan]
22
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
23
---
24
include/block/block_int-common.h | 3 +
25
block/file-posix.c | 139 ++++++++++++++++++++++---------
26
2 files changed, 104 insertions(+), 38 deletions(-)
27
28
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
29
index XXXXXXX..XXXXXXX 100644
30
--- a/include/block/block_int-common.h
31
+++ b/include/block/block_int-common.h
32
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
33
* an explicit monitor command to load the disk inside the guest).
34
*/
35
bool has_variable_length;
36
+
37
+ /* device zone model */
38
+ BlockZoneModel zoned;
39
} BlockLimits;
40
41
typedef struct BdrvOpBlocker BdrvOpBlocker;
42
diff --git a/block/file-posix.c b/block/file-posix.c
43
index XXXXXXX..XXXXXXX 100644
44
--- a/block/file-posix.c
45
+++ b/block/file-posix.c
46
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_hw_transfer(int fd, struct stat *st)
47
#endif
48
}
49
50
-static int hdev_get_max_segments(int fd, struct stat *st)
51
+/*
52
+ * Get a sysfs attribute value as character string.
53
+ */
54
+static int get_sysfs_str_val(struct stat *st, const char *attribute,
55
+ char **val) {
56
+#ifdef CONFIG_LINUX
57
+ g_autofree char *sysfspath = NULL;
58
+ int ret;
59
+ size_t len;
60
+
61
+ if (!S_ISBLK(st->st_mode)) {
62
+ return -ENOTSUP;
63
+ }
64
+
65
+ sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s",
66
+ major(st->st_rdev), minor(st->st_rdev),
67
+ attribute);
68
+ ret = g_file_get_contents(sysfspath, val, &len, NULL);
69
+ if (ret == -1) {
70
+ return -ENOENT;
71
+ }
72
+
73
+ /* The file is ended with '\n' */
74
+ char *p;
75
+ p = *val;
76
+ if (*(p + len - 1) == '\n') {
77
+ *(p + len - 1) = '\0';
78
+ }
79
+ return ret;
80
+#else
81
+ return -ENOTSUP;
82
+#endif
83
+}
84
+
85
+static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
86
+{
87
+ g_autofree char *val = NULL;
88
+ int ret;
89
+
90
+ ret = get_sysfs_str_val(st, "zoned", &val);
91
+ if (ret < 0) {
92
+ return ret;
93
+ }
94
+
95
+ if (strcmp(val, "host-managed") == 0) {
96
+ *zoned = BLK_Z_HM;
97
+ } else if (strcmp(val, "host-aware") == 0) {
98
+ *zoned = BLK_Z_HA;
99
+ } else if (strcmp(val, "none") == 0) {
100
+ *zoned = BLK_Z_NONE;
101
+ } else {
102
+ return -ENOTSUP;
103
+ }
104
+ return 0;
105
+}
106
+
107
+/*
108
+ * Get a sysfs attribute value as a long integer.
109
+ */
110
+static long get_sysfs_long_val(struct stat *st, const char *attribute)
111
{
112
#ifdef CONFIG_LINUX
113
- char buf[32];
114
+ g_autofree char *str = NULL;
115
const char *end;
116
- char *sysfspath = NULL;
117
+ long val;
118
+ int ret;
119
+
120
+ ret = get_sysfs_str_val(st, attribute, &str);
121
+ if (ret < 0) {
122
+ return ret;
123
+ }
124
+
125
+ /* The file is ended with '\n', pass 'end' to accept that. */
126
+ ret = qemu_strtol(str, &end, 10, &val);
127
+ if (ret == 0 && end && *end == '\0') {
128
+ ret = val;
129
+ }
130
+ return ret;
131
+#else
132
+ return -ENOTSUP;
133
+#endif
134
+}
135
+
136
+static int hdev_get_max_segments(int fd, struct stat *st)
137
+{
138
+#ifdef CONFIG_LINUX
139
int ret;
140
- int sysfd = -1;
141
- long max_segments;
142
143
if (S_ISCHR(st->st_mode)) {
144
if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
145
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_segments(int fd, struct stat *st)
146
}
147
return -ENOTSUP;
148
}
149
-
150
- if (!S_ISBLK(st->st_mode)) {
151
- return -ENOTSUP;
152
- }
153
-
154
- sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
155
- major(st->st_rdev), minor(st->st_rdev));
156
- sysfd = open(sysfspath, O_RDONLY);
157
- if (sysfd == -1) {
158
- ret = -errno;
159
- goto out;
160
- }
161
- ret = RETRY_ON_EINTR(read(sysfd, buf, sizeof(buf) - 1));
162
- if (ret < 0) {
163
- ret = -errno;
164
- goto out;
165
- } else if (ret == 0) {
166
- ret = -EIO;
167
- goto out;
168
- }
169
- buf[ret] = 0;
170
- /* The file is ended with '\n', pass 'end' to accept that. */
171
- ret = qemu_strtol(buf, &end, 10, &max_segments);
172
- if (ret == 0 && end && *end == '\n') {
173
- ret = max_segments;
174
- }
175
-
176
-out:
177
- if (sysfd != -1) {
178
- close(sysfd);
179
- }
180
- g_free(sysfspath);
181
- return ret;
182
+ return get_sysfs_long_val(st, "max_segments");
183
#else
184
return -ENOTSUP;
185
#endif
186
}
187
188
+static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
189
+ Error **errp)
190
+{
191
+ BlockZoneModel zoned;
192
+ int ret;
193
+
194
+ bs->bl.zoned = BLK_Z_NONE;
195
+
196
+ ret = get_sysfs_zoned_model(st, &zoned);
197
+ if (ret < 0 || zoned == BLK_Z_NONE) {
198
+ return;
199
+ }
200
+ bs->bl.zoned = zoned;
201
+}
202
+
203
static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
204
{
205
BDRVRawState *s = bs->opaque;
206
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
207
bs->bl.max_hw_iov = ret;
208
}
209
}
210
+
211
+ raw_refresh_zoned_limits(bs, &st, errp);
212
}
213
214
static int check_for_dasd(int fd)
215
--
216
2.40.0
217
218
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
Introduce a new API for thread-local blk_io_plug() that does not
2
2
traverse the block graph. The goal is to make blk_io_plug() multi-queue
3
Add zoned device option to host_device BlockDriver. It will be presented only
3
friendly.
4
for zoned host block devices. By adding zone management operations to the
4
5
host_block_device BlockDriver, users can use the new block layer APIs
5
Instead of having block drivers track whether or not we're in a plugged
6
including Report Zone and four zone management operations
6
section, provide an API that allows them to defer a function call until
7
(open, close, finish, reset, reset_all).
7
we're unplugged: blk_io_plug_call(fn, opaque). If blk_io_plug_call() is
8
8
called multiple times with the same fn/opaque pair, then fn() is only
9
Qemu-io uses the new APIs to perform zoned storage commands of the device:
9
called once at the end of the function - resulting in batching.
10
zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs),
10
11
zone_finish(zf).
11
This patch introduces the API and changes blk_io_plug()/blk_io_unplug().
12
12
blk_io_plug()/blk_io_unplug() no longer require a BlockBackend argument
13
For example, to test zone_report, use following command:
13
because the plug state is now thread-local.
14
$ ./build/qemu-io --image-opts -n driver=host_device, filename=/dev/nullb0
14
15
-c "zrp offset nr_zones"
15
Later patches convert block drivers to blk_io_plug_call() and then we
16
16
can finally remove .bdrv_co_io_plug() once all block drivers have been
17
Signed-off-by: Sam Li <faithilikerun@gmail.com>
17
converted.
18
Reviewed-by: Hannes Reinecke <hare@suse.de>
18
19
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
19
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
20
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
20
Reviewed-by: Eric Blake <eblake@redhat.com>
21
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
21
Acked-by: Kevin Wolf <kwolf@redhat.com>
22
Acked-by: Kevin Wolf <kwolf@redhat.com>
22
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
23
Message-id: 20230530180959.1108766-2-stefanha@redhat.com
23
Message-id: 20230427172019.3345-4-faithilikerun@gmail.com
24
Message-id: 20230324090605.28361-4-faithilikerun@gmail.com
25
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
26
<philmd@linaro.org> and remove spurious ret = -errno in
27
raw_co_zone_mgmt().
28
--Stefan]
29
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
24
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
30
---
25
---
31
meson.build | 4 +
26
MAINTAINERS | 1 +
32
include/block/block-io.h | 9 +
27
include/sysemu/block-backend-io.h | 13 +--
33
include/block/block_int-common.h | 21 ++
28
block/block-backend.c | 22 -----
34
include/block/raw-aio.h | 6 +-
29
block/plug.c | 159 ++++++++++++++++++++++++++++++
35
include/sysemu/block-backend-io.h | 18 ++
30
hw/block/dataplane/xen-block.c | 8 +-
36
block/block-backend.c | 137 +++++++++++++
31
hw/block/virtio-blk.c | 4 +-
37
block/file-posix.c | 313 +++++++++++++++++++++++++++++-
32
hw/scsi/virtio-scsi.c | 6 +-
38
block/io.c | 41 ++++
33
block/meson.build | 1 +
39
qemu-io-cmds.c | 149 ++++++++++++++
34
8 files changed, 173 insertions(+), 41 deletions(-)
40
9 files changed, 695 insertions(+), 3 deletions(-)
35
create mode 100644 block/plug.c
41
36
42
diff --git a/meson.build b/meson.build
37
diff --git a/MAINTAINERS b/MAINTAINERS
43
index XXXXXXX..XXXXXXX 100644
38
index XXXXXXX..XXXXXXX 100644
44
--- a/meson.build
39
--- a/MAINTAINERS
45
+++ b/meson.build
40
+++ b/MAINTAINERS
46
@@ -XXX,XX +XXX,XX @@ config_host_data.set('CONFIG_REPLICATION', get_option('replication').allowed())
41
@@ -XXX,XX +XXX,XX @@ F: util/aio-*.c
47
# has_header
42
F: util/aio-*.h
48
config_host_data.set('CONFIG_EPOLL', cc.has_header('sys/epoll.h'))
43
F: util/fdmon-*.c
49
config_host_data.set('CONFIG_LINUX_MAGIC_H', cc.has_header('linux/magic.h'))
44
F: block/io.c
50
+config_host_data.set('CONFIG_BLKZONED', cc.has_header('linux/blkzoned.h'))
45
+F: block/plug.c
51
config_host_data.set('CONFIG_VALGRIND_H', cc.has_header('valgrind/valgrind.h'))
46
F: migration/block*
52
config_host_data.set('HAVE_BTRFS_H', cc.has_header('linux/btrfs.h'))
47
F: include/block/aio.h
53
config_host_data.set('HAVE_DRM_H', cc.has_header('libdrm/drm.h'))
48
F: include/block/aio-wait.h
54
@@ -XXX,XX +XXX,XX @@ config_host_data.set('HAVE_SIGEV_NOTIFY_THREAD_ID',
55
config_host_data.set('HAVE_STRUCT_STAT_ST_ATIM',
56
cc.has_member('struct stat', 'st_atim',
57
prefix: '#include <sys/stat.h>'))
58
+config_host_data.set('HAVE_BLK_ZONE_REP_CAPACITY',
59
+ cc.has_member('struct blk_zone', 'capacity',
60
+ prefix: '#include <linux/blkzoned.h>'))
61
62
# has_type
63
config_host_data.set('CONFIG_IOVEC',
64
diff --git a/include/block/block-io.h b/include/block/block-io.h
65
index XXXXXXX..XXXXXXX 100644
66
--- a/include/block/block-io.h
67
+++ b/include/block/block-io.h
68
@@ -XXX,XX +XXX,XX @@ int coroutine_fn GRAPH_RDLOCK bdrv_co_flush(BlockDriverState *bs);
69
int coroutine_fn GRAPH_RDLOCK bdrv_co_pdiscard(BdrvChild *child, int64_t offset,
70
int64_t bytes);
71
72
+/* Report zone information of zone block device. */
73
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_report(BlockDriverState *bs,
74
+ int64_t offset,
75
+ unsigned int *nr_zones,
76
+ BlockZoneDescriptor *zones);
77
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs,
78
+ BlockZoneOp op,
79
+ int64_t offset, int64_t len);
80
+
81
bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
82
int bdrv_block_status(BlockDriverState *bs, int64_t offset,
83
int64_t bytes, int64_t *pnum, int64_t *map,
84
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
85
index XXXXXXX..XXXXXXX 100644
86
--- a/include/block/block_int-common.h
87
+++ b/include/block/block_int-common.h
88
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
89
int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_load_vmstate)(
90
BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
91
92
+ int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs,
93
+ int64_t offset, unsigned int *nr_zones,
94
+ BlockZoneDescriptor *zones);
95
+ int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
96
+ int64_t offset, int64_t len);
97
+
98
/* removable device specific */
99
bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)(
100
BlockDriverState *bs);
101
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
102
103
/* device zone model */
104
BlockZoneModel zoned;
105
+
106
+ /* zone size expressed in bytes */
107
+ uint32_t zone_size;
108
+
109
+ /* total number of zones */
110
+ uint32_t nr_zones;
111
+
112
+ /* maximum sectors of a zone append write operation */
113
+ int64_t max_append_sectors;
114
+
115
+ /* maximum number of open zones */
116
+ int64_t max_open_zones;
117
+
118
+ /* maximum number of active zones */
119
+ int64_t max_active_zones;
120
} BlockLimits;
121
122
typedef struct BdrvOpBlocker BdrvOpBlocker;
123
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
124
index XXXXXXX..XXXXXXX 100644
125
--- a/include/block/raw-aio.h
126
+++ b/include/block/raw-aio.h
127
@@ -XXX,XX +XXX,XX @@
128
#define QEMU_AIO_WRITE_ZEROES 0x0020
129
#define QEMU_AIO_COPY_RANGE 0x0040
130
#define QEMU_AIO_TRUNCATE 0x0080
131
+#define QEMU_AIO_ZONE_REPORT 0x0100
132
+#define QEMU_AIO_ZONE_MGMT 0x0200
133
#define QEMU_AIO_TYPE_MASK \
134
(QEMU_AIO_READ | \
135
QEMU_AIO_WRITE | \
136
@@ -XXX,XX +XXX,XX @@
137
QEMU_AIO_DISCARD | \
138
QEMU_AIO_WRITE_ZEROES | \
139
QEMU_AIO_COPY_RANGE | \
140
- QEMU_AIO_TRUNCATE)
141
+ QEMU_AIO_TRUNCATE | \
142
+ QEMU_AIO_ZONE_REPORT | \
143
+ QEMU_AIO_ZONE_MGMT)
144
145
/* AIO flags */
146
#define QEMU_AIO_MISALIGNED 0x1000
147
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
49
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
148
index XXXXXXX..XXXXXXX 100644
50
index XXXXXXX..XXXXXXX 100644
149
--- a/include/sysemu/block-backend-io.h
51
--- a/include/sysemu/block-backend-io.h
150
+++ b/include/sysemu/block-backend-io.h
52
+++ b/include/sysemu/block-backend-io.h
151
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t offset,
53
@@ -XXX,XX +XXX,XX @@ void blk_iostatus_set_err(BlockBackend *blk, int error);
152
BlockCompletionFunc *cb, void *opaque);
54
int blk_get_max_iov(BlockBackend *blk);
153
BlockAIOCB *blk_aio_flush(BlockBackend *blk,
55
int blk_get_max_hw_iov(BlockBackend *blk);
154
BlockCompletionFunc *cb, void *opaque);
56
155
+BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
57
-/*
156
+ unsigned int *nr_zones,
58
- * blk_io_plug/unplug are thread-local operations. This means that multiple
157
+ BlockZoneDescriptor *zones,
59
- * IOThreads can simultaneously call plug/unplug, but the caller must ensure
158
+ BlockCompletionFunc *cb, void *opaque);
60
- * that each unplug() is called in the same IOThread of the matching plug().
159
+BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
61
- */
160
+ int64_t offset, int64_t len,
62
-void coroutine_fn blk_co_io_plug(BlockBackend *blk);
161
+ BlockCompletionFunc *cb, void *opaque);
63
-void co_wrapper blk_io_plug(BlockBackend *blk);
162
BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes,
64
-
163
BlockCompletionFunc *cb, void *opaque);
65
-void coroutine_fn blk_co_io_unplug(BlockBackend *blk);
164
void blk_aio_cancel_async(BlockAIOCB *acb);
66
-void co_wrapper blk_io_unplug(BlockBackend *blk);
165
@@ -XXX,XX +XXX,XX @@ int co_wrapper_mixed blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
67
+void blk_io_plug(void);
166
int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
68
+void blk_io_unplug(void);
167
int64_t bytes, BdrvRequestFlags flags);
69
+void blk_io_plug_call(void (*fn)(void *), void *opaque);
168
70
169
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
71
AioContext *blk_get_aio_context(BlockBackend *blk);
170
+ unsigned int *nr_zones,
72
BlockAcctStats *blk_get_stats(BlockBackend *blk);
171
+ BlockZoneDescriptor *zones);
172
+int co_wrapper_mixed blk_zone_report(BlockBackend *blk, int64_t offset,
173
+ unsigned int *nr_zones,
174
+ BlockZoneDescriptor *zones);
175
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
176
+ int64_t offset, int64_t len);
177
+int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
178
+ int64_t offset, int64_t len);
179
+
180
int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset,
181
int64_t bytes);
182
int coroutine_fn blk_co_pdiscard(BlockBackend *blk, int64_t offset,
183
diff --git a/block/block-backend.c b/block/block-backend.c
73
diff --git a/block/block-backend.c b/block/block-backend.c
184
index XXXXXXX..XXXXXXX 100644
74
index XXXXXXX..XXXXXXX 100644
185
--- a/block/block-backend.c
75
--- a/block/block-backend.c
186
+++ b/block/block-backend.c
76
+++ b/block/block-backend.c
187
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
77
@@ -XXX,XX +XXX,XX @@ void blk_add_insert_bs_notifier(BlockBackend *blk, Notifier *notify)
188
return ret;
78
notifier_list_add(&blk->insert_bs_notifiers, notify);
189
}
79
}
190
80
191
+static void coroutine_fn blk_aio_zone_report_entry(void *opaque)
81
-void coroutine_fn blk_co_io_plug(BlockBackend *blk)
82
-{
83
- BlockDriverState *bs = blk_bs(blk);
84
- IO_CODE();
85
- GRAPH_RDLOCK_GUARD();
86
-
87
- if (bs) {
88
- bdrv_co_io_plug(bs);
89
- }
90
-}
91
-
92
-void coroutine_fn blk_co_io_unplug(BlockBackend *blk)
93
-{
94
- BlockDriverState *bs = blk_bs(blk);
95
- IO_CODE();
96
- GRAPH_RDLOCK_GUARD();
97
-
98
- if (bs) {
99
- bdrv_co_io_unplug(bs);
100
- }
101
-}
102
-
103
BlockAcctStats *blk_get_stats(BlockBackend *blk)
104
{
105
IO_CODE();
106
diff --git a/block/plug.c b/block/plug.c
107
new file mode 100644
108
index XXXXXXX..XXXXXXX
109
--- /dev/null
110
+++ b/block/plug.c
111
@@ -XXX,XX +XXX,XX @@
112
+/* SPDX-License-Identifier: GPL-2.0-or-later */
113
+/*
114
+ * Block I/O plugging
115
+ *
116
+ * Copyright Red Hat.
117
+ *
118
+ * This API defers a function call within a blk_io_plug()/blk_io_unplug()
119
+ * section, allowing multiple calls to batch up. This is a performance
120
+ * optimization that is used in the block layer to submit several I/O requests
121
+ * at once instead of individually:
122
+ *
123
+ * blk_io_plug(); <-- start of plugged region
124
+ * ...
125
+ * blk_io_plug_call(my_func, my_obj); <-- deferred my_func(my_obj) call
126
+ * blk_io_plug_call(my_func, my_obj); <-- another
127
+ * blk_io_plug_call(my_func, my_obj); <-- another
128
+ * ...
129
+ * blk_io_unplug(); <-- end of plugged region, my_func(my_obj) is called once
130
+ *
131
+ * This code is actually generic and not tied to the block layer. If another
132
+ * subsystem needs this functionality, it could be renamed.
133
+ */
134
+
135
+#include "qemu/osdep.h"
136
+#include "qemu/coroutine-tls.h"
137
+#include "qemu/notify.h"
138
+#include "qemu/thread.h"
139
+#include "sysemu/block-backend.h"
140
+
141
+/* A function call that has been deferred until unplug() */
142
+typedef struct {
143
+ void (*fn)(void *);
144
+ void *opaque;
145
+} UnplugFn;
146
+
147
+/* Per-thread state */
148
+typedef struct {
149
+ unsigned count; /* how many times has plug() been called? */
150
+ GArray *unplug_fns; /* functions to call at unplug time */
151
+} Plug;
152
+
153
+/* Use get_ptr_plug() to fetch this thread-local value */
154
+QEMU_DEFINE_STATIC_CO_TLS(Plug, plug);
155
+
156
+/* Called at thread cleanup time */
157
+static void blk_io_plug_atexit(Notifier *n, void *value)
192
+{
158
+{
193
+ BlkAioEmAIOCB *acb = opaque;
159
+ Plug *plug = get_ptr_plug();
194
+ BlkRwCo *rwco = &acb->rwco;
160
+ g_array_free(plug->unplug_fns, TRUE);
195
+
196
+ rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset,
197
+ (unsigned int*)(uintptr_t)acb->bytes,
198
+ rwco->iobuf);
199
+ blk_aio_complete(acb);
200
+}
161
+}
201
+
162
+
202
+BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
163
+/* This won't involve coroutines, so use __thread */
203
+ unsigned int *nr_zones,
164
+static __thread Notifier blk_io_plug_atexit_notifier;
204
+ BlockZoneDescriptor *zones,
165
+
205
+ BlockCompletionFunc *cb, void *opaque)
166
+/**
167
+ * blk_io_plug_call:
168
+ * @fn: a function pointer to be invoked
169
+ * @opaque: a user-defined argument to @fn()
170
+ *
171
+ * Call @fn(@opaque) immediately if not within a blk_io_plug()/blk_io_unplug()
172
+ * section.
173
+ *
174
+ * Otherwise defer the call until the end of the outermost
175
+ * blk_io_plug()/blk_io_unplug() section in this thread. If the same
176
+ * @fn/@opaque pair has already been deferred, it will only be called once upon
177
+ * blk_io_unplug() so that accumulated calls are batched into a single call.
178
+ *
179
+ * The caller must ensure that @opaque is not freed before @fn() is invoked.
180
+ */
181
+void blk_io_plug_call(void (*fn)(void *), void *opaque)
206
+{
182
+{
207
+ BlkAioEmAIOCB *acb;
183
+ Plug *plug = get_ptr_plug();
208
+ Coroutine *co;
184
+
209
+ IO_CODE();
185
+ /* Call immediately if we're not plugged */
210
+
186
+ if (plug->count == 0) {
211
+ blk_inc_in_flight(blk);
187
+ fn(opaque);
212
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
188
+ return;
213
+ acb->rwco = (BlkRwCo) {
189
+ }
214
+ .blk = blk,
190
+
215
+ .offset = offset,
191
+ GArray *array = plug->unplug_fns;
216
+ .iobuf = zones,
192
+ if (!array) {
217
+ .ret = NOT_DONE,
193
+ array = g_array_new(FALSE, FALSE, sizeof(UnplugFn));
194
+ plug->unplug_fns = array;
195
+ blk_io_plug_atexit_notifier.notify = blk_io_plug_atexit;
196
+ qemu_thread_atexit_add(&blk_io_plug_atexit_notifier);
197
+ }
198
+
199
+ UnplugFn *fns = (UnplugFn *)array->data;
200
+ UnplugFn new_fn = {
201
+ .fn = fn,
202
+ .opaque = opaque,
218
+ };
203
+ };
219
+ acb->bytes = (int64_t)(uintptr_t)nr_zones,
204
+
220
+ acb->has_returned = false;
205
+ /*
221
+
206
+ * There won't be many, so do a linear search. If this becomes a bottleneck
222
+ co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
207
+ * then a binary search (glib 2.62+) or different data structure could be
223
+ aio_co_enter(blk_get_aio_context(blk), co);
208
+ * used.
224
+
209
+ */
225
+ acb->has_returned = true;
210
+ for (guint i = 0; i < array->len; i++) {
226
+ if (acb->rwco.ret != NOT_DONE) {
211
+ if (memcmp(&fns[i], &new_fn, sizeof(new_fn)) == 0) {
227
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
212
+ return; /* already exists */
228
+ blk_aio_complete_bh, acb);
213
+ }
229
+ }
214
+ }
230
+
215
+
231
+ return &acb->common;
216
+ g_array_append_val(array, new_fn);
232
+}
217
+}
233
+
218
+
234
+static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque)
219
+/**
220
+ * blk_io_plug: Defer blk_io_plug_call() functions until blk_io_unplug()
221
+ *
222
+ * blk_io_plug/unplug are thread-local operations. This means that multiple
223
+ * threads can simultaneously call plug/unplug, but the caller must ensure that
224
+ * each unplug() is called in the same thread of the matching plug().
225
+ *
226
+ * Nesting is supported. blk_io_plug_call() functions are only called at the
227
+ * outermost blk_io_unplug().
228
+ */
229
+void blk_io_plug(void)
235
+{
230
+{
236
+ BlkAioEmAIOCB *acb = opaque;
231
+ Plug *plug = get_ptr_plug();
237
+ BlkRwCo *rwco = &acb->rwco;
232
+
238
+
233
+ assert(plug->count < UINT32_MAX);
239
+ rwco->ret = blk_co_zone_mgmt(rwco->blk,
234
+
240
+ (BlockZoneOp)(uintptr_t)rwco->iobuf,
235
+ plug->count++;
241
+ rwco->offset, acb->bytes);
242
+ blk_aio_complete(acb);
243
+}
236
+}
244
+
237
+
245
+BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
238
+/**
246
+ int64_t offset, int64_t len,
239
+ * blk_io_unplug: Run any pending blk_io_plug_call() functions
247
+ BlockCompletionFunc *cb, void *opaque) {
240
+ *
248
+ BlkAioEmAIOCB *acb;
241
+ * There must have been a matching blk_io_plug() call in the same thread prior
249
+ Coroutine *co;
242
+ * to this blk_io_unplug() call.
250
+ IO_CODE();
243
+ */
251
+
244
+void blk_io_unplug(void)
252
+ blk_inc_in_flight(blk);
245
+{
253
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
246
+ Plug *plug = get_ptr_plug();
254
+ acb->rwco = (BlkRwCo) {
247
+
255
+ .blk = blk,
248
+ assert(plug->count > 0);
256
+ .offset = offset,
249
+
257
+ .iobuf = (void *)(uintptr_t)op,
250
+ if (--plug->count > 0) {
258
+ .ret = NOT_DONE,
251
+ return;
259
+ };
252
+ }
260
+ acb->bytes = len;
253
+
261
+ acb->has_returned = false;
254
+ GArray *array = plug->unplug_fns;
262
+
255
+ if (!array) {
263
+ co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
256
+ return;
264
+ aio_co_enter(blk_get_aio_context(blk), co);
257
+ }
265
+
258
+
266
+ acb->has_returned = true;
259
+ UnplugFn *fns = (UnplugFn *)array->data;
267
+ if (acb->rwco.ret != NOT_DONE) {
260
+
268
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
261
+ for (guint i = 0; i < array->len; i++) {
269
+ blk_aio_complete_bh, acb);
262
+ fns[i].fn(fns[i].opaque);
270
+ }
263
+ }
271
+
264
+
272
+ return &acb->common;
265
+ /*
266
+ * This resets the array without freeing memory so that appending is cheap
267
+ * in the future.
268
+ */
269
+ g_array_set_size(array, 0);
273
+}
270
+}
274
+
271
diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
275
+/*
272
index XXXXXXX..XXXXXXX 100644
276
+ * Send a zone_report command.
273
--- a/hw/block/dataplane/xen-block.c
277
+ * offset is a byte offset from the start of the device. No alignment
274
+++ b/hw/block/dataplane/xen-block.c
278
+ * required for offset.
275
@@ -XXX,XX +XXX,XX @@ static bool xen_block_handle_requests(XenBlockDataPlane *dataplane)
279
+ * nr_zones represents IN maximum and OUT actual.
276
* is below us.
280
+ */
277
*/
281
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
278
if (inflight_atstart > IO_PLUG_THRESHOLD) {
282
+ unsigned int *nr_zones,
279
- blk_io_plug(dataplane->blk);
283
+ BlockZoneDescriptor *zones)
280
+ blk_io_plug();
284
+{
281
}
285
+ int ret;
282
while (rc != rp) {
286
+ IO_CODE();
283
/* pull request from ring */
287
+
284
@@ -XXX,XX +XXX,XX @@ static bool xen_block_handle_requests(XenBlockDataPlane *dataplane)
288
+ blk_inc_in_flight(blk); /* increase before waiting */
285
289
+ blk_wait_while_drained(blk);
286
if (inflight_atstart > IO_PLUG_THRESHOLD &&
290
+ GRAPH_RDLOCK_GUARD();
287
batched >= inflight_atstart) {
291
+ if (!blk_is_available(blk)) {
288
- blk_io_unplug(dataplane->blk);
292
+ blk_dec_in_flight(blk);
289
+ blk_io_unplug();
293
+ return -ENOMEDIUM;
290
}
294
+ }
291
xen_block_do_aio(request);
295
+ ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones);
292
if (inflight_atstart > IO_PLUG_THRESHOLD) {
296
+ blk_dec_in_flight(blk);
293
if (batched >= inflight_atstart) {
297
+ return ret;
294
- blk_io_plug(dataplane->blk);
298
+}
295
+ blk_io_plug();
299
+
296
batched = 0;
300
+/*
297
} else {
301
+ * Send a zone_management command.
298
batched++;
302
+ * op is the zone operation;
299
@@ -XXX,XX +XXX,XX @@ static bool xen_block_handle_requests(XenBlockDataPlane *dataplane)
303
+ * offset is the byte offset from the start of the zoned device;
300
}
304
+ * len is the maximum number of bytes the command should operate on. It
301
}
305
+ * should be aligned with the device zone size.
302
if (inflight_atstart > IO_PLUG_THRESHOLD) {
306
+ */
303
- blk_io_unplug(dataplane->blk);
307
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
304
+ blk_io_unplug();
308
+ int64_t offset, int64_t len)
305
}
309
+{
306
310
+ int ret;
307
return done_something;
311
+ IO_CODE();
308
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
312
+
309
index XXXXXXX..XXXXXXX 100644
313
+ blk_inc_in_flight(blk);
310
--- a/hw/block/virtio-blk.c
314
+ blk_wait_while_drained(blk);
311
+++ b/hw/block/virtio-blk.c
315
+ GRAPH_RDLOCK_GUARD();
312
@@ -XXX,XX +XXX,XX @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
316
+
313
bool suppress_notifications = virtio_queue_get_notification(vq);
317
+ ret = blk_check_byte_request(blk, offset, len);
314
318
+ if (ret < 0) {
315
aio_context_acquire(blk_get_aio_context(s->blk));
319
+ blk_dec_in_flight(blk);
316
- blk_io_plug(s->blk);
320
+ return ret;
317
+ blk_io_plug();
321
+ }
318
322
+
319
do {
323
+ ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len);
320
if (suppress_notifications) {
324
+ blk_dec_in_flight(blk);
321
@@ -XXX,XX +XXX,XX @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
325
+ return ret;
322
virtio_blk_submit_multireq(s, &mrb);
326
+}
323
}
327
+
324
328
void blk_drain(BlockBackend *blk)
325
- blk_io_unplug(s->blk);
329
{
326
+ blk_io_unplug();
330
BlockDriverState *bs = blk_bs(blk);
327
aio_context_release(blk_get_aio_context(s->blk));
331
diff --git a/block/file-posix.c b/block/file-posix.c
332
index XXXXXXX..XXXXXXX 100644
333
--- a/block/file-posix.c
334
+++ b/block/file-posix.c
335
@@ -XXX,XX +XXX,XX @@
336
#include <sys/param.h>
337
#include <sys/syscall.h>
338
#include <sys/vfs.h>
339
+#if defined(CONFIG_BLKZONED)
340
+#include <linux/blkzoned.h>
341
+#endif
342
#include <linux/cdrom.h>
343
#include <linux/fd.h>
344
#include <linux/fs.h>
345
@@ -XXX,XX +XXX,XX @@ typedef struct RawPosixAIOData {
346
PreallocMode prealloc;
347
Error **errp;
348
} truncate;
349
+ struct {
350
+ unsigned int *nr_zones;
351
+ BlockZoneDescriptor *zones;
352
+ } zone_report;
353
+ struct {
354
+ unsigned long op;
355
+ } zone_mgmt;
356
};
357
} RawPosixAIOData;
358
359
@@ -XXX,XX +XXX,XX @@ static int get_sysfs_str_val(struct stat *st, const char *attribute,
360
#endif
361
}
328
}
362
329
363
+#if defined(CONFIG_BLKZONED)
330
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
364
static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
331
index XXXXXXX..XXXXXXX 100644
365
{
332
--- a/hw/scsi/virtio-scsi.c
366
g_autofree char *val = NULL;
333
+++ b/hw/scsi/virtio-scsi.c
367
@@ -XXX,XX +XXX,XX @@ static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
334
@@ -XXX,XX +XXX,XX @@ static int virtio_scsi_handle_cmd_req_prepare(VirtIOSCSI *s, VirtIOSCSIReq *req)
368
}
335
return -ENOBUFS;
336
}
337
scsi_req_ref(req->sreq);
338
- blk_io_plug(d->conf.blk);
339
+ blk_io_plug();
340
object_unref(OBJECT(d));
369
return 0;
341
return 0;
370
}
342
}
371
+#endif /* defined(CONFIG_BLKZONED) */
343
@@ -XXX,XX +XXX,XX @@ static void virtio_scsi_handle_cmd_req_submit(VirtIOSCSI *s, VirtIOSCSIReq *req)
372
344
if (scsi_req_enqueue(sreq)) {
373
/*
345
scsi_req_continue(sreq);
374
* Get a sysfs attribute value as a long integer.
346
}
375
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_segments(int fd, struct stat *st)
347
- blk_io_unplug(sreq->dev->conf.blk);
376
#endif
348
+ blk_io_unplug();
349
scsi_req_unref(sreq);
377
}
350
}
378
351
379
+#if defined(CONFIG_BLKZONED)
352
@@ -XXX,XX +XXX,XX @@ static void virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue *vq)
380
static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
353
while (!QTAILQ_EMPTY(&reqs)) {
381
Error **errp)
354
req = QTAILQ_FIRST(&reqs);
382
{
355
QTAILQ_REMOVE(&reqs, req, next);
383
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
356
- blk_io_unplug(req->sreq->dev->conf.blk);
384
return;
357
+ blk_io_unplug();
385
}
358
scsi_req_unref(req->sreq);
386
bs->bl.zoned = zoned;
359
virtqueue_detach_element(req->vq, &req->elem, 0);
387
+
360
virtio_scsi_free_req(req);
388
+ ret = get_sysfs_long_val(st, "max_open_zones");
361
diff --git a/block/meson.build b/block/meson.build
389
+ if (ret >= 0) {
362
index XXXXXXX..XXXXXXX 100644
390
+ bs->bl.max_open_zones = ret;
363
--- a/block/meson.build
391
+ }
364
+++ b/block/meson.build
392
+
365
@@ -XXX,XX +XXX,XX @@ block_ss.add(files(
393
+ ret = get_sysfs_long_val(st, "max_active_zones");
366
'mirror.c',
394
+ if (ret >= 0) {
367
'nbd.c',
395
+ bs->bl.max_active_zones = ret;
368
'null.c',
396
+ }
369
+ 'plug.c',
397
+
370
'qapi.c',
398
+ /*
371
'qcow2-bitmap.c',
399
+ * The zoned device must at least have zone size and nr_zones fields.
372
'qcow2-cache.c',
400
+ */
401
+ ret = get_sysfs_long_val(st, "chunk_sectors");
402
+ if (ret < 0) {
403
+ error_setg_errno(errp, -ret, "Unable to read chunk_sectors "
404
+ "sysfs attribute");
405
+ return;
406
+ } else if (!ret) {
407
+ error_setg(errp, "Read 0 from chunk_sectors sysfs attribute");
408
+ return;
409
+ }
410
+ bs->bl.zone_size = ret << BDRV_SECTOR_BITS;
411
+
412
+ ret = get_sysfs_long_val(st, "nr_zones");
413
+ if (ret < 0) {
414
+ error_setg_errno(errp, -ret, "Unable to read nr_zones "
415
+ "sysfs attribute");
416
+ return;
417
+ } else if (!ret) {
418
+ error_setg(errp, "Read 0 from nr_zones sysfs attribute");
419
+ return;
420
+ }
421
+ bs->bl.nr_zones = ret;
422
+
423
+ ret = get_sysfs_long_val(st, "zone_append_max_bytes");
424
+ if (ret > 0) {
425
+ bs->bl.max_append_sectors = ret >> BDRV_SECTOR_BITS;
426
+ }
427
}
428
+#else /* !defined(CONFIG_BLKZONED) */
429
+static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
430
+ Error **errp)
431
+{
432
+ bs->bl.zoned = BLK_Z_NONE;
433
+}
434
+#endif /* !defined(CONFIG_BLKZONED) */
435
436
static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
437
{
438
@@ -XXX,XX +XXX,XX @@ static int hdev_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
439
BDRVRawState *s = bs->opaque;
440
int ret;
441
442
- /* If DASD, get blocksizes */
443
+ /* If DASD or zoned devices, get blocksizes */
444
if (check_for_dasd(s->fd) < 0) {
445
- return -ENOTSUP;
446
+ /* zoned devices are not DASD */
447
+ if (bs->bl.zoned == BLK_Z_NONE) {
448
+ return -ENOTSUP;
449
+ }
450
}
451
ret = probe_logical_blocksize(s->fd, &bsz->log);
452
if (ret < 0) {
453
@@ -XXX,XX +XXX,XX @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
454
}
455
#endif
456
457
+/*
458
+ * parse_zone - Fill a zone descriptor
459
+ */
460
+#if defined(CONFIG_BLKZONED)
461
+static inline int parse_zone(struct BlockZoneDescriptor *zone,
462
+ const struct blk_zone *blkz) {
463
+ zone->start = blkz->start << BDRV_SECTOR_BITS;
464
+ zone->length = blkz->len << BDRV_SECTOR_BITS;
465
+ zone->wp = blkz->wp << BDRV_SECTOR_BITS;
466
+
467
+#ifdef HAVE_BLK_ZONE_REP_CAPACITY
468
+ zone->cap = blkz->capacity << BDRV_SECTOR_BITS;
469
+#else
470
+ zone->cap = blkz->len << BDRV_SECTOR_BITS;
471
+#endif
472
+
473
+ switch (blkz->type) {
474
+ case BLK_ZONE_TYPE_SEQWRITE_REQ:
475
+ zone->type = BLK_ZT_SWR;
476
+ break;
477
+ case BLK_ZONE_TYPE_SEQWRITE_PREF:
478
+ zone->type = BLK_ZT_SWP;
479
+ break;
480
+ case BLK_ZONE_TYPE_CONVENTIONAL:
481
+ zone->type = BLK_ZT_CONV;
482
+ break;
483
+ default:
484
+ error_report("Unsupported zone type: 0x%x", blkz->type);
485
+ return -ENOTSUP;
486
+ }
487
+
488
+ switch (blkz->cond) {
489
+ case BLK_ZONE_COND_NOT_WP:
490
+ zone->state = BLK_ZS_NOT_WP;
491
+ break;
492
+ case BLK_ZONE_COND_EMPTY:
493
+ zone->state = BLK_ZS_EMPTY;
494
+ break;
495
+ case BLK_ZONE_COND_IMP_OPEN:
496
+ zone->state = BLK_ZS_IOPEN;
497
+ break;
498
+ case BLK_ZONE_COND_EXP_OPEN:
499
+ zone->state = BLK_ZS_EOPEN;
500
+ break;
501
+ case BLK_ZONE_COND_CLOSED:
502
+ zone->state = BLK_ZS_CLOSED;
503
+ break;
504
+ case BLK_ZONE_COND_READONLY:
505
+ zone->state = BLK_ZS_RDONLY;
506
+ break;
507
+ case BLK_ZONE_COND_FULL:
508
+ zone->state = BLK_ZS_FULL;
509
+ break;
510
+ case BLK_ZONE_COND_OFFLINE:
511
+ zone->state = BLK_ZS_OFFLINE;
512
+ break;
513
+ default:
514
+ error_report("Unsupported zone state: 0x%x", blkz->cond);
515
+ return -ENOTSUP;
516
+ }
517
+ return 0;
518
+}
519
+#endif
520
+
521
+#if defined(CONFIG_BLKZONED)
522
+static int handle_aiocb_zone_report(void *opaque)
523
+{
524
+ RawPosixAIOData *aiocb = opaque;
525
+ int fd = aiocb->aio_fildes;
526
+ unsigned int *nr_zones = aiocb->zone_report.nr_zones;
527
+ BlockZoneDescriptor *zones = aiocb->zone_report.zones;
528
+ /* zoned block devices use 512-byte sectors */
529
+ uint64_t sector = aiocb->aio_offset / 512;
530
+
531
+ struct blk_zone *blkz;
532
+ size_t rep_size;
533
+ unsigned int nrz;
534
+ int ret;
535
+ unsigned int n = 0, i = 0;
536
+
537
+ nrz = *nr_zones;
538
+ rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
539
+ g_autofree struct blk_zone_report *rep = NULL;
540
+ rep = g_malloc(rep_size);
541
+
542
+ blkz = (struct blk_zone *)(rep + 1);
543
+ while (n < nrz) {
544
+ memset(rep, 0, rep_size);
545
+ rep->sector = sector;
546
+ rep->nr_zones = nrz - n;
547
+
548
+ do {
549
+ ret = ioctl(fd, BLKREPORTZONE, rep);
550
+ } while (ret != 0 && errno == EINTR);
551
+ if (ret != 0) {
552
+ error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
553
+ fd, sector, errno);
554
+ return -errno;
555
+ }
556
+
557
+ if (!rep->nr_zones) {
558
+ break;
559
+ }
560
+
561
+ for (i = 0; i < rep->nr_zones; i++, n++) {
562
+ ret = parse_zone(&zones[n], &blkz[i]);
563
+ if (ret != 0) {
564
+ return ret;
565
+ }
566
+
567
+ /* The next report should start after the last zone reported */
568
+ sector = blkz[i].start + blkz[i].len;
569
+ }
570
+ }
571
+
572
+ *nr_zones = n;
573
+ return 0;
574
+}
575
+#endif
576
+
577
+#if defined(CONFIG_BLKZONED)
578
+static int handle_aiocb_zone_mgmt(void *opaque)
579
+{
580
+ RawPosixAIOData *aiocb = opaque;
581
+ int fd = aiocb->aio_fildes;
582
+ uint64_t sector = aiocb->aio_offset / 512;
583
+ int64_t nr_sectors = aiocb->aio_nbytes / 512;
584
+ struct blk_zone_range range;
585
+ int ret;
586
+
587
+ /* Execute the operation */
588
+ range.sector = sector;
589
+ range.nr_sectors = nr_sectors;
590
+ do {
591
+ ret = ioctl(fd, aiocb->zone_mgmt.op, &range);
592
+ } while (ret != 0 && errno == EINTR);
593
+
594
+ return ret;
595
+}
596
+#endif
597
+
598
static int handle_aiocb_copy_range(void *opaque)
599
{
600
RawPosixAIOData *aiocb = opaque;
601
@@ -XXX,XX +XXX,XX @@ static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
602
}
603
}
604
605
+/*
606
+ * zone report - Get a zone block device's information in the form
607
+ * of an array of zone descriptors.
608
+ * zones is an array of zone descriptors to hold zone information on reply;
609
+ * offset can be any byte within the entire size of the device;
610
+ * nr_zones is the maxium number of sectors the command should operate on.
611
+ */
612
+#if defined(CONFIG_BLKZONED)
613
+static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
614
+ unsigned int *nr_zones,
615
+ BlockZoneDescriptor *zones) {
616
+ BDRVRawState *s = bs->opaque;
617
+ RawPosixAIOData acb = (RawPosixAIOData) {
618
+ .bs = bs,
619
+ .aio_fildes = s->fd,
620
+ .aio_type = QEMU_AIO_ZONE_REPORT,
621
+ .aio_offset = offset,
622
+ .zone_report = {
623
+ .nr_zones = nr_zones,
624
+ .zones = zones,
625
+ },
626
+ };
627
+
628
+ return raw_thread_pool_submit(handle_aiocb_zone_report, &acb);
629
+}
630
+#endif
631
+
632
+/*
633
+ * zone management operations - Execute an operation on a zone
634
+ */
635
+#if defined(CONFIG_BLKZONED)
636
+static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
637
+ int64_t offset, int64_t len) {
638
+ BDRVRawState *s = bs->opaque;
639
+ RawPosixAIOData acb;
640
+ int64_t zone_size, zone_size_mask;
641
+ const char *op_name;
642
+ unsigned long zo;
643
+ int ret;
644
+ int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
645
+
646
+ zone_size = bs->bl.zone_size;
647
+ zone_size_mask = zone_size - 1;
648
+ if (offset & zone_size_mask) {
649
+ error_report("sector offset %" PRId64 " is not aligned to zone size "
650
+ "%" PRId64 "", offset / 512, zone_size / 512);
651
+ return -EINVAL;
652
+ }
653
+
654
+ if (((offset + len) < capacity && len & zone_size_mask) ||
655
+ offset + len > capacity) {
656
+ error_report("number of sectors %" PRId64 " is not aligned to zone size"
657
+ " %" PRId64 "", len / 512, zone_size / 512);
658
+ return -EINVAL;
659
+ }
660
+
661
+ switch (op) {
662
+ case BLK_ZO_OPEN:
663
+ op_name = "BLKOPENZONE";
664
+ zo = BLKOPENZONE;
665
+ break;
666
+ case BLK_ZO_CLOSE:
667
+ op_name = "BLKCLOSEZONE";
668
+ zo = BLKCLOSEZONE;
669
+ break;
670
+ case BLK_ZO_FINISH:
671
+ op_name = "BLKFINISHZONE";
672
+ zo = BLKFINISHZONE;
673
+ break;
674
+ case BLK_ZO_RESET:
675
+ op_name = "BLKRESETZONE";
676
+ zo = BLKRESETZONE;
677
+ break;
678
+ default:
679
+ error_report("Unsupported zone op: 0x%x", op);
680
+ return -ENOTSUP;
681
+ }
682
+
683
+ acb = (RawPosixAIOData) {
684
+ .bs = bs,
685
+ .aio_fildes = s->fd,
686
+ .aio_type = QEMU_AIO_ZONE_MGMT,
687
+ .aio_offset = offset,
688
+ .aio_nbytes = len,
689
+ .zone_mgmt = {
690
+ .op = zo,
691
+ },
692
+ };
693
+
694
+ ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb);
695
+ if (ret != 0) {
696
+ error_report("ioctl %s failed %d", op_name, ret);
697
+ }
698
+
699
+ return ret;
700
+}
701
+#endif
702
+
703
static coroutine_fn int
704
raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
705
bool blkdev)
706
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_device = {
707
#ifdef __linux__
708
.bdrv_co_ioctl = hdev_co_ioctl,
709
#endif
710
+
711
+ /* zoned device */
712
+#if defined(CONFIG_BLKZONED)
713
+ /* zone management operations */
714
+ .bdrv_co_zone_report = raw_co_zone_report,
715
+ .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
716
+#endif
717
};
718
719
#if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
720
diff --git a/block/io.c b/block/io.c
721
index XXXXXXX..XXXXXXX 100644
722
--- a/block/io.c
723
+++ b/block/io.c
724
@@ -XXX,XX +XXX,XX @@ out:
725
return co.ret;
726
}
727
728
+int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
729
+ unsigned int *nr_zones,
730
+ BlockZoneDescriptor *zones)
731
+{
732
+ BlockDriver *drv = bs->drv;
733
+ CoroutineIOCompletion co = {
734
+ .coroutine = qemu_coroutine_self(),
735
+ };
736
+ IO_CODE();
737
+
738
+ bdrv_inc_in_flight(bs);
739
+ if (!drv || !drv->bdrv_co_zone_report || bs->bl.zoned == BLK_Z_NONE) {
740
+ co.ret = -ENOTSUP;
741
+ goto out;
742
+ }
743
+ co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones);
744
+out:
745
+ bdrv_dec_in_flight(bs);
746
+ return co.ret;
747
+}
748
+
749
+int coroutine_fn bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
750
+ int64_t offset, int64_t len)
751
+{
752
+ BlockDriver *drv = bs->drv;
753
+ CoroutineIOCompletion co = {
754
+ .coroutine = qemu_coroutine_self(),
755
+ };
756
+ IO_CODE();
757
+
758
+ bdrv_inc_in_flight(bs);
759
+ if (!drv || !drv->bdrv_co_zone_mgmt || bs->bl.zoned == BLK_Z_NONE) {
760
+ co.ret = -ENOTSUP;
761
+ goto out;
762
+ }
763
+ co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len);
764
+out:
765
+ bdrv_dec_in_flight(bs);
766
+ return co.ret;
767
+}
768
+
769
void *qemu_blockalign(BlockDriverState *bs, size_t size)
770
{
771
IO_CODE();
772
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
773
index XXXXXXX..XXXXXXX 100644
774
--- a/qemu-io-cmds.c
775
+++ b/qemu-io-cmds.c
776
@@ -XXX,XX +XXX,XX @@ static const cmdinfo_t flush_cmd = {
777
.oneline = "flush all in-core file state to disk",
778
};
779
780
+static inline int64_t tosector(int64_t bytes)
781
+{
782
+ return bytes >> BDRV_SECTOR_BITS;
783
+}
784
+
785
+static int zone_report_f(BlockBackend *blk, int argc, char **argv)
786
+{
787
+ int ret;
788
+ int64_t offset;
789
+ unsigned int nr_zones;
790
+
791
+ ++optind;
792
+ offset = cvtnum(argv[optind]);
793
+ ++optind;
794
+ nr_zones = cvtnum(argv[optind]);
795
+
796
+ g_autofree BlockZoneDescriptor *zones = NULL;
797
+ zones = g_new(BlockZoneDescriptor, nr_zones);
798
+ ret = blk_zone_report(blk, offset, &nr_zones, zones);
799
+ if (ret < 0) {
800
+ printf("zone report failed: %s\n", strerror(-ret));
801
+ } else {
802
+ for (int i = 0; i < nr_zones; ++i) {
803
+ printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", "
804
+ "cap"" 0x%" PRIx64 ", wptr 0x%" PRIx64 ", "
805
+ "zcond:%u, [type: %u]\n",
806
+ tosector(zones[i].start), tosector(zones[i].length),
807
+ tosector(zones[i].cap), tosector(zones[i].wp),
808
+ zones[i].state, zones[i].type);
809
+ }
810
+ }
811
+ return ret;
812
+}
813
+
814
+static const cmdinfo_t zone_report_cmd = {
815
+ .name = "zone_report",
816
+ .altname = "zrp",
817
+ .cfunc = zone_report_f,
818
+ .argmin = 2,
819
+ .argmax = 2,
820
+ .args = "offset number",
821
+ .oneline = "report zone information",
822
+};
823
+
824
+static int zone_open_f(BlockBackend *blk, int argc, char **argv)
825
+{
826
+ int ret;
827
+ int64_t offset, len;
828
+ ++optind;
829
+ offset = cvtnum(argv[optind]);
830
+ ++optind;
831
+ len = cvtnum(argv[optind]);
832
+ ret = blk_zone_mgmt(blk, BLK_ZO_OPEN, offset, len);
833
+ if (ret < 0) {
834
+ printf("zone open failed: %s\n", strerror(-ret));
835
+ }
836
+ return ret;
837
+}
838
+
839
+static const cmdinfo_t zone_open_cmd = {
840
+ .name = "zone_open",
841
+ .altname = "zo",
842
+ .cfunc = zone_open_f,
843
+ .argmin = 2,
844
+ .argmax = 2,
845
+ .args = "offset len",
846
+ .oneline = "explicit open a range of zones in zone block device",
847
+};
848
+
849
+static int zone_close_f(BlockBackend *blk, int argc, char **argv)
850
+{
851
+ int ret;
852
+ int64_t offset, len;
853
+ ++optind;
854
+ offset = cvtnum(argv[optind]);
855
+ ++optind;
856
+ len = cvtnum(argv[optind]);
857
+ ret = blk_zone_mgmt(blk, BLK_ZO_CLOSE, offset, len);
858
+ if (ret < 0) {
859
+ printf("zone close failed: %s\n", strerror(-ret));
860
+ }
861
+ return ret;
862
+}
863
+
864
+static const cmdinfo_t zone_close_cmd = {
865
+ .name = "zone_close",
866
+ .altname = "zc",
867
+ .cfunc = zone_close_f,
868
+ .argmin = 2,
869
+ .argmax = 2,
870
+ .args = "offset len",
871
+ .oneline = "close a range of zones in zone block device",
872
+};
873
+
874
+static int zone_finish_f(BlockBackend *blk, int argc, char **argv)
875
+{
876
+ int ret;
877
+ int64_t offset, len;
878
+ ++optind;
879
+ offset = cvtnum(argv[optind]);
880
+ ++optind;
881
+ len = cvtnum(argv[optind]);
882
+ ret = blk_zone_mgmt(blk, BLK_ZO_FINISH, offset, len);
883
+ if (ret < 0) {
884
+ printf("zone finish failed: %s\n", strerror(-ret));
885
+ }
886
+ return ret;
887
+}
888
+
889
+static const cmdinfo_t zone_finish_cmd = {
890
+ .name = "zone_finish",
891
+ .altname = "zf",
892
+ .cfunc = zone_finish_f,
893
+ .argmin = 2,
894
+ .argmax = 2,
895
+ .args = "offset len",
896
+ .oneline = "finish a range of zones in zone block device",
897
+};
898
+
899
+static int zone_reset_f(BlockBackend *blk, int argc, char **argv)
900
+{
901
+ int ret;
902
+ int64_t offset, len;
903
+ ++optind;
904
+ offset = cvtnum(argv[optind]);
905
+ ++optind;
906
+ len = cvtnum(argv[optind]);
907
+ ret = blk_zone_mgmt(blk, BLK_ZO_RESET, offset, len);
908
+ if (ret < 0) {
909
+ printf("zone reset failed: %s\n", strerror(-ret));
910
+ }
911
+ return ret;
912
+}
913
+
914
+static const cmdinfo_t zone_reset_cmd = {
915
+ .name = "zone_reset",
916
+ .altname = "zrs",
917
+ .cfunc = zone_reset_f,
918
+ .argmin = 2,
919
+ .argmax = 2,
920
+ .args = "offset len",
921
+ .oneline = "reset a zone write pointer in zone block device",
922
+};
923
+
924
static int truncate_f(BlockBackend *blk, int argc, char **argv);
925
static const cmdinfo_t truncate_cmd = {
926
.name = "truncate",
927
@@ -XXX,XX +XXX,XX @@ static void __attribute((constructor)) init_qemuio_commands(void)
928
qemuio_add_command(&aio_write_cmd);
929
qemuio_add_command(&aio_flush_cmd);
930
qemuio_add_command(&flush_cmd);
931
+ qemuio_add_command(&zone_report_cmd);
932
+ qemuio_add_command(&zone_open_cmd);
933
+ qemuio_add_command(&zone_close_cmd);
934
+ qemuio_add_command(&zone_finish_cmd);
935
+ qemuio_add_command(&zone_reset_cmd);
936
qemuio_add_command(&truncate_cmd);
937
qemuio_add_command(&length_cmd);
938
qemuio_add_command(&info_cmd);
939
--
373
--
940
2.40.0
374
2.40.1
941
942
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
Stop using the .bdrv_co_io_plug() API because it is not multi-queue
2
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
3
submission instead.
2
4
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
6
Reviewed-by: Eric Blake <eblake@redhat.com>
5
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
6
Message-id: 20230427172339.3709-5-faithilikerun@gmail.com
8
Acked-by: Kevin Wolf <kwolf@redhat.com>
9
Message-id: 20230530180959.1108766-3-stefanha@redhat.com
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
---
11
---
9
block/file-posix.c | 3 +++
12
block/nvme.c | 44 ++++++++++++--------------------------------
10
block/trace-events | 2 ++
13
block/trace-events | 1 -
11
2 files changed, 5 insertions(+)
14
2 files changed, 12 insertions(+), 33 deletions(-)
12
15
13
diff --git a/block/file-posix.c b/block/file-posix.c
16
diff --git a/block/nvme.c b/block/nvme.c
14
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
15
--- a/block/file-posix.c
18
--- a/block/nvme.c
16
+++ b/block/file-posix.c
19
+++ b/block/nvme.c
17
@@ -XXX,XX +XXX,XX @@ out:
20
@@ -XXX,XX +XXX,XX @@
18
if (!BDRV_ZT_IS_CONV(*wp)) {
21
#include "qemu/vfio-helpers.h"
19
if (type & QEMU_AIO_ZONE_APPEND) {
22
#include "block/block-io.h"
20
*s->offset = *wp;
23
#include "block/block_int.h"
21
+ trace_zbd_zone_append_complete(bs, *s->offset
24
+#include "sysemu/block-backend.h"
22
+ >> BDRV_SECTOR_BITS);
25
#include "sysemu/replay.h"
23
}
26
#include "trace.h"
24
/* Advance the wp if needed */
27
25
if (offset + bytes > *wp) {
28
@@ -XXX,XX +XXX,XX @@ struct BDRVNVMeState {
26
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_append(BlockDriverState *bs,
29
int blkshift;
27
len += iov_len;
30
31
uint64_t max_transfer;
32
- bool plugged;
33
34
bool supports_write_zeroes;
35
bool supports_discard;
36
@@ -XXX,XX +XXX,XX @@ static void nvme_kick(NVMeQueuePair *q)
37
{
38
BDRVNVMeState *s = q->s;
39
40
- if (s->plugged || !q->need_kick) {
41
+ if (!q->need_kick) {
42
return;
28
}
43
}
29
44
trace_nvme_kick(s, q->index);
30
+ trace_zbd_zone_append(bs, *offset >> BDRV_SECTOR_BITS);
45
@@ -XXX,XX +XXX,XX @@ static bool nvme_process_completion(NVMeQueuePair *q)
31
return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
46
NvmeCqe *c;
47
48
trace_nvme_process_completion(s, q->index, q->inflight);
49
- if (s->plugged) {
50
- trace_nvme_process_completion_queue_plugged(s, q->index);
51
- return false;
52
- }
53
54
/*
55
* Support re-entrancy when a request cb() function invokes aio_poll().
56
@@ -XXX,XX +XXX,XX @@ static void nvme_trace_command(const NvmeCmd *cmd)
57
}
32
}
58
}
33
#endif
59
60
+static void nvme_unplug_fn(void *opaque)
61
+{
62
+ NVMeQueuePair *q = opaque;
63
+
64
+ QEMU_LOCK_GUARD(&q->lock);
65
+ nvme_kick(q);
66
+ nvme_process_completion(q);
67
+}
68
+
69
static void nvme_submit_command(NVMeQueuePair *q, NVMeRequest *req,
70
NvmeCmd *cmd, BlockCompletionFunc cb,
71
void *opaque)
72
@@ -XXX,XX +XXX,XX @@ static void nvme_submit_command(NVMeQueuePair *q, NVMeRequest *req,
73
q->sq.tail * NVME_SQ_ENTRY_BYTES, cmd, sizeof(*cmd));
74
q->sq.tail = (q->sq.tail + 1) % NVME_QUEUE_SIZE;
75
q->need_kick++;
76
- nvme_kick(q);
77
- nvme_process_completion(q);
78
+ blk_io_plug_call(nvme_unplug_fn, q);
79
qemu_mutex_unlock(&q->lock);
80
}
81
82
@@ -XXX,XX +XXX,XX @@ static void nvme_attach_aio_context(BlockDriverState *bs,
83
}
84
}
85
86
-static void coroutine_fn nvme_co_io_plug(BlockDriverState *bs)
87
-{
88
- BDRVNVMeState *s = bs->opaque;
89
- assert(!s->plugged);
90
- s->plugged = true;
91
-}
92
-
93
-static void coroutine_fn nvme_co_io_unplug(BlockDriverState *bs)
94
-{
95
- BDRVNVMeState *s = bs->opaque;
96
- assert(s->plugged);
97
- s->plugged = false;
98
- for (unsigned i = INDEX_IO(0); i < s->queue_count; i++) {
99
- NVMeQueuePair *q = s->queues[i];
100
- qemu_mutex_lock(&q->lock);
101
- nvme_kick(q);
102
- nvme_process_completion(q);
103
- qemu_mutex_unlock(&q->lock);
104
- }
105
-}
106
-
107
static bool nvme_register_buf(BlockDriverState *bs, void *host, size_t size,
108
Error **errp)
109
{
110
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_nvme = {
111
.bdrv_detach_aio_context = nvme_detach_aio_context,
112
.bdrv_attach_aio_context = nvme_attach_aio_context,
113
114
- .bdrv_co_io_plug = nvme_co_io_plug,
115
- .bdrv_co_io_unplug = nvme_co_io_unplug,
116
-
117
.bdrv_register_buf = nvme_register_buf,
118
.bdrv_unregister_buf = nvme_unregister_buf,
119
};
34
diff --git a/block/trace-events b/block/trace-events
120
diff --git a/block/trace-events b/block/trace-events
35
index XXXXXXX..XXXXXXX 100644
121
index XXXXXXX..XXXXXXX 100644
36
--- a/block/trace-events
122
--- a/block/trace-events
37
+++ b/block/trace-events
123
+++ b/block/trace-events
38
@@ -XXX,XX +XXX,XX @@ file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d"
124
@@ -XXX,XX +XXX,XX @@ nvme_kick(void *s, unsigned q_index) "s %p q #%u"
39
file_flush_fdatasync_failed(int err) "errno %d"
125
nvme_dma_flush_queue_wait(void *s) "s %p"
40
zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report %d zones starting at sector offset 0x%" PRIx64 ""
126
nvme_error(int cmd_specific, int sq_head, int sqid, int cid, int status) "cmd_specific %d sq_head %d sqid %d cid %d status 0x%x"
41
zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs %p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " sectors"
127
nvme_process_completion(void *s, unsigned q_index, int inflight) "s %p q #%u inflight %d"
42
+zbd_zone_append(void *bs, int64_t sector) "bs %p append at sector offset 0x%" PRIx64 ""
128
-nvme_process_completion_queue_plugged(void *s, unsigned q_index) "s %p q #%u"
43
+zbd_zone_append_complete(void *bs, int64_t sector) "bs %p returns append sector 0x%" PRIx64 ""
129
nvme_complete_command(void *s, unsigned q_index, int cid) "s %p q #%u cid %d"
44
130
nvme_submit_command(void *s, unsigned q_index, int cid) "s %p q #%u cid %d"
45
# ssh.c
131
nvme_submit_command_raw(int c0, int c1, int c2, int c3, int c4, int c5, int c6, int c7) "%02x %02x %02x %02x %02x %02x %02x %02x"
46
sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
47
--
132
--
48
2.40.0
133
2.40.1
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
Stop using the .bdrv_co_io_plug() API because it is not multi-queue
2
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
3
submission instead.
2
4
3
raw-format driver usually sits on top of file-posix driver. It needs to
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
pass through requests of zone commands.
6
Reviewed-by: Eric Blake <eblake@redhat.com>
5
7
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
9
Reviewed-by: Hannes Reinecke <hare@suse.de>
10
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
11
Acked-by: Kevin Wolf <kwolf@redhat.com>
8
Acked-by: Kevin Wolf <kwolf@redhat.com>
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Message-id: 20230530180959.1108766-4-stefanha@redhat.com
13
Message-id: 20230427172019.3345-5-faithilikerun@gmail.com
14
Message-id: 20230324090605.28361-5-faithilikerun@gmail.com
15
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
16
<philmd@linaro.org>.
17
--Stefan]
18
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
19
---
11
---
20
block/raw-format.c | 17 +++++++++++++++++
12
block/blkio.c | 43 ++++++++++++++++++++++++-------------------
21
1 file changed, 17 insertions(+)
13
1 file changed, 24 insertions(+), 19 deletions(-)
22
14
23
diff --git a/block/raw-format.c b/block/raw-format.c
15
diff --git a/block/blkio.c b/block/blkio.c
24
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
25
--- a/block/raw-format.c
17
--- a/block/blkio.c
26
+++ b/block/raw-format.c
18
+++ b/block/blkio.c
27
@@ -XXX,XX +XXX,XX @@ raw_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
19
@@ -XXX,XX +XXX,XX @@
28
return bdrv_co_pdiscard(bs->file, offset, bytes);
20
#include "qemu/error-report.h"
21
#include "qapi/qmp/qdict.h"
22
#include "qemu/module.h"
23
+#include "sysemu/block-backend.h"
24
#include "exec/memory.h" /* for ram_block_discard_disable() */
25
26
#include "block/block-io.h"
27
@@ -XXX,XX +XXX,XX @@ static void blkio_detach_aio_context(BlockDriverState *bs)
28
NULL, NULL, NULL);
29
}
29
}
30
30
31
+static int coroutine_fn GRAPH_RDLOCK
31
-/* Call with s->blkio_lock held to submit I/O after enqueuing a new request */
32
+raw_co_zone_report(BlockDriverState *bs, int64_t offset,
32
-static void blkio_submit_io(BlockDriverState *bs)
33
+ unsigned int *nr_zones,
33
+/*
34
+ BlockZoneDescriptor *zones)
34
+ * Called by blk_io_unplug() or immediately if not plugged. Called without
35
+ * blkio_lock.
36
+ */
37
+static void blkio_unplug_fn(void *opaque)
38
{
39
- if (qatomic_read(&bs->io_plugged) == 0) {
40
- BDRVBlkioState *s = bs->opaque;
41
+ BDRVBlkioState *s = opaque;
42
43
+ WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
44
blkioq_do_io(s->blkioq, NULL, 0, 0, NULL);
45
}
46
}
47
48
+/*
49
+ * Schedule I/O submission after enqueuing a new request. Called without
50
+ * blkio_lock.
51
+ */
52
+static void blkio_submit_io(BlockDriverState *bs)
35
+{
53
+{
36
+ return bdrv_co_zone_report(bs->file->bs, offset, nr_zones, zones);
54
+ BDRVBlkioState *s = bs->opaque;
55
+
56
+ blk_io_plug_call(blkio_unplug_fn, s);
37
+}
57
+}
38
+
58
+
39
+static int coroutine_fn GRAPH_RDLOCK
59
static int coroutine_fn
40
+raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
60
blkio_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
41
+ int64_t offset, int64_t len)
42
+{
43
+ return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
44
+}
45
+
46
static int64_t coroutine_fn GRAPH_RDLOCK
47
raw_co_getlength(BlockDriverState *bs)
48
{
61
{
49
@@ -XXX,XX +XXX,XX @@ BlockDriver bdrv_raw = {
62
@@ -XXX,XX +XXX,XX @@ blkio_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
50
.bdrv_co_pwritev = &raw_co_pwritev,
63
51
.bdrv_co_pwrite_zeroes = &raw_co_pwrite_zeroes,
64
WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
52
.bdrv_co_pdiscard = &raw_co_pdiscard,
65
blkioq_discard(s->blkioq, offset, bytes, &cod, 0);
53
+ .bdrv_co_zone_report = &raw_co_zone_report,
66
- blkio_submit_io(bs);
54
+ .bdrv_co_zone_mgmt = &raw_co_zone_mgmt,
67
}
55
.bdrv_co_block_status = &raw_co_block_status,
68
56
.bdrv_co_copy_range_from = &raw_co_copy_range_from,
69
+ blkio_submit_io(bs);
57
.bdrv_co_copy_range_to = &raw_co_copy_range_to,
70
qemu_coroutine_yield();
71
return cod.ret;
72
}
73
@@ -XXX,XX +XXX,XX @@ blkio_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
74
75
WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
76
blkioq_readv(s->blkioq, offset, iov, iovcnt, &cod, 0);
77
- blkio_submit_io(bs);
78
}
79
80
+ blkio_submit_io(bs);
81
qemu_coroutine_yield();
82
83
if (use_bounce_buffer) {
84
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn blkio_co_pwritev(BlockDriverState *bs, int64_t offset,
85
86
WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
87
blkioq_writev(s->blkioq, offset, iov, iovcnt, &cod, blkio_flags);
88
- blkio_submit_io(bs);
89
}
90
91
+ blkio_submit_io(bs);
92
qemu_coroutine_yield();
93
94
if (use_bounce_buffer) {
95
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn blkio_co_flush(BlockDriverState *bs)
96
97
WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
98
blkioq_flush(s->blkioq, &cod, 0);
99
- blkio_submit_io(bs);
100
}
101
102
+ blkio_submit_io(bs);
103
qemu_coroutine_yield();
104
return cod.ret;
105
}
106
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn blkio_co_pwrite_zeroes(BlockDriverState *bs,
107
108
WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
109
blkioq_write_zeroes(s->blkioq, offset, bytes, &cod, blkio_flags);
110
- blkio_submit_io(bs);
111
}
112
113
+ blkio_submit_io(bs);
114
qemu_coroutine_yield();
115
return cod.ret;
116
}
117
118
-static void coroutine_fn blkio_co_io_unplug(BlockDriverState *bs)
119
-{
120
- BDRVBlkioState *s = bs->opaque;
121
-
122
- WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
123
- blkio_submit_io(bs);
124
- }
125
-}
126
-
127
typedef enum {
128
BMRR_OK,
129
BMRR_SKIP,
130
@@ -XXX,XX +XXX,XX @@ static void blkio_refresh_limits(BlockDriverState *bs, Error **errp)
131
.bdrv_co_pwritev = blkio_co_pwritev, \
132
.bdrv_co_flush_to_disk = blkio_co_flush, \
133
.bdrv_co_pwrite_zeroes = blkio_co_pwrite_zeroes, \
134
- .bdrv_co_io_unplug = blkio_co_io_unplug, \
135
.bdrv_refresh_limits = blkio_refresh_limits, \
136
.bdrv_register_buf = blkio_register_buf, \
137
.bdrv_unregister_buf = blkio_unregister_buf, \
58
--
138
--
59
2.40.0
139
2.40.1
60
61
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Putting zoned/non-zoned BlockDrivers on top of each other is not
4
allowed.
5
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Reviewed-by: Hannes Reinecke <hare@suse.de>
9
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Message-id: 20230427172019.3345-6-faithilikerun@gmail.com
13
Message-id: 20230324090605.28361-6-faithilikerun@gmail.com
14
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
15
<philmd@linaro.org> and clarify that the check is about zoned
16
BlockDrivers.
17
--Stefan]
18
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
19
---
20
include/block/block_int-common.h | 5 +++++
21
block.c | 19 +++++++++++++++++++
22
block/file-posix.c | 12 ++++++++++++
23
block/raw-format.c | 1 +
24
4 files changed, 37 insertions(+)
25
26
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
27
index XXXXXXX..XXXXXXX 100644
28
--- a/include/block/block_int-common.h
29
+++ b/include/block/block_int-common.h
30
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
31
*/
32
bool is_format;
33
34
+ /*
35
+ * Set to true if the BlockDriver supports zoned children.
36
+ */
37
+ bool supports_zoned_children;
38
+
39
/*
40
* Drivers not implementing bdrv_parse_filename nor bdrv_open should have
41
* this field set to true, except ones that are defined only by their
42
diff --git a/block.c b/block.c
43
index XXXXXXX..XXXXXXX 100644
44
--- a/block.c
45
+++ b/block.c
46
@@ -XXX,XX +XXX,XX @@ void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
47
return;
48
}
49
50
+ /*
51
+ * Non-zoned block drivers do not follow zoned storage constraints
52
+ * (i.e. sequential writes to zones). Refuse mixing zoned and non-zoned
53
+ * drivers in a graph.
54
+ */
55
+ if (!parent_bs->drv->supports_zoned_children &&
56
+ child_bs->bl.zoned == BLK_Z_HM) {
57
+ /*
58
+ * The host-aware model allows zoned storage constraints and random
59
+ * write. Allow mixing host-aware and non-zoned drivers. Using
60
+ * host-aware device as a regular device.
61
+ */
62
+ error_setg(errp, "Cannot add a %s child to a %s parent",
63
+ child_bs->bl.zoned == BLK_Z_HM ? "zoned" : "non-zoned",
64
+ parent_bs->drv->supports_zoned_children ?
65
+ "support zoned children" : "not support zoned children");
66
+ return;
67
+ }
68
+
69
if (!QLIST_EMPTY(&child_bs->parents)) {
70
error_setg(errp, "The node %s already has a parent",
71
child_bs->node_name);
72
diff --git a/block/file-posix.c b/block/file-posix.c
73
index XXXXXXX..XXXXXXX 100644
74
--- a/block/file-posix.c
75
+++ b/block/file-posix.c
76
@@ -XXX,XX +XXX,XX @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
77
goto fail;
78
}
79
}
80
+#ifdef CONFIG_BLKZONED
81
+ /*
82
+ * The kernel page cache does not reliably work for writes to SWR zones
83
+ * of zoned block device because it can not guarantee the order of writes.
84
+ */
85
+ if ((bs->bl.zoned != BLK_Z_NONE) &&
86
+ (!(s->open_flags & O_DIRECT))) {
87
+ error_setg(errp, "The driver supports zoned devices, and it requires "
88
+ "cache.direct=on, which was not specified.");
89
+ return -EINVAL; /* No host kernel page cache */
90
+ }
91
+#endif
92
93
if (S_ISBLK(st.st_mode)) {
94
#ifdef __linux__
95
diff --git a/block/raw-format.c b/block/raw-format.c
96
index XXXXXXX..XXXXXXX 100644
97
--- a/block/raw-format.c
98
+++ b/block/raw-format.c
99
@@ -XXX,XX +XXX,XX @@ static void raw_child_perm(BlockDriverState *bs, BdrvChild *c,
100
BlockDriver bdrv_raw = {
101
.format_name = "raw",
102
.instance_size = sizeof(BDRVRawState),
103
+ .supports_zoned_children = true,
104
.bdrv_probe = &raw_probe,
105
.bdrv_reopen_prepare = &raw_reopen_prepare,
106
.bdrv_reopen_commit = &raw_reopen_commit,
107
--
108
2.40.0
109
110
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
The new block layer APIs of zoned block devices can be tested by:
4
$ tests/qemu-iotests/check zoned
5
Run each zone operation on a newly created null_blk device
6
and see whether it outputs the same zone information.
7
8
Signed-off-by: Sam Li <faithilikerun@gmail.com>
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Message-id: 20230427172019.3345-7-faithilikerun@gmail.com
13
Message-id: 20230324090605.28361-7-faithilikerun@gmail.com
14
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
15
<philmd@linaro.org>.
16
--Stefan]
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
18
---
19
tests/qemu-iotests/tests/zoned | 89 ++++++++++++++++++++++++++++++
20
tests/qemu-iotests/tests/zoned.out | 53 ++++++++++++++++++
21
2 files changed, 142 insertions(+)
22
create mode 100755 tests/qemu-iotests/tests/zoned
23
create mode 100644 tests/qemu-iotests/tests/zoned.out
24
25
diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned
26
new file mode 100755
27
index XXXXXXX..XXXXXXX
28
--- /dev/null
29
+++ b/tests/qemu-iotests/tests/zoned
30
@@ -XXX,XX +XXX,XX @@
31
+#!/usr/bin/env bash
32
+#
33
+# Test zone management operations.
34
+#
35
+
36
+seq="$(basename $0)"
37
+echo "QA output created by $seq"
38
+status=1 # failure is the default!
39
+
40
+_cleanup()
41
+{
42
+ _cleanup_test_img
43
+ sudo -n rmmod null_blk
44
+}
45
+trap "_cleanup; exit \$status" 0 1 2 3 15
46
+
47
+# get standard environment, filters and checks
48
+. ../common.rc
49
+. ../common.filter
50
+. ../common.qemu
51
+
52
+# This test only runs on Linux hosts with raw image files.
53
+_supported_fmt raw
54
+_supported_proto file
55
+_supported_os Linux
56
+
57
+sudo -n true || \
58
+ _notrun 'Password-less sudo required'
59
+
60
+IMG="--image-opts -n driver=host_device,filename=/dev/nullb0"
61
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
62
+
63
+echo "Testing a null_blk device:"
64
+echo "case 1: if the operations work"
65
+sudo -n modprobe null_blk nr_devices=1 zoned=1
66
+sudo -n chmod 0666 /dev/nullb0
67
+
68
+echo "(1) report the first zone:"
69
+$QEMU_IO $IMG -c "zrp 0 1"
70
+echo
71
+echo "report the first 10 zones"
72
+$QEMU_IO $IMG -c "zrp 0 10"
73
+echo
74
+echo "report the last zone:"
75
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2" # 0x3e70000000 / 512 = 0x1f380000
76
+echo
77
+echo
78
+echo "(2) opening the first zone"
79
+$QEMU_IO $IMG -c "zo 0 268435456" # 268435456 / 512 = 524288
80
+echo "report after:"
81
+$QEMU_IO $IMG -c "zrp 0 1"
82
+echo
83
+echo "opening the second zone"
84
+$QEMU_IO $IMG -c "zo 268435456 268435456" #
85
+echo "report after:"
86
+$QEMU_IO $IMG -c "zrp 268435456 1"
87
+echo
88
+echo "opening the last zone"
89
+$QEMU_IO $IMG -c "zo 0x3e70000000 268435456"
90
+echo "report after:"
91
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2"
92
+echo
93
+echo
94
+echo "(3) closing the first zone"
95
+$QEMU_IO $IMG -c "zc 0 268435456"
96
+echo "report after:"
97
+$QEMU_IO $IMG -c "zrp 0 1"
98
+echo
99
+echo "closing the last zone"
100
+$QEMU_IO $IMG -c "zc 0x3e70000000 268435456"
101
+echo "report after:"
102
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2"
103
+echo
104
+echo
105
+echo "(4) finishing the second zone"
106
+$QEMU_IO $IMG -c "zf 268435456 268435456"
107
+echo "After finishing a zone:"
108
+$QEMU_IO $IMG -c "zrp 268435456 1"
109
+echo
110
+echo
111
+echo "(5) resetting the second zone"
112
+$QEMU_IO $IMG -c "zrs 268435456 268435456"
113
+echo "After resetting a zone:"
114
+$QEMU_IO $IMG -c "zrp 268435456 1"
115
+
116
+# success, all done
117
+echo "*** done"
118
+rm -f $seq.full
119
+status=0
120
diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out
121
new file mode 100644
122
index XXXXXXX..XXXXXXX
123
--- /dev/null
124
+++ b/tests/qemu-iotests/tests/zoned.out
125
@@ -XXX,XX +XXX,XX @@
126
+QA output created by zoned
127
+Testing a null_blk device:
128
+case 1: if the operations work
129
+(1) report the first zone:
130
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
131
+
132
+report the first 10 zones
133
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
134
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
135
+start: 0x100000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:1, [type: 2]
136
+start: 0x180000, len 0x80000, cap 0x80000, wptr 0x180000, zcond:1, [type: 2]
137
+start: 0x200000, len 0x80000, cap 0x80000, wptr 0x200000, zcond:1, [type: 2]
138
+start: 0x280000, len 0x80000, cap 0x80000, wptr 0x280000, zcond:1, [type: 2]
139
+start: 0x300000, len 0x80000, cap 0x80000, wptr 0x300000, zcond:1, [type: 2]
140
+start: 0x380000, len 0x80000, cap 0x80000, wptr 0x380000, zcond:1, [type: 2]
141
+start: 0x400000, len 0x80000, cap 0x80000, wptr 0x400000, zcond:1, [type: 2]
142
+start: 0x480000, len 0x80000, cap 0x80000, wptr 0x480000, zcond:1, [type: 2]
143
+
144
+report the last zone:
145
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:1, [type: 2]
146
+
147
+
148
+(2) opening the first zone
149
+report after:
150
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:3, [type: 2]
151
+
152
+opening the second zone
153
+report after:
154
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:3, [type: 2]
155
+
156
+opening the last zone
157
+report after:
158
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:3, [type: 2]
159
+
160
+
161
+(3) closing the first zone
162
+report after:
163
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
164
+
165
+closing the last zone
166
+report after:
167
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:1, [type: 2]
168
+
169
+
170
+(4) finishing the second zone
171
+After finishing a zone:
172
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:14, [type: 2]
173
+
174
+
175
+(5) resetting the second zone
176
+After resetting a zone:
177
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
178
+*** done
179
--
180
2.40.0
181
182
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
Stop using the .bdrv_co_io_plug() API because it is not multi-queue
2
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
3
submission instead.
2
4
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Reviewed-by: Eric Blake <eblake@redhat.com>
5
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
7
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
6
Acked-by: Kevin Wolf <kwolf@redhat.com>
8
Acked-by: Kevin Wolf <kwolf@redhat.com>
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Message-id: 20230530180959.1108766-5-stefanha@redhat.com
8
Message-id: 20230427172019.3345-8-faithilikerun@gmail.com
9
Message-id: 20230324090605.28361-8-faithilikerun@gmail.com
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
11
---
12
block/file-posix.c | 3 +++
12
include/block/raw-aio.h | 7 -------
13
block/trace-events | 2 ++
13
block/file-posix.c | 10 ----------
14
2 files changed, 5 insertions(+)
14
block/io_uring.c | 44 ++++++++++++++++-------------------------
15
block/trace-events | 5 ++---
16
4 files changed, 19 insertions(+), 47 deletions(-)
15
17
18
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
19
index XXXXXXX..XXXXXXX 100644
20
--- a/include/block/raw-aio.h
21
+++ b/include/block/raw-aio.h
22
@@ -XXX,XX +XXX,XX @@ int coroutine_fn luring_co_submit(BlockDriverState *bs, int fd, uint64_t offset,
23
QEMUIOVector *qiov, int type);
24
void luring_detach_aio_context(LuringState *s, AioContext *old_context);
25
void luring_attach_aio_context(LuringState *s, AioContext *new_context);
26
-
27
-/*
28
- * luring_io_plug/unplug work in the thread's current AioContext, therefore the
29
- * caller must ensure that they are paired in the same IOThread.
30
- */
31
-void luring_io_plug(void);
32
-void luring_io_unplug(void);
33
#endif
34
35
#ifdef _WIN32
16
diff --git a/block/file-posix.c b/block/file-posix.c
36
diff --git a/block/file-posix.c b/block/file-posix.c
17
index XXXXXXX..XXXXXXX 100644
37
index XXXXXXX..XXXXXXX 100644
18
--- a/block/file-posix.c
38
--- a/block/file-posix.c
19
+++ b/block/file-posix.c
39
+++ b/block/file-posix.c
20
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
40
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn raw_co_io_plug(BlockDriverState *bs)
21
},
41
laio_io_plug();
22
};
42
}
23
43
#endif
24
+ trace_zbd_zone_report(bs, *nr_zones, offset >> BDRV_SECTOR_BITS);
44
-#ifdef CONFIG_LINUX_IO_URING
25
return raw_thread_pool_submit(handle_aiocb_zone_report, &acb);
45
- if (s->use_linux_io_uring) {
46
- luring_io_plug();
47
- }
48
-#endif
26
}
49
}
50
51
static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs)
52
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs)
53
laio_io_unplug(s->aio_max_batch);
54
}
27
#endif
55
#endif
28
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
56
-#ifdef CONFIG_LINUX_IO_URING
29
},
57
- if (s->use_linux_io_uring) {
30
};
58
- luring_io_unplug();
31
59
- }
32
+ trace_zbd_zone_mgmt(bs, op_name, offset >> BDRV_SECTOR_BITS,
60
-#endif
33
+ len >> BDRV_SECTOR_BITS);
61
}
34
ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb);
62
35
if (ret != 0) {
63
static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs)
36
error_report("ioctl %s failed %d", op_name, ret);
64
diff --git a/block/io_uring.c b/block/io_uring.c
65
index XXXXXXX..XXXXXXX 100644
66
--- a/block/io_uring.c
67
+++ b/block/io_uring.c
68
@@ -XXX,XX +XXX,XX @@
69
#include "block/raw-aio.h"
70
#include "qemu/coroutine.h"
71
#include "qapi/error.h"
72
+#include "sysemu/block-backend.h"
73
#include "trace.h"
74
75
/* Only used for assertions. */
76
@@ -XXX,XX +XXX,XX @@ typedef struct LuringAIOCB {
77
} LuringAIOCB;
78
79
typedef struct LuringQueue {
80
- int plugged;
81
unsigned int in_queue;
82
unsigned int in_flight;
83
bool blocked;
84
@@ -XXX,XX +XXX,XX @@ static void luring_process_completions_and_submit(LuringState *s)
85
{
86
luring_process_completions(s);
87
88
- if (!s->io_q.plugged && s->io_q.in_queue > 0) {
89
+ if (s->io_q.in_queue > 0) {
90
ioq_submit(s);
91
}
92
}
93
@@ -XXX,XX +XXX,XX @@ static void qemu_luring_poll_ready(void *opaque)
94
static void ioq_init(LuringQueue *io_q)
95
{
96
QSIMPLEQ_INIT(&io_q->submit_queue);
97
- io_q->plugged = 0;
98
io_q->in_queue = 0;
99
io_q->in_flight = 0;
100
io_q->blocked = false;
101
}
102
103
-void luring_io_plug(void)
104
+static void luring_unplug_fn(void *opaque)
105
{
106
- AioContext *ctx = qemu_get_current_aio_context();
107
- LuringState *s = aio_get_linux_io_uring(ctx);
108
- trace_luring_io_plug(s);
109
- s->io_q.plugged++;
110
-}
111
-
112
-void luring_io_unplug(void)
113
-{
114
- AioContext *ctx = qemu_get_current_aio_context();
115
- LuringState *s = aio_get_linux_io_uring(ctx);
116
- assert(s->io_q.plugged);
117
- trace_luring_io_unplug(s, s->io_q.blocked, s->io_q.plugged,
118
- s->io_q.in_queue, s->io_q.in_flight);
119
- if (--s->io_q.plugged == 0 &&
120
- !s->io_q.blocked && s->io_q.in_queue > 0) {
121
+ LuringState *s = opaque;
122
+ trace_luring_unplug_fn(s, s->io_q.blocked, s->io_q.in_queue,
123
+ s->io_q.in_flight);
124
+ if (!s->io_q.blocked && s->io_q.in_queue > 0) {
125
ioq_submit(s);
126
}
127
}
128
@@ -XXX,XX +XXX,XX @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s,
129
130
QSIMPLEQ_INSERT_TAIL(&s->io_q.submit_queue, luringcb, next);
131
s->io_q.in_queue++;
132
- trace_luring_do_submit(s, s->io_q.blocked, s->io_q.plugged,
133
- s->io_q.in_queue, s->io_q.in_flight);
134
- if (!s->io_q.blocked &&
135
- (!s->io_q.plugged ||
136
- s->io_q.in_flight + s->io_q.in_queue >= MAX_ENTRIES)) {
137
- ret = ioq_submit(s);
138
- trace_luring_do_submit_done(s, ret);
139
- return ret;
140
+ trace_luring_do_submit(s, s->io_q.blocked, s->io_q.in_queue,
141
+ s->io_q.in_flight);
142
+ if (!s->io_q.blocked) {
143
+ if (s->io_q.in_flight + s->io_q.in_queue >= MAX_ENTRIES) {
144
+ ret = ioq_submit(s);
145
+ trace_luring_do_submit_done(s, ret);
146
+ return ret;
147
+ }
148
+
149
+ blk_io_plug_call(luring_unplug_fn, s);
150
}
151
return 0;
152
}
37
diff --git a/block/trace-events b/block/trace-events
153
diff --git a/block/trace-events b/block/trace-events
38
index XXXXXXX..XXXXXXX 100644
154
index XXXXXXX..XXXXXXX 100644
39
--- a/block/trace-events
155
--- a/block/trace-events
40
+++ b/block/trace-events
156
+++ b/block/trace-events
41
@@ -XXX,XX +XXX,XX @@ file_FindEjectableOpticalMedia(const char *media) "Matching using %s"
157
@@ -XXX,XX +XXX,XX @@ file_paio_submit(void *acb, void *opaque, int64_t offset, int count, int type) "
42
file_setup_cdrom(const char *partition) "Using %s as optical disc"
158
# io_uring.c
43
file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d"
159
luring_init_state(void *s, size_t size) "s %p size %zu"
44
file_flush_fdatasync_failed(int err) "errno %d"
160
luring_cleanup_state(void *s) "%p freed"
45
+zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report %d zones starting at sector offset 0x%" PRIx64 ""
161
-luring_io_plug(void *s) "LuringState %p plug"
46
+zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs %p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " sectors"
162
-luring_io_unplug(void *s, int blocked, int plugged, int queued, int inflight) "LuringState %p blocked %d plugged %d queued %d inflight %d"
47
163
-luring_do_submit(void *s, int blocked, int plugged, int queued, int inflight) "LuringState %p blocked %d plugged %d queued %d inflight %d"
48
# ssh.c
164
+luring_unplug_fn(void *s, int blocked, int queued, int inflight) "LuringState %p blocked %d queued %d inflight %d"
49
sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
165
+luring_do_submit(void *s, int blocked, int queued, int inflight) "LuringState %p blocked %d queued %d inflight %d"
166
luring_do_submit_done(void *s, int ret) "LuringState %p submitted to kernel %d"
167
luring_co_submit(void *bs, void *s, void *luringcb, int fd, uint64_t offset, size_t nbytes, int type) "bs %p s %p luringcb %p fd %d offset %" PRId64 " nbytes %zd type %d"
168
luring_process_completion(void *s, void *aiocb, int ret) "LuringState %p luringcb %p ret %d"
50
--
169
--
51
2.40.0
170
2.40.1
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Add the documentation about the zoned device support to virtio-blk
4
emulation.
5
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
9
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Message-id: 20230427172019.3345-9-faithilikerun@gmail.com
13
Message-id: 20230324090605.28361-9-faithilikerun@gmail.com
14
[Add index-api.rst to fix "zoned-storage.rst:document isn't included in
15
any toctree" error and fix pre-formatted command-line indentation.
16
--Stefan]
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
18
---
19
docs/devel/index-api.rst | 1 +
20
docs/devel/zoned-storage.rst | 43 ++++++++++++++++++++++++++
21
docs/system/qemu-block-drivers.rst.inc | 6 ++++
22
3 files changed, 50 insertions(+)
23
create mode 100644 docs/devel/zoned-storage.rst
24
25
diff --git a/docs/devel/index-api.rst b/docs/devel/index-api.rst
26
index XXXXXXX..XXXXXXX 100644
27
--- a/docs/devel/index-api.rst
28
+++ b/docs/devel/index-api.rst
29
@@ -XXX,XX +XXX,XX @@ generated from in-code annotations to function prototypes.
30
memory
31
modules
32
ui
33
+ zoned-storage
34
diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
35
new file mode 100644
36
index XXXXXXX..XXXXXXX
37
--- /dev/null
38
+++ b/docs/devel/zoned-storage.rst
39
@@ -XXX,XX +XXX,XX @@
40
+=============
41
+zoned-storage
42
+=============
43
+
44
+Zoned Block Devices (ZBDs) divide the LBA space into block regions called zones
45
+that are larger than the LBA size. They can only allow sequential writes, which
46
+can reduce write amplification in SSDs, and potentially lead to higher
47
+throughput and increased capacity. More details about ZBDs can be found at:
48
+
49
+https://zonedstorage.io/docs/introduction/zoned-storage
50
+
51
+1. Block layer APIs for zoned storage
52
+-------------------------------------
53
+QEMU block layer supports three zoned storage models:
54
+- BLK_Z_HM: The host-managed zoned model only allows sequential writes access
55
+to zones. It supports ZBD-specific I/O commands that can be used by a host to
56
+manage the zones of a device.
57
+- BLK_Z_HA: The host-aware zoned model allows random write operations in
58
+zones, making it backward compatible with regular block devices.
59
+- BLK_Z_NONE: The non-zoned model has no zones support. It includes both
60
+regular and drive-managed ZBD devices. ZBD-specific I/O commands are not
61
+supported.
62
+
63
+The block device information resides inside BlockDriverState. QEMU uses
64
+BlockLimits struct(BlockDriverState::bl) that is continuously accessed by the
65
+block layer while processing I/O requests. A BlockBackend has a root pointer to
66
+a BlockDriverState graph(for example, raw format on top of file-posix). The
67
+zoned storage information can be propagated from the leaf BlockDriverState all
68
+the way up to the BlockBackend. If the zoned storage model in file-posix is
69
+set to BLK_Z_HM, then block drivers will declare support for zoned host device.
70
+
71
+The block layer APIs support commands needed for zoned storage devices,
72
+including report zones, four zone operations, and zone append.
73
+
74
+2. Emulating zoned storage controllers
75
+--------------------------------------
76
+When the BlockBackend's BlockLimits model reports a zoned storage device, users
77
+like the virtio-blk emulation or the qemu-io-cmds.c utility can use block layer
78
+APIs for zoned storage emulation or testing.
79
+
80
+For example, to test zone_report on a null_blk device using qemu-io is::
81
+
82
+ $ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0 -c "zrp offset nr_zones"
83
diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc
84
index XXXXXXX..XXXXXXX 100644
85
--- a/docs/system/qemu-block-drivers.rst.inc
86
+++ b/docs/system/qemu-block-drivers.rst.inc
87
@@ -XXX,XX +XXX,XX @@ Hard disks
88
you may corrupt your host data (use the ``-snapshot`` command
89
line option or modify the device permissions accordingly).
90
91
+Zoned block devices
92
+ Zoned block devices can be passed through to the guest if the emulated storage
93
+ controller supports zoned storage. Use ``--blockdev host_device,
94
+ node-name=drive0,filename=/dev/nullb0,cache.direct=on`` to pass through
95
+ ``/dev/nullb0`` as ``drive0``.
96
+
97
Windows
98
^^^^^^^
99
100
--
101
2.40.0
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
Stop using the .bdrv_co_io_plug() API because it is not multi-queue
2
2
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
3
Since Linux doesn't have a user API to issue zone append operations to
3
submission instead.
4
zoned devices from user space, the file-posix driver is modified to add
4
5
zone append emulation using regular writes. To do this, the file-posix
5
Note that a dev_max_batch check is dropped in laio_io_unplug() because
6
driver tracks the wp location of all zones of the device. It uses an
6
the semantics of unplug_fn() are different from .bdrv_co_unplug():
7
array of uint64_t. The most significant bit of each wp location indicates
7
1. unplug_fn() is only called when the last blk_io_unplug() call occurs,
8
if the zone type is conventional zones.
8
not every time blk_io_unplug() is called.
9
9
2. unplug_fn() is per-thread, not per-BlockDriverState, so there is no
10
The zones wp can be changed due to the following operations issued:
10
way to get per-BlockDriverState fields like dev_max_batch.
11
- zone reset: change the wp to the start offset of that zone
11
12
- zone finish: change to the end location of that zone
12
Therefore this condition cannot be moved to laio_unplug_fn(). It is not
13
- write to a zone
13
obvious that this condition affects performance in practice, so I am
14
- zone append
14
removing it instead of trying to come up with a more complex mechanism
15
15
to preserve the condition.
16
Signed-off-by: Sam Li <faithilikerun@gmail.com>
16
17
Message-id: 20230427172339.3709-2-faithilikerun@gmail.com
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
18
[Fix errno propagation from handle_aiocb_zone_mgmt()
18
Reviewed-by: Eric Blake <eblake@redhat.com>
19
--Stefan]
19
Acked-by: Kevin Wolf <kwolf@redhat.com>
20
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
21
Message-id: 20230530180959.1108766-6-stefanha@redhat.com
20
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
22
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
21
---
23
---
22
include/block/block-common.h | 14 +++
24
include/block/raw-aio.h | 7 -------
23
include/block/block_int-common.h | 5 +
25
block/file-posix.c | 28 ----------------------------
24
block/file-posix.c | 178 ++++++++++++++++++++++++++++++-
26
block/linux-aio.c | 41 +++++++++++------------------------------
25
3 files changed, 193 insertions(+), 4 deletions(-)
27
3 files changed, 11 insertions(+), 65 deletions(-)
26
28
27
diff --git a/include/block/block-common.h b/include/block/block-common.h
29
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
28
index XXXXXXX..XXXXXXX 100644
30
index XXXXXXX..XXXXXXX 100644
29
--- a/include/block/block-common.h
31
--- a/include/block/raw-aio.h
30
+++ b/include/block/block-common.h
32
+++ b/include/block/raw-aio.h
31
@@ -XXX,XX +XXX,XX @@ typedef struct BlockZoneDescriptor {
33
@@ -XXX,XX +XXX,XX @@ int coroutine_fn laio_co_submit(int fd, uint64_t offset, QEMUIOVector *qiov,
32
BlockZoneState state;
34
33
} BlockZoneDescriptor;
35
void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context);
34
36
void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context);
35
+/*
37
-
36
+ * Track write pointers of a zone in bytes.
38
-/*
37
+ */
39
- * laio_io_plug/unplug work in the thread's current AioContext, therefore the
38
+typedef struct BlockZoneWps {
40
- * caller must ensure that they are paired in the same IOThread.
39
+ CoMutex colock;
41
- */
40
+ uint64_t wp[];
42
-void laio_io_plug(void);
41
+} BlockZoneWps;
43
-void laio_io_unplug(uint64_t dev_max_batch);
42
+
44
#endif
43
typedef struct BlockDriverInfo {
45
/* io_uring.c - Linux io_uring implementation */
44
/* in bytes, 0 if irrelevant */
46
#ifdef CONFIG_LINUX_IO_URING
45
int cluster_size;
46
@@ -XXX,XX +XXX,XX @@ typedef enum {
47
#define BDRV_SECTOR_BITS 9
48
#define BDRV_SECTOR_SIZE (1ULL << BDRV_SECTOR_BITS)
49
50
+/*
51
+ * Get the first most significant bit of wp. If it is zero, then
52
+ * the zone type is SWR.
53
+ */
54
+#define BDRV_ZT_IS_CONV(wp) (wp & (1ULL << 63))
55
+
56
#define BDRV_REQUEST_MAX_SECTORS MIN_CONST(SIZE_MAX >> BDRV_SECTOR_BITS, \
57
INT_MAX >> BDRV_SECTOR_BITS)
58
#define BDRV_REQUEST_MAX_BYTES (BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS)
59
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
60
index XXXXXXX..XXXXXXX 100644
61
--- a/include/block/block_int-common.h
62
+++ b/include/block/block_int-common.h
63
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
64
65
/* maximum number of active zones */
66
int64_t max_active_zones;
67
+
68
+ int64_t write_granularity;
69
} BlockLimits;
70
71
typedef struct BdrvOpBlocker BdrvOpBlocker;
72
@@ -XXX,XX +XXX,XX @@ struct BlockDriverState {
73
CoMutex bsc_modify_lock;
74
/* Always non-NULL, but must only be dereferenced under an RCU read guard */
75
BdrvBlockStatusCache *block_status_cache;
76
+
77
+ /* array of write pointers' location of each zone in the zoned device. */
78
+ BlockZoneWps *wps;
79
};
80
81
struct BlockBackendRootState {
82
diff --git a/block/file-posix.c b/block/file-posix.c
47
diff --git a/block/file-posix.c b/block/file-posix.c
83
index XXXXXXX..XXXXXXX 100644
48
index XXXXXXX..XXXXXXX 100644
84
--- a/block/file-posix.c
49
--- a/block/file-posix.c
85
+++ b/block/file-posix.c
50
+++ b/block/file-posix.c
86
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_segments(int fd, struct stat *st)
51
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, int64_t offset,
87
}
52
return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_WRITE);
88
53
}
89
#if defined(CONFIG_BLKZONED)
54
90
+/*
55
-static void coroutine_fn raw_co_io_plug(BlockDriverState *bs)
91
+ * If the reset_all flag is true, then the wps of zone whose state is
56
-{
92
+ * not readonly or offline should be all reset to the start sector.
57
- BDRVRawState __attribute__((unused)) *s = bs->opaque;
93
+ * Else, take the real wp of the device.
58
-#ifdef CONFIG_LINUX_AIO
94
+ */
59
- if (s->use_linux_aio) {
95
+static int get_zones_wp(BlockDriverState *bs, int fd, int64_t offset,
60
- laio_io_plug();
96
+ unsigned int nrz, bool reset_all)
61
- }
97
+{
62
-#endif
98
+ struct blk_zone *blkz;
63
-}
99
+ size_t rep_size;
64
-
100
+ uint64_t sector = offset >> BDRV_SECTOR_BITS;
65
-static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs)
101
+ BlockZoneWps *wps = bs->wps;
66
-{
102
+ unsigned int j = offset / bs->bl.zone_size;
67
- BDRVRawState __attribute__((unused)) *s = bs->opaque;
103
+ unsigned int n = 0, i = 0;
68
-#ifdef CONFIG_LINUX_AIO
104
+ int ret;
69
- if (s->use_linux_aio) {
105
+ rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
70
- laio_io_unplug(s->aio_max_batch);
106
+ g_autofree struct blk_zone_report *rep = NULL;
71
- }
107
+
72
-#endif
108
+ rep = g_malloc(rep_size);
73
-}
109
+ blkz = (struct blk_zone *)(rep + 1);
74
-
110
+ while (n < nrz) {
75
static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs)
111
+ memset(rep, 0, rep_size);
76
{
112
+ rep->sector = sector;
77
BDRVRawState *s = bs->opaque;
113
+ rep->nr_zones = nrz - n;
78
@@ -XXX,XX +XXX,XX @@ BlockDriver bdrv_file = {
114
+
79
.bdrv_co_copy_range_from = raw_co_copy_range_from,
115
+ do {
80
.bdrv_co_copy_range_to = raw_co_copy_range_to,
116
+ ret = ioctl(fd, BLKREPORTZONE, rep);
81
.bdrv_refresh_limits = raw_refresh_limits,
117
+ } while (ret != 0 && errno == EINTR);
82
- .bdrv_co_io_plug = raw_co_io_plug,
118
+ if (ret != 0) {
83
- .bdrv_co_io_unplug = raw_co_io_unplug,
119
+ error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
84
.bdrv_attach_aio_context = raw_aio_attach_aio_context,
120
+ fd, offset, errno);
85
121
+ return -errno;
86
.bdrv_co_truncate = raw_co_truncate,
122
+ }
87
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_device = {
123
+
88
.bdrv_co_copy_range_from = raw_co_copy_range_from,
124
+ if (!rep->nr_zones) {
89
.bdrv_co_copy_range_to = raw_co_copy_range_to,
125
+ break;
90
.bdrv_refresh_limits = raw_refresh_limits,
126
+ }
91
- .bdrv_co_io_plug = raw_co_io_plug,
127
+
92
- .bdrv_co_io_unplug = raw_co_io_unplug,
128
+ for (i = 0; i < rep->nr_zones; ++i, ++n, ++j) {
93
.bdrv_attach_aio_context = raw_aio_attach_aio_context,
129
+ /*
94
130
+ * The wp tracking cares only about sequential writes required and
95
.bdrv_co_truncate = raw_co_truncate,
131
+ * sequential write preferred zones so that the wp can advance to
96
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_cdrom = {
132
+ * the right location.
97
.bdrv_co_pwritev = raw_co_pwritev,
133
+ * Use the most significant bit of the wp location to indicate the
98
.bdrv_co_flush_to_disk = raw_co_flush_to_disk,
134
+ * zone type: 0 for SWR/SWP zones and 1 for conventional zones.
99
.bdrv_refresh_limits = cdrom_refresh_limits,
135
+ */
100
- .bdrv_co_io_plug = raw_co_io_plug,
136
+ if (blkz[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
101
- .bdrv_co_io_unplug = raw_co_io_unplug,
137
+ wps->wp[j] |= 1ULL << 63;
102
.bdrv_attach_aio_context = raw_aio_attach_aio_context,
138
+ } else {
103
139
+ switch(blkz[i].cond) {
104
.bdrv_co_truncate = raw_co_truncate,
140
+ case BLK_ZONE_COND_FULL:
105
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_cdrom = {
141
+ case BLK_ZONE_COND_READONLY:
106
.bdrv_co_pwritev = raw_co_pwritev,
142
+ /* Zone not writable */
107
.bdrv_co_flush_to_disk = raw_co_flush_to_disk,
143
+ wps->wp[j] = (blkz[i].start + blkz[i].len) << BDRV_SECTOR_BITS;
108
.bdrv_refresh_limits = cdrom_refresh_limits,
144
+ break;
109
- .bdrv_co_io_plug = raw_co_io_plug,
145
+ case BLK_ZONE_COND_OFFLINE:
110
- .bdrv_co_io_unplug = raw_co_io_unplug,
146
+ /* Zone not writable nor readable */
111
.bdrv_attach_aio_context = raw_aio_attach_aio_context,
147
+ wps->wp[j] = (blkz[i].start) << BDRV_SECTOR_BITS;
112
148
+ break;
113
.bdrv_co_truncate = raw_co_truncate,
149
+ default:
114
diff --git a/block/linux-aio.c b/block/linux-aio.c
150
+ if (reset_all) {
115
index XXXXXXX..XXXXXXX 100644
151
+ wps->wp[j] = blkz[i].start << BDRV_SECTOR_BITS;
116
--- a/block/linux-aio.c
152
+ } else {
117
+++ b/block/linux-aio.c
153
+ wps->wp[j] = blkz[i].wp << BDRV_SECTOR_BITS;
118
@@ -XXX,XX +XXX,XX @@
154
+ }
119
#include "qemu/event_notifier.h"
155
+ break;
120
#include "qemu/coroutine.h"
156
+ }
121
#include "qapi/error.h"
157
+ }
122
+#include "sysemu/block-backend.h"
158
+ }
123
159
+ sector = blkz[i - 1].start + blkz[i - 1].len;
124
/* Only used for assertions. */
160
+ }
125
#include "qemu/coroutine_int.h"
161
+
126
@@ -XXX,XX +XXX,XX @@ struct qemu_laiocb {
162
+ return 0;
127
};
163
+}
128
164
+
129
typedef struct {
165
+static void update_zones_wp(BlockDriverState *bs, int fd, int64_t offset,
130
- int plugged;
166
+ unsigned int nrz)
131
unsigned int in_queue;
167
+{
132
unsigned int in_flight;
168
+ if (get_zones_wp(bs, fd, offset, nrz, 0) < 0) {
133
bool blocked;
169
+ error_report("update zone wp failed");
134
@@ -XXX,XX +XXX,XX @@ static void qemu_laio_process_completions_and_submit(LinuxAioState *s)
170
+ }
135
{
171
+}
136
qemu_laio_process_completions(s);
172
+
137
173
static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
138
- if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
174
Error **errp)
139
+ if (!QSIMPLEQ_EMPTY(&s->io_q.pending)) {
175
{
140
ioq_submit(s);
176
+ BDRVRawState *s = bs->opaque;
177
BlockZoneModel zoned;
178
int ret;
179
180
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
181
if (ret > 0) {
182
bs->bl.max_append_sectors = ret >> BDRV_SECTOR_BITS;
183
}
141
}
184
+
142
}
185
+ ret = get_sysfs_long_val(st, "physical_block_size");
143
@@ -XXX,XX +XXX,XX @@ static void qemu_laio_poll_ready(EventNotifier *opaque)
186
+ if (ret >= 0) {
144
static void ioq_init(LaioQueue *io_q)
187
+ bs->bl.write_granularity = ret;
145
{
188
+ }
146
QSIMPLEQ_INIT(&io_q->pending);
189
+
147
- io_q->plugged = 0;
190
+ /* The refresh_limits() function can be called multiple times. */
148
io_q->in_queue = 0;
191
+ g_free(bs->wps);
149
io_q->in_flight = 0;
192
+ bs->wps = g_malloc(sizeof(BlockZoneWps) +
150
io_q->blocked = false;
193
+ sizeof(int64_t) * bs->bl.nr_zones);
151
@@ -XXX,XX +XXX,XX @@ static uint64_t laio_max_batch(LinuxAioState *s, uint64_t dev_max_batch)
194
+ ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 0);
152
return max_batch;
195
+ if (ret < 0) {
153
}
196
+ error_setg_errno(errp, -ret, "report wps failed");
154
197
+ bs->wps = NULL;
155
-void laio_io_plug(void)
198
+ return;
156
+static void laio_unplug_fn(void *opaque)
199
+ }
157
{
200
+ qemu_co_mutex_init(&bs->wps->colock);
158
- AioContext *ctx = qemu_get_current_aio_context();
201
}
159
- LinuxAioState *s = aio_get_linux_aio(ctx);
202
#else /* !defined(CONFIG_BLKZONED) */
160
+ LinuxAioState *s = opaque;
203
static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
161
204
@@ -XXX,XX +XXX,XX @@ static int handle_aiocb_zone_mgmt(void *opaque)
162
- s->io_q.plugged++;
205
ret = ioctl(fd, aiocb->zone_mgmt.op, &range);
163
-}
206
} while (ret != 0 && errno == EINTR);
164
-
207
165
-void laio_io_unplug(uint64_t dev_max_batch)
208
- return ret;
166
-{
209
+ return ret < 0 ? -errno : ret;
167
- AioContext *ctx = qemu_get_current_aio_context();
210
}
168
- LinuxAioState *s = aio_get_linux_aio(ctx);
211
#endif
169
-
212
170
- assert(s->io_q.plugged);
213
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
171
- s->io_q.plugged--;
214
{
172
-
215
BDRVRawState *s = bs->opaque;
173
- /*
216
RawPosixAIOData acb;
174
- * Why max batch checking is performed here:
217
+ int ret;
175
- * Another BDS may have queued requests with a higher dev_max_batch and
218
176
- * therefore in_queue could now exceed our dev_max_batch. Re-check the max
219
if (fd_open(bs) < 0)
177
- * batch so we can honor our device's dev_max_batch.
220
return -EIO;
178
- */
221
+#if defined(CONFIG_BLKZONED)
179
- if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch) ||
222
+ if (type & QEMU_AIO_WRITE && bs->wps) {
180
- (!s->io_q.plugged &&
223
+ qemu_co_mutex_lock(&bs->wps->colock);
181
- !s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending))) {
224
+ }
182
+ if (!s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
225
+#endif
183
ioq_submit(s);
226
227
/*
228
* When using O_DIRECT, the request must be aligned to be able to use
229
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
230
#ifdef CONFIG_LINUX_IO_URING
231
} else if (s->use_linux_io_uring) {
232
assert(qiov->size == bytes);
233
- return luring_co_submit(bs, s->fd, offset, qiov, type);
234
+ ret = luring_co_submit(bs, s->fd, offset, qiov, type);
235
+ goto out;
236
#endif
237
#ifdef CONFIG_LINUX_AIO
238
} else if (s->use_linux_aio) {
239
assert(qiov->size == bytes);
240
- return laio_co_submit(s->fd, offset, qiov, type, s->aio_max_batch);
241
+ ret = laio_co_submit(s->fd, offset, qiov, type,
242
+ s->aio_max_batch);
243
+ goto out;
244
#endif
245
}
184
}
246
185
}
247
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
186
@@ -XXX,XX +XXX,XX @@ static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset,
248
};
187
249
188
QSIMPLEQ_INSERT_TAIL(&s->io_q.pending, laiocb, next);
250
assert(qiov->size == bytes);
189
s->io_q.in_queue++;
251
- return raw_thread_pool_submit(handle_aiocb_rw, &acb);
190
- if (!s->io_q.blocked &&
252
+ ret = raw_thread_pool_submit(handle_aiocb_rw, &acb);
191
- (!s->io_q.plugged ||
253
+ goto out; /* Avoid the compiler err of unused label */
192
- s->io_q.in_queue >= laio_max_batch(s, dev_max_batch))) {
254
+
193
- ioq_submit(s);
255
+out:
194
+ if (!s->io_q.blocked) {
256
+#if defined(CONFIG_BLKZONED)
195
+ if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch)) {
257
+{
196
+ ioq_submit(s);
258
+ BlockZoneWps *wps = bs->wps;
197
+ } else {
259
+ if (ret == 0) {
198
+ blk_io_plug_call(laio_unplug_fn, s);
260
+ if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) {
261
+ uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
262
+ if (!BDRV_ZT_IS_CONV(*wp)) {
263
+ /* Advance the wp if needed */
264
+ if (offset + bytes > *wp) {
265
+ *wp = offset + bytes;
266
+ }
267
+ }
268
+ }
269
+ } else {
270
+ if (type & QEMU_AIO_WRITE) {
271
+ update_zones_wp(bs, s->fd, 0, 1);
272
+ }
273
+ }
274
+
275
+ if (type & QEMU_AIO_WRITE && wps) {
276
+ qemu_co_mutex_unlock(&wps->colock);
277
+ }
278
+}
279
+#endif
280
+ return ret;
281
}
282
283
static int coroutine_fn raw_co_preadv(BlockDriverState *bs, int64_t offset,
284
@@ -XXX,XX +XXX,XX @@ static void raw_close(BlockDriverState *bs)
285
BDRVRawState *s = bs->opaque;
286
287
if (s->fd >= 0) {
288
+#if defined(CONFIG_BLKZONED)
289
+ g_free(bs->wps);
290
+#endif
291
qemu_close(s->fd);
292
s->fd = -1;
293
}
294
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
295
const char *op_name;
296
unsigned long zo;
297
int ret;
298
+ BlockZoneWps *wps = bs->wps;
299
int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
300
301
zone_size = bs->bl.zone_size;
302
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
303
return -EINVAL;
304
}
305
306
+ uint32_t i = offset / bs->bl.zone_size;
307
+ uint32_t nrz = len / bs->bl.zone_size;
308
+ uint64_t *wp = &wps->wp[i];
309
+ if (BDRV_ZT_IS_CONV(*wp) && len != capacity) {
310
+ error_report("zone mgmt operations are not allowed for conventional zones");
311
+ return -EIO;
312
+ }
313
+
314
switch (op) {
315
case BLK_ZO_OPEN:
316
op_name = "BLKOPENZONE";
317
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
318
len >> BDRV_SECTOR_BITS);
319
ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb);
320
if (ret != 0) {
321
+ update_zones_wp(bs, s->fd, offset, i);
322
error_report("ioctl %s failed %d", op_name, ret);
323
+ return ret;
324
+ }
325
+
326
+ if (zo == BLKRESETZONE && len == capacity) {
327
+ ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 1);
328
+ if (ret < 0) {
329
+ error_report("reporting single wp failed");
330
+ return ret;
331
+ }
332
+ } else if (zo == BLKRESETZONE) {
333
+ for (unsigned int j = 0; j < nrz; ++j) {
334
+ wp[j] = offset + j * zone_size;
335
+ }
336
+ } else if (zo == BLKFINISHZONE) {
337
+ for (unsigned int j = 0; j < nrz; ++j) {
338
+ /* The zoned device allows the last zone smaller that the
339
+ * zone size. */
340
+ wp[j] = MIN(offset + (j + 1) * zone_size, offset + len);
341
+ }
199
+ }
342
}
200
}
343
201
344
return ret;
202
return 0;
345
--
203
--
346
2.40.0
204
2.40.1
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
No block driver implements .bdrv_co_io_plug() anymore. Get rid of the
2
function pointers.
2
3
3
A zone append command is a write operation that specifies the first
4
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
logical block of a zone as the write position. When writing to a zoned
5
Reviewed-by: Eric Blake <eblake@redhat.com>
5
block device using zone append, the byte offset of the call may point at
6
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
6
any position within the zone to which the data is being appended. Upon
7
Acked-by: Kevin Wolf <kwolf@redhat.com>
7
completion the device will respond with the position where the data has
8
Message-id: 20230530180959.1108766-7-stefanha@redhat.com
8
been written in the zone.
9
10
Signed-off-by: Sam Li <faithilikerun@gmail.com>
11
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
12
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Message-id: 20230427172339.3709-3-faithilikerun@gmail.com
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
15
---
10
---
16
include/block/block-io.h | 4 ++
11
include/block/block-io.h | 3 ---
17
include/block/block_int-common.h | 3 ++
12
include/block/block_int-common.h | 11 ----------
18
include/block/raw-aio.h | 4 +-
13
block/io.c | 37 --------------------------------
19
include/sysemu/block-backend-io.h | 9 +++++
14
3 files changed, 51 deletions(-)
20
block/block-backend.c | 61 +++++++++++++++++++++++++++++++
21
block/file-posix.c | 58 +++++++++++++++++++++++++----
22
block/io.c | 27 ++++++++++++++
23
block/io_uring.c | 4 ++
24
block/linux-aio.c | 3 ++
25
block/raw-format.c | 8 ++++
26
10 files changed, 173 insertions(+), 8 deletions(-)
27
15
28
diff --git a/include/block/block-io.h b/include/block/block-io.h
16
diff --git a/include/block/block-io.h b/include/block/block-io.h
29
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
30
--- a/include/block/block-io.h
18
--- a/include/block/block-io.h
31
+++ b/include/block/block-io.h
19
+++ b/include/block/block-io.h
32
@@ -XXX,XX +XXX,XX @@ int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_report(BlockDriverState *bs,
20
@@ -XXX,XX +XXX,XX @@ void coroutine_fn bdrv_co_leave(BlockDriverState *bs, AioContext *old_ctx);
33
int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs,
21
34
BlockZoneOp op,
22
AioContext *child_of_bds_get_parent_aio_context(BdrvChild *c);
35
int64_t offset, int64_t len);
23
36
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_append(BlockDriverState *bs,
24
-void coroutine_fn GRAPH_RDLOCK bdrv_co_io_plug(BlockDriverState *bs);
37
+ int64_t *offset,
25
-void coroutine_fn GRAPH_RDLOCK bdrv_co_io_unplug(BlockDriverState *bs);
38
+ QEMUIOVector *qiov,
26
-
39
+ BdrvRequestFlags flags);
27
bool coroutine_fn GRAPH_RDLOCK
40
28
bdrv_co_can_store_new_dirty_bitmap(BlockDriverState *bs, const char *name,
41
bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
29
uint32_t granularity, Error **errp);
42
int bdrv_block_status(BlockDriverState *bs, int64_t offset,
43
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
30
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
44
index XXXXXXX..XXXXXXX 100644
31
index XXXXXXX..XXXXXXX 100644
45
--- a/include/block/block_int-common.h
32
--- a/include/block/block_int-common.h
46
+++ b/include/block/block_int-common.h
33
+++ b/include/block/block_int-common.h
47
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
34
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
48
BlockZoneDescriptor *zones);
35
void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_debug_event)(
49
int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
36
BlockDriverState *bs, BlkdebugEvent event);
50
int64_t offset, int64_t len);
37
51
+ int coroutine_fn (*bdrv_co_zone_append)(BlockDriverState *bs,
38
- /* io queue for linux-aio */
52
+ int64_t *offset, QEMUIOVector *qiov,
39
- void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_io_plug)(BlockDriverState *bs);
53
+ BdrvRequestFlags flags);
40
- void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_io_unplug)(
54
41
- BlockDriverState *bs);
55
/* removable device specific */
42
-
56
bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)(
43
bool (*bdrv_supports_persistent_dirty_bitmap)(BlockDriverState *bs);
57
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
44
58
index XXXXXXX..XXXXXXX 100644
45
bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_can_store_new_dirty_bitmap)(
59
--- a/include/block/raw-aio.h
46
@@ -XXX,XX +XXX,XX @@ struct BlockDriverState {
60
+++ b/include/block/raw-aio.h
47
unsigned int in_flight;
61
@@ -XXX,XX +XXX,XX @@
48
unsigned int serialising_in_flight;
62
#define QEMU_AIO_TRUNCATE 0x0080
49
63
#define QEMU_AIO_ZONE_REPORT 0x0100
50
- /*
64
#define QEMU_AIO_ZONE_MGMT 0x0200
51
- * counter for nested bdrv_io_plug.
65
+#define QEMU_AIO_ZONE_APPEND 0x0400
52
- * Accessed with atomic ops.
66
#define QEMU_AIO_TYPE_MASK \
53
- */
67
(QEMU_AIO_READ | \
54
- unsigned io_plugged;
68
QEMU_AIO_WRITE | \
55
-
69
@@ -XXX,XX +XXX,XX @@
56
/* do we need to tell the quest if we have a volatile write cache? */
70
QEMU_AIO_COPY_RANGE | \
57
int enable_write_cache;
71
QEMU_AIO_TRUNCATE | \
72
QEMU_AIO_ZONE_REPORT | \
73
- QEMU_AIO_ZONE_MGMT)
74
+ QEMU_AIO_ZONE_MGMT | \
75
+ QEMU_AIO_ZONE_APPEND)
76
77
/* AIO flags */
78
#define QEMU_AIO_MISALIGNED 0x1000
79
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
80
index XXXXXXX..XXXXXXX 100644
81
--- a/include/sysemu/block-backend-io.h
82
+++ b/include/sysemu/block-backend-io.h
83
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
84
BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
85
int64_t offset, int64_t len,
86
BlockCompletionFunc *cb, void *opaque);
87
+BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
88
+ QEMUIOVector *qiov, BdrvRequestFlags flags,
89
+ BlockCompletionFunc *cb, void *opaque);
90
BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes,
91
BlockCompletionFunc *cb, void *opaque);
92
void blk_aio_cancel_async(BlockAIOCB *acb);
93
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
94
int64_t offset, int64_t len);
95
int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
96
int64_t offset, int64_t len);
97
+int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
98
+ QEMUIOVector *qiov,
99
+ BdrvRequestFlags flags);
100
+int co_wrapper_mixed blk_zone_append(BlockBackend *blk, int64_t *offset,
101
+ QEMUIOVector *qiov,
102
+ BdrvRequestFlags flags);
103
104
int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset,
105
int64_t bytes);
106
diff --git a/block/block-backend.c b/block/block-backend.c
107
index XXXXXXX..XXXXXXX 100644
108
--- a/block/block-backend.c
109
+++ b/block/block-backend.c
110
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
111
return &acb->common;
112
}
113
114
+static void coroutine_fn blk_aio_zone_append_entry(void *opaque)
115
+{
116
+ BlkAioEmAIOCB *acb = opaque;
117
+ BlkRwCo *rwco = &acb->rwco;
118
+
119
+ rwco->ret = blk_co_zone_append(rwco->blk, (int64_t *)(uintptr_t)acb->bytes,
120
+ rwco->iobuf, rwco->flags);
121
+ blk_aio_complete(acb);
122
+}
123
+
124
+BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
125
+ QEMUIOVector *qiov, BdrvRequestFlags flags,
126
+ BlockCompletionFunc *cb, void *opaque) {
127
+ BlkAioEmAIOCB *acb;
128
+ Coroutine *co;
129
+ IO_CODE();
130
+
131
+ blk_inc_in_flight(blk);
132
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
133
+ acb->rwco = (BlkRwCo) {
134
+ .blk = blk,
135
+ .ret = NOT_DONE,
136
+ .flags = flags,
137
+ .iobuf = qiov,
138
+ };
139
+ acb->bytes = (int64_t)(uintptr_t)offset;
140
+ acb->has_returned = false;
141
+
142
+ co = qemu_coroutine_create(blk_aio_zone_append_entry, acb);
143
+ aio_co_enter(blk_get_aio_context(blk), co);
144
+ acb->has_returned = true;
145
+ if (acb->rwco.ret != NOT_DONE) {
146
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
147
+ blk_aio_complete_bh, acb);
148
+ }
149
+
150
+ return &acb->common;
151
+}
152
+
153
/*
154
* Send a zone_report command.
155
* offset is a byte offset from the start of the device. No alignment
156
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
157
return ret;
158
}
159
160
+/*
161
+ * Send a zone_append command.
162
+ */
163
+int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
164
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
165
+{
166
+ int ret;
167
+ IO_CODE();
168
+
169
+ blk_inc_in_flight(blk);
170
+ blk_wait_while_drained(blk);
171
+ GRAPH_RDLOCK_GUARD();
172
+ if (!blk_is_available(blk)) {
173
+ blk_dec_in_flight(blk);
174
+ return -ENOMEDIUM;
175
+ }
176
+
177
+ ret = bdrv_co_zone_append(blk_bs(blk), offset, qiov, flags);
178
+ blk_dec_in_flight(blk);
179
+ return ret;
180
+}
181
+
182
void blk_drain(BlockBackend *blk)
183
{
184
BlockDriverState *bs = blk_bs(blk);
185
diff --git a/block/file-posix.c b/block/file-posix.c
186
index XXXXXXX..XXXXXXX 100644
187
--- a/block/file-posix.c
188
+++ b/block/file-posix.c
189
@@ -XXX,XX +XXX,XX @@ typedef struct BDRVRawState {
190
bool has_write_zeroes:1;
191
bool use_linux_aio:1;
192
bool use_linux_io_uring:1;
193
+ int64_t *offset; /* offset of zone append operation */
194
int page_cache_inconsistent; /* errno from fdatasync failure */
195
bool has_fallocate;
196
bool needs_alignment;
197
@@ -XXX,XX +XXX,XX @@ static ssize_t handle_aiocb_rw_vector(RawPosixAIOData *aiocb)
198
ssize_t len;
199
200
len = RETRY_ON_EINTR(
201
- (aiocb->aio_type & QEMU_AIO_WRITE) ?
202
+ (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) ?
203
qemu_pwritev(aiocb->aio_fildes,
204
aiocb->io.iov,
205
aiocb->io.niov,
206
@@ -XXX,XX +XXX,XX @@ static ssize_t handle_aiocb_rw_linear(RawPosixAIOData *aiocb, char *buf)
207
ssize_t len;
208
209
while (offset < aiocb->aio_nbytes) {
210
- if (aiocb->aio_type & QEMU_AIO_WRITE) {
211
+ if (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
212
len = pwrite(aiocb->aio_fildes,
213
(const char *)buf + offset,
214
aiocb->aio_nbytes - offset,
215
@@ -XXX,XX +XXX,XX @@ static int handle_aiocb_rw(void *opaque)
216
}
217
218
nbytes = handle_aiocb_rw_linear(aiocb, buf);
219
- if (!(aiocb->aio_type & QEMU_AIO_WRITE)) {
220
+ if (!(aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))) {
221
char *p = buf;
222
size_t count = aiocb->aio_nbytes, copy;
223
int i;
224
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
225
if (fd_open(bs) < 0)
226
return -EIO;
227
#if defined(CONFIG_BLKZONED)
228
- if (type & QEMU_AIO_WRITE && bs->wps) {
229
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && bs->wps) {
230
qemu_co_mutex_lock(&bs->wps->colock);
231
+ if (type & QEMU_AIO_ZONE_APPEND && bs->bl.zone_size) {
232
+ int index = offset / bs->bl.zone_size;
233
+ offset = bs->wps->wp[index];
234
+ }
235
}
236
#endif
237
238
@@ -XXX,XX +XXX,XX @@ out:
239
{
240
BlockZoneWps *wps = bs->wps;
241
if (ret == 0) {
242
- if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) {
243
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))
244
+ && wps && bs->bl.zone_size) {
245
uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
246
if (!BDRV_ZT_IS_CONV(*wp)) {
247
+ if (type & QEMU_AIO_ZONE_APPEND) {
248
+ *s->offset = *wp;
249
+ }
250
/* Advance the wp if needed */
251
if (offset + bytes > *wp) {
252
*wp = offset + bytes;
253
@@ -XXX,XX +XXX,XX @@ out:
254
}
255
}
256
} else {
257
- if (type & QEMU_AIO_WRITE) {
258
+ if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
259
update_zones_wp(bs, s->fd, 0, 1);
260
}
261
}
262
263
- if (type & QEMU_AIO_WRITE && wps) {
264
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && wps) {
265
qemu_co_mutex_unlock(&wps->colock);
266
}
267
}
268
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
269
}
270
#endif
271
272
+#if defined(CONFIG_BLKZONED)
273
+static int coroutine_fn raw_co_zone_append(BlockDriverState *bs,
274
+ int64_t *offset,
275
+ QEMUIOVector *qiov,
276
+ BdrvRequestFlags flags) {
277
+ assert(flags == 0);
278
+ int64_t zone_size_mask = bs->bl.zone_size - 1;
279
+ int64_t iov_len = 0;
280
+ int64_t len = 0;
281
+ BDRVRawState *s = bs->opaque;
282
+ s->offset = offset;
283
+
284
+ if (*offset & zone_size_mask) {
285
+ error_report("sector offset %" PRId64 " is not aligned to zone size "
286
+ "%" PRId32 "", *offset / 512, bs->bl.zone_size / 512);
287
+ return -EINVAL;
288
+ }
289
+
290
+ int64_t wg = bs->bl.write_granularity;
291
+ int64_t wg_mask = wg - 1;
292
+ for (int i = 0; i < qiov->niov; i++) {
293
+ iov_len = qiov->iov[i].iov_len;
294
+ if (iov_len & wg_mask) {
295
+ error_report("len of IOVector[%d] %" PRId64 " is not aligned to "
296
+ "block size %" PRId64 "", i, iov_len, wg);
297
+ return -EINVAL;
298
+ }
299
+ len += iov_len;
300
+ }
301
+
302
+ return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
303
+}
304
+#endif
305
+
306
static coroutine_fn int
307
raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
308
bool blkdev)
309
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_device = {
310
/* zone management operations */
311
.bdrv_co_zone_report = raw_co_zone_report,
312
.bdrv_co_zone_mgmt = raw_co_zone_mgmt,
313
+ .bdrv_co_zone_append = raw_co_zone_append,
314
#endif
315
};
316
58
317
diff --git a/block/io.c b/block/io.c
59
diff --git a/block/io.c b/block/io.c
318
index XXXXXXX..XXXXXXX 100644
60
index XXXXXXX..XXXXXXX 100644
319
--- a/block/io.c
61
--- a/block/io.c
320
+++ b/block/io.c
62
+++ b/block/io.c
321
@@ -XXX,XX +XXX,XX @@ out:
63
@@ -XXX,XX +XXX,XX @@ void *qemu_try_blockalign0(BlockDriverState *bs, size_t size)
322
return co.ret;
64
return mem;
323
}
65
}
324
66
325
+int coroutine_fn bdrv_co_zone_append(BlockDriverState *bs, int64_t *offset,
67
-void coroutine_fn bdrv_co_io_plug(BlockDriverState *bs)
326
+ QEMUIOVector *qiov,
68
-{
327
+ BdrvRequestFlags flags)
69
- BdrvChild *child;
328
+{
70
- IO_CODE();
329
+ int ret;
71
- assert_bdrv_graph_readable();
330
+ BlockDriver *drv = bs->drv;
72
-
331
+ CoroutineIOCompletion co = {
73
- QLIST_FOREACH(child, &bs->children, next) {
332
+ .coroutine = qemu_coroutine_self(),
74
- bdrv_co_io_plug(child->bs);
333
+ };
75
- }
334
+ IO_CODE();
76
-
335
+
77
- if (qatomic_fetch_inc(&bs->io_plugged) == 0) {
336
+ ret = bdrv_check_qiov_request(*offset, qiov->size, qiov, 0, NULL);
78
- BlockDriver *drv = bs->drv;
337
+ if (ret < 0) {
79
- if (drv && drv->bdrv_co_io_plug) {
338
+ return ret;
80
- drv->bdrv_co_io_plug(bs);
339
+ }
81
- }
340
+
82
- }
341
+ bdrv_inc_in_flight(bs);
83
-}
342
+ if (!drv || !drv->bdrv_co_zone_append || bs->bl.zoned == BLK_Z_NONE) {
84
-
343
+ co.ret = -ENOTSUP;
85
-void coroutine_fn bdrv_co_io_unplug(BlockDriverState *bs)
344
+ goto out;
86
-{
345
+ }
87
- BdrvChild *child;
346
+ co.ret = drv->bdrv_co_zone_append(bs, offset, qiov, flags);
88
- IO_CODE();
347
+out:
89
- assert_bdrv_graph_readable();
348
+ bdrv_dec_in_flight(bs);
90
-
349
+ return co.ret;
91
- assert(bs->io_plugged);
350
+}
92
- if (qatomic_fetch_dec(&bs->io_plugged) == 1) {
351
+
93
- BlockDriver *drv = bs->drv;
352
void *qemu_blockalign(BlockDriverState *bs, size_t size)
94
- if (drv && drv->bdrv_co_io_unplug) {
353
{
95
- drv->bdrv_co_io_unplug(bs);
354
IO_CODE();
96
- }
355
diff --git a/block/io_uring.c b/block/io_uring.c
97
- }
356
index XXXXXXX..XXXXXXX 100644
98
-
357
--- a/block/io_uring.c
99
- QLIST_FOREACH(child, &bs->children, next) {
358
+++ b/block/io_uring.c
100
- bdrv_co_io_unplug(child->bs);
359
@@ -XXX,XX +XXX,XX @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s,
101
- }
360
io_uring_prep_writev(sqes, fd, luringcb->qiov->iov,
102
-}
361
luringcb->qiov->niov, offset);
103
-
362
break;
104
/* Helper that undoes bdrv_register_buf() when it fails partway through */
363
+ case QEMU_AIO_ZONE_APPEND:
105
static void GRAPH_RDLOCK
364
+ io_uring_prep_writev(sqes, fd, luringcb->qiov->iov,
106
bdrv_register_buf_rollback(BlockDriverState *bs, void *host, size_t size,
365
+ luringcb->qiov->niov, offset);
366
+ break;
367
case QEMU_AIO_READ:
368
io_uring_prep_readv(sqes, fd, luringcb->qiov->iov,
369
luringcb->qiov->niov, offset);
370
diff --git a/block/linux-aio.c b/block/linux-aio.c
371
index XXXXXXX..XXXXXXX 100644
372
--- a/block/linux-aio.c
373
+++ b/block/linux-aio.c
374
@@ -XXX,XX +XXX,XX @@ static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset,
375
case QEMU_AIO_WRITE:
376
io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset);
377
break;
378
+ case QEMU_AIO_ZONE_APPEND:
379
+ io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset);
380
+ break;
381
case QEMU_AIO_READ:
382
io_prep_preadv(iocbs, fd, qiov->iov, qiov->niov, offset);
383
break;
384
diff --git a/block/raw-format.c b/block/raw-format.c
385
index XXXXXXX..XXXXXXX 100644
386
--- a/block/raw-format.c
387
+++ b/block/raw-format.c
388
@@ -XXX,XX +XXX,XX @@ raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
389
return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
390
}
391
392
+static int coroutine_fn GRAPH_RDLOCK
393
+raw_co_zone_append(BlockDriverState *bs,int64_t *offset, QEMUIOVector *qiov,
394
+ BdrvRequestFlags flags)
395
+{
396
+ return bdrv_co_zone_append(bs->file->bs, offset, qiov, flags);
397
+}
398
+
399
static int64_t coroutine_fn GRAPH_RDLOCK
400
raw_co_getlength(BlockDriverState *bs)
401
{
402
@@ -XXX,XX +XXX,XX @@ BlockDriver bdrv_raw = {
403
.bdrv_co_pdiscard = &raw_co_pdiscard,
404
.bdrv_co_zone_report = &raw_co_zone_report,
405
.bdrv_co_zone_mgmt = &raw_co_zone_mgmt,
406
+ .bdrv_co_zone_append = &raw_co_zone_append,
407
.bdrv_co_block_status = &raw_co_block_status,
408
.bdrv_co_copy_range_from = &raw_co_copy_range_from,
409
.bdrv_co_copy_range_to = &raw_co_copy_range_to,
410
--
107
--
411
2.40.0
108
2.40.1
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
The patch tests zone append writes by reporting the zone wp after
4
the completion of the call. "zap -p" option can print the sector
5
offset value after completion, which should be the start sector
6
where the append write begins.
7
8
Signed-off-by: Sam Li <faithilikerun@gmail.com>
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Message-id: 20230427172339.3709-4-faithilikerun@gmail.com
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
13
qemu-io-cmds.c | 75 ++++++++++++++++++++++++++++++
14
tests/qemu-iotests/tests/zoned | 16 +++++++
15
tests/qemu-iotests/tests/zoned.out | 16 +++++++
16
3 files changed, 107 insertions(+)
17
18
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
19
index XXXXXXX..XXXXXXX 100644
20
--- a/qemu-io-cmds.c
21
+++ b/qemu-io-cmds.c
22
@@ -XXX,XX +XXX,XX @@ static const cmdinfo_t zone_reset_cmd = {
23
.oneline = "reset a zone write pointer in zone block device",
24
};
25
26
+static int do_aio_zone_append(BlockBackend *blk, QEMUIOVector *qiov,
27
+ int64_t *offset, int flags, int *total)
28
+{
29
+ int async_ret = NOT_DONE;
30
+
31
+ blk_aio_zone_append(blk, offset, qiov, flags, aio_rw_done, &async_ret);
32
+ while (async_ret == NOT_DONE) {
33
+ main_loop_wait(false);
34
+ }
35
+
36
+ *total = qiov->size;
37
+ return async_ret < 0 ? async_ret : 1;
38
+}
39
+
40
+static int zone_append_f(BlockBackend *blk, int argc, char **argv)
41
+{
42
+ int ret;
43
+ bool pflag = false;
44
+ int flags = 0;
45
+ int total = 0;
46
+ int64_t offset;
47
+ char *buf;
48
+ int c, nr_iov;
49
+ int pattern = 0xcd;
50
+ QEMUIOVector qiov;
51
+
52
+ if (optind > argc - 3) {
53
+ return -EINVAL;
54
+ }
55
+
56
+ if ((c = getopt(argc, argv, "p")) != -1) {
57
+ pflag = true;
58
+ }
59
+
60
+ offset = cvtnum(argv[optind]);
61
+ if (offset < 0) {
62
+ print_cvtnum_err(offset, argv[optind]);
63
+ return offset;
64
+ }
65
+ optind++;
66
+ nr_iov = argc - optind;
67
+ buf = create_iovec(blk, &qiov, &argv[optind], nr_iov, pattern,
68
+ flags & BDRV_REQ_REGISTERED_BUF);
69
+ if (buf == NULL) {
70
+ return -EINVAL;
71
+ }
72
+ ret = do_aio_zone_append(blk, &qiov, &offset, flags, &total);
73
+ if (ret < 0) {
74
+ printf("zone append failed: %s\n", strerror(-ret));
75
+ goto out;
76
+ }
77
+
78
+ if (pflag) {
79
+ printf("After zap done, the append sector is 0x%" PRIx64 "\n",
80
+ tosector(offset));
81
+ }
82
+
83
+out:
84
+ qemu_io_free(blk, buf, qiov.size,
85
+ flags & BDRV_REQ_REGISTERED_BUF);
86
+ qemu_iovec_destroy(&qiov);
87
+ return ret;
88
+}
89
+
90
+static const cmdinfo_t zone_append_cmd = {
91
+ .name = "zone_append",
92
+ .altname = "zap",
93
+ .cfunc = zone_append_f,
94
+ .argmin = 3,
95
+ .argmax = 4,
96
+ .args = "offset len [len..]",
97
+ .oneline = "append write a number of bytes at a specified offset",
98
+};
99
+
100
static int truncate_f(BlockBackend *blk, int argc, char **argv);
101
static const cmdinfo_t truncate_cmd = {
102
.name = "truncate",
103
@@ -XXX,XX +XXX,XX @@ static void __attribute((constructor)) init_qemuio_commands(void)
104
qemuio_add_command(&zone_close_cmd);
105
qemuio_add_command(&zone_finish_cmd);
106
qemuio_add_command(&zone_reset_cmd);
107
+ qemuio_add_command(&zone_append_cmd);
108
qemuio_add_command(&truncate_cmd);
109
qemuio_add_command(&length_cmd);
110
qemuio_add_command(&info_cmd);
111
diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned
112
index XXXXXXX..XXXXXXX 100755
113
--- a/tests/qemu-iotests/tests/zoned
114
+++ b/tests/qemu-iotests/tests/zoned
115
@@ -XXX,XX +XXX,XX @@ echo "(5) resetting the second zone"
116
$QEMU_IO $IMG -c "zrs 268435456 268435456"
117
echo "After resetting a zone:"
118
$QEMU_IO $IMG -c "zrp 268435456 1"
119
+echo
120
+echo
121
+echo "(6) append write" # the physical block size of the device is 4096
122
+$QEMU_IO $IMG -c "zrp 0 1"
123
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
124
+echo "After appending the first zone firstly:"
125
+$QEMU_IO $IMG -c "zrp 0 1"
126
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
127
+echo "After appending the first zone secondly:"
128
+$QEMU_IO $IMG -c "zrp 0 1"
129
+$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000"
130
+echo "After appending the second zone firstly:"
131
+$QEMU_IO $IMG -c "zrp 268435456 1"
132
+$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000"
133
+echo "After appending the second zone secondly:"
134
+$QEMU_IO $IMG -c "zrp 268435456 1"
135
136
# success, all done
137
echo "*** done"
138
diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out
139
index XXXXXXX..XXXXXXX 100644
140
--- a/tests/qemu-iotests/tests/zoned.out
141
+++ b/tests/qemu-iotests/tests/zoned.out
142
@@ -XXX,XX +XXX,XX @@ start: 0x80000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:14, [type: 2]
143
(5) resetting the second zone
144
After resetting a zone:
145
start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
146
+
147
+
148
+(6) append write
149
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
150
+After zap done, the append sector is 0x0
151
+After appending the first zone firstly:
152
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x18, zcond:2, [type: 2]
153
+After zap done, the append sector is 0x18
154
+After appending the first zone secondly:
155
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x30, zcond:2, [type: 2]
156
+After zap done, the append sector is 0x80000
157
+After appending the second zone firstly:
158
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80018, zcond:2, [type: 2]
159
+After zap done, the append sector is 0x80018
160
+After appending the second zone secondly:
161
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80030, zcond:2, [type: 2]
162
*** done
163
--
164
2.40.0
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Use scripts/update-linux-headers.sh to update headers to 6.3-rc1.
4
5
Signed-off-by: Sam Li <faithilikerun@gmail.com>
6
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
[Reran scripts/update-linux-headers.sh on Linux v6.3. The only change
10
was the use of __virtioXX types instead of uintXX_t.
11
--Stefan]
12
Message-Id: <20230407082528.18841-2-faithilikerun@gmail.com>
13
---
14
include/standard-headers/drm/drm_fourcc.h | 12 +++
15
include/standard-headers/linux/ethtool.h | 48 ++++++++-
16
include/standard-headers/linux/fuse.h | 45 +++++++-
17
include/standard-headers/linux/pci_regs.h | 1 +
18
include/standard-headers/linux/vhost_types.h | 2 +
19
include/standard-headers/linux/virtio_blk.h | 105 +++++++++++++++++++
20
linux-headers/asm-arm64/kvm.h | 1 +
21
linux-headers/asm-x86/kvm.h | 34 +++++-
22
linux-headers/linux/kvm.h | 9 ++
23
linux-headers/linux/vfio.h | 15 +--
24
linux-headers/linux/vhost.h | 8 ++
25
11 files changed, 270 insertions(+), 10 deletions(-)
26
27
diff --git a/include/standard-headers/drm/drm_fourcc.h b/include/standard-headers/drm/drm_fourcc.h
28
index XXXXXXX..XXXXXXX 100644
29
--- a/include/standard-headers/drm/drm_fourcc.h
30
+++ b/include/standard-headers/drm/drm_fourcc.h
31
@@ -XXX,XX +XXX,XX @@ extern "C" {
32
*
33
* The authoritative list of format modifier codes is found in
34
* `include/uapi/drm/drm_fourcc.h`
35
+ *
36
+ * Open Source User Waiver
37
+ * -----------------------
38
+ *
39
+ * Because this is the authoritative source for pixel formats and modifiers
40
+ * referenced by GL, Vulkan extensions and other standards and hence used both
41
+ * by open source and closed source driver stacks, the usual requirement for an
42
+ * upstream in-kernel or open source userspace user does not apply.
43
+ *
44
+ * To ensure, as much as feasible, compatibility across stacks and avoid
45
+ * confusion with incompatible enumerations stakeholders for all relevant driver
46
+ * stacks should approve additions.
47
*/
48
49
#define fourcc_code(a, b, c, d) ((uint32_t)(a) | ((uint32_t)(b) << 8) | \
50
diff --git a/include/standard-headers/linux/ethtool.h b/include/standard-headers/linux/ethtool.h
51
index XXXXXXX..XXXXXXX 100644
52
--- a/include/standard-headers/linux/ethtool.h
53
+++ b/include/standard-headers/linux/ethtool.h
54
@@ -XXX,XX +XXX,XX @@ enum ethtool_stringset {
55
    ETH_SS_COUNT
56
};
57
58
+/**
59
+ * enum ethtool_mac_stats_src - source of ethtool MAC statistics
60
+ * @ETHTOOL_MAC_STATS_SRC_AGGREGATE:
61
+ *    if device supports a MAC merge layer, this retrieves the aggregate
62
+ *    statistics of the eMAC and pMAC. Otherwise, it retrieves just the
63
+ *    statistics of the single (express) MAC.
64
+ * @ETHTOOL_MAC_STATS_SRC_EMAC:
65
+ *    if device supports a MM layer, this retrieves the eMAC statistics.
66
+ *    Otherwise, it retrieves the statistics of the single (express) MAC.
67
+ * @ETHTOOL_MAC_STATS_SRC_PMAC:
68
+ *    if device supports a MM layer, this retrieves the pMAC statistics.
69
+ */
70
+enum ethtool_mac_stats_src {
71
+    ETHTOOL_MAC_STATS_SRC_AGGREGATE,
72
+    ETHTOOL_MAC_STATS_SRC_EMAC,
73
+    ETHTOOL_MAC_STATS_SRC_PMAC,
74
+};
75
+
76
/**
77
* enum ethtool_module_power_mode_policy - plug-in module power mode policy
78
* @ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH: Module is always in high power mode.
79
@@ -XXX,XX +XXX,XX @@ enum ethtool_podl_pse_pw_d_status {
80
    ETHTOOL_PODL_PSE_PW_D_STATUS_ERROR,
81
};
82
83
+/**
84
+ * enum ethtool_mm_verify_status - status of MAC Merge Verify function
85
+ * @ETHTOOL_MM_VERIFY_STATUS_UNKNOWN:
86
+ *    verification status is unknown
87
+ * @ETHTOOL_MM_VERIFY_STATUS_INITIAL:
88
+ *    the 802.3 Verify State diagram is in the state INIT_VERIFICATION
89
+ * @ETHTOOL_MM_VERIFY_STATUS_VERIFYING:
90
+ *    the Verify State diagram is in the state VERIFICATION_IDLE,
91
+ *    SEND_VERIFY or WAIT_FOR_RESPONSE
92
+ * @ETHTOOL_MM_VERIFY_STATUS_SUCCEEDED:
93
+ *    indicates that the Verify State diagram is in the state VERIFIED
94
+ * @ETHTOOL_MM_VERIFY_STATUS_FAILED:
95
+ *    the Verify State diagram is in the state VERIFY_FAIL
96
+ * @ETHTOOL_MM_VERIFY_STATUS_DISABLED:
97
+ *    verification of preemption operation is disabled
98
+ */
99
+enum ethtool_mm_verify_status {
100
+    ETHTOOL_MM_VERIFY_STATUS_UNKNOWN,
101
+    ETHTOOL_MM_VERIFY_STATUS_INITIAL,
102
+    ETHTOOL_MM_VERIFY_STATUS_VERIFYING,
103
+    ETHTOOL_MM_VERIFY_STATUS_SUCCEEDED,
104
+    ETHTOOL_MM_VERIFY_STATUS_FAILED,
105
+    ETHTOOL_MM_VERIFY_STATUS_DISABLED,
106
+};
107
+
108
/**
109
* struct ethtool_gstrings - string set for data tagging
110
* @cmd: Command number = %ETHTOOL_GSTRINGS
111
@@ -XXX,XX +XXX,XX @@ struct ethtool_rxnfc {
112
        uint32_t            rule_cnt;
113
        uint32_t            rss_context;
114
    };
115
-    uint32_t                rule_locs[0];
116
+    uint32_t                rule_locs[];
117
};
118
119
120
@@ -XXX,XX +XXX,XX @@ enum ethtool_link_mode_bit_indices {
121
    ETHTOOL_LINK_MODE_800000baseDR8_2_Full_BIT     = 96,
122
    ETHTOOL_LINK_MODE_800000baseSR8_Full_BIT     = 97,
123
    ETHTOOL_LINK_MODE_800000baseVR8_Full_BIT     = 98,
124
+    ETHTOOL_LINK_MODE_10baseT1S_Full_BIT         = 99,
125
+    ETHTOOL_LINK_MODE_10baseT1S_Half_BIT         = 100,
126
+    ETHTOOL_LINK_MODE_10baseT1S_P2MP_Half_BIT     = 101,
127
128
    /* must be last entry */
129
    __ETHTOOL_LINK_MODE_MASK_NBITS
130
diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h
131
index XXXXXXX..XXXXXXX 100644
132
--- a/include/standard-headers/linux/fuse.h
133
+++ b/include/standard-headers/linux/fuse.h
134
@@ -XXX,XX +XXX,XX @@
135
* 7.38
136
* - add FUSE_EXPIRE_ONLY flag to fuse_notify_inval_entry
137
* - add FOPEN_PARALLEL_DIRECT_WRITES
138
+ * - add total_extlen to fuse_in_header
139
+ * - add FUSE_MAX_NR_SECCTX
140
+ * - add extension header
141
+ * - add FUSE_EXT_GROUPS
142
+ * - add FUSE_CREATE_SUPP_GROUP
143
*/
144
145
#ifndef _LINUX_FUSE_H
146
@@ -XXX,XX +XXX,XX @@ struct fuse_file_lock {
147
* FUSE_SECURITY_CTX:    add security context to create, mkdir, symlink, and
148
*            mknod
149
* FUSE_HAS_INODE_DAX: use per inode DAX
150
+ * FUSE_CREATE_SUPP_GROUP: add supplementary group info to create, mkdir,
151
+ *            symlink and mknod (single group that matches parent)
152
*/
153
#define FUSE_ASYNC_READ        (1 << 0)
154
#define FUSE_POSIX_LOCKS    (1 << 1)
155
@@ -XXX,XX +XXX,XX @@ struct fuse_file_lock {
156
/* bits 32..63 get shifted down 32 bits into the flags2 field */
157
#define FUSE_SECURITY_CTX    (1ULL << 32)
158
#define FUSE_HAS_INODE_DAX    (1ULL << 33)
159
+#define FUSE_CREATE_SUPP_GROUP    (1ULL << 34)
160
161
/**
162
* CUSE INIT request/reply flags
163
@@ -XXX,XX +XXX,XX @@ struct fuse_file_lock {
164
*/
165
#define FUSE_EXPIRE_ONLY        (1 << 0)
166
167
+/**
168
+ * extension type
169
+ * FUSE_MAX_NR_SECCTX: maximum value of &fuse_secctx_header.nr_secctx
170
+ * FUSE_EXT_GROUPS: &fuse_supp_groups extension
171
+ */
172
+enum fuse_ext_type {
173
+    /* Types 0..31 are reserved for fuse_secctx_header */
174
+    FUSE_MAX_NR_SECCTX    = 31,
175
+    FUSE_EXT_GROUPS        = 32,
176
+};
177
+
178
enum fuse_opcode {
179
    FUSE_LOOKUP        = 1,
180
    FUSE_FORGET        = 2, /* no reply */
181
@@ -XXX,XX +XXX,XX @@ struct fuse_in_header {
182
    uint32_t    uid;
183
    uint32_t    gid;
184
    uint32_t    pid;
185
-    uint32_t    padding;
186
+    uint16_t    total_extlen; /* length of extensions in 8byte units */
187
+    uint16_t    padding;
188
};
189
190
struct fuse_out_header {
191
@@ -XXX,XX +XXX,XX @@ struct fuse_secctx_header {
192
    uint32_t    nr_secctx;
193
};
194
195
+/**
196
+ * struct fuse_ext_header - extension header
197
+ * @size: total size of this extension including this header
198
+ * @type: type of extension
199
+ *
200
+ * This is made compatible with fuse_secctx_header by using type values >
201
+ * FUSE_MAX_NR_SECCTX
202
+ */
203
+struct fuse_ext_header {
204
+    uint32_t    size;
205
+    uint32_t    type;
206
+};
207
+
208
+/**
209
+ * struct fuse_supp_groups - Supplementary group extension
210
+ * @nr_groups: number of supplementary groups
211
+ * @groups: flexible array of group IDs
212
+ */
213
+struct fuse_supp_groups {
214
+    uint32_t    nr_groups;
215
+    uint32_t    groups[];
216
+};
217
+
218
#endif /* _LINUX_FUSE_H */
219
diff --git a/include/standard-headers/linux/pci_regs.h b/include/standard-headers/linux/pci_regs.h
220
index XXXXXXX..XXXXXXX 100644
221
--- a/include/standard-headers/linux/pci_regs.h
222
+++ b/include/standard-headers/linux/pci_regs.h
223
@@ -XXX,XX +XXX,XX @@
224
#define PCI_EXP_LNKCTL2_TX_MARGIN    0x0380 /* Transmit Margin */
225
#define PCI_EXP_LNKCTL2_HASD        0x0020 /* HW Autonomous Speed Disable */
226
#define PCI_EXP_LNKSTA2        0x32    /* Link Status 2 */
227
+#define PCI_EXP_LNKSTA2_FLIT        0x0400 /* Flit Mode Status */
228
#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2    0x32    /* end of v2 EPs w/ link */
229
#define PCI_EXP_SLTCAP2        0x34    /* Slot Capabilities 2 */
230
#define PCI_EXP_SLTCAP2_IBPD    0x00000001 /* In-band PD Disable Supported */
231
diff --git a/include/standard-headers/linux/vhost_types.h b/include/standard-headers/linux/vhost_types.h
232
index XXXXXXX..XXXXXXX 100644
233
--- a/include/standard-headers/linux/vhost_types.h
234
+++ b/include/standard-headers/linux/vhost_types.h
235
@@ -XXX,XX +XXX,XX @@ struct vhost_vdpa_iova_range {
236
#define VHOST_BACKEND_F_IOTLB_ASID 0x3
237
/* Device can be suspended */
238
#define VHOST_BACKEND_F_SUSPEND 0x4
239
+/* Device can be resumed */
240
+#define VHOST_BACKEND_F_RESUME 0x5
241
242
#endif
243
diff --git a/include/standard-headers/linux/virtio_blk.h b/include/standard-headers/linux/virtio_blk.h
244
index XXXXXXX..XXXXXXX 100644
245
--- a/include/standard-headers/linux/virtio_blk.h
246
+++ b/include/standard-headers/linux/virtio_blk.h
247
@@ -XXX,XX +XXX,XX @@
248
#define VIRTIO_BLK_F_DISCARD    13    /* DISCARD is supported */
249
#define VIRTIO_BLK_F_WRITE_ZEROES    14    /* WRITE ZEROES is supported */
250
#define VIRTIO_BLK_F_SECURE_ERASE    16 /* Secure Erase is supported */
251
+#define VIRTIO_BLK_F_ZONED        17    /* Zoned block device */
252
253
/* Legacy feature bits */
254
#ifndef VIRTIO_BLK_NO_LEGACY
255
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_config {
256
    /* Secure erase commands must be aligned to this number of sectors. */
257
    __virtio32 secure_erase_sector_alignment;
258
259
+    /* Zoned block device characteristics (if VIRTIO_BLK_F_ZONED) */
260
+    struct virtio_blk_zoned_characteristics {
261
+        __virtio32 zone_sectors;
262
+        __virtio32 max_open_zones;
263
+        __virtio32 max_active_zones;
264
+        __virtio32 max_append_sectors;
265
+        __virtio32 write_granularity;
266
+        uint8_t model;
267
+        uint8_t unused2[3];
268
+    } zoned;
269
} QEMU_PACKED;
270
271
/*
272
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_config {
273
/* Secure erase command */
274
#define VIRTIO_BLK_T_SECURE_ERASE    14
275
276
+/* Zone append command */
277
+#define VIRTIO_BLK_T_ZONE_APPEND 15
278
+
279
+/* Report zones command */
280
+#define VIRTIO_BLK_T_ZONE_REPORT 16
281
+
282
+/* Open zone command */
283
+#define VIRTIO_BLK_T_ZONE_OPEN 18
284
+
285
+/* Close zone command */
286
+#define VIRTIO_BLK_T_ZONE_CLOSE 20
287
+
288
+/* Finish zone command */
289
+#define VIRTIO_BLK_T_ZONE_FINISH 22
290
+
291
+/* Reset zone command */
292
+#define VIRTIO_BLK_T_ZONE_RESET 24
293
+
294
+/* Reset All zones command */
295
+#define VIRTIO_BLK_T_ZONE_RESET_ALL 26
296
+
297
#ifndef VIRTIO_BLK_NO_LEGACY
298
/* Barrier before this op. */
299
#define VIRTIO_BLK_T_BARRIER    0x80000000
300
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_outhdr {
301
    __virtio64 sector;
302
};
303
304
+/*
305
+ * Supported zoned device models.
306
+ */
307
+
308
+/* Regular block device */
309
+#define VIRTIO_BLK_Z_NONE 0
310
+/* Host-managed zoned device */
311
+#define VIRTIO_BLK_Z_HM 1
312
+/* Host-aware zoned device */
313
+#define VIRTIO_BLK_Z_HA 2
314
+
315
+/*
316
+ * Zone descriptor. A part of VIRTIO_BLK_T_ZONE_REPORT command reply.
317
+ */
318
+struct virtio_blk_zone_descriptor {
319
+    /* Zone capacity */
320
+    __virtio64 z_cap;
321
+    /* The starting sector of the zone */
322
+    __virtio64 z_start;
323
+    /* Zone write pointer position in sectors */
324
+    __virtio64 z_wp;
325
+    /* Zone type */
326
+    uint8_t z_type;
327
+    /* Zone state */
328
+    uint8_t z_state;
329
+    uint8_t reserved[38];
330
+};
331
+
332
+struct virtio_blk_zone_report {
333
+    __virtio64 nr_zones;
334
+    uint8_t reserved[56];
335
+    struct virtio_blk_zone_descriptor zones[];
336
+};
337
+
338
+/*
339
+ * Supported zone types.
340
+ */
341
+
342
+/* Conventional zone */
343
+#define VIRTIO_BLK_ZT_CONV 1
344
+/* Sequential Write Required zone */
345
+#define VIRTIO_BLK_ZT_SWR 2
346
+/* Sequential Write Preferred zone */
347
+#define VIRTIO_BLK_ZT_SWP 3
348
+
349
+/*
350
+ * Zone states that are available for zones of all types.
351
+ */
352
+
353
+/* Not a write pointer (conventional zones only) */
354
+#define VIRTIO_BLK_ZS_NOT_WP 0
355
+/* Empty */
356
+#define VIRTIO_BLK_ZS_EMPTY 1
357
+/* Implicitly Open */
358
+#define VIRTIO_BLK_ZS_IOPEN 2
359
+/* Explicitly Open */
360
+#define VIRTIO_BLK_ZS_EOPEN 3
361
+/* Closed */
362
+#define VIRTIO_BLK_ZS_CLOSED 4
363
+/* Read-Only */
364
+#define VIRTIO_BLK_ZS_RDONLY 13
365
+/* Full */
366
+#define VIRTIO_BLK_ZS_FULL 14
367
+/* Offline */
368
+#define VIRTIO_BLK_ZS_OFFLINE 15
369
+
370
/* Unmap this range (only valid for write zeroes command) */
371
#define VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP    0x00000001
372
373
@@ -XXX,XX +XXX,XX @@ struct virtio_scsi_inhdr {
374
#define VIRTIO_BLK_S_OK        0
375
#define VIRTIO_BLK_S_IOERR    1
376
#define VIRTIO_BLK_S_UNSUPP    2
377
+
378
+/* Error codes that are specific to zoned block devices */
379
+#define VIRTIO_BLK_S_ZONE_INVALID_CMD 3
380
+#define VIRTIO_BLK_S_ZONE_UNALIGNED_WP 4
381
+#define VIRTIO_BLK_S_ZONE_OPEN_RESOURCE 5
382
+#define VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE 6
383
+
384
#endif /* _LINUX_VIRTIO_BLK_H */
385
diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
386
index XXXXXXX..XXXXXXX 100644
387
--- a/linux-headers/asm-arm64/kvm.h
388
+++ b/linux-headers/asm-arm64/kvm.h
389
@@ -XXX,XX +XXX,XX @@ struct kvm_regs {
390
#define KVM_ARM_VCPU_SVE        4 /* enable SVE for this CPU */
391
#define KVM_ARM_VCPU_PTRAUTH_ADDRESS    5 /* VCPU uses address authentication */
392
#define KVM_ARM_VCPU_PTRAUTH_GENERIC    6 /* VCPU uses generic authentication */
393
+#define KVM_ARM_VCPU_HAS_EL2        7 /* Support nested virtualization */
394
395
struct kvm_vcpu_init {
396
    __u32 target;
397
diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
398
index XXXXXXX..XXXXXXX 100644
399
--- a/linux-headers/asm-x86/kvm.h
400
+++ b/linux-headers/asm-x86/kvm.h
401
@@ -XXX,XX +XXX,XX @@
402
403
#include <linux/types.h>
404
#include <linux/ioctl.h>
405
+#include <linux/stddef.h>
406
407
#define KVM_PIO_PAGE_OFFSET 1
408
#define KVM_COALESCED_MMIO_PAGE_OFFSET 2
409
@@ -XXX,XX +XXX,XX @@ struct kvm_nested_state {
410
     * KVM_{GET,PUT}_NESTED_STATE ioctl values.
411
     */
412
    union {
413
-        struct kvm_vmx_nested_state_data vmx[0];
414
-        struct kvm_svm_nested_state_data svm[0];
415
+        __DECLARE_FLEX_ARRAY(struct kvm_vmx_nested_state_data, vmx);
416
+        __DECLARE_FLEX_ARRAY(struct kvm_svm_nested_state_data, svm);
417
    } data;
418
};
419
420
@@ -XXX,XX +XXX,XX @@ struct kvm_pmu_event_filter {
421
#define KVM_PMU_EVENT_ALLOW 0
422
#define KVM_PMU_EVENT_DENY 1
423
424
+#define KVM_PMU_EVENT_FLAG_MASKED_EVENTS BIT(0)
425
+#define KVM_PMU_EVENT_FLAGS_VALID_MASK (KVM_PMU_EVENT_FLAG_MASKED_EVENTS)
426
+
427
+/*
428
+ * Masked event layout.
429
+ * Bits Description
430
+ * ---- -----------
431
+ * 7:0 event select (low bits)
432
+ * 15:8 umask match
433
+ * 31:16 unused
434
+ * 35:32 event select (high bits)
435
+ * 36:54 unused
436
+ * 55 exclude bit
437
+ * 63:56 umask mask
438
+ */
439
+
440
+#define KVM_PMU_ENCODE_MASKED_ENTRY(event_select, mask, match, exclude) \
441
+    (((event_select) & 0xFFULL) | (((event_select) & 0XF00ULL) << 24) | \
442
+    (((mask) & 0xFFULL) << 56) | \
443
+    (((match) & 0xFFULL) << 8) | \
444
+    ((__u64)(!!(exclude)) << 55))
445
+
446
+#define KVM_PMU_MASKED_ENTRY_EVENT_SELECT \
447
+    (GENMASK_ULL(7, 0) | GENMASK_ULL(35, 32))
448
+#define KVM_PMU_MASKED_ENTRY_UMASK_MASK        (GENMASK_ULL(63, 56))
449
+#define KVM_PMU_MASKED_ENTRY_UMASK_MATCH    (GENMASK_ULL(15, 8))
450
+#define KVM_PMU_MASKED_ENTRY_EXCLUDE        (BIT_ULL(55))
451
+#define KVM_PMU_MASKED_ENTRY_UMASK_MASK_SHIFT    (56)
452
+
453
/* for KVM_{GET,SET,HAS}_DEVICE_ATTR */
454
#define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */
455
#define KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */
456
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
457
index XXXXXXX..XXXXXXX 100644
458
--- a/linux-headers/linux/kvm.h
459
+++ b/linux-headers/linux/kvm.h
460
@@ -XXX,XX +XXX,XX @@ struct kvm_s390_mem_op {
461
        struct {
462
            __u8 ar;    /* the access register number */
463
            __u8 key;    /* access key, ignored if flag unset */
464
+            __u8 pad1[6];    /* ignored */
465
+            __u64 old_addr;    /* ignored if cmpxchg flag unset */
466
        };
467
        __u32 sida_offset; /* offset into the sida */
468
        __u8 reserved[32]; /* ignored */
469
@@ -XXX,XX +XXX,XX @@ struct kvm_s390_mem_op {
470
#define KVM_S390_MEMOP_SIDA_WRITE    3
471
#define KVM_S390_MEMOP_ABSOLUTE_READ    4
472
#define KVM_S390_MEMOP_ABSOLUTE_WRITE    5
473
+#define KVM_S390_MEMOP_ABSOLUTE_CMPXCHG    6
474
+
475
/* flags for kvm_s390_mem_op->flags */
476
#define KVM_S390_MEMOP_F_CHECK_ONLY        (1ULL << 0)
477
#define KVM_S390_MEMOP_F_INJECT_EXCEPTION    (1ULL << 1)
478
#define KVM_S390_MEMOP_F_SKEY_PROTECTION    (1ULL << 2)
479
480
+/* flags specifying extension support via KVM_CAP_S390_MEM_OP_EXTENSION */
481
+#define KVM_S390_MEMOP_EXTENSION_CAP_BASE    (1 << 0)
482
+#define KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG    (1 << 1)
483
+
484
/* for KVM_INTERRUPT */
485
struct kvm_interrupt {
486
    /* in */
487
@@ -XXX,XX +XXX,XX @@ struct kvm_ppc_resize_hpt {
488
#define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
489
#define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224
490
#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225
491
+#define KVM_CAP_PMU_EVENT_MASKED_EVENTS 226
492
493
#ifdef KVM_CAP_IRQ_ROUTING
494
495
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
496
index XXXXXXX..XXXXXXX 100644
497
--- a/linux-headers/linux/vfio.h
498
+++ b/linux-headers/linux/vfio.h
499
@@ -XXX,XX +XXX,XX @@
500
/* Supports VFIO_DMA_UNMAP_FLAG_ALL */
501
#define VFIO_UNMAP_ALL            9
502
503
-/* Supports the vaddr flag for DMA map and unmap */
504
+/*
505
+ * Supports the vaddr flag for DMA map and unmap. Not supported for mediated
506
+ * devices, so this capability is subject to change as groups are added or
507
+ * removed.
508
+ */
509
#define VFIO_UPDATE_VADDR        10
510
511
/*
512
@@ -XXX,XX +XXX,XX @@ struct vfio_iommu_type1_info_dma_avail {
513
* Map process virtual addresses to IO virtual addresses using the
514
* provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
515
*
516
- * If flags & VFIO_DMA_MAP_FLAG_VADDR, update the base vaddr for iova, and
517
- * unblock translation of host virtual addresses in the iova range. The vaddr
518
+ * If flags & VFIO_DMA_MAP_FLAG_VADDR, update the base vaddr for iova. The vaddr
519
* must have previously been invalidated with VFIO_DMA_UNMAP_FLAG_VADDR. To
520
* maintain memory consistency within the user application, the updated vaddr
521
* must address the same memory object as originally mapped. Failure to do so
522
@@ -XXX,XX +XXX,XX @@ struct vfio_bitmap {
523
* must be 0. This cannot be combined with the get-dirty-bitmap flag.
524
*
525
* If flags & VFIO_DMA_UNMAP_FLAG_VADDR, do not unmap, but invalidate host
526
- * virtual addresses in the iova range. Tasks that attempt to translate an
527
- * iova's vaddr will block. DMA to already-mapped pages continues. This
528
- * cannot be combined with the get-dirty-bitmap flag.
529
+ * virtual addresses in the iova range. DMA to already-mapped pages continues.
530
+ * Groups may not be added to the container while any addresses are invalid.
531
+ * This cannot be combined with the get-dirty-bitmap flag.
532
*/
533
struct vfio_iommu_type1_dma_unmap {
534
    __u32    argsz;
535
diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
536
index XXXXXXX..XXXXXXX 100644
537
--- a/linux-headers/linux/vhost.h
538
+++ b/linux-headers/linux/vhost.h
539
@@ -XXX,XX +XXX,XX @@
540
*/
541
#define VHOST_VDPA_SUSPEND        _IO(VHOST_VIRTIO, 0x7D)
542
543
+/* Resume a device so it can resume processing virtqueue requests
544
+ *
545
+ * After the return of this ioctl the device will have restored all the
546
+ * necessary states and it is fully operational to continue processing the
547
+ * virtqueue descriptors.
548
+ */
549
+#define VHOST_VDPA_RESUME        _IO(VHOST_VIRTIO, 0x7E)
550
+
551
#endif
552
--
553
2.40.0
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
From: Stefano Garzarella <sgarzare@redhat.com>
2
2
3
This patch extends virtio-blk emulation to handle zoned device commands
3
Some virtio-blk drivers (e.g. virtio-blk-vhost-vdpa) supports the fd
4
by calling the new block layer APIs to perform zoned device I/O on
4
passing. Let's expose this to the user, so the management layer
5
behalf of the guest. It supports Report Zone, four zone oparations (open,
5
can pass the file descriptor of an already opened path.
6
close, finish, reset), and Append Zone.
7
6
8
The VIRTIO_BLK_F_ZONED feature bit will only be set if the host does
7
If the libblkio virtio-blk driver supports fd passing, let's always
9
support zoned block devices. Regular block devices(conventional zones)
8
use qemu_open() to open the `path`, so we can handle fd passing
10
will not be set.
9
from the management layer through the "/dev/fdset/N" special path.
11
10
12
The guest os can use blktests, fio to test those commands on zoned devices.
11
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Furthermore, using zonefs to test zone append write is also supported.
12
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
13
Message-id: 20230530071941.8954-2-sgarzare@redhat.com
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
15
---
16
block/blkio.c | 53 ++++++++++++++++++++++++++++++++++++++++++---------
17
1 file changed, 44 insertions(+), 9 deletions(-)
14
18
15
Signed-off-by: Sam Li <faithilikerun@gmail.com>
19
diff --git a/block/blkio.c b/block/blkio.c
16
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
17
Message-Id: <20230407082528.18841-3-faithilikerun@gmail.com>
18
---
19
hw/block/virtio-blk-common.c | 2 +
20
hw/block/virtio-blk.c | 389 +++++++++++++++++++++++++++++++++++
21
hw/virtio/virtio-qmp.c | 2 +
22
3 files changed, 393 insertions(+)
23
24
diff --git a/hw/block/virtio-blk-common.c b/hw/block/virtio-blk-common.c
25
index XXXXXXX..XXXXXXX 100644
20
index XXXXXXX..XXXXXXX 100644
26
--- a/hw/block/virtio-blk-common.c
21
--- a/block/blkio.c
27
+++ b/hw/block/virtio-blk-common.c
22
+++ b/block/blkio.c
28
@@ -XXX,XX +XXX,XX @@ static const VirtIOFeature feature_sizes[] = {
23
@@ -XXX,XX +XXX,XX @@ static int blkio_virtio_blk_common_open(BlockDriverState *bs,
29
.end = endof(struct virtio_blk_config, discard_sector_alignment)},
24
{
30
{.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES,
25
const char *path = qdict_get_try_str(options, "path");
31
.end = endof(struct virtio_blk_config, write_zeroes_may_unmap)},
26
BDRVBlkioState *s = bs->opaque;
32
+ {.flags = 1ULL << VIRTIO_BLK_F_ZONED,
27
- int ret;
33
+ .end = endof(struct virtio_blk_config, zoned)},
28
+ bool fd_supported = false;
34
{}
29
+ int fd, ret;
35
};
30
36
31
if (!path) {
37
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
32
error_setg(errp, "missing 'path' option");
38
index XXXXXXX..XXXXXXX 100644
33
return -EINVAL;
39
--- a/hw/block/virtio-blk.c
34
}
40
+++ b/hw/block/virtio-blk.c
35
41
@@ -XXX,XX +XXX,XX @@
36
- ret = blkio_set_str(s->blkio, "path", path);
42
#include "qemu/module.h"
37
- qdict_del(options, "path");
43
#include "qemu/error-report.h"
38
- if (ret < 0) {
44
#include "qemu/main-loop.h"
39
- error_setg_errno(errp, -ret, "failed to set path: %s",
45
+#include "block/block_int.h"
40
- blkio_get_error_msg());
46
#include "trace.h"
41
- return ret;
47
#include "hw/block/block.h"
42
- }
48
#include "hw/qdev-properties.h"
43
-
49
@@ -XXX,XX +XXX,XX @@ err:
44
if (!(flags & BDRV_O_NOCACHE)) {
50
return err_status;
45
error_setg(errp, "cache.direct=off is not supported");
51
}
46
return -EINVAL;
52
47
}
53
+typedef struct ZoneCmdData {
54
+ VirtIOBlockReq *req;
55
+ struct iovec *in_iov;
56
+ unsigned in_num;
57
+ union {
58
+ struct {
59
+ unsigned int nr_zones;
60
+ BlockZoneDescriptor *zones;
61
+ } zone_report_data;
62
+ struct {
63
+ int64_t offset;
64
+ } zone_append_data;
65
+ };
66
+} ZoneCmdData;
67
+
48
+
68
+/*
49
+ if (blkio_get_int(s->blkio, "fd", &fd) == 0) {
69
+ * check zoned_request: error checking before issuing requests. If all checks
50
+ fd_supported = true;
70
+ * passed, return true.
71
+ * append: true if only zone append requests issued.
72
+ */
73
+static bool check_zoned_request(VirtIOBlock *s, int64_t offset, int64_t len,
74
+ bool append, uint8_t *status) {
75
+ BlockDriverState *bs = blk_bs(s->blk);
76
+ int index;
77
+
78
+ if (!virtio_has_feature(s->host_features, VIRTIO_BLK_F_ZONED)) {
79
+ *status = VIRTIO_BLK_S_UNSUPP;
80
+ return false;
81
+ }
51
+ }
82
+
52
+
83
+ if (offset < 0 || len < 0 || len > (bs->total_sectors << BDRV_SECTOR_BITS)
53
+ /*
84
+ || offset > (bs->total_sectors << BDRV_SECTOR_BITS) - len) {
54
+ * If the libblkio driver supports fd passing, let's always use qemu_open()
85
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
55
+ * to open the `path`, so we can handle fd passing from the management
86
+ return false;
56
+ * layer through the "/dev/fdset/N" special path.
87
+ }
57
+ */
58
+ if (fd_supported) {
59
+ int open_flags;
88
+
60
+
89
+ if (append) {
61
+ if (flags & BDRV_O_RDWR) {
90
+ if (bs->bl.write_granularity) {
62
+ open_flags = O_RDWR;
91
+ if ((offset % bs->bl.write_granularity) != 0) {
63
+ } else {
92
+ *status = VIRTIO_BLK_S_ZONE_UNALIGNED_WP;
64
+ open_flags = O_RDONLY;
93
+ return false;
94
+ }
95
+ }
65
+ }
96
+
66
+
97
+ index = offset / bs->bl.zone_size;
67
+ fd = qemu_open(path, open_flags, errp);
98
+ if (BDRV_ZT_IS_CONV(bs->wps->wp[index])) {
68
+ if (fd < 0) {
99
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
69
+ return -EINVAL;
100
+ return false;
101
+ }
70
+ }
102
+
71
+
103
+ if (len / 512 > bs->bl.max_append_sectors) {
72
+ ret = blkio_set_int(s->blkio, "fd", fd);
104
+ if (bs->bl.max_append_sectors == 0) {
73
+ if (ret < 0) {
105
+ *status = VIRTIO_BLK_S_UNSUPP;
74
+ error_setg_errno(errp, -ret, "failed to set fd: %s",
106
+ } else {
75
+ blkio_get_error_msg());
107
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
76
+ qemu_close(fd);
108
+ }
77
+ return ret;
109
+ return false;
110
+ }
78
+ }
111
+ }
79
+ } else {
112
+ return true;
80
+ ret = blkio_set_str(s->blkio, "path", path);
113
+}
81
+ if (ret < 0) {
114
+
82
+ error_setg_errno(errp, -ret, "failed to set path: %s",
115
+static void virtio_blk_zone_report_complete(void *opaque, int ret)
83
+ blkio_get_error_msg());
116
+{
84
+ return ret;
117
+ ZoneCmdData *data = opaque;
118
+ VirtIOBlockReq *req = data->req;
119
+ VirtIOBlock *s = req->dev;
120
+ VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
121
+ struct iovec *in_iov = data->in_iov;
122
+ unsigned in_num = data->in_num;
123
+ int64_t zrp_size, n, j = 0;
124
+ int64_t nz = data->zone_report_data.nr_zones;
125
+ int8_t err_status = VIRTIO_BLK_S_OK;
126
+
127
+ if (ret) {
128
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
129
+ goto out;
130
+ }
131
+
132
+ struct virtio_blk_zone_report zrp_hdr = (struct virtio_blk_zone_report) {
133
+ .nr_zones = cpu_to_le64(nz),
134
+ };
135
+ zrp_size = sizeof(struct virtio_blk_zone_report)
136
+ + sizeof(struct virtio_blk_zone_descriptor) * nz;
137
+ n = iov_from_buf(in_iov, in_num, 0, &zrp_hdr, sizeof(zrp_hdr));
138
+ if (n != sizeof(zrp_hdr)) {
139
+ virtio_error(vdev, "Driver provided input buffer that is too small!");
140
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
141
+ goto out;
142
+ }
143
+
144
+ for (size_t i = sizeof(zrp_hdr); i < zrp_size;
145
+ i += sizeof(struct virtio_blk_zone_descriptor), ++j) {
146
+ struct virtio_blk_zone_descriptor desc =
147
+ (struct virtio_blk_zone_descriptor) {
148
+ .z_start = cpu_to_le64(data->zone_report_data.zones[j].start
149
+ >> BDRV_SECTOR_BITS),
150
+ .z_cap = cpu_to_le64(data->zone_report_data.zones[j].cap
151
+ >> BDRV_SECTOR_BITS),
152
+ .z_wp = cpu_to_le64(data->zone_report_data.zones[j].wp
153
+ >> BDRV_SECTOR_BITS),
154
+ };
155
+
156
+ switch (data->zone_report_data.zones[j].type) {
157
+ case BLK_ZT_CONV:
158
+ desc.z_type = VIRTIO_BLK_ZT_CONV;
159
+ break;
160
+ case BLK_ZT_SWR:
161
+ desc.z_type = VIRTIO_BLK_ZT_SWR;
162
+ break;
163
+ case BLK_ZT_SWP:
164
+ desc.z_type = VIRTIO_BLK_ZT_SWP;
165
+ break;
166
+ default:
167
+ g_assert_not_reached();
168
+ }
169
+
170
+ switch (data->zone_report_data.zones[j].state) {
171
+ case BLK_ZS_RDONLY:
172
+ desc.z_state = VIRTIO_BLK_ZS_RDONLY;
173
+ break;
174
+ case BLK_ZS_OFFLINE:
175
+ desc.z_state = VIRTIO_BLK_ZS_OFFLINE;
176
+ break;
177
+ case BLK_ZS_EMPTY:
178
+ desc.z_state = VIRTIO_BLK_ZS_EMPTY;
179
+ break;
180
+ case BLK_ZS_CLOSED:
181
+ desc.z_state = VIRTIO_BLK_ZS_CLOSED;
182
+ break;
183
+ case BLK_ZS_FULL:
184
+ desc.z_state = VIRTIO_BLK_ZS_FULL;
185
+ break;
186
+ case BLK_ZS_EOPEN:
187
+ desc.z_state = VIRTIO_BLK_ZS_EOPEN;
188
+ break;
189
+ case BLK_ZS_IOPEN:
190
+ desc.z_state = VIRTIO_BLK_ZS_IOPEN;
191
+ break;
192
+ case BLK_ZS_NOT_WP:
193
+ desc.z_state = VIRTIO_BLK_ZS_NOT_WP;
194
+ break;
195
+ default:
196
+ g_assert_not_reached();
197
+ }
198
+
199
+ /* TODO: it takes O(n^2) time complexity. Optimizations required. */
200
+ n = iov_from_buf(in_iov, in_num, i, &desc, sizeof(desc));
201
+ if (n != sizeof(desc)) {
202
+ virtio_error(vdev, "Driver provided input buffer "
203
+ "for descriptors that is too small!");
204
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
205
+ }
85
+ }
206
+ }
86
+ }
207
+
87
+
208
+out:
88
+ qdict_del(options, "path");
209
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
210
+ virtio_blk_req_complete(req, err_status);
211
+ virtio_blk_free_request(req);
212
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
213
+ g_free(data->zone_report_data.zones);
214
+ g_free(data);
215
+}
216
+
89
+
217
+static void virtio_blk_handle_zone_report(VirtIOBlockReq *req,
90
return 0;
218
+ struct iovec *in_iov,
219
+ unsigned in_num)
220
+{
221
+ VirtIOBlock *s = req->dev;
222
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
223
+ unsigned int nr_zones;
224
+ ZoneCmdData *data;
225
+ int64_t zone_size, offset;
226
+ uint8_t err_status;
227
+
228
+ if (req->in_len < sizeof(struct virtio_blk_inhdr) +
229
+ sizeof(struct virtio_blk_zone_report) +
230
+ sizeof(struct virtio_blk_zone_descriptor)) {
231
+ virtio_error(vdev, "in buffer too small for zone report");
232
+ return;
233
+ }
234
+
235
+ /* start byte offset of the zone report */
236
+ offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
237
+ if (!check_zoned_request(s, offset, 0, false, &err_status)) {
238
+ goto out;
239
+ }
240
+ nr_zones = (req->in_len - sizeof(struct virtio_blk_inhdr) -
241
+ sizeof(struct virtio_blk_zone_report)) /
242
+ sizeof(struct virtio_blk_zone_descriptor);
243
+
244
+ zone_size = sizeof(BlockZoneDescriptor) * nr_zones;
245
+ data = g_malloc(sizeof(ZoneCmdData));
246
+ data->req = req;
247
+ data->in_iov = in_iov;
248
+ data->in_num = in_num;
249
+ data->zone_report_data.nr_zones = nr_zones;
250
+ data->zone_report_data.zones = g_malloc(zone_size),
251
+
252
+ blk_aio_zone_report(s->blk, offset, &data->zone_report_data.nr_zones,
253
+ data->zone_report_data.zones,
254
+ virtio_blk_zone_report_complete, data);
255
+ return;
256
+out:
257
+ virtio_blk_req_complete(req, err_status);
258
+ virtio_blk_free_request(req);
259
+}
260
+
261
+static void virtio_blk_zone_mgmt_complete(void *opaque, int ret)
262
+{
263
+ VirtIOBlockReq *req = opaque;
264
+ VirtIOBlock *s = req->dev;
265
+ int8_t err_status = VIRTIO_BLK_S_OK;
266
+
267
+ if (ret) {
268
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
269
+ }
270
+
271
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
272
+ virtio_blk_req_complete(req, err_status);
273
+ virtio_blk_free_request(req);
274
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
275
+}
276
+
277
+static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
278
+{
279
+ VirtIOBlock *s = req->dev;
280
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
281
+ BlockDriverState *bs = blk_bs(s->blk);
282
+ int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
283
+ uint64_t len;
284
+ uint64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
285
+ uint8_t err_status = VIRTIO_BLK_S_OK;
286
+
287
+ uint32_t type = virtio_ldl_p(vdev, &req->out.type);
288
+ if (type == VIRTIO_BLK_T_ZONE_RESET_ALL) {
289
+ /* Entire drive capacity */
290
+ offset = 0;
291
+ len = capacity;
292
+ } else {
293
+ if (bs->bl.zone_size > capacity - offset) {
294
+ /* The zoned device allows the last smaller zone. */
295
+ len = capacity - bs->bl.zone_size * (bs->bl.nr_zones - 1);
296
+ } else {
297
+ len = bs->bl.zone_size;
298
+ }
299
+ }
300
+
301
+ if (!check_zoned_request(s, offset, len, false, &err_status)) {
302
+ goto out;
303
+ }
304
+
305
+ blk_aio_zone_mgmt(s->blk, op, offset, len,
306
+ virtio_blk_zone_mgmt_complete, req);
307
+
308
+ return 0;
309
+out:
310
+ virtio_blk_req_complete(req, err_status);
311
+ virtio_blk_free_request(req);
312
+ return err_status;
313
+}
314
+
315
+static void virtio_blk_zone_append_complete(void *opaque, int ret)
316
+{
317
+ ZoneCmdData *data = opaque;
318
+ VirtIOBlockReq *req = data->req;
319
+ VirtIOBlock *s = req->dev;
320
+ VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
321
+ int64_t append_sector, n;
322
+ uint8_t err_status = VIRTIO_BLK_S_OK;
323
+
324
+ if (ret) {
325
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
326
+ goto out;
327
+ }
328
+
329
+ virtio_stq_p(vdev, &append_sector,
330
+ data->zone_append_data.offset >> BDRV_SECTOR_BITS);
331
+ n = iov_from_buf(data->in_iov, data->in_num, 0, &append_sector,
332
+ sizeof(append_sector));
333
+ if (n != sizeof(append_sector)) {
334
+ virtio_error(vdev, "Driver provided input buffer less than size of "
335
+ "append_sector");
336
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
337
+ goto out;
338
+ }
339
+
340
+out:
341
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
342
+ virtio_blk_req_complete(req, err_status);
343
+ virtio_blk_free_request(req);
344
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
345
+ g_free(data);
346
+}
347
+
348
+static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
349
+ struct iovec *out_iov,
350
+ struct iovec *in_iov,
351
+ uint64_t out_num,
352
+ unsigned in_num) {
353
+ VirtIOBlock *s = req->dev;
354
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
355
+ uint8_t err_status = VIRTIO_BLK_S_OK;
356
+
357
+ int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
358
+ int64_t len = iov_size(out_iov, out_num);
359
+
360
+ if (!check_zoned_request(s, offset, len, true, &err_status)) {
361
+ goto out;
362
+ }
363
+
364
+ ZoneCmdData *data = g_malloc(sizeof(ZoneCmdData));
365
+ data->req = req;
366
+ data->in_iov = in_iov;
367
+ data->in_num = in_num;
368
+ data->zone_append_data.offset = offset;
369
+ qemu_iovec_init_external(&req->qiov, out_iov, out_num);
370
+ blk_aio_zone_append(s->blk, &data->zone_append_data.offset, &req->qiov, 0,
371
+ virtio_blk_zone_append_complete, data);
372
+ return 0;
373
+
374
+out:
375
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
376
+ virtio_blk_req_complete(req, err_status);
377
+ virtio_blk_free_request(req);
378
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
379
+ return err_status;
380
+}
381
+
382
static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
383
{
384
uint32_t type;
385
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
386
case VIRTIO_BLK_T_FLUSH:
387
virtio_blk_handle_flush(req, mrb);
388
break;
389
+ case VIRTIO_BLK_T_ZONE_REPORT:
390
+ virtio_blk_handle_zone_report(req, in_iov, in_num);
391
+ break;
392
+ case VIRTIO_BLK_T_ZONE_OPEN:
393
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_OPEN);
394
+ break;
395
+ case VIRTIO_BLK_T_ZONE_CLOSE:
396
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_CLOSE);
397
+ break;
398
+ case VIRTIO_BLK_T_ZONE_FINISH:
399
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_FINISH);
400
+ break;
401
+ case VIRTIO_BLK_T_ZONE_RESET:
402
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_RESET);
403
+ break;
404
+ case VIRTIO_BLK_T_ZONE_RESET_ALL:
405
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_RESET);
406
+ break;
407
case VIRTIO_BLK_T_SCSI_CMD:
408
virtio_blk_handle_scsi(req);
409
break;
410
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
411
virtio_blk_free_request(req);
412
break;
413
}
414
+ case VIRTIO_BLK_T_ZONE_APPEND & ~VIRTIO_BLK_T_OUT:
415
+ /*
416
+ * Passing out_iov/out_num and in_iov/in_num is not safe
417
+ * to access req->elem.out_sg directly because it may be
418
+ * modified by virtio_blk_handle_request().
419
+ */
420
+ virtio_blk_handle_zone_append(req, out_iov, in_iov, out_num, in_num);
421
+ break;
422
/*
423
* VIRTIO_BLK_T_DISCARD and VIRTIO_BLK_T_WRITE_ZEROES are defined with
424
* VIRTIO_BLK_T_OUT flag set. We masked this flag in the switch statement,
425
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
426
{
427
VirtIOBlock *s = VIRTIO_BLK(vdev);
428
BlockConf *conf = &s->conf.conf;
429
+ BlockDriverState *bs = blk_bs(s->blk);
430
struct virtio_blk_config blkcfg;
431
uint64_t capacity;
432
int64_t length;
433
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
434
blkcfg.write_zeroes_may_unmap = 1;
435
virtio_stl_p(vdev, &blkcfg.max_write_zeroes_seg, 1);
436
}
437
+ if (bs->bl.zoned != BLK_Z_NONE) {
438
+ switch (bs->bl.zoned) {
439
+ case BLK_Z_HM:
440
+ blkcfg.zoned.model = VIRTIO_BLK_Z_HM;
441
+ break;
442
+ case BLK_Z_HA:
443
+ blkcfg.zoned.model = VIRTIO_BLK_Z_HA;
444
+ break;
445
+ default:
446
+ g_assert_not_reached();
447
+ }
448
+
449
+ virtio_stl_p(vdev, &blkcfg.zoned.zone_sectors,
450
+ bs->bl.zone_size / 512);
451
+ virtio_stl_p(vdev, &blkcfg.zoned.max_active_zones,
452
+ bs->bl.max_active_zones);
453
+ virtio_stl_p(vdev, &blkcfg.zoned.max_open_zones,
454
+ bs->bl.max_open_zones);
455
+ virtio_stl_p(vdev, &blkcfg.zoned.write_granularity, blk_size);
456
+ virtio_stl_p(vdev, &blkcfg.zoned.max_append_sectors,
457
+ bs->bl.max_append_sectors);
458
+ } else {
459
+ blkcfg.zoned.model = VIRTIO_BLK_Z_NONE;
460
+ }
461
memcpy(config, &blkcfg, s->config_size);
462
}
91
}
463
92
464
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
465
VirtIODevice *vdev = VIRTIO_DEVICE(dev);
466
VirtIOBlock *s = VIRTIO_BLK(dev);
467
VirtIOBlkConf *conf = &s->conf;
468
+ BlockDriverState *bs = blk_bs(conf->conf.blk);
469
Error *err = NULL;
470
unsigned i;
471
472
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
473
return;
474
}
475
476
+ if (bs->bl.zoned != BLK_Z_NONE) {
477
+ virtio_add_feature(&s->host_features, VIRTIO_BLK_F_ZONED);
478
+ if (bs->bl.zoned == BLK_Z_HM) {
479
+ virtio_clear_feature(&s->host_features, VIRTIO_BLK_F_DISCARD);
480
+ }
481
+ }
482
+
483
if (virtio_has_feature(s->host_features, VIRTIO_BLK_F_DISCARD) &&
484
(!conf->max_discard_sectors ||
485
conf->max_discard_sectors > BDRV_REQUEST_MAX_SECTORS)) {
486
diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c
487
index XXXXXXX..XXXXXXX 100644
488
--- a/hw/virtio/virtio-qmp.c
489
+++ b/hw/virtio/virtio-qmp.c
490
@@ -XXX,XX +XXX,XX @@ static const qmp_virtio_feature_map_t virtio_blk_feature_map[] = {
491
"VIRTIO_BLK_F_DISCARD: Discard command supported"),
492
FEATURE_ENTRY(VIRTIO_BLK_F_WRITE_ZEROES, \
493
"VIRTIO_BLK_F_WRITE_ZEROES: Write zeroes command supported"),
494
+ FEATURE_ENTRY(VIRTIO_BLK_F_ZONED, \
495
+ "VIRTIO_BLK_F_ZONED: Zoned block devices"),
496
#ifndef VIRTIO_BLK_NO_LEGACY
497
FEATURE_ENTRY(VIRTIO_BLK_F_BARRIER, \
498
"VIRTIO_BLK_F_BARRIER: Request barriers supported"),
499
--
93
--
500
2.40.0
94
2.40.1
diff view generated by jsdifflib
1
From: Sam Li <faithilikerun@gmail.com>
1
From: Stefano Garzarella <sgarzare@redhat.com>
2
2
3
Taking account of the new zone append write operation for zoned devices,
3
The virtio-blk-vhost-vdpa driver in libblkio 1.3.0 supports the fd
4
BLOCK_ACCT_ZONE_APPEND enum is introduced as other I/O request type (read,
4
passing through the new 'fd' property.
5
write, flush).
6
5
7
Signed-off-by: Sam Li <faithilikerun@gmail.com>
6
Since now we are using qemu_open() on '@path' if the virtio-blk driver
7
supports the fd passing, let's announce it.
8
In this way, the management layer can pass the file descriptor of an
9
already opened vhost-vdpa character device. This is useful especially
10
when the device can only be accessed with certain privileges.
11
12
Add the '@fdset' feature only when the virtio-blk-vhost-vdpa driver
13
in libblkio supports it.
14
15
Suggested-by: Markus Armbruster <armbru@redhat.com>
16
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
17
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
18
Message-id: 20230530071941.8954-3-sgarzare@redhat.com
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
19
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Message-Id: <20230407082528.18841-4-faithilikerun@gmail.com>
10
---
20
---
11
qapi/block-core.json | 68 ++++++++++++++++++++++++++++++++------
21
qapi/block-core.json | 6 ++++++
12
qapi/block.json | 4 +++
22
meson.build | 4 ++++
13
include/block/accounting.h | 1 +
23
2 files changed, 10 insertions(+)
14
block/qapi-sysemu.c | 11 ++++++
15
block/qapi.c | 18 ++++++++++
16
hw/block/virtio-blk.c | 4 +++
17
6 files changed, 95 insertions(+), 11 deletions(-)
18
24
19
diff --git a/qapi/block-core.json b/qapi/block-core.json
25
diff --git a/qapi/block-core.json b/qapi/block-core.json
20
index XXXXXXX..XXXXXXX 100644
26
index XXXXXXX..XXXXXXX 100644
21
--- a/qapi/block-core.json
27
--- a/qapi/block-core.json
22
+++ b/qapi/block-core.json
28
+++ b/qapi/block-core.json
23
@@ -XXX,XX +XXX,XX @@
29
@@ -XXX,XX +XXX,XX @@
24
# @min_wr_latency_ns: Minimum latency of write operations in the
25
# defined interval, in nanoseconds.
26
#
30
#
27
+# @min_zone_append_latency_ns: Minimum latency of zone append operations
31
# @path: path to the vhost-vdpa character device.
28
+# in the defined interval, in nanoseconds
32
#
29
+# (since 8.1)
33
+# Features:
34
+# @fdset: Member @path supports the special "/dev/fdset/N" path
35
+# (since 8.1)
30
+#
36
+#
31
# @min_flush_latency_ns: Minimum latency of flush operations in the
37
# Since: 7.2
32
# defined interval, in nanoseconds.
33
#
34
@@ -XXX,XX +XXX,XX @@
35
# @max_wr_latency_ns: Maximum latency of write operations in the
36
# defined interval, in nanoseconds.
37
#
38
+# @max_zone_append_latency_ns: Maximum latency of zone append operations
39
+# in the defined interval, in nanoseconds
40
+# (since 8.1)
41
+#
42
# @max_flush_latency_ns: Maximum latency of flush operations in the
43
# defined interval, in nanoseconds.
44
#
45
@@ -XXX,XX +XXX,XX @@
46
# @avg_wr_latency_ns: Average latency of write operations in the
47
# defined interval, in nanoseconds.
48
#
49
+# @avg_zone_append_latency_ns: Average latency of zone append operations
50
+# in the defined interval, in nanoseconds
51
+# (since 8.1)
52
+#
53
# @avg_flush_latency_ns: Average latency of flush operations in the
54
# defined interval, in nanoseconds.
55
#
56
@@ -XXX,XX +XXX,XX @@
57
# @avg_wr_queue_depth: Average number of pending write operations
58
# in the defined interval.
59
#
60
+# @avg_zone_append_queue_depth: Average number of pending zone append
61
+# operations in the defined interval
62
+# (since 8.1).
63
+#
64
# Since: 2.5
65
##
38
##
66
{ 'struct': 'BlockDeviceTimedStats',
39
{ 'struct': 'BlockdevOptionsVirtioBlkVhostVdpa',
67
'data': { 'interval_length': 'int', 'min_rd_latency_ns': 'int',
40
'data': { 'path': 'str' },
68
'max_rd_latency_ns': 'int', 'avg_rd_latency_ns': 'int',
41
+ 'features': [ { 'name' :'fdset',
69
'min_wr_latency_ns': 'int', 'max_wr_latency_ns': 'int',
42
+ 'if': 'CONFIG_BLKIO_VHOST_VDPA_FD' } ],
70
- 'avg_wr_latency_ns': 'int', 'min_flush_latency_ns': 'int',
43
'if': 'CONFIG_BLKIO' }
71
- 'max_flush_latency_ns': 'int', 'avg_flush_latency_ns': 'int',
72
- 'avg_rd_queue_depth': 'number', 'avg_wr_queue_depth': 'number' } }
73
+ 'avg_wr_latency_ns': 'int', 'min_zone_append_latency_ns': 'int',
74
+ 'max_zone_append_latency_ns': 'int',
75
+ 'avg_zone_append_latency_ns': 'int',
76
+ 'min_flush_latency_ns': 'int', 'max_flush_latency_ns': 'int',
77
+ 'avg_flush_latency_ns': 'int', 'avg_rd_queue_depth': 'number',
78
+ 'avg_wr_queue_depth': 'number',
79
+ 'avg_zone_append_queue_depth': 'number' } }
80
44
81
##
45
##
82
# @BlockDeviceStats:
46
diff --git a/meson.build b/meson.build
83
@@ -XXX,XX +XXX,XX @@
84
#
85
# @wr_bytes: The number of bytes written by the device.
86
#
87
+# @zone_append_bytes: The number of bytes appended by the zoned devices
88
+# (since 8.1)
89
+#
90
# @unmap_bytes: The number of bytes unmapped by the device (Since 4.2)
91
#
92
# @rd_operations: The number of read operations performed by the device.
93
#
94
# @wr_operations: The number of write operations performed by the device.
95
#
96
+# @zone_append_operations: The number of zone append operations performed
97
+# by the zoned devices (since 8.1)
98
+#
99
# @flush_operations: The number of cache flush operations performed by the
100
# device (since 0.15)
101
#
102
@@ -XXX,XX +XXX,XX @@
103
#
104
# @wr_total_time_ns: Total time spent on writes in nanoseconds (since 0.15).
105
#
106
+# @zone_append_total_time_ns: Total time spent on zone append writes
107
+# in nanoseconds (since 8.1)
108
+#
109
# @flush_total_time_ns: Total time spent on cache flushes in nanoseconds
110
# (since 0.15).
111
#
112
@@ -XXX,XX +XXX,XX @@
113
# @wr_merged: Number of write requests that have been merged into another
114
# request (Since 2.3).
115
#
116
+# @zone_append_merged: Number of zone append requests that have been merged
117
+# into another request (since 8.1)
118
+#
119
# @unmap_merged: Number of unmap requests that have been merged into another
120
# request (Since 4.2)
121
#
122
@@ -XXX,XX +XXX,XX @@
123
# @failed_wr_operations: The number of failed write operations
124
# performed by the device (Since 2.5)
125
#
126
+# @failed_zone_append_operations: The number of failed zone append write
127
+# operations performed by the zoned devices
128
+# (since 8.1)
129
+#
130
# @failed_flush_operations: The number of failed flush operations
131
# performed by the device (Since 2.5)
132
#
133
@@ -XXX,XX +XXX,XX @@
134
# @invalid_wr_operations: The number of invalid write operations
135
# performed by the device (Since 2.5)
136
#
137
+# @invalid_zone_append_operations: The number of invalid zone append operations
138
+# performed by the zoned device (since 8.1)
139
+#
140
# @invalid_flush_operations: The number of invalid flush operations
141
# performed by the device (Since 2.5)
142
#
143
@@ -XXX,XX +XXX,XX @@
144
#
145
# @wr_latency_histogram: @BlockLatencyHistogramInfo. (Since 4.0)
146
#
147
+# @zone_append_latency_histogram: @BlockLatencyHistogramInfo. (since 8.1)
148
+#
149
# @flush_latency_histogram: @BlockLatencyHistogramInfo. (Since 4.0)
150
#
151
# Since: 0.14
152
##
153
{ 'struct': 'BlockDeviceStats',
154
- 'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'unmap_bytes' : 'int',
155
- 'rd_operations': 'int', 'wr_operations': 'int',
156
+ 'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'zone_append_bytes': 'int',
157
+ 'unmap_bytes' : 'int', 'rd_operations': 'int',
158
+ 'wr_operations': 'int', 'zone_append_operations': 'int',
159
'flush_operations': 'int', 'unmap_operations': 'int',
160
'rd_total_time_ns': 'int', 'wr_total_time_ns': 'int',
161
- 'flush_total_time_ns': 'int', 'unmap_total_time_ns': 'int',
162
- 'wr_highest_offset': 'int',
163
- 'rd_merged': 'int', 'wr_merged': 'int', 'unmap_merged': 'int',
164
- '*idle_time_ns': 'int',
165
+ 'zone_append_total_time_ns': 'int', 'flush_total_time_ns': 'int',
166
+ 'unmap_total_time_ns': 'int', 'wr_highest_offset': 'int',
167
+ 'rd_merged': 'int', 'wr_merged': 'int', 'zone_append_merged': 'int',
168
+ 'unmap_merged': 'int', '*idle_time_ns': 'int',
169
'failed_rd_operations': 'int', 'failed_wr_operations': 'int',
170
- 'failed_flush_operations': 'int', 'failed_unmap_operations': 'int',
171
- 'invalid_rd_operations': 'int', 'invalid_wr_operations': 'int',
172
+ 'failed_zone_append_operations': 'int',
173
+ 'failed_flush_operations': 'int',
174
+ 'failed_unmap_operations': 'int', 'invalid_rd_operations': 'int',
175
+ 'invalid_wr_operations': 'int',
176
+ 'invalid_zone_append_operations': 'int',
177
'invalid_flush_operations': 'int', 'invalid_unmap_operations': 'int',
178
'account_invalid': 'bool', 'account_failed': 'bool',
179
'timed_stats': ['BlockDeviceTimedStats'],
180
'*rd_latency_histogram': 'BlockLatencyHistogramInfo',
181
'*wr_latency_histogram': 'BlockLatencyHistogramInfo',
182
+ '*zone_append_latency_histogram': 'BlockLatencyHistogramInfo',
183
'*flush_latency_histogram': 'BlockLatencyHistogramInfo' } }
184
185
##
186
diff --git a/qapi/block.json b/qapi/block.json
187
index XXXXXXX..XXXXXXX 100644
47
index XXXXXXX..XXXXXXX 100644
188
--- a/qapi/block.json
48
--- a/meson.build
189
+++ b/qapi/block.json
49
+++ b/meson.build
190
@@ -XXX,XX +XXX,XX @@
50
@@ -XXX,XX +XXX,XX @@ config_host_data.set('CONFIG_LZO', lzo.found())
191
# @boundaries-write: list of interval boundary values for write latency
51
config_host_data.set('CONFIG_MPATH', mpathpersist.found())
192
# histogram.
52
config_host_data.set('CONFIG_MPATH_NEW_API', mpathpersist_new_api)
193
#
53
config_host_data.set('CONFIG_BLKIO', blkio.found())
194
+# @boundaries-zap: list of interval boundary values for zone append write
54
+if blkio.found()
195
+# latency histogram.
55
+ config_host_data.set('CONFIG_BLKIO_VHOST_VDPA_FD',
196
+#
56
+ blkio.version().version_compare('>=1.3.0'))
197
# @boundaries-flush: list of interval boundary values for flush latency
57
+endif
198
# histogram.
58
config_host_data.set('CONFIG_CURL', curl.found())
199
#
59
config_host_data.set('CONFIG_CURSES', curses.found())
200
@@ -XXX,XX +XXX,XX @@
60
config_host_data.set('CONFIG_GBM', gbm.found())
201
'*boundaries': ['uint64'],
202
'*boundaries-read': ['uint64'],
203
'*boundaries-write': ['uint64'],
204
+ '*boundaries-zap': ['uint64'],
205
'*boundaries-flush': ['uint64'] },
206
'allow-preconfig': true }
207
diff --git a/include/block/accounting.h b/include/block/accounting.h
208
index XXXXXXX..XXXXXXX 100644
209
--- a/include/block/accounting.h
210
+++ b/include/block/accounting.h
211
@@ -XXX,XX +XXX,XX @@ enum BlockAcctType {
212
BLOCK_ACCT_READ,
213
BLOCK_ACCT_WRITE,
214
BLOCK_ACCT_FLUSH,
215
+ BLOCK_ACCT_ZONE_APPEND,
216
BLOCK_ACCT_UNMAP,
217
BLOCK_MAX_IOTYPE,
218
};
219
diff --git a/block/qapi-sysemu.c b/block/qapi-sysemu.c
220
index XXXXXXX..XXXXXXX 100644
221
--- a/block/qapi-sysemu.c
222
+++ b/block/qapi-sysemu.c
223
@@ -XXX,XX +XXX,XX @@ void qmp_block_latency_histogram_set(
224
bool has_boundaries, uint64List *boundaries,
225
bool has_boundaries_read, uint64List *boundaries_read,
226
bool has_boundaries_write, uint64List *boundaries_write,
227
+ bool has_boundaries_append, uint64List *boundaries_append,
228
bool has_boundaries_flush, uint64List *boundaries_flush,
229
Error **errp)
230
{
231
@@ -XXX,XX +XXX,XX @@ void qmp_block_latency_histogram_set(
232
}
233
}
234
235
+ if (has_boundaries || has_boundaries_append) {
236
+ ret = block_latency_histogram_set(
237
+ stats, BLOCK_ACCT_ZONE_APPEND,
238
+ has_boundaries_append ? boundaries_append : boundaries);
239
+ if (ret) {
240
+ error_setg(errp, "Device '%s' set append write boundaries fail", id);
241
+ return;
242
+ }
243
+ }
244
+
245
if (has_boundaries || has_boundaries_flush) {
246
ret = block_latency_histogram_set(
247
stats, BLOCK_ACCT_FLUSH,
248
diff --git a/block/qapi.c b/block/qapi.c
249
index XXXXXXX..XXXXXXX 100644
250
--- a/block/qapi.c
251
+++ b/block/qapi.c
252
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
253
254
ds->rd_bytes = stats->nr_bytes[BLOCK_ACCT_READ];
255
ds->wr_bytes = stats->nr_bytes[BLOCK_ACCT_WRITE];
256
+ ds->zone_append_bytes = stats->nr_bytes[BLOCK_ACCT_ZONE_APPEND];
257
ds->unmap_bytes = stats->nr_bytes[BLOCK_ACCT_UNMAP];
258
ds->rd_operations = stats->nr_ops[BLOCK_ACCT_READ];
259
ds->wr_operations = stats->nr_ops[BLOCK_ACCT_WRITE];
260
+ ds->zone_append_operations = stats->nr_ops[BLOCK_ACCT_ZONE_APPEND];
261
ds->unmap_operations = stats->nr_ops[BLOCK_ACCT_UNMAP];
262
263
ds->failed_rd_operations = stats->failed_ops[BLOCK_ACCT_READ];
264
ds->failed_wr_operations = stats->failed_ops[BLOCK_ACCT_WRITE];
265
+ ds->failed_zone_append_operations =
266
+ stats->failed_ops[BLOCK_ACCT_ZONE_APPEND];
267
ds->failed_flush_operations = stats->failed_ops[BLOCK_ACCT_FLUSH];
268
ds->failed_unmap_operations = stats->failed_ops[BLOCK_ACCT_UNMAP];
269
270
ds->invalid_rd_operations = stats->invalid_ops[BLOCK_ACCT_READ];
271
ds->invalid_wr_operations = stats->invalid_ops[BLOCK_ACCT_WRITE];
272
+ ds->invalid_zone_append_operations =
273
+ stats->invalid_ops[BLOCK_ACCT_ZONE_APPEND];
274
ds->invalid_flush_operations =
275
stats->invalid_ops[BLOCK_ACCT_FLUSH];
276
ds->invalid_unmap_operations = stats->invalid_ops[BLOCK_ACCT_UNMAP];
277
278
ds->rd_merged = stats->merged[BLOCK_ACCT_READ];
279
ds->wr_merged = stats->merged[BLOCK_ACCT_WRITE];
280
+ ds->zone_append_merged = stats->merged[BLOCK_ACCT_ZONE_APPEND];
281
ds->unmap_merged = stats->merged[BLOCK_ACCT_UNMAP];
282
ds->flush_operations = stats->nr_ops[BLOCK_ACCT_FLUSH];
283
ds->wr_total_time_ns = stats->total_time_ns[BLOCK_ACCT_WRITE];
284
+ ds->zone_append_total_time_ns =
285
+ stats->total_time_ns[BLOCK_ACCT_ZONE_APPEND];
286
ds->rd_total_time_ns = stats->total_time_ns[BLOCK_ACCT_READ];
287
ds->flush_total_time_ns = stats->total_time_ns[BLOCK_ACCT_FLUSH];
288
ds->unmap_total_time_ns = stats->total_time_ns[BLOCK_ACCT_UNMAP];
289
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
290
291
TimedAverage *rd = &ts->latency[BLOCK_ACCT_READ];
292
TimedAverage *wr = &ts->latency[BLOCK_ACCT_WRITE];
293
+ TimedAverage *zap = &ts->latency[BLOCK_ACCT_ZONE_APPEND];
294
TimedAverage *fl = &ts->latency[BLOCK_ACCT_FLUSH];
295
296
dev_stats->interval_length = ts->interval_length;
297
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
298
dev_stats->max_wr_latency_ns = timed_average_max(wr);
299
dev_stats->avg_wr_latency_ns = timed_average_avg(wr);
300
301
+ dev_stats->min_zone_append_latency_ns = timed_average_min(zap);
302
+ dev_stats->max_zone_append_latency_ns = timed_average_max(zap);
303
+ dev_stats->avg_zone_append_latency_ns = timed_average_avg(zap);
304
+
305
dev_stats->min_flush_latency_ns = timed_average_min(fl);
306
dev_stats->max_flush_latency_ns = timed_average_max(fl);
307
dev_stats->avg_flush_latency_ns = timed_average_avg(fl);
308
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
309
block_acct_queue_depth(ts, BLOCK_ACCT_READ);
310
dev_stats->avg_wr_queue_depth =
311
block_acct_queue_depth(ts, BLOCK_ACCT_WRITE);
312
+ dev_stats->avg_zone_append_queue_depth =
313
+ block_acct_queue_depth(ts, BLOCK_ACCT_ZONE_APPEND);
314
315
QAPI_LIST_PREPEND(ds->timed_stats, dev_stats);
316
}
317
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
318
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_READ]);
319
ds->wr_latency_histogram
320
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_WRITE]);
321
+ ds->zone_append_latency_histogram
322
+ = bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_ZONE_APPEND]);
323
ds->flush_latency_histogram
324
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_FLUSH]);
325
}
326
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
327
index XXXXXXX..XXXXXXX 100644
328
--- a/hw/block/virtio-blk.c
329
+++ b/hw/block/virtio-blk.c
330
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
331
data->in_num = in_num;
332
data->zone_append_data.offset = offset;
333
qemu_iovec_init_external(&req->qiov, out_iov, out_num);
334
+
335
+ block_acct_start(blk_get_stats(s->blk), &req->acct, len,
336
+ BLOCK_ACCT_ZONE_APPEND);
337
+
338
blk_aio_zone_append(s->blk, &data->zone_append_data.offset, &req->qiov, 0,
339
virtio_blk_zone_append_complete, data);
340
return 0;
341
--
61
--
342
2.40.0
62
2.40.1
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Message-Id: <20230407082528.18841-5-faithilikerun@gmail.com>
7
---
8
hw/block/virtio-blk.c | 12 ++++++++++++
9
hw/block/trace-events | 7 +++++++
10
2 files changed, 19 insertions(+)
11
12
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
13
index XXXXXXX..XXXXXXX 100644
14
--- a/hw/block/virtio-blk.c
15
+++ b/hw/block/virtio-blk.c
16
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_report_complete(void *opaque, int ret)
17
int64_t nz = data->zone_report_data.nr_zones;
18
int8_t err_status = VIRTIO_BLK_S_OK;
19
20
+ trace_virtio_blk_zone_report_complete(vdev, req, nz, ret);
21
if (ret) {
22
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
23
goto out;
24
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_handle_zone_report(VirtIOBlockReq *req,
25
nr_zones = (req->in_len - sizeof(struct virtio_blk_inhdr) -
26
sizeof(struct virtio_blk_zone_report)) /
27
sizeof(struct virtio_blk_zone_descriptor);
28
+ trace_virtio_blk_handle_zone_report(vdev, req,
29
+ offset >> BDRV_SECTOR_BITS, nr_zones);
30
31
zone_size = sizeof(BlockZoneDescriptor) * nr_zones;
32
data = g_malloc(sizeof(ZoneCmdData));
33
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_mgmt_complete(void *opaque, int ret)
34
{
35
VirtIOBlockReq *req = opaque;
36
VirtIOBlock *s = req->dev;
37
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
38
int8_t err_status = VIRTIO_BLK_S_OK;
39
+ trace_virtio_blk_zone_mgmt_complete(vdev, req,ret);
40
41
if (ret) {
42
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
43
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
44
/* Entire drive capacity */
45
offset = 0;
46
len = capacity;
47
+ trace_virtio_blk_handle_zone_reset_all(vdev, req, 0,
48
+ bs->total_sectors);
49
} else {
50
if (bs->bl.zone_size > capacity - offset) {
51
/* The zoned device allows the last smaller zone. */
52
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
53
} else {
54
len = bs->bl.zone_size;
55
}
56
+ trace_virtio_blk_handle_zone_mgmt(vdev, req, op,
57
+ offset >> BDRV_SECTOR_BITS,
58
+ len >> BDRV_SECTOR_BITS);
59
}
60
61
if (!check_zoned_request(s, offset, len, false, &err_status)) {
62
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_append_complete(void *opaque, int ret)
63
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
64
goto out;
65
}
66
+ trace_virtio_blk_zone_append_complete(vdev, req, append_sector, ret);
67
68
out:
69
aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
70
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
71
int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
72
int64_t len = iov_size(out_iov, out_num);
73
74
+ trace_virtio_blk_handle_zone_append(vdev, req, offset >> BDRV_SECTOR_BITS);
75
if (!check_zoned_request(s, offset, len, true, &err_status)) {
76
goto out;
77
}
78
diff --git a/hw/block/trace-events b/hw/block/trace-events
79
index XXXXXXX..XXXXXXX 100644
80
--- a/hw/block/trace-events
81
+++ b/hw/block/trace-events
82
@@ -XXX,XX +XXX,XX @@ pflash_write_unknown(const char *name, uint8_t cmd) "%s: unknown command 0x%02x"
83
# virtio-blk.c
84
virtio_blk_req_complete(void *vdev, void *req, int status) "vdev %p req %p status %d"
85
virtio_blk_rw_complete(void *vdev, void *req, int ret) "vdev %p req %p ret %d"
86
+virtio_blk_zone_report_complete(void *vdev, void *req, unsigned int nr_zones, int ret) "vdev %p req %p nr_zones %u ret %d"
87
+virtio_blk_zone_mgmt_complete(void *vdev, void *req, int ret) "vdev %p req %p ret %d"
88
+virtio_blk_zone_append_complete(void *vdev, void *req, int64_t sector, int ret) "vdev %p req %p, append sector 0x%" PRIx64 " ret %d"
89
virtio_blk_handle_write(void *vdev, void *req, uint64_t sector, size_t nsectors) "vdev %p req %p sector %"PRIu64" nsectors %zu"
90
virtio_blk_handle_read(void *vdev, void *req, uint64_t sector, size_t nsectors) "vdev %p req %p sector %"PRIu64" nsectors %zu"
91
virtio_blk_submit_multireq(void *vdev, void *mrb, int start, int num_reqs, uint64_t offset, size_t size, bool is_write) "vdev %p mrb %p start %d num_reqs %d offset %"PRIu64" size %zu is_write %d"
92
+virtio_blk_handle_zone_report(void *vdev, void *req, int64_t sector, unsigned int nr_zones) "vdev %p req %p sector 0x%" PRIx64 " nr_zones %u"
93
+virtio_blk_handle_zone_mgmt(void *vdev, void *req, uint8_t op, int64_t sector, int64_t len) "vdev %p req %p op 0x%x sector 0x%" PRIx64 " len 0x%" PRIx64 ""
94
+virtio_blk_handle_zone_reset_all(void *vdev, void *req, int64_t sector, int64_t len) "vdev %p req %p sector 0x%" PRIx64 " cap 0x%" PRIx64 ""
95
+virtio_blk_handle_zone_append(void *vdev, void *req, int64_t sector) "vdev %p req %p, append sector 0x%" PRIx64 ""
96
97
# hd-geometry.c
98
hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p LCHS %d %d %d"
99
--
100
2.40.0
diff view generated by jsdifflib
Deleted patch
1
From: Sam Li <faithilikerun@gmail.com>
2
1
3
Add the documentation about the example of using virtio-blk driver
4
to pass the zoned block devices through to the guest.
5
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
[Fix rST syntax
9
--Stefan]
10
Message-Id: <20230407082528.18841-6-faithilikerun@gmail.com>
11
---
12
docs/devel/zoned-storage.rst | 19 +++++++++++++++++++
13
1 file changed, 19 insertions(+)
14
15
diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
16
index XXXXXXX..XXXXXXX 100644
17
--- a/docs/devel/zoned-storage.rst
18
+++ b/docs/devel/zoned-storage.rst
19
@@ -XXX,XX +XXX,XX @@ APIs for zoned storage emulation or testing.
20
For example, to test zone_report on a null_blk device using qemu-io is::
21
22
$ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0 -c "zrp offset nr_zones"
23
+
24
+To expose the host's zoned block device through virtio-blk, the command line
25
+can be (includes the -device parameter)::
26
+
27
+ -blockdev node-name=drive0,driver=host_device,filename=/dev/nullb0,cache.direct=on \
28
+ -device virtio-blk-pci,drive=drive0
29
+
30
+Or only use the -drive parameter::
31
+
32
+ -driver driver=host_device,file=/dev/nullb0,if=virtio,cache.direct=on
33
+
34
+Additionally, QEMU has several ways of supporting zoned storage, including:
35
+(1) Using virtio-scsi: --device scsi-block allows for the passing through of
36
+SCSI ZBC devices, enabling the attachment of ZBC or ZAC HDDs to QEMU.
37
+(2) PCI device pass-through: While NVMe ZNS emulation is available for testing
38
+purposes, it cannot yet pass through a zoned device from the host. To pass on
39
+the NVMe ZNS device to the guest, use VFIO PCI pass the entire NVMe PCI adapter
40
+through to the guest. Likewise, an HDD HBA can be passed on to QEMU all HDDs
41
+attached to the HBA.
42
--
43
2.40.0
diff view generated by jsdifflib