1
The following changes since commit 711c0418c8c1ce3a24346f058b001c4c5a2f0f81:
1
The following changes since commit 8844bb8d896595ee1d25d21c770e6e6f29803097:
2
2
3
Merge remote-tracking branch 'remotes/philmd/tags/mips-20210702' into staging (2021-07-04 14:04:12 +0100)
3
Merge tag 'or1k-pull-request-20230513' of https://github.com/stffrdhrn/qemu into staging (2023-05-13 11:23:14 +0100)
4
4
5
are available in the Git repository at:
5
are available in the Git repository at:
6
6
7
https://gitlab.com/stefanha/qemu.git tags/block-pull-request
7
https://gitlab.com/stefanha/qemu.git tags/block-pull-request
8
8
9
for you to fetch changes up to 9f460c64e13897117f35ffb61f6f5e0102cabc70:
9
for you to fetch changes up to 01562fee5f3ad4506d57dbcf4b1903b565eceec7:
10
10
11
block/io: Merge discard request alignments (2021-07-06 14:28:55 +0100)
11
docs/zoned-storage:add zoned emulation use case (2023-05-15 08:19:04 -0400)
12
12
13
----------------------------------------------------------------
13
----------------------------------------------------------------
14
Pull request
14
Pull request
15
15
16
This pull request contain's Sam Li's zoned storage support in the QEMU block
17
layer and virtio-blk emulation.
18
19
v2:
20
- Sam fixed the CI failures. CI passes for me now. [Richard]
21
16
----------------------------------------------------------------
22
----------------------------------------------------------------
17
23
18
Akihiko Odaki (3):
24
Sam Li (16):
19
block/file-posix: Optimize for macOS
25
block/block-common: add zoned device structs
20
block: Add backend_defaults property
26
block/file-posix: introduce helper functions for sysfs attributes
21
block/io: Merge discard request alignments
27
block/block-backend: add block layer APIs resembling Linux
28
ZonedBlockDevice ioctls
29
block/raw-format: add zone operations to pass through requests
30
block: add zoned BlockDriver check to block layer
31
iotests: test new zone operations
32
block: add some trace events for new block layer APIs
33
docs/zoned-storage: add zoned device documentation
34
file-posix: add tracking of the zone write pointers
35
block: introduce zone append write for zoned devices
36
qemu-iotests: test zone append operation
37
block: add some trace events for zone append
38
virtio-blk: add zoned storage emulation for zoned devices
39
block: add accounting for zone append operation
40
virtio-blk: add some trace events for zoned emulation
41
docs/zoned-storage:add zoned emulation use case
22
42
23
Stefan Hajnoczi (2):
43
docs/devel/index-api.rst | 1 +
24
util/async: add a human-readable name to BHs for debugging
44
docs/devel/zoned-storage.rst | 62 +++
25
util/async: print leaked BH name when AioContext finalizes
45
qapi/block-core.json | 68 ++-
26
46
qapi/block.json | 4 +
27
include/block/aio.h | 31 ++++++++++++++++++++++---
47
meson.build | 5 +
28
include/hw/block/block.h | 3 +++
48
include/block/accounting.h | 1 +
29
include/qemu/main-loop.h | 4 +++-
49
include/block/block-common.h | 57 ++
30
block/file-posix.c | 27 ++++++++++++++++++++--
50
include/block/block-io.h | 13 +
31
block/io.c | 2 ++
51
include/block/block_int-common.h | 37 ++
32
hw/block/block.c | 42 ++++++++++++++++++++++++++++++----
52
include/block/raw-aio.h | 8 +-
33
tests/unit/ptimer-test-stubs.c | 2 +-
53
include/sysemu/block-backend-io.h | 27 +
34
util/async.c | 25 ++++++++++++++++----
54
block.c | 19 +
35
util/main-loop.c | 4 ++--
55
block/block-backend.c | 198 +++++++
36
tests/qemu-iotests/172.out | 38 ++++++++++++++++++++++++++++++
56
block/file-posix.c | 692 +++++++++++++++++++++++--
37
10 files changed, 161 insertions(+), 17 deletions(-)
57
block/io.c | 68 +++
58
block/io_uring.c | 4 +
59
block/linux-aio.c | 3 +
60
block/qapi-sysemu.c | 11 +
61
block/qapi.c | 18 +
62
block/raw-format.c | 26 +
63
hw/block/virtio-blk-common.c | 2 +
64
hw/block/virtio-blk.c | 405 +++++++++++++++
65
hw/virtio/virtio-qmp.c | 2 +
66
qemu-io-cmds.c | 224 ++++++++
67
block/trace-events | 4 +
68
docs/system/qemu-block-drivers.rst.inc | 6 +
69
hw/block/trace-events | 7 +
70
tests/qemu-iotests/227.out | 18 +
71
tests/qemu-iotests/tests/zoned | 105 ++++
72
tests/qemu-iotests/tests/zoned.out | 69 +++
73
30 files changed, 2106 insertions(+), 58 deletions(-)
74
create mode 100644 docs/devel/zoned-storage.rst
75
create mode 100755 tests/qemu-iotests/tests/zoned
76
create mode 100644 tests/qemu-iotests/tests/zoned.out
38
77
39
--
78
--
40
2.31.1
79
2.40.1
41
diff view generated by jsdifflib
New patch
1
From: Sam Li <faithilikerun@gmail.com>
1
2
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
6
Reviewed-by: Hannes Reinecke <hare@suse.de>
7
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
8
Acked-by: Kevin Wolf <kwolf@redhat.com>
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Message-id: 20230508045533.175575-2-faithilikerun@gmail.com
11
Message-id: 20230324090605.28361-2-faithilikerun@gmail.com
12
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
13
<philmd@linaro.org>.
14
--Stefan]
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
16
---
17
include/block/block-common.h | 43 ++++++++++++++++++++++++++++++++++++
18
1 file changed, 43 insertions(+)
19
20
diff --git a/include/block/block-common.h b/include/block/block-common.h
21
index XXXXXXX..XXXXXXX 100644
22
--- a/include/block/block-common.h
23
+++ b/include/block/block-common.h
24
@@ -XXX,XX +XXX,XX @@ typedef struct BlockDriver BlockDriver;
25
typedef struct BdrvChild BdrvChild;
26
typedef struct BdrvChildClass BdrvChildClass;
27
28
+typedef enum BlockZoneOp {
29
+ BLK_ZO_OPEN,
30
+ BLK_ZO_CLOSE,
31
+ BLK_ZO_FINISH,
32
+ BLK_ZO_RESET,
33
+} BlockZoneOp;
34
+
35
+typedef enum BlockZoneModel {
36
+ BLK_Z_NONE = 0x0, /* Regular block device */
37
+ BLK_Z_HM = 0x1, /* Host-managed zoned block device */
38
+ BLK_Z_HA = 0x2, /* Host-aware zoned block device */
39
+} BlockZoneModel;
40
+
41
+typedef enum BlockZoneState {
42
+ BLK_ZS_NOT_WP = 0x0,
43
+ BLK_ZS_EMPTY = 0x1,
44
+ BLK_ZS_IOPEN = 0x2,
45
+ BLK_ZS_EOPEN = 0x3,
46
+ BLK_ZS_CLOSED = 0x4,
47
+ BLK_ZS_RDONLY = 0xD,
48
+ BLK_ZS_FULL = 0xE,
49
+ BLK_ZS_OFFLINE = 0xF,
50
+} BlockZoneState;
51
+
52
+typedef enum BlockZoneType {
53
+ BLK_ZT_CONV = 0x1, /* Conventional random writes supported */
54
+ BLK_ZT_SWR = 0x2, /* Sequential writes required */
55
+ BLK_ZT_SWP = 0x3, /* Sequential writes preferred */
56
+} BlockZoneType;
57
+
58
+/*
59
+ * Zone descriptor data structure.
60
+ * Provides information on a zone with all position and size values in bytes.
61
+ */
62
+typedef struct BlockZoneDescriptor {
63
+ uint64_t start;
64
+ uint64_t length;
65
+ uint64_t cap;
66
+ uint64_t wp;
67
+ BlockZoneType type;
68
+ BlockZoneState state;
69
+} BlockZoneDescriptor;
70
+
71
typedef struct BlockDriverInfo {
72
/* in bytes, 0 if irrelevant */
73
int cluster_size;
74
--
75
2.40.1
76
77
diff view generated by jsdifflib
New patch
1
1
From: Sam Li <faithilikerun@gmail.com>
2
3
Use get_sysfs_str_val() to get the string value of device
4
zoned model. Then get_sysfs_zoned_model() can convert it to
5
BlockZoneModel type of QEMU.
6
7
Use get_sysfs_long_val() to get the long value of zoned device
8
information.
9
10
Signed-off-by: Sam Li <faithilikerun@gmail.com>
11
Reviewed-by: Hannes Reinecke <hare@suse.de>
12
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
14
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
15
Acked-by: Kevin Wolf <kwolf@redhat.com>
16
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
17
Message-id: 20230508045533.175575-3-faithilikerun@gmail.com
18
Message-id: 20230324090605.28361-3-faithilikerun@gmail.com
19
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
20
<philmd@linaro.org>.
21
--Stefan]
22
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
23
---
24
include/block/block_int-common.h | 3 +
25
block/file-posix.c | 135 ++++++++++++++++++++++---------
26
2 files changed, 100 insertions(+), 38 deletions(-)
27
28
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
29
index XXXXXXX..XXXXXXX 100644
30
--- a/include/block/block_int-common.h
31
+++ b/include/block/block_int-common.h
32
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
33
* an explicit monitor command to load the disk inside the guest).
34
*/
35
bool has_variable_length;
36
+
37
+ /* device zone model */
38
+ BlockZoneModel zoned;
39
} BlockLimits;
40
41
typedef struct BdrvOpBlocker BdrvOpBlocker;
42
diff --git a/block/file-posix.c b/block/file-posix.c
43
index XXXXXXX..XXXXXXX 100644
44
--- a/block/file-posix.c
45
+++ b/block/file-posix.c
46
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_hw_transfer(int fd, struct stat *st)
47
#endif
48
}
49
50
-static int hdev_get_max_segments(int fd, struct stat *st)
51
+/*
52
+ * Get a sysfs attribute value as character string.
53
+ */
54
+#ifdef CONFIG_LINUX
55
+static int get_sysfs_str_val(struct stat *st, const char *attribute,
56
+ char **val) {
57
+ g_autofree char *sysfspath = NULL;
58
+ int ret;
59
+ size_t len;
60
+
61
+ if (!S_ISBLK(st->st_mode)) {
62
+ return -ENOTSUP;
63
+ }
64
+
65
+ sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s",
66
+ major(st->st_rdev), minor(st->st_rdev),
67
+ attribute);
68
+ ret = g_file_get_contents(sysfspath, val, &len, NULL);
69
+ if (ret == -1) {
70
+ return -ENOENT;
71
+ }
72
+
73
+ /* The file is ended with '\n' */
74
+ char *p;
75
+ p = *val;
76
+ if (*(p + len - 1) == '\n') {
77
+ *(p + len - 1) = '\0';
78
+ }
79
+ return ret;
80
+}
81
+#endif
82
+
83
+static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
84
{
85
+ g_autofree char *val = NULL;
86
+ int ret;
87
+
88
+ ret = get_sysfs_str_val(st, "zoned", &val);
89
+ if (ret < 0) {
90
+ return ret;
91
+ }
92
+
93
+ if (strcmp(val, "host-managed") == 0) {
94
+ *zoned = BLK_Z_HM;
95
+ } else if (strcmp(val, "host-aware") == 0) {
96
+ *zoned = BLK_Z_HA;
97
+ } else if (strcmp(val, "none") == 0) {
98
+ *zoned = BLK_Z_NONE;
99
+ } else {
100
+ return -ENOTSUP;
101
+ }
102
+ return 0;
103
+}
104
+
105
+/*
106
+ * Get a sysfs attribute value as a long integer.
107
+ */
108
#ifdef CONFIG_LINUX
109
- char buf[32];
110
+static long get_sysfs_long_val(struct stat *st, const char *attribute)
111
+{
112
+ g_autofree char *str = NULL;
113
const char *end;
114
- char *sysfspath = NULL;
115
+ long val;
116
+ int ret;
117
+
118
+ ret = get_sysfs_str_val(st, attribute, &str);
119
+ if (ret < 0) {
120
+ return ret;
121
+ }
122
+
123
+ /* The file is ended with '\n', pass 'end' to accept that. */
124
+ ret = qemu_strtol(str, &end, 10, &val);
125
+ if (ret == 0 && end && *end == '\0') {
126
+ ret = val;
127
+ }
128
+ return ret;
129
+}
130
+#endif
131
+
132
+static int hdev_get_max_segments(int fd, struct stat *st)
133
+{
134
+#ifdef CONFIG_LINUX
135
int ret;
136
- int sysfd = -1;
137
- long max_segments;
138
139
if (S_ISCHR(st->st_mode)) {
140
if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
141
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_segments(int fd, struct stat *st)
142
}
143
return -ENOTSUP;
144
}
145
-
146
- if (!S_ISBLK(st->st_mode)) {
147
- return -ENOTSUP;
148
- }
149
-
150
- sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
151
- major(st->st_rdev), minor(st->st_rdev));
152
- sysfd = open(sysfspath, O_RDONLY);
153
- if (sysfd == -1) {
154
- ret = -errno;
155
- goto out;
156
- }
157
- ret = RETRY_ON_EINTR(read(sysfd, buf, sizeof(buf) - 1));
158
- if (ret < 0) {
159
- ret = -errno;
160
- goto out;
161
- } else if (ret == 0) {
162
- ret = -EIO;
163
- goto out;
164
- }
165
- buf[ret] = 0;
166
- /* The file is ended with '\n', pass 'end' to accept that. */
167
- ret = qemu_strtol(buf, &end, 10, &max_segments);
168
- if (ret == 0 && end && *end == '\n') {
169
- ret = max_segments;
170
- }
171
-
172
-out:
173
- if (sysfd != -1) {
174
- close(sysfd);
175
- }
176
- g_free(sysfspath);
177
- return ret;
178
+ return get_sysfs_long_val(st, "max_segments");
179
#else
180
return -ENOTSUP;
181
#endif
182
}
183
184
+static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
185
+ Error **errp)
186
+{
187
+ BlockZoneModel zoned;
188
+ int ret;
189
+
190
+ bs->bl.zoned = BLK_Z_NONE;
191
+
192
+ ret = get_sysfs_zoned_model(st, &zoned);
193
+ if (ret < 0 || zoned == BLK_Z_NONE) {
194
+ return;
195
+ }
196
+ bs->bl.zoned = zoned;
197
+}
198
+
199
static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
200
{
201
BDRVRawState *s = bs->opaque;
202
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
203
bs->bl.max_hw_iov = ret;
204
}
205
}
206
+
207
+ raw_refresh_zoned_limits(bs, &st, errp);
208
}
209
210
static int check_for_dasd(int fd)
211
--
212
2.40.1
213
214
diff view generated by jsdifflib
1
From: Akihiko Odaki <akihiko.odaki@gmail.com>
1
From: Sam Li <faithilikerun@gmail.com>
2
2
3
This commit introduces "punch hole" operation and optimizes transfer
3
Add zoned device option to host_device BlockDriver. It will be presented only
4
block size for macOS.
4
for zoned host block devices. By adding zone management operations to the
5
host_block_device BlockDriver, users can use the new block layer APIs
6
including Report Zone and four zone management operations
7
(open, close, finish, reset, reset_all).
5
8
6
Thanks to Konstantin Nazarov for detailed analysis of a flaw in an
9
Qemu-io uses the new APIs to perform zoned storage commands of the device:
7
old version of this change:
10
zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs),
8
https://gist.github.com/akihikodaki/87df4149e7ca87f18dc56807ec5a1bc5#gistcomment-3654667
11
zone_finish(zf).
9
12
10
Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com>
13
For example, to test zone_report, use following command:
11
Message-id: 20210705130458.97642-1-akihiko.odaki@gmail.com
14
$ ./build/qemu-io --image-opts -n driver=host_device, filename=/dev/nullb0
15
-c "zrp offset nr_zones"
16
17
Signed-off-by: Sam Li <faithilikerun@gmail.com>
18
Reviewed-by: Hannes Reinecke <hare@suse.de>
19
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
20
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
21
Acked-by: Kevin Wolf <kwolf@redhat.com>
22
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
23
Message-id: 20230508045533.175575-4-faithilikerun@gmail.com
24
Message-id: 20230324090605.28361-4-faithilikerun@gmail.com
25
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
26
<philmd@linaro.org> and remove spurious ret = -errno in
27
raw_co_zone_mgmt().
28
--Stefan]
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
29
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
---
30
---
14
block/file-posix.c | 27 +++++++++++++++++++++++++--
31
meson.build | 5 +
15
1 file changed, 25 insertions(+), 2 deletions(-)
32
include/block/block-io.h | 9 +
33
include/block/block_int-common.h | 21 ++
34
include/block/raw-aio.h | 6 +-
35
include/sysemu/block-backend-io.h | 18 ++
36
block/block-backend.c | 137 +++++++++++++
37
block/file-posix.c | 313 +++++++++++++++++++++++++++++-
38
block/io.c | 41 ++++
39
qemu-io-cmds.c | 149 ++++++++++++++
40
9 files changed, 696 insertions(+), 3 deletions(-)
16
41
42
diff --git a/meson.build b/meson.build
43
index XXXXXXX..XXXXXXX 100644
44
--- a/meson.build
45
+++ b/meson.build
46
@@ -XXX,XX +XXX,XX @@ if rdma.found()
47
endif
48
49
# has_header_symbol
50
+config_host_data.set('CONFIG_BLKZONED',
51
+ cc.has_header_symbol('linux/blkzoned.h', 'BLKOPENZONE'))
52
config_host_data.set('CONFIG_EPOLL_CREATE1',
53
cc.has_header_symbol('sys/epoll.h', 'epoll_create1'))
54
config_host_data.set('CONFIG_FALLOCATE_PUNCH_HOLE',
55
@@ -XXX,XX +XXX,XX @@ config_host_data.set('HAVE_SIGEV_NOTIFY_THREAD_ID',
56
config_host_data.set('HAVE_STRUCT_STAT_ST_ATIM',
57
cc.has_member('struct stat', 'st_atim',
58
prefix: '#include <sys/stat.h>'))
59
+config_host_data.set('HAVE_BLK_ZONE_REP_CAPACITY',
60
+ cc.has_member('struct blk_zone', 'capacity',
61
+ prefix: '#include <linux/blkzoned.h>'))
62
63
# has_type
64
config_host_data.set('CONFIG_IOVEC',
65
diff --git a/include/block/block-io.h b/include/block/block-io.h
66
index XXXXXXX..XXXXXXX 100644
67
--- a/include/block/block-io.h
68
+++ b/include/block/block-io.h
69
@@ -XXX,XX +XXX,XX @@ int coroutine_fn GRAPH_RDLOCK bdrv_co_flush(BlockDriverState *bs);
70
int coroutine_fn GRAPH_RDLOCK bdrv_co_pdiscard(BdrvChild *child, int64_t offset,
71
int64_t bytes);
72
73
+/* Report zone information of zone block device. */
74
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_report(BlockDriverState *bs,
75
+ int64_t offset,
76
+ unsigned int *nr_zones,
77
+ BlockZoneDescriptor *zones);
78
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs,
79
+ BlockZoneOp op,
80
+ int64_t offset, int64_t len);
81
+
82
bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
83
int bdrv_block_status(BlockDriverState *bs, int64_t offset,
84
int64_t bytes, int64_t *pnum, int64_t *map,
85
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
86
index XXXXXXX..XXXXXXX 100644
87
--- a/include/block/block_int-common.h
88
+++ b/include/block/block_int-common.h
89
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
90
int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_load_vmstate)(
91
BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
92
93
+ int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs,
94
+ int64_t offset, unsigned int *nr_zones,
95
+ BlockZoneDescriptor *zones);
96
+ int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
97
+ int64_t offset, int64_t len);
98
+
99
/* removable device specific */
100
bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)(
101
BlockDriverState *bs);
102
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
103
104
/* device zone model */
105
BlockZoneModel zoned;
106
+
107
+ /* zone size expressed in bytes */
108
+ uint32_t zone_size;
109
+
110
+ /* total number of zones */
111
+ uint32_t nr_zones;
112
+
113
+ /* maximum sectors of a zone append write operation */
114
+ uint32_t max_append_sectors;
115
+
116
+ /* maximum number of open zones */
117
+ uint32_t max_open_zones;
118
+
119
+ /* maximum number of active zones */
120
+ uint32_t max_active_zones;
121
} BlockLimits;
122
123
typedef struct BdrvOpBlocker BdrvOpBlocker;
124
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
125
index XXXXXXX..XXXXXXX 100644
126
--- a/include/block/raw-aio.h
127
+++ b/include/block/raw-aio.h
128
@@ -XXX,XX +XXX,XX @@
129
#define QEMU_AIO_WRITE_ZEROES 0x0020
130
#define QEMU_AIO_COPY_RANGE 0x0040
131
#define QEMU_AIO_TRUNCATE 0x0080
132
+#define QEMU_AIO_ZONE_REPORT 0x0100
133
+#define QEMU_AIO_ZONE_MGMT 0x0200
134
#define QEMU_AIO_TYPE_MASK \
135
(QEMU_AIO_READ | \
136
QEMU_AIO_WRITE | \
137
@@ -XXX,XX +XXX,XX @@
138
QEMU_AIO_DISCARD | \
139
QEMU_AIO_WRITE_ZEROES | \
140
QEMU_AIO_COPY_RANGE | \
141
- QEMU_AIO_TRUNCATE)
142
+ QEMU_AIO_TRUNCATE | \
143
+ QEMU_AIO_ZONE_REPORT | \
144
+ QEMU_AIO_ZONE_MGMT)
145
146
/* AIO flags */
147
#define QEMU_AIO_MISALIGNED 0x1000
148
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
149
index XXXXXXX..XXXXXXX 100644
150
--- a/include/sysemu/block-backend-io.h
151
+++ b/include/sysemu/block-backend-io.h
152
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t offset,
153
BlockCompletionFunc *cb, void *opaque);
154
BlockAIOCB *blk_aio_flush(BlockBackend *blk,
155
BlockCompletionFunc *cb, void *opaque);
156
+BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
157
+ unsigned int *nr_zones,
158
+ BlockZoneDescriptor *zones,
159
+ BlockCompletionFunc *cb, void *opaque);
160
+BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
161
+ int64_t offset, int64_t len,
162
+ BlockCompletionFunc *cb, void *opaque);
163
BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes,
164
BlockCompletionFunc *cb, void *opaque);
165
void blk_aio_cancel_async(BlockAIOCB *acb);
166
@@ -XXX,XX +XXX,XX @@ int co_wrapper_mixed blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
167
int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
168
int64_t bytes, BdrvRequestFlags flags);
169
170
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
171
+ unsigned int *nr_zones,
172
+ BlockZoneDescriptor *zones);
173
+int co_wrapper_mixed blk_zone_report(BlockBackend *blk, int64_t offset,
174
+ unsigned int *nr_zones,
175
+ BlockZoneDescriptor *zones);
176
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
177
+ int64_t offset, int64_t len);
178
+int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
179
+ int64_t offset, int64_t len);
180
+
181
int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset,
182
int64_t bytes);
183
int coroutine_fn blk_co_pdiscard(BlockBackend *blk, int64_t offset,
184
diff --git a/block/block-backend.c b/block/block-backend.c
185
index XXXXXXX..XXXXXXX 100644
186
--- a/block/block-backend.c
187
+++ b/block/block-backend.c
188
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
189
return ret;
190
}
191
192
+static void coroutine_fn blk_aio_zone_report_entry(void *opaque)
193
+{
194
+ BlkAioEmAIOCB *acb = opaque;
195
+ BlkRwCo *rwco = &acb->rwco;
196
+
197
+ rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset,
198
+ (unsigned int*)(uintptr_t)acb->bytes,
199
+ rwco->iobuf);
200
+ blk_aio_complete(acb);
201
+}
202
+
203
+BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
204
+ unsigned int *nr_zones,
205
+ BlockZoneDescriptor *zones,
206
+ BlockCompletionFunc *cb, void *opaque)
207
+{
208
+ BlkAioEmAIOCB *acb;
209
+ Coroutine *co;
210
+ IO_CODE();
211
+
212
+ blk_inc_in_flight(blk);
213
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
214
+ acb->rwco = (BlkRwCo) {
215
+ .blk = blk,
216
+ .offset = offset,
217
+ .iobuf = zones,
218
+ .ret = NOT_DONE,
219
+ };
220
+ acb->bytes = (int64_t)(uintptr_t)nr_zones,
221
+ acb->has_returned = false;
222
+
223
+ co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
224
+ aio_co_enter(blk_get_aio_context(blk), co);
225
+
226
+ acb->has_returned = true;
227
+ if (acb->rwco.ret != NOT_DONE) {
228
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
229
+ blk_aio_complete_bh, acb);
230
+ }
231
+
232
+ return &acb->common;
233
+}
234
+
235
+static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque)
236
+{
237
+ BlkAioEmAIOCB *acb = opaque;
238
+ BlkRwCo *rwco = &acb->rwco;
239
+
240
+ rwco->ret = blk_co_zone_mgmt(rwco->blk,
241
+ (BlockZoneOp)(uintptr_t)rwco->iobuf,
242
+ rwco->offset, acb->bytes);
243
+ blk_aio_complete(acb);
244
+}
245
+
246
+BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
247
+ int64_t offset, int64_t len,
248
+ BlockCompletionFunc *cb, void *opaque) {
249
+ BlkAioEmAIOCB *acb;
250
+ Coroutine *co;
251
+ IO_CODE();
252
+
253
+ blk_inc_in_flight(blk);
254
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
255
+ acb->rwco = (BlkRwCo) {
256
+ .blk = blk,
257
+ .offset = offset,
258
+ .iobuf = (void *)(uintptr_t)op,
259
+ .ret = NOT_DONE,
260
+ };
261
+ acb->bytes = len;
262
+ acb->has_returned = false;
263
+
264
+ co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
265
+ aio_co_enter(blk_get_aio_context(blk), co);
266
+
267
+ acb->has_returned = true;
268
+ if (acb->rwco.ret != NOT_DONE) {
269
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
270
+ blk_aio_complete_bh, acb);
271
+ }
272
+
273
+ return &acb->common;
274
+}
275
+
276
+/*
277
+ * Send a zone_report command.
278
+ * offset is a byte offset from the start of the device. No alignment
279
+ * required for offset.
280
+ * nr_zones represents IN maximum and OUT actual.
281
+ */
282
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
283
+ unsigned int *nr_zones,
284
+ BlockZoneDescriptor *zones)
285
+{
286
+ int ret;
287
+ IO_CODE();
288
+
289
+ blk_inc_in_flight(blk); /* increase before waiting */
290
+ blk_wait_while_drained(blk);
291
+ GRAPH_RDLOCK_GUARD();
292
+ if (!blk_is_available(blk)) {
293
+ blk_dec_in_flight(blk);
294
+ return -ENOMEDIUM;
295
+ }
296
+ ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones);
297
+ blk_dec_in_flight(blk);
298
+ return ret;
299
+}
300
+
301
+/*
302
+ * Send a zone_management command.
303
+ * op is the zone operation;
304
+ * offset is the byte offset from the start of the zoned device;
305
+ * len is the maximum number of bytes the command should operate on. It
306
+ * should be aligned with the device zone size.
307
+ */
308
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
309
+ int64_t offset, int64_t len)
310
+{
311
+ int ret;
312
+ IO_CODE();
313
+
314
+ blk_inc_in_flight(blk);
315
+ blk_wait_while_drained(blk);
316
+ GRAPH_RDLOCK_GUARD();
317
+
318
+ ret = blk_check_byte_request(blk, offset, len);
319
+ if (ret < 0) {
320
+ blk_dec_in_flight(blk);
321
+ return ret;
322
+ }
323
+
324
+ ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len);
325
+ blk_dec_in_flight(blk);
326
+ return ret;
327
+}
328
+
329
void blk_drain(BlockBackend *blk)
330
{
331
BlockDriverState *bs = blk_bs(blk);
17
diff --git a/block/file-posix.c b/block/file-posix.c
332
diff --git a/block/file-posix.c b/block/file-posix.c
18
index XXXXXXX..XXXXXXX 100644
333
index XXXXXXX..XXXXXXX 100644
19
--- a/block/file-posix.c
334
--- a/block/file-posix.c
20
+++ b/block/file-posix.c
335
+++ b/block/file-posix.c
21
@@ -XXX,XX +XXX,XX @@
336
@@ -XXX,XX +XXX,XX @@
22
#if defined(HAVE_HOST_BLOCK_DEVICE)
23
#include <paths.h>
24
#include <sys/param.h>
337
#include <sys/param.h>
25
+#include <sys/mount.h>
338
#include <sys/syscall.h>
26
#include <IOKit/IOKitLib.h>
339
#include <sys/vfs.h>
27
#include <IOKit/IOBSD.h>
340
+#if defined(CONFIG_BLKZONED)
28
#include <IOKit/storage/IOMediaBSDClient.h>
341
+#include <linux/blkzoned.h>
29
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
342
+#endif
343
#include <linux/cdrom.h>
344
#include <linux/fd.h>
345
#include <linux/fs.h>
346
@@ -XXX,XX +XXX,XX @@ typedef struct RawPosixAIOData {
347
PreallocMode prealloc;
348
Error **errp;
349
} truncate;
350
+ struct {
351
+ unsigned int *nr_zones;
352
+ BlockZoneDescriptor *zones;
353
+ } zone_report;
354
+ struct {
355
+ unsigned long op;
356
+ } zone_mgmt;
357
};
358
} RawPosixAIOData;
359
360
@@ -XXX,XX +XXX,XX @@ static int get_sysfs_str_val(struct stat *st, const char *attribute,
361
}
362
#endif
363
364
+#if defined(CONFIG_BLKZONED)
365
static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
366
{
367
g_autofree char *val = NULL;
368
@@ -XXX,XX +XXX,XX @@ static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
369
}
370
return 0;
371
}
372
+#endif /* defined(CONFIG_BLKZONED) */
373
374
/*
375
* Get a sysfs attribute value as a long integer.
376
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_segments(int fd, struct stat *st)
377
#endif
378
}
379
380
+#if defined(CONFIG_BLKZONED)
381
static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
382
Error **errp)
383
{
384
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
30
return;
385
return;
31
}
386
}
32
387
bs->bl.zoned = zoned;
33
+#if defined(__APPLE__) && (__MACH__)
388
+
34
+ struct statfs buf;
389
+ ret = get_sysfs_long_val(st, "max_open_zones");
35
+
390
+ if (ret >= 0) {
36
+ if (!fstatfs(s->fd, &buf)) {
391
+ bs->bl.max_open_zones = ret;
37
+ bs->bl.opt_transfer = buf.f_iosize;
392
+ }
38
+ bs->bl.pdiscard_alignment = buf.f_bsize;
393
+
39
+ }
394
+ ret = get_sysfs_long_val(st, "max_active_zones");
395
+ if (ret >= 0) {
396
+ bs->bl.max_active_zones = ret;
397
+ }
398
+
399
+ /*
400
+ * The zoned device must at least have zone size and nr_zones fields.
401
+ */
402
+ ret = get_sysfs_long_val(st, "chunk_sectors");
403
+ if (ret < 0) {
404
+ error_setg_errno(errp, -ret, "Unable to read chunk_sectors "
405
+ "sysfs attribute");
406
+ return;
407
+ } else if (!ret) {
408
+ error_setg(errp, "Read 0 from chunk_sectors sysfs attribute");
409
+ return;
410
+ }
411
+ bs->bl.zone_size = ret << BDRV_SECTOR_BITS;
412
+
413
+ ret = get_sysfs_long_val(st, "nr_zones");
414
+ if (ret < 0) {
415
+ error_setg_errno(errp, -ret, "Unable to read nr_zones "
416
+ "sysfs attribute");
417
+ return;
418
+ } else if (!ret) {
419
+ error_setg(errp, "Read 0 from nr_zones sysfs attribute");
420
+ return;
421
+ }
422
+ bs->bl.nr_zones = ret;
423
+
424
+ ret = get_sysfs_long_val(st, "zone_append_max_bytes");
425
+ if (ret > 0) {
426
+ bs->bl.max_append_sectors = ret >> BDRV_SECTOR_BITS;
427
+ }
428
}
429
+#else /* !defined(CONFIG_BLKZONED) */
430
+static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
431
+ Error **errp)
432
+{
433
+ bs->bl.zoned = BLK_Z_NONE;
434
+}
435
+#endif /* !defined(CONFIG_BLKZONED) */
436
437
static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
438
{
439
@@ -XXX,XX +XXX,XX @@ static int hdev_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
440
BDRVRawState *s = bs->opaque;
441
int ret;
442
443
- /* If DASD, get blocksizes */
444
+ /* If DASD or zoned devices, get blocksizes */
445
if (check_for_dasd(s->fd) < 0) {
446
- return -ENOTSUP;
447
+ /* zoned devices are not DASD */
448
+ if (bs->bl.zoned == BLK_Z_NONE) {
449
+ return -ENOTSUP;
450
+ }
451
}
452
ret = probe_logical_blocksize(s->fd, &bsz->log);
453
if (ret < 0) {
454
@@ -XXX,XX +XXX,XX @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
455
}
456
#endif
457
458
+/*
459
+ * parse_zone - Fill a zone descriptor
460
+ */
461
+#if defined(CONFIG_BLKZONED)
462
+static inline int parse_zone(struct BlockZoneDescriptor *zone,
463
+ const struct blk_zone *blkz) {
464
+ zone->start = blkz->start << BDRV_SECTOR_BITS;
465
+ zone->length = blkz->len << BDRV_SECTOR_BITS;
466
+ zone->wp = blkz->wp << BDRV_SECTOR_BITS;
467
+
468
+#ifdef HAVE_BLK_ZONE_REP_CAPACITY
469
+ zone->cap = blkz->capacity << BDRV_SECTOR_BITS;
470
+#else
471
+ zone->cap = blkz->len << BDRV_SECTOR_BITS;
40
+#endif
472
+#endif
41
+
473
+
42
if (bs->sg || S_ISBLK(st.st_mode)) {
474
+ switch (blkz->type) {
43
int ret = hdev_get_max_hw_transfer(s->fd, &st);
475
+ case BLK_ZONE_TYPE_SEQWRITE_REQ:
44
476
+ zone->type = BLK_ZT_SWR;
45
@@ -XXX,XX +XXX,XX @@ out:
477
+ break;
478
+ case BLK_ZONE_TYPE_SEQWRITE_PREF:
479
+ zone->type = BLK_ZT_SWP;
480
+ break;
481
+ case BLK_ZONE_TYPE_CONVENTIONAL:
482
+ zone->type = BLK_ZT_CONV;
483
+ break;
484
+ default:
485
+ error_report("Unsupported zone type: 0x%x", blkz->type);
486
+ return -ENOTSUP;
487
+ }
488
+
489
+ switch (blkz->cond) {
490
+ case BLK_ZONE_COND_NOT_WP:
491
+ zone->state = BLK_ZS_NOT_WP;
492
+ break;
493
+ case BLK_ZONE_COND_EMPTY:
494
+ zone->state = BLK_ZS_EMPTY;
495
+ break;
496
+ case BLK_ZONE_COND_IMP_OPEN:
497
+ zone->state = BLK_ZS_IOPEN;
498
+ break;
499
+ case BLK_ZONE_COND_EXP_OPEN:
500
+ zone->state = BLK_ZS_EOPEN;
501
+ break;
502
+ case BLK_ZONE_COND_CLOSED:
503
+ zone->state = BLK_ZS_CLOSED;
504
+ break;
505
+ case BLK_ZONE_COND_READONLY:
506
+ zone->state = BLK_ZS_RDONLY;
507
+ break;
508
+ case BLK_ZONE_COND_FULL:
509
+ zone->state = BLK_ZS_FULL;
510
+ break;
511
+ case BLK_ZONE_COND_OFFLINE:
512
+ zone->state = BLK_ZS_OFFLINE;
513
+ break;
514
+ default:
515
+ error_report("Unsupported zone state: 0x%x", blkz->cond);
516
+ return -ENOTSUP;
517
+ }
518
+ return 0;
519
+}
520
+#endif
521
+
522
+#if defined(CONFIG_BLKZONED)
523
+static int handle_aiocb_zone_report(void *opaque)
524
+{
525
+ RawPosixAIOData *aiocb = opaque;
526
+ int fd = aiocb->aio_fildes;
527
+ unsigned int *nr_zones = aiocb->zone_report.nr_zones;
528
+ BlockZoneDescriptor *zones = aiocb->zone_report.zones;
529
+ /* zoned block devices use 512-byte sectors */
530
+ uint64_t sector = aiocb->aio_offset / 512;
531
+
532
+ struct blk_zone *blkz;
533
+ size_t rep_size;
534
+ unsigned int nrz;
535
+ int ret;
536
+ unsigned int n = 0, i = 0;
537
+
538
+ nrz = *nr_zones;
539
+ rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
540
+ g_autofree struct blk_zone_report *rep = NULL;
541
+ rep = g_malloc(rep_size);
542
+
543
+ blkz = (struct blk_zone *)(rep + 1);
544
+ while (n < nrz) {
545
+ memset(rep, 0, rep_size);
546
+ rep->sector = sector;
547
+ rep->nr_zones = nrz - n;
548
+
549
+ do {
550
+ ret = ioctl(fd, BLKREPORTZONE, rep);
551
+ } while (ret != 0 && errno == EINTR);
552
+ if (ret != 0) {
553
+ error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
554
+ fd, sector, errno);
555
+ return -errno;
556
+ }
557
+
558
+ if (!rep->nr_zones) {
559
+ break;
560
+ }
561
+
562
+ for (i = 0; i < rep->nr_zones; i++, n++) {
563
+ ret = parse_zone(&zones[n], &blkz[i]);
564
+ if (ret != 0) {
565
+ return ret;
566
+ }
567
+
568
+ /* The next report should start after the last zone reported */
569
+ sector = blkz[i].start + blkz[i].len;
570
+ }
571
+ }
572
+
573
+ *nr_zones = n;
574
+ return 0;
575
+}
576
+#endif
577
+
578
+#if defined(CONFIG_BLKZONED)
579
+static int handle_aiocb_zone_mgmt(void *opaque)
580
+{
581
+ RawPosixAIOData *aiocb = opaque;
582
+ int fd = aiocb->aio_fildes;
583
+ uint64_t sector = aiocb->aio_offset / 512;
584
+ int64_t nr_sectors = aiocb->aio_nbytes / 512;
585
+ struct blk_zone_range range;
586
+ int ret;
587
+
588
+ /* Execute the operation */
589
+ range.sector = sector;
590
+ range.nr_sectors = nr_sectors;
591
+ do {
592
+ ret = ioctl(fd, aiocb->zone_mgmt.op, &range);
593
+ } while (ret != 0 && errno == EINTR);
594
+
595
+ return ret;
596
+}
597
+#endif
598
+
599
static int handle_aiocb_copy_range(void *opaque)
600
{
601
RawPosixAIOData *aiocb = opaque;
602
@@ -XXX,XX +XXX,XX @@ static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
46
}
603
}
47
}
604
}
48
605
49
+#if defined(CONFIG_FALLOCATE) || defined(BLKZEROOUT) || defined(BLKDISCARD)
606
+/*
50
static int translate_err(int err)
607
+ * zone report - Get a zone block device's information in the form
608
+ * of an array of zone descriptors.
609
+ * zones is an array of zone descriptors to hold zone information on reply;
610
+ * offset can be any byte within the entire size of the device;
611
+ * nr_zones is the maxium number of sectors the command should operate on.
612
+ */
613
+#if defined(CONFIG_BLKZONED)
614
+static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
615
+ unsigned int *nr_zones,
616
+ BlockZoneDescriptor *zones) {
617
+ BDRVRawState *s = bs->opaque;
618
+ RawPosixAIOData acb = (RawPosixAIOData) {
619
+ .bs = bs,
620
+ .aio_fildes = s->fd,
621
+ .aio_type = QEMU_AIO_ZONE_REPORT,
622
+ .aio_offset = offset,
623
+ .zone_report = {
624
+ .nr_zones = nr_zones,
625
+ .zones = zones,
626
+ },
627
+ };
628
+
629
+ return raw_thread_pool_submit(handle_aiocb_zone_report, &acb);
630
+}
631
+#endif
632
+
633
+/*
634
+ * zone management operations - Execute an operation on a zone
635
+ */
636
+#if defined(CONFIG_BLKZONED)
637
+static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
638
+ int64_t offset, int64_t len) {
639
+ BDRVRawState *s = bs->opaque;
640
+ RawPosixAIOData acb;
641
+ int64_t zone_size, zone_size_mask;
642
+ const char *op_name;
643
+ unsigned long zo;
644
+ int ret;
645
+ int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
646
+
647
+ zone_size = bs->bl.zone_size;
648
+ zone_size_mask = zone_size - 1;
649
+ if (offset & zone_size_mask) {
650
+ error_report("sector offset %" PRId64 " is not aligned to zone size "
651
+ "%" PRId64 "", offset / 512, zone_size / 512);
652
+ return -EINVAL;
653
+ }
654
+
655
+ if (((offset + len) < capacity && len & zone_size_mask) ||
656
+ offset + len > capacity) {
657
+ error_report("number of sectors %" PRId64 " is not aligned to zone size"
658
+ " %" PRId64 "", len / 512, zone_size / 512);
659
+ return -EINVAL;
660
+ }
661
+
662
+ switch (op) {
663
+ case BLK_ZO_OPEN:
664
+ op_name = "BLKOPENZONE";
665
+ zo = BLKOPENZONE;
666
+ break;
667
+ case BLK_ZO_CLOSE:
668
+ op_name = "BLKCLOSEZONE";
669
+ zo = BLKCLOSEZONE;
670
+ break;
671
+ case BLK_ZO_FINISH:
672
+ op_name = "BLKFINISHZONE";
673
+ zo = BLKFINISHZONE;
674
+ break;
675
+ case BLK_ZO_RESET:
676
+ op_name = "BLKRESETZONE";
677
+ zo = BLKRESETZONE;
678
+ break;
679
+ default:
680
+ error_report("Unsupported zone op: 0x%x", op);
681
+ return -ENOTSUP;
682
+ }
683
+
684
+ acb = (RawPosixAIOData) {
685
+ .bs = bs,
686
+ .aio_fildes = s->fd,
687
+ .aio_type = QEMU_AIO_ZONE_MGMT,
688
+ .aio_offset = offset,
689
+ .aio_nbytes = len,
690
+ .zone_mgmt = {
691
+ .op = zo,
692
+ },
693
+ };
694
+
695
+ ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb);
696
+ if (ret != 0) {
697
+ error_report("ioctl %s failed %d", op_name, ret);
698
+ }
699
+
700
+ return ret;
701
+}
702
+#endif
703
+
704
static coroutine_fn int
705
raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
706
bool blkdev)
707
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_device = {
708
#ifdef __linux__
709
.bdrv_co_ioctl = hdev_co_ioctl,
710
#endif
711
+
712
+ /* zoned device */
713
+#if defined(CONFIG_BLKZONED)
714
+ /* zone management operations */
715
+ .bdrv_co_zone_report = raw_co_zone_report,
716
+ .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
717
+#endif
718
};
719
720
#if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
721
diff --git a/block/io.c b/block/io.c
722
index XXXXXXX..XXXXXXX 100644
723
--- a/block/io.c
724
+++ b/block/io.c
725
@@ -XXX,XX +XXX,XX @@ out:
726
return co.ret;
727
}
728
729
+int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
730
+ unsigned int *nr_zones,
731
+ BlockZoneDescriptor *zones)
732
+{
733
+ BlockDriver *drv = bs->drv;
734
+ CoroutineIOCompletion co = {
735
+ .coroutine = qemu_coroutine_self(),
736
+ };
737
+ IO_CODE();
738
+
739
+ bdrv_inc_in_flight(bs);
740
+ if (!drv || !drv->bdrv_co_zone_report || bs->bl.zoned == BLK_Z_NONE) {
741
+ co.ret = -ENOTSUP;
742
+ goto out;
743
+ }
744
+ co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones);
745
+out:
746
+ bdrv_dec_in_flight(bs);
747
+ return co.ret;
748
+}
749
+
750
+int coroutine_fn bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
751
+ int64_t offset, int64_t len)
752
+{
753
+ BlockDriver *drv = bs->drv;
754
+ CoroutineIOCompletion co = {
755
+ .coroutine = qemu_coroutine_self(),
756
+ };
757
+ IO_CODE();
758
+
759
+ bdrv_inc_in_flight(bs);
760
+ if (!drv || !drv->bdrv_co_zone_mgmt || bs->bl.zoned == BLK_Z_NONE) {
761
+ co.ret = -ENOTSUP;
762
+ goto out;
763
+ }
764
+ co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len);
765
+out:
766
+ bdrv_dec_in_flight(bs);
767
+ return co.ret;
768
+}
769
+
770
void *qemu_blockalign(BlockDriverState *bs, size_t size)
51
{
771
{
52
if (err == -ENODEV || err == -ENOSYS || err == -EOPNOTSUPP ||
772
IO_CODE();
53
@@ -XXX,XX +XXX,XX @@ static int translate_err(int err)
773
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
54
}
774
index XXXXXXX..XXXXXXX 100644
55
return err;
775
--- a/qemu-io-cmds.c
56
}
776
+++ b/qemu-io-cmds.c
57
+#endif
777
@@ -XXX,XX +XXX,XX @@ static const cmdinfo_t flush_cmd = {
58
778
.oneline = "flush all in-core file state to disk",
59
#ifdef CONFIG_FALLOCATE
779
};
60
static int do_fallocate(int fd, int mode, off_t offset, off_t len)
780
61
@@ -XXX,XX +XXX,XX @@ static int handle_aiocb_discard(void *opaque)
781
+static inline int64_t tosector(int64_t bytes)
62
}
782
+{
63
} while (errno == EINTR);
783
+ return bytes >> BDRV_SECTOR_BITS;
64
784
+}
65
- ret = -errno;
785
+
66
+ ret = translate_err(-errno);
786
+static int zone_report_f(BlockBackend *blk, int argc, char **argv)
67
#endif
787
+{
68
} else {
788
+ int ret;
69
#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
789
+ int64_t offset;
70
ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
790
+ unsigned int nr_zones;
71
aiocb->aio_offset, aiocb->aio_nbytes);
791
+
72
+ ret = translate_err(-errno);
792
+ ++optind;
73
+#elif defined(__APPLE__) && (__MACH__)
793
+ offset = cvtnum(argv[optind]);
74
+ fpunchhole_t fpunchhole;
794
+ ++optind;
75
+ fpunchhole.fp_flags = 0;
795
+ nr_zones = cvtnum(argv[optind]);
76
+ fpunchhole.reserved = 0;
796
+
77
+ fpunchhole.fp_offset = aiocb->aio_offset;
797
+ g_autofree BlockZoneDescriptor *zones = NULL;
78
+ fpunchhole.fp_length = aiocb->aio_nbytes;
798
+ zones = g_new(BlockZoneDescriptor, nr_zones);
79
+ if (fcntl(s->fd, F_PUNCHHOLE, &fpunchhole) == -1) {
799
+ ret = blk_zone_report(blk, offset, &nr_zones, zones);
80
+ ret = errno == ENODEV ? -ENOTSUP : -errno;
800
+ if (ret < 0) {
81
+ } else {
801
+ printf("zone report failed: %s\n", strerror(-ret));
82
+ ret = 0;
802
+ } else {
803
+ for (int i = 0; i < nr_zones; ++i) {
804
+ printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", "
805
+ "cap"" 0x%" PRIx64 ", wptr 0x%" PRIx64 ", "
806
+ "zcond:%u, [type: %u]\n",
807
+ tosector(zones[i].start), tosector(zones[i].length),
808
+ tosector(zones[i].cap), tosector(zones[i].wp),
809
+ zones[i].state, zones[i].type);
83
+ }
810
+ }
84
#endif
811
+ }
85
}
812
+ return ret;
86
813
+}
87
- ret = translate_err(ret);
814
+
88
if (ret == -ENOTSUP) {
815
+static const cmdinfo_t zone_report_cmd = {
89
s->has_discard = false;
816
+ .name = "zone_report",
90
}
817
+ .altname = "zrp",
818
+ .cfunc = zone_report_f,
819
+ .argmin = 2,
820
+ .argmax = 2,
821
+ .args = "offset number",
822
+ .oneline = "report zone information",
823
+};
824
+
825
+static int zone_open_f(BlockBackend *blk, int argc, char **argv)
826
+{
827
+ int ret;
828
+ int64_t offset, len;
829
+ ++optind;
830
+ offset = cvtnum(argv[optind]);
831
+ ++optind;
832
+ len = cvtnum(argv[optind]);
833
+ ret = blk_zone_mgmt(blk, BLK_ZO_OPEN, offset, len);
834
+ if (ret < 0) {
835
+ printf("zone open failed: %s\n", strerror(-ret));
836
+ }
837
+ return ret;
838
+}
839
+
840
+static const cmdinfo_t zone_open_cmd = {
841
+ .name = "zone_open",
842
+ .altname = "zo",
843
+ .cfunc = zone_open_f,
844
+ .argmin = 2,
845
+ .argmax = 2,
846
+ .args = "offset len",
847
+ .oneline = "explicit open a range of zones in zone block device",
848
+};
849
+
850
+static int zone_close_f(BlockBackend *blk, int argc, char **argv)
851
+{
852
+ int ret;
853
+ int64_t offset, len;
854
+ ++optind;
855
+ offset = cvtnum(argv[optind]);
856
+ ++optind;
857
+ len = cvtnum(argv[optind]);
858
+ ret = blk_zone_mgmt(blk, BLK_ZO_CLOSE, offset, len);
859
+ if (ret < 0) {
860
+ printf("zone close failed: %s\n", strerror(-ret));
861
+ }
862
+ return ret;
863
+}
864
+
865
+static const cmdinfo_t zone_close_cmd = {
866
+ .name = "zone_close",
867
+ .altname = "zc",
868
+ .cfunc = zone_close_f,
869
+ .argmin = 2,
870
+ .argmax = 2,
871
+ .args = "offset len",
872
+ .oneline = "close a range of zones in zone block device",
873
+};
874
+
875
+static int zone_finish_f(BlockBackend *blk, int argc, char **argv)
876
+{
877
+ int ret;
878
+ int64_t offset, len;
879
+ ++optind;
880
+ offset = cvtnum(argv[optind]);
881
+ ++optind;
882
+ len = cvtnum(argv[optind]);
883
+ ret = blk_zone_mgmt(blk, BLK_ZO_FINISH, offset, len);
884
+ if (ret < 0) {
885
+ printf("zone finish failed: %s\n", strerror(-ret));
886
+ }
887
+ return ret;
888
+}
889
+
890
+static const cmdinfo_t zone_finish_cmd = {
891
+ .name = "zone_finish",
892
+ .altname = "zf",
893
+ .cfunc = zone_finish_f,
894
+ .argmin = 2,
895
+ .argmax = 2,
896
+ .args = "offset len",
897
+ .oneline = "finish a range of zones in zone block device",
898
+};
899
+
900
+static int zone_reset_f(BlockBackend *blk, int argc, char **argv)
901
+{
902
+ int ret;
903
+ int64_t offset, len;
904
+ ++optind;
905
+ offset = cvtnum(argv[optind]);
906
+ ++optind;
907
+ len = cvtnum(argv[optind]);
908
+ ret = blk_zone_mgmt(blk, BLK_ZO_RESET, offset, len);
909
+ if (ret < 0) {
910
+ printf("zone reset failed: %s\n", strerror(-ret));
911
+ }
912
+ return ret;
913
+}
914
+
915
+static const cmdinfo_t zone_reset_cmd = {
916
+ .name = "zone_reset",
917
+ .altname = "zrs",
918
+ .cfunc = zone_reset_f,
919
+ .argmin = 2,
920
+ .argmax = 2,
921
+ .args = "offset len",
922
+ .oneline = "reset a zone write pointer in zone block device",
923
+};
924
+
925
static int truncate_f(BlockBackend *blk, int argc, char **argv);
926
static const cmdinfo_t truncate_cmd = {
927
.name = "truncate",
928
@@ -XXX,XX +XXX,XX @@ static void __attribute((constructor)) init_qemuio_commands(void)
929
qemuio_add_command(&aio_write_cmd);
930
qemuio_add_command(&aio_flush_cmd);
931
qemuio_add_command(&flush_cmd);
932
+ qemuio_add_command(&zone_report_cmd);
933
+ qemuio_add_command(&zone_open_cmd);
934
+ qemuio_add_command(&zone_close_cmd);
935
+ qemuio_add_command(&zone_finish_cmd);
936
+ qemuio_add_command(&zone_reset_cmd);
937
qemuio_add_command(&truncate_cmd);
938
qemuio_add_command(&length_cmd);
939
qemuio_add_command(&info_cmd);
91
--
940
--
92
2.31.1
941
2.40.1
93
942
943
diff view generated by jsdifflib
New patch
1
From: Sam Li <faithilikerun@gmail.com>
1
2
3
raw-format driver usually sits on top of file-posix driver. It needs to
4
pass through requests of zone commands.
5
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
9
Reviewed-by: Hannes Reinecke <hare@suse.de>
10
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
11
Acked-by: Kevin Wolf <kwolf@redhat.com>
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Message-id: 20230508045533.175575-5-faithilikerun@gmail.com
14
Message-id: 20230324090605.28361-5-faithilikerun@gmail.com
15
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
16
<philmd@linaro.org>.
17
--Stefan]
18
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
19
---
20
block/raw-format.c | 17 +++++++++++++++++
21
1 file changed, 17 insertions(+)
22
23
diff --git a/block/raw-format.c b/block/raw-format.c
24
index XXXXXXX..XXXXXXX 100644
25
--- a/block/raw-format.c
26
+++ b/block/raw-format.c
27
@@ -XXX,XX +XXX,XX @@ raw_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
28
return bdrv_co_pdiscard(bs->file, offset, bytes);
29
}
30
31
+static int coroutine_fn GRAPH_RDLOCK
32
+raw_co_zone_report(BlockDriverState *bs, int64_t offset,
33
+ unsigned int *nr_zones,
34
+ BlockZoneDescriptor *zones)
35
+{
36
+ return bdrv_co_zone_report(bs->file->bs, offset, nr_zones, zones);
37
+}
38
+
39
+static int coroutine_fn GRAPH_RDLOCK
40
+raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
41
+ int64_t offset, int64_t len)
42
+{
43
+ return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
44
+}
45
+
46
static int64_t coroutine_fn GRAPH_RDLOCK
47
raw_co_getlength(BlockDriverState *bs)
48
{
49
@@ -XXX,XX +XXX,XX @@ BlockDriver bdrv_raw = {
50
.bdrv_co_pwritev = &raw_co_pwritev,
51
.bdrv_co_pwrite_zeroes = &raw_co_pwrite_zeroes,
52
.bdrv_co_pdiscard = &raw_co_pdiscard,
53
+ .bdrv_co_zone_report = &raw_co_zone_report,
54
+ .bdrv_co_zone_mgmt = &raw_co_zone_mgmt,
55
.bdrv_co_block_status = &raw_co_block_status,
56
.bdrv_co_copy_range_from = &raw_co_copy_range_from,
57
.bdrv_co_copy_range_to = &raw_co_copy_range_to,
58
--
59
2.40.1
60
61
diff view generated by jsdifflib
New patch
1
From: Sam Li <faithilikerun@gmail.com>
1
2
3
Putting zoned/non-zoned BlockDrivers on top of each other is not
4
allowed.
5
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Reviewed-by: Hannes Reinecke <hare@suse.de>
9
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Message-id: 20230508045533.175575-6-faithilikerun@gmail.com
13
Message-id: 20230324090605.28361-6-faithilikerun@gmail.com
14
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
15
<philmd@linaro.org> and clarify that the check is about zoned
16
BlockDrivers.
17
--Stefan]
18
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
19
---
20
include/block/block_int-common.h | 5 +++++
21
block.c | 19 +++++++++++++++++++
22
block/file-posix.c | 12 ++++++++++++
23
block/raw-format.c | 1 +
24
4 files changed, 37 insertions(+)
25
26
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
27
index XXXXXXX..XXXXXXX 100644
28
--- a/include/block/block_int-common.h
29
+++ b/include/block/block_int-common.h
30
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
31
*/
32
bool is_format;
33
34
+ /*
35
+ * Set to true if the BlockDriver supports zoned children.
36
+ */
37
+ bool supports_zoned_children;
38
+
39
/*
40
* Drivers not implementing bdrv_parse_filename nor bdrv_open should have
41
* this field set to true, except ones that are defined only by their
42
diff --git a/block.c b/block.c
43
index XXXXXXX..XXXXXXX 100644
44
--- a/block.c
45
+++ b/block.c
46
@@ -XXX,XX +XXX,XX @@ void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
47
return;
48
}
49
50
+ /*
51
+ * Non-zoned block drivers do not follow zoned storage constraints
52
+ * (i.e. sequential writes to zones). Refuse mixing zoned and non-zoned
53
+ * drivers in a graph.
54
+ */
55
+ if (!parent_bs->drv->supports_zoned_children &&
56
+ child_bs->bl.zoned == BLK_Z_HM) {
57
+ /*
58
+ * The host-aware model allows zoned storage constraints and random
59
+ * write. Allow mixing host-aware and non-zoned drivers. Using
60
+ * host-aware device as a regular device.
61
+ */
62
+ error_setg(errp, "Cannot add a %s child to a %s parent",
63
+ child_bs->bl.zoned == BLK_Z_HM ? "zoned" : "non-zoned",
64
+ parent_bs->drv->supports_zoned_children ?
65
+ "support zoned children" : "not support zoned children");
66
+ return;
67
+ }
68
+
69
if (!QLIST_EMPTY(&child_bs->parents)) {
70
error_setg(errp, "The node %s already has a parent",
71
child_bs->node_name);
72
diff --git a/block/file-posix.c b/block/file-posix.c
73
index XXXXXXX..XXXXXXX 100644
74
--- a/block/file-posix.c
75
+++ b/block/file-posix.c
76
@@ -XXX,XX +XXX,XX @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
77
goto fail;
78
}
79
}
80
+#ifdef CONFIG_BLKZONED
81
+ /*
82
+ * The kernel page cache does not reliably work for writes to SWR zones
83
+ * of zoned block device because it can not guarantee the order of writes.
84
+ */
85
+ if ((bs->bl.zoned != BLK_Z_NONE) &&
86
+ (!(s->open_flags & O_DIRECT))) {
87
+ error_setg(errp, "The driver supports zoned devices, and it requires "
88
+ "cache.direct=on, which was not specified.");
89
+ return -EINVAL; /* No host kernel page cache */
90
+ }
91
+#endif
92
93
if (S_ISBLK(st.st_mode)) {
94
#ifdef __linux__
95
diff --git a/block/raw-format.c b/block/raw-format.c
96
index XXXXXXX..XXXXXXX 100644
97
--- a/block/raw-format.c
98
+++ b/block/raw-format.c
99
@@ -XXX,XX +XXX,XX @@ static void raw_child_perm(BlockDriverState *bs, BdrvChild *c,
100
BlockDriver bdrv_raw = {
101
.format_name = "raw",
102
.instance_size = sizeof(BDRVRawState),
103
+ .supports_zoned_children = true,
104
.bdrv_probe = &raw_probe,
105
.bdrv_reopen_prepare = &raw_reopen_prepare,
106
.bdrv_reopen_commit = &raw_reopen_commit,
107
--
108
2.40.1
109
110
diff view generated by jsdifflib
1
BHs must be deleted before the AioContext is finalized. If not, it's a
1
From: Sam Li <faithilikerun@gmail.com>
2
bug and probably indicates that some part of the program still expects
3
the BH to run in the future. That can lead to memory leaks, inconsistent
4
state, or just hangs.
5
2
6
Unfortunately the assert(flags & BH_DELETED) call in aio_ctx_finalize()
3
The new block layer APIs of zoned block devices can be tested by:
7
is difficult to debug because the assertion failure contains no
4
$ tests/qemu-iotests/check zoned
8
information about the BH!
5
Run each zone operation on a newly created null_blk device
6
and see whether it outputs the same zone information.
9
7
10
Use the QEMUBH name field added in the previous patch to show a useful
8
Signed-off-by: Sam Li <faithilikerun@gmail.com>
11
error when a leaked BH is detected.
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Message-id: 20230508045533.175575-7-faithilikerun@gmail.com
13
Message-id: 20230324090605.28361-7-faithilikerun@gmail.com
14
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
15
<philmd@linaro.org>.
16
--Stefan]
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
18
---
19
tests/qemu-iotests/tests/zoned | 89 ++++++++++++++++++++++++++++++
20
tests/qemu-iotests/tests/zoned.out | 53 ++++++++++++++++++
21
2 files changed, 142 insertions(+)
22
create mode 100755 tests/qemu-iotests/tests/zoned
23
create mode 100644 tests/qemu-iotests/tests/zoned.out
12
24
13
Suggested-by: Eric Ernst <eric.g.ernst@gmail.com>
25
diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
26
new file mode 100755
15
Message-Id: <20210414200247.917496-3-stefanha@redhat.com>
27
index XXXXXXX..XXXXXXX
16
---
28
--- /dev/null
17
util/async.c | 16 ++++++++++++++--
29
+++ b/tests/qemu-iotests/tests/zoned
18
1 file changed, 14 insertions(+), 2 deletions(-)
30
@@ -XXX,XX +XXX,XX @@
31
+#!/usr/bin/env bash
32
+#
33
+# Test zone management operations.
34
+#
35
+
36
+seq="$(basename $0)"
37
+echo "QA output created by $seq"
38
+status=1 # failure is the default!
39
+
40
+_cleanup()
41
+{
42
+ _cleanup_test_img
43
+ sudo -n rmmod null_blk
44
+}
45
+trap "_cleanup; exit \$status" 0 1 2 3 15
46
+
47
+# get standard environment, filters and checks
48
+. ../common.rc
49
+. ../common.filter
50
+. ../common.qemu
51
+
52
+# This test only runs on Linux hosts with raw image files.
53
+_supported_fmt raw
54
+_supported_proto file
55
+_supported_os Linux
56
+
57
+sudo -n true || \
58
+ _notrun 'Password-less sudo required'
59
+
60
+IMG="--image-opts -n driver=host_device,filename=/dev/nullb0"
61
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
62
+
63
+echo "Testing a null_blk device:"
64
+echo "case 1: if the operations work"
65
+sudo -n modprobe null_blk nr_devices=1 zoned=1
66
+sudo -n chmod 0666 /dev/nullb0
67
+
68
+echo "(1) report the first zone:"
69
+$QEMU_IO $IMG -c "zrp 0 1"
70
+echo
71
+echo "report the first 10 zones"
72
+$QEMU_IO $IMG -c "zrp 0 10"
73
+echo
74
+echo "report the last zone:"
75
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2" # 0x3e70000000 / 512 = 0x1f380000
76
+echo
77
+echo
78
+echo "(2) opening the first zone"
79
+$QEMU_IO $IMG -c "zo 0 268435456" # 268435456 / 512 = 524288
80
+echo "report after:"
81
+$QEMU_IO $IMG -c "zrp 0 1"
82
+echo
83
+echo "opening the second zone"
84
+$QEMU_IO $IMG -c "zo 268435456 268435456" #
85
+echo "report after:"
86
+$QEMU_IO $IMG -c "zrp 268435456 1"
87
+echo
88
+echo "opening the last zone"
89
+$QEMU_IO $IMG -c "zo 0x3e70000000 268435456"
90
+echo "report after:"
91
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2"
92
+echo
93
+echo
94
+echo "(3) closing the first zone"
95
+$QEMU_IO $IMG -c "zc 0 268435456"
96
+echo "report after:"
97
+$QEMU_IO $IMG -c "zrp 0 1"
98
+echo
99
+echo "closing the last zone"
100
+$QEMU_IO $IMG -c "zc 0x3e70000000 268435456"
101
+echo "report after:"
102
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2"
103
+echo
104
+echo
105
+echo "(4) finishing the second zone"
106
+$QEMU_IO $IMG -c "zf 268435456 268435456"
107
+echo "After finishing a zone:"
108
+$QEMU_IO $IMG -c "zrp 268435456 1"
109
+echo
110
+echo
111
+echo "(5) resetting the second zone"
112
+$QEMU_IO $IMG -c "zrs 268435456 268435456"
113
+echo "After resetting a zone:"
114
+$QEMU_IO $IMG -c "zrp 268435456 1"
115
+
116
+# success, all done
117
+echo "*** done"
118
+rm -f $seq.full
119
+status=0
120
diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out
121
new file mode 100644
122
index XXXXXXX..XXXXXXX
123
--- /dev/null
124
+++ b/tests/qemu-iotests/tests/zoned.out
125
@@ -XXX,XX +XXX,XX @@
126
+QA output created by zoned
127
+Testing a null_blk device:
128
+case 1: if the operations work
129
+(1) report the first zone:
130
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
131
+
132
+report the first 10 zones
133
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
134
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
135
+start: 0x100000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:1, [type: 2]
136
+start: 0x180000, len 0x80000, cap 0x80000, wptr 0x180000, zcond:1, [type: 2]
137
+start: 0x200000, len 0x80000, cap 0x80000, wptr 0x200000, zcond:1, [type: 2]
138
+start: 0x280000, len 0x80000, cap 0x80000, wptr 0x280000, zcond:1, [type: 2]
139
+start: 0x300000, len 0x80000, cap 0x80000, wptr 0x300000, zcond:1, [type: 2]
140
+start: 0x380000, len 0x80000, cap 0x80000, wptr 0x380000, zcond:1, [type: 2]
141
+start: 0x400000, len 0x80000, cap 0x80000, wptr 0x400000, zcond:1, [type: 2]
142
+start: 0x480000, len 0x80000, cap 0x80000, wptr 0x480000, zcond:1, [type: 2]
143
+
144
+report the last zone:
145
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:1, [type: 2]
146
+
147
+
148
+(2) opening the first zone
149
+report after:
150
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:3, [type: 2]
151
+
152
+opening the second zone
153
+report after:
154
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:3, [type: 2]
155
+
156
+opening the last zone
157
+report after:
158
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:3, [type: 2]
159
+
160
+
161
+(3) closing the first zone
162
+report after:
163
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
164
+
165
+closing the last zone
166
+report after:
167
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:1, [type: 2]
168
+
169
+
170
+(4) finishing the second zone
171
+After finishing a zone:
172
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:14, [type: 2]
173
+
174
+
175
+(5) resetting the second zone
176
+After resetting a zone:
177
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
178
+*** done
179
--
180
2.40.1
19
181
20
diff --git a/util/async.c b/util/async.c
21
index XXXXXXX..XXXXXXX 100644
22
--- a/util/async.c
23
+++ b/util/async.c
24
@@ -XXX,XX +XXX,XX @@ aio_ctx_finalize(GSource *source)
25
assert(QSIMPLEQ_EMPTY(&ctx->bh_slice_list));
26
27
while ((bh = aio_bh_dequeue(&ctx->bh_list, &flags))) {
28
- /* qemu_bh_delete() must have been called on BHs in this AioContext */
29
- assert(flags & BH_DELETED);
30
+ /*
31
+ * qemu_bh_delete() must have been called on BHs in this AioContext. In
32
+ * many cases memory leaks, hangs, or inconsistent state occur when a
33
+ * BH is leaked because something still expects it to run.
34
+ *
35
+ * If you hit this, fix the lifecycle of the BH so that
36
+ * qemu_bh_delete() and any associated cleanup is called before the
37
+ * AioContext is finalized.
38
+ */
39
+ if (unlikely(!(flags & BH_DELETED))) {
40
+ fprintf(stderr, "%s: BH '%s' leaked, aborting...\n",
41
+ __func__, bh->name);
42
+ abort();
43
+ }
44
45
g_free(bh);
46
}
47
--
48
2.31.1
49
182
diff view generated by jsdifflib
New patch
1
From: Sam Li <faithilikerun@gmail.com>
1
2
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
6
Acked-by: Kevin Wolf <kwolf@redhat.com>
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Message-id: 20230508045533.175575-8-faithilikerun@gmail.com
9
Message-id: 20230324090605.28361-8-faithilikerun@gmail.com
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
12
block/file-posix.c | 3 +++
13
block/trace-events | 2 ++
14
2 files changed, 5 insertions(+)
15
16
diff --git a/block/file-posix.c b/block/file-posix.c
17
index XXXXXXX..XXXXXXX 100644
18
--- a/block/file-posix.c
19
+++ b/block/file-posix.c
20
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
21
},
22
};
23
24
+ trace_zbd_zone_report(bs, *nr_zones, offset >> BDRV_SECTOR_BITS);
25
return raw_thread_pool_submit(handle_aiocb_zone_report, &acb);
26
}
27
#endif
28
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
29
},
30
};
31
32
+ trace_zbd_zone_mgmt(bs, op_name, offset >> BDRV_SECTOR_BITS,
33
+ len >> BDRV_SECTOR_BITS);
34
ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb);
35
if (ret != 0) {
36
error_report("ioctl %s failed %d", op_name, ret);
37
diff --git a/block/trace-events b/block/trace-events
38
index XXXXXXX..XXXXXXX 100644
39
--- a/block/trace-events
40
+++ b/block/trace-events
41
@@ -XXX,XX +XXX,XX @@ file_FindEjectableOpticalMedia(const char *media) "Matching using %s"
42
file_setup_cdrom(const char *partition) "Using %s as optical disc"
43
file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d"
44
file_flush_fdatasync_failed(int err) "errno %d"
45
+zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report %d zones starting at sector offset 0x%" PRIx64 ""
46
+zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs %p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " sectors"
47
48
# ssh.c
49
sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
50
--
51
2.40.1
diff view generated by jsdifflib
New patch
1
From: Sam Li <faithilikerun@gmail.com>
1
2
3
Add the documentation about the zoned device support to virtio-blk
4
emulation.
5
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
9
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Message-id: 20230508045533.175575-9-faithilikerun@gmail.com
13
Message-id: 20230324090605.28361-9-faithilikerun@gmail.com
14
[Add index-api.rst to fix "zoned-storage.rst:document isn't included in
15
any toctree" error and fix pre-formatted code syntax.
16
--Stefan]
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
18
---
19
docs/devel/index-api.rst | 1 +
20
docs/devel/zoned-storage.rst | 43 ++++++++++++++++++++++++++
21
docs/system/qemu-block-drivers.rst.inc | 6 ++++
22
3 files changed, 50 insertions(+)
23
create mode 100644 docs/devel/zoned-storage.rst
24
25
diff --git a/docs/devel/index-api.rst b/docs/devel/index-api.rst
26
index XXXXXXX..XXXXXXX 100644
27
--- a/docs/devel/index-api.rst
28
+++ b/docs/devel/index-api.rst
29
@@ -XXX,XX +XXX,XX @@ generated from in-code annotations to function prototypes.
30
memory
31
modules
32
ui
33
+ zoned-storage
34
diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
35
new file mode 100644
36
index XXXXXXX..XXXXXXX
37
--- /dev/null
38
+++ b/docs/devel/zoned-storage.rst
39
@@ -XXX,XX +XXX,XX @@
40
+=============
41
+zoned-storage
42
+=============
43
+
44
+Zoned Block Devices (ZBDs) divide the LBA space into block regions called zones
45
+that are larger than the LBA size. They can only allow sequential writes, which
46
+can reduce write amplification in SSDs, and potentially lead to higher
47
+throughput and increased capacity. More details about ZBDs can be found at:
48
+
49
+https://zonedstorage.io/docs/introduction/zoned-storage
50
+
51
+1. Block layer APIs for zoned storage
52
+-------------------------------------
53
+QEMU block layer supports three zoned storage models:
54
+- BLK_Z_HM: The host-managed zoned model only allows sequential writes access
55
+to zones. It supports ZBD-specific I/O commands that can be used by a host to
56
+manage the zones of a device.
57
+- BLK_Z_HA: The host-aware zoned model allows random write operations in
58
+zones, making it backward compatible with regular block devices.
59
+- BLK_Z_NONE: The non-zoned model has no zones support. It includes both
60
+regular and drive-managed ZBD devices. ZBD-specific I/O commands are not
61
+supported.
62
+
63
+The block device information resides inside BlockDriverState. QEMU uses
64
+BlockLimits struct(BlockDriverState::bl) that is continuously accessed by the
65
+block layer while processing I/O requests. A BlockBackend has a root pointer to
66
+a BlockDriverState graph(for example, raw format on top of file-posix). The
67
+zoned storage information can be propagated from the leaf BlockDriverState all
68
+the way up to the BlockBackend. If the zoned storage model in file-posix is
69
+set to BLK_Z_HM, then block drivers will declare support for zoned host device.
70
+
71
+The block layer APIs support commands needed for zoned storage devices,
72
+including report zones, four zone operations, and zone append.
73
+
74
+2. Emulating zoned storage controllers
75
+--------------------------------------
76
+When the BlockBackend's BlockLimits model reports a zoned storage device, users
77
+like the virtio-blk emulation or the qemu-io-cmds.c utility can use block layer
78
+APIs for zoned storage emulation or testing.
79
+
80
+For example, to test zone_report on a null_blk device using qemu-io is::
81
+
82
+ $ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0 -c "zrp offset nr_zones"
83
diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc
84
index XXXXXXX..XXXXXXX 100644
85
--- a/docs/system/qemu-block-drivers.rst.inc
86
+++ b/docs/system/qemu-block-drivers.rst.inc
87
@@ -XXX,XX +XXX,XX @@ Hard disks
88
you may corrupt your host data (use the ``-snapshot`` command
89
line option or modify the device permissions accordingly).
90
91
+Zoned block devices
92
+ Zoned block devices can be passed through to the guest if the emulated storage
93
+ controller supports zoned storage. Use ``--blockdev host_device,
94
+ node-name=drive0,filename=/dev/nullb0,cache.direct=on`` to pass through
95
+ ``/dev/nullb0`` as ``drive0``.
96
+
97
Windows
98
^^^^^^^
99
100
--
101
2.40.1
diff view generated by jsdifflib
1
It can be difficult to debug issues with BHs in production environments.
1
From: Sam Li <faithilikerun@gmail.com>
2
Although BHs can usually be identified by looking up their ->cb()
3
function pointer, this requires debug information for the program. It is
4
also not possible to print human-readable diagnostics about BHs because
5
they have no identifier.
6
2
7
This patch adds a name to each BH. The name is not unique per instance
3
Since Linux doesn't have a user API to issue zone append operations to
8
but differentiates between cb() functions, which is usually enough. It's
4
zoned devices from user space, the file-posix driver is modified to add
9
done by changing aio_bh_new() and friends to macros that stringify cb.
5
zone append emulation using regular writes. To do this, the file-posix
6
driver tracks the wp location of all zones of the device. It uses an
7
array of uint64_t. The most significant bit of each wp location indicates
8
if the zone type is conventional zones.
10
9
11
The next patch will use the name field when reporting leaked BHs.
10
The zones wp can be changed due to the following operations issued:
11
- zone reset: change the wp to the start offset of that zone
12
- zone finish: change to the end location of that zone
13
- write to a zone
14
- zone append
12
15
16
Signed-off-by: Sam Li <faithilikerun@gmail.com>
17
Message-id: 20230508051510.177850-2-faithilikerun@gmail.com
18
[Fix errno propagation from handle_aiocb_zone_mgmt()
19
--Stefan]
13
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
20
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
14
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
15
Message-Id: <20210414200247.917496-2-stefanha@redhat.com>
16
---
21
---
17
include/block/aio.h | 31 ++++++++++++++++++++++++++++---
22
include/block/block-common.h | 14 +++
18
include/qemu/main-loop.h | 4 +++-
23
include/block/block_int-common.h | 5 +
19
tests/unit/ptimer-test-stubs.c | 2 +-
24
block/file-posix.c | 178 ++++++++++++++++++++++++++++++-
20
util/async.c | 9 +++++++--
25
3 files changed, 193 insertions(+), 4 deletions(-)
21
util/main-loop.c | 4 ++--
22
5 files changed, 41 insertions(+), 9 deletions(-)
23
26
24
diff --git a/include/block/aio.h b/include/block/aio.h
27
diff --git a/include/block/block-common.h b/include/block/block-common.h
25
index XXXXXXX..XXXXXXX 100644
28
index XXXXXXX..XXXXXXX 100644
26
--- a/include/block/aio.h
29
--- a/include/block/block-common.h
27
+++ b/include/block/aio.h
30
+++ b/include/block/block-common.h
28
@@ -XXX,XX +XXX,XX @@ void aio_context_acquire(AioContext *ctx);
31
@@ -XXX,XX +XXX,XX @@ typedef struct BlockZoneDescriptor {
29
/* Relinquish ownership of the AioContext. */
32
BlockZoneState state;
30
void aio_context_release(AioContext *ctx);
33
} BlockZoneDescriptor;
31
34
32
+/**
35
+/*
33
+ * aio_bh_schedule_oneshot_full: Allocate a new bottom half structure that will
36
+ * Track write pointers of a zone in bytes.
34
+ * run only once and as soon as possible.
35
+ *
36
+ * @name: A human-readable identifier for debugging purposes.
37
+ */
37
+ */
38
+void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
38
+typedef struct BlockZoneWps {
39
+ const char *name);
39
+ CoMutex colock;
40
+
40
+ uint64_t wp[];
41
/**
41
+} BlockZoneWps;
42
* aio_bh_schedule_oneshot: Allocate a new bottom half structure that will run
42
+
43
* only once and as soon as possible.
43
typedef struct BlockDriverInfo {
44
+ *
44
/* in bytes, 0 if irrelevant */
45
+ * A convenience wrapper for aio_bh_schedule_oneshot_full() that uses cb as the
45
int cluster_size;
46
+ * name string.
46
@@ -XXX,XX +XXX,XX @@ typedef enum {
47
*/
47
#define BDRV_SECTOR_BITS 9
48
-void aio_bh_schedule_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque);
48
#define BDRV_SECTOR_SIZE (1ULL << BDRV_SECTOR_BITS)
49
+#define aio_bh_schedule_oneshot(ctx, cb, opaque) \
49
50
+ aio_bh_schedule_oneshot_full((ctx), (cb), (opaque), (stringify(cb)))
50
+/*
51
51
+ * Get the first most significant bit of wp. If it is zero, then
52
/**
52
+ * the zone type is SWR.
53
- * aio_bh_new: Allocate a new bottom half structure.
54
+ * aio_bh_new_full: Allocate a new bottom half structure.
55
*
56
* Bottom halves are lightweight callbacks whose invocation is guaranteed
57
* to be wait-free, thread-safe and signal-safe. The #QEMUBH structure
58
* is opaque and must be allocated prior to its use.
59
+ *
60
+ * @name: A human-readable identifier for debugging purposes.
61
*/
62
-QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque);
63
+QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
64
+ const char *name);
65
+
66
+/**
67
+ * aio_bh_new: Allocate a new bottom half structure
68
+ *
69
+ * A convenience wrapper for aio_bh_new_full() that uses the cb as the name
70
+ * string.
71
+ */
53
+ */
72
+#define aio_bh_new(ctx, cb, opaque) \
54
+#define BDRV_ZT_IS_CONV(wp) (wp & (1ULL << 63))
73
+ aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)))
55
+
74
56
#define BDRV_REQUEST_MAX_SECTORS MIN_CONST(SIZE_MAX >> BDRV_SECTOR_BITS, \
75
/**
57
INT_MAX >> BDRV_SECTOR_BITS)
76
* aio_notify: Force processing of pending events.
58
#define BDRV_REQUEST_MAX_BYTES (BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS)
77
diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
59
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
78
index XXXXXXX..XXXXXXX 100644
60
index XXXXXXX..XXXXXXX 100644
79
--- a/include/qemu/main-loop.h
61
--- a/include/block/block_int-common.h
80
+++ b/include/qemu/main-loop.h
62
+++ b/include/block/block_int-common.h
81
@@ -XXX,XX +XXX,XX @@ void qemu_cond_timedwait_iothread(QemuCond *cond, int ms);
63
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
82
64
83
void qemu_fd_register(int fd);
65
/* maximum number of active zones */
84
66
uint32_t max_active_zones;
85
-QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque);
67
+
86
+#define qemu_bh_new(cb, opaque) \
68
+ uint32_t write_granularity;
87
+ qemu_bh_new_full((cb), (opaque), (stringify(cb)))
69
} BlockLimits;
88
+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name);
70
89
void qemu_bh_schedule_idle(QEMUBH *bh);
71
typedef struct BdrvOpBlocker BdrvOpBlocker;
90
72
@@ -XXX,XX +XXX,XX @@ struct BlockDriverState {
91
enum {
73
CoMutex bsc_modify_lock;
92
diff --git a/tests/unit/ptimer-test-stubs.c b/tests/unit/ptimer-test-stubs.c
74
/* Always non-NULL, but must only be dereferenced under an RCU read guard */
75
BdrvBlockStatusCache *block_status_cache;
76
+
77
+ /* array of write pointers' location of each zone in the zoned device. */
78
+ BlockZoneWps *wps;
79
};
80
81
struct BlockBackendRootState {
82
diff --git a/block/file-posix.c b/block/file-posix.c
93
index XXXXXXX..XXXXXXX 100644
83
index XXXXXXX..XXXXXXX 100644
94
--- a/tests/unit/ptimer-test-stubs.c
84
--- a/block/file-posix.c
95
+++ b/tests/unit/ptimer-test-stubs.c
85
+++ b/block/file-posix.c
96
@@ -XXX,XX +XXX,XX @@ int64_t qemu_clock_deadline_ns_all(QEMUClockType type, int attr_mask)
86
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_segments(int fd, struct stat *st)
97
return deadline;
98
}
87
}
99
88
100
-QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque)
89
#if defined(CONFIG_BLKZONED)
101
+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name)
90
+/*
91
+ * If the reset_all flag is true, then the wps of zone whose state is
92
+ * not readonly or offline should be all reset to the start sector.
93
+ * Else, take the real wp of the device.
94
+ */
95
+static int get_zones_wp(BlockDriverState *bs, int fd, int64_t offset,
96
+ unsigned int nrz, bool reset_all)
97
+{
98
+ struct blk_zone *blkz;
99
+ size_t rep_size;
100
+ uint64_t sector = offset >> BDRV_SECTOR_BITS;
101
+ BlockZoneWps *wps = bs->wps;
102
+ unsigned int j = offset / bs->bl.zone_size;
103
+ unsigned int n = 0, i = 0;
104
+ int ret;
105
+ rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
106
+ g_autofree struct blk_zone_report *rep = NULL;
107
+
108
+ rep = g_malloc(rep_size);
109
+ blkz = (struct blk_zone *)(rep + 1);
110
+ while (n < nrz) {
111
+ memset(rep, 0, rep_size);
112
+ rep->sector = sector;
113
+ rep->nr_zones = nrz - n;
114
+
115
+ do {
116
+ ret = ioctl(fd, BLKREPORTZONE, rep);
117
+ } while (ret != 0 && errno == EINTR);
118
+ if (ret != 0) {
119
+ error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
120
+ fd, offset, errno);
121
+ return -errno;
122
+ }
123
+
124
+ if (!rep->nr_zones) {
125
+ break;
126
+ }
127
+
128
+ for (i = 0; i < rep->nr_zones; ++i, ++n, ++j) {
129
+ /*
130
+ * The wp tracking cares only about sequential writes required and
131
+ * sequential write preferred zones so that the wp can advance to
132
+ * the right location.
133
+ * Use the most significant bit of the wp location to indicate the
134
+ * zone type: 0 for SWR/SWP zones and 1 for conventional zones.
135
+ */
136
+ if (blkz[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
137
+ wps->wp[j] |= 1ULL << 63;
138
+ } else {
139
+ switch(blkz[i].cond) {
140
+ case BLK_ZONE_COND_FULL:
141
+ case BLK_ZONE_COND_READONLY:
142
+ /* Zone not writable */
143
+ wps->wp[j] = (blkz[i].start + blkz[i].len) << BDRV_SECTOR_BITS;
144
+ break;
145
+ case BLK_ZONE_COND_OFFLINE:
146
+ /* Zone not writable nor readable */
147
+ wps->wp[j] = (blkz[i].start) << BDRV_SECTOR_BITS;
148
+ break;
149
+ default:
150
+ if (reset_all) {
151
+ wps->wp[j] = blkz[i].start << BDRV_SECTOR_BITS;
152
+ } else {
153
+ wps->wp[j] = blkz[i].wp << BDRV_SECTOR_BITS;
154
+ }
155
+ break;
156
+ }
157
+ }
158
+ }
159
+ sector = blkz[i - 1].start + blkz[i - 1].len;
160
+ }
161
+
162
+ return 0;
163
+}
164
+
165
+static void update_zones_wp(BlockDriverState *bs, int fd, int64_t offset,
166
+ unsigned int nrz)
167
+{
168
+ if (get_zones_wp(bs, fd, offset, nrz, 0) < 0) {
169
+ error_report("update zone wp failed");
170
+ }
171
+}
172
+
173
static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
174
Error **errp)
102
{
175
{
103
QEMUBH *bh = g_new(QEMUBH, 1);
176
+ BDRVRawState *s = bs->opaque;
104
177
BlockZoneModel zoned;
105
diff --git a/util/async.c b/util/async.c
178
int ret;
106
index XXXXXXX..XXXXXXX 100644
179
107
--- a/util/async.c
180
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
108
+++ b/util/async.c
181
if (ret > 0) {
109
@@ -XXX,XX +XXX,XX @@ enum {
182
bs->bl.max_append_sectors = ret >> BDRV_SECTOR_BITS;
110
183
}
111
struct QEMUBH {
184
+
112
AioContext *ctx;
185
+ ret = get_sysfs_long_val(st, "physical_block_size");
113
+ const char *name;
186
+ if (ret >= 0) {
114
QEMUBHFunc *cb;
187
+ bs->bl.write_granularity = ret;
115
void *opaque;
188
+ }
116
QSLIST_ENTRY(QEMUBH) next;
189
+
117
@@ -XXX,XX +XXX,XX @@ static QEMUBH *aio_bh_dequeue(BHList *head, unsigned *flags)
190
+ /* The refresh_limits() function can be called multiple times. */
118
return bh;
191
+ g_free(bs->wps);
192
+ bs->wps = g_malloc(sizeof(BlockZoneWps) +
193
+ sizeof(int64_t) * bs->bl.nr_zones);
194
+ ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 0);
195
+ if (ret < 0) {
196
+ error_setg_errno(errp, -ret, "report wps failed");
197
+ bs->wps = NULL;
198
+ return;
199
+ }
200
+ qemu_co_mutex_init(&bs->wps->colock);
119
}
201
}
120
202
#else /* !defined(CONFIG_BLKZONED) */
121
-void aio_bh_schedule_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
203
static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
122
+void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb,
204
@@ -XXX,XX +XXX,XX @@ static int handle_aiocb_zone_mgmt(void *opaque)
123
+ void *opaque, const char *name)
205
ret = ioctl(fd, aiocb->zone_mgmt.op, &range);
206
} while (ret != 0 && errno == EINTR);
207
208
- return ret;
209
+ return ret < 0 ? -errno : ret;
210
}
211
#endif
212
213
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
124
{
214
{
125
QEMUBH *bh;
215
BDRVRawState *s = bs->opaque;
126
bh = g_new(QEMUBH, 1);
216
RawPosixAIOData acb;
127
@@ -XXX,XX +XXX,XX @@ void aio_bh_schedule_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
217
+ int ret;
128
.ctx = ctx,
218
129
.cb = cb,
219
if (fd_open(bs) < 0)
130
.opaque = opaque,
220
return -EIO;
131
+ .name = name,
221
+#if defined(CONFIG_BLKZONED)
222
+ if (type & QEMU_AIO_WRITE && bs->wps) {
223
+ qemu_co_mutex_lock(&bs->wps->colock);
224
+ }
225
+#endif
226
227
/*
228
* When using O_DIRECT, the request must be aligned to be able to use
229
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
230
#ifdef CONFIG_LINUX_IO_URING
231
} else if (s->use_linux_io_uring) {
232
assert(qiov->size == bytes);
233
- return luring_co_submit(bs, s->fd, offset, qiov, type);
234
+ ret = luring_co_submit(bs, s->fd, offset, qiov, type);
235
+ goto out;
236
#endif
237
#ifdef CONFIG_LINUX_AIO
238
} else if (s->use_linux_aio) {
239
assert(qiov->size == bytes);
240
- return laio_co_submit(s->fd, offset, qiov, type, s->aio_max_batch);
241
+ ret = laio_co_submit(s->fd, offset, qiov, type,
242
+ s->aio_max_batch);
243
+ goto out;
244
#endif
245
}
246
247
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
132
};
248
};
133
aio_bh_enqueue(bh, BH_SCHEDULED | BH_ONESHOT);
249
250
assert(qiov->size == bytes);
251
- return raw_thread_pool_submit(handle_aiocb_rw, &acb);
252
+ ret = raw_thread_pool_submit(handle_aiocb_rw, &acb);
253
+ goto out; /* Avoid the compiler err of unused label */
254
+
255
+out:
256
+#if defined(CONFIG_BLKZONED)
257
+{
258
+ BlockZoneWps *wps = bs->wps;
259
+ if (ret == 0) {
260
+ if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) {
261
+ uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
262
+ if (!BDRV_ZT_IS_CONV(*wp)) {
263
+ /* Advance the wp if needed */
264
+ if (offset + bytes > *wp) {
265
+ *wp = offset + bytes;
266
+ }
267
+ }
268
+ }
269
+ } else {
270
+ if (type & QEMU_AIO_WRITE) {
271
+ update_zones_wp(bs, s->fd, 0, 1);
272
+ }
273
+ }
274
+
275
+ if (type & QEMU_AIO_WRITE && wps) {
276
+ qemu_co_mutex_unlock(&wps->colock);
277
+ }
278
+}
279
+#endif
280
+ return ret;
134
}
281
}
135
282
136
-QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
283
static int coroutine_fn raw_co_preadv(BlockDriverState *bs, int64_t offset,
137
+QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
284
@@ -XXX,XX +XXX,XX @@ static void raw_close(BlockDriverState *bs)
138
+ const char *name)
285
BDRVRawState *s = bs->opaque;
139
{
286
140
QEMUBH *bh;
287
if (s->fd >= 0) {
141
bh = g_new(QEMUBH, 1);
288
+#if defined(CONFIG_BLKZONED)
142
@@ -XXX,XX +XXX,XX @@ QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
289
+ g_free(bs->wps);
143
.ctx = ctx,
290
+#endif
144
.cb = cb,
291
qemu_close(s->fd);
145
.opaque = opaque,
292
s->fd = -1;
146
+ .name = name,
293
}
147
};
294
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
148
return bh;
295
const char *op_name;
149
}
296
unsigned long zo;
150
diff --git a/util/main-loop.c b/util/main-loop.c
297
int ret;
151
index XXXXXXX..XXXXXXX 100644
298
+ BlockZoneWps *wps = bs->wps;
152
--- a/util/main-loop.c
299
int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
153
+++ b/util/main-loop.c
300
154
@@ -XXX,XX +XXX,XX @@ void main_loop_wait(int nonblocking)
301
zone_size = bs->bl.zone_size;
155
302
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
156
/* Functions to operate on the main QEMU AioContext. */
303
return -EINVAL;
157
304
}
158
-QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque)
305
159
+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name)
306
+ uint32_t i = offset / bs->bl.zone_size;
160
{
307
+ uint32_t nrz = len / bs->bl.zone_size;
161
- return aio_bh_new(qemu_aio_context, cb, opaque);
308
+ uint64_t *wp = &wps->wp[i];
162
+ return aio_bh_new_full(qemu_aio_context, cb, opaque, name);
309
+ if (BDRV_ZT_IS_CONV(*wp) && len != capacity) {
163
}
310
+ error_report("zone mgmt operations are not allowed for conventional zones");
164
311
+ return -EIO;
165
/*
312
+ }
313
+
314
switch (op) {
315
case BLK_ZO_OPEN:
316
op_name = "BLKOPENZONE";
317
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
318
len >> BDRV_SECTOR_BITS);
319
ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb);
320
if (ret != 0) {
321
+ update_zones_wp(bs, s->fd, offset, i);
322
error_report("ioctl %s failed %d", op_name, ret);
323
+ return ret;
324
+ }
325
+
326
+ if (zo == BLKRESETZONE && len == capacity) {
327
+ ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 1);
328
+ if (ret < 0) {
329
+ error_report("reporting single wp failed");
330
+ return ret;
331
+ }
332
+ } else if (zo == BLKRESETZONE) {
333
+ for (unsigned int j = 0; j < nrz; ++j) {
334
+ wp[j] = offset + j * zone_size;
335
+ }
336
+ } else if (zo == BLKFINISHZONE) {
337
+ for (unsigned int j = 0; j < nrz; ++j) {
338
+ /* The zoned device allows the last zone smaller that the
339
+ * zone size. */
340
+ wp[j] = MIN(offset + (j + 1) * zone_size, offset + len);
341
+ }
342
}
343
344
return ret;
166
--
345
--
167
2.31.1
346
2.40.1
168
diff view generated by jsdifflib
1
From: Akihiko Odaki <akihiko.odaki@gmail.com>
1
From: Sam Li <faithilikerun@gmail.com>
2
2
3
Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com>
3
A zone append command is a write operation that specifies the first
4
Message-id: 20210705130458.97642-3-akihiko.odaki@gmail.com
4
logical block of a zone as the write position. When writing to a zoned
5
block device using zone append, the byte offset of the call may point at
6
any position within the zone to which the data is being appended. Upon
7
completion the device will respond with the position where the data has
8
been written in the zone.
9
10
Signed-off-by: Sam Li <faithilikerun@gmail.com>
11
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
12
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Message-id: 20230508051510.177850-3-faithilikerun@gmail.com
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
---
15
---
7
block/io.c | 2 ++
16
include/block/block-io.h | 4 ++
8
1 file changed, 2 insertions(+)
17
include/block/block_int-common.h | 3 ++
18
include/block/raw-aio.h | 4 +-
19
include/sysemu/block-backend-io.h | 9 +++++
20
block/block-backend.c | 61 +++++++++++++++++++++++++++++++
21
block/file-posix.c | 58 +++++++++++++++++++++++++----
22
block/io.c | 27 ++++++++++++++
23
block/io_uring.c | 4 ++
24
block/linux-aio.c | 3 ++
25
block/raw-format.c | 8 ++++
26
10 files changed, 173 insertions(+), 8 deletions(-)
9
27
28
diff --git a/include/block/block-io.h b/include/block/block-io.h
29
index XXXXXXX..XXXXXXX 100644
30
--- a/include/block/block-io.h
31
+++ b/include/block/block-io.h
32
@@ -XXX,XX +XXX,XX @@ int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_report(BlockDriverState *bs,
33
int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs,
34
BlockZoneOp op,
35
int64_t offset, int64_t len);
36
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_append(BlockDriverState *bs,
37
+ int64_t *offset,
38
+ QEMUIOVector *qiov,
39
+ BdrvRequestFlags flags);
40
41
bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
42
int bdrv_block_status(BlockDriverState *bs, int64_t offset,
43
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
44
index XXXXXXX..XXXXXXX 100644
45
--- a/include/block/block_int-common.h
46
+++ b/include/block/block_int-common.h
47
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
48
BlockZoneDescriptor *zones);
49
int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
50
int64_t offset, int64_t len);
51
+ int coroutine_fn (*bdrv_co_zone_append)(BlockDriverState *bs,
52
+ int64_t *offset, QEMUIOVector *qiov,
53
+ BdrvRequestFlags flags);
54
55
/* removable device specific */
56
bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)(
57
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
58
index XXXXXXX..XXXXXXX 100644
59
--- a/include/block/raw-aio.h
60
+++ b/include/block/raw-aio.h
61
@@ -XXX,XX +XXX,XX @@
62
#define QEMU_AIO_TRUNCATE 0x0080
63
#define QEMU_AIO_ZONE_REPORT 0x0100
64
#define QEMU_AIO_ZONE_MGMT 0x0200
65
+#define QEMU_AIO_ZONE_APPEND 0x0400
66
#define QEMU_AIO_TYPE_MASK \
67
(QEMU_AIO_READ | \
68
QEMU_AIO_WRITE | \
69
@@ -XXX,XX +XXX,XX @@
70
QEMU_AIO_COPY_RANGE | \
71
QEMU_AIO_TRUNCATE | \
72
QEMU_AIO_ZONE_REPORT | \
73
- QEMU_AIO_ZONE_MGMT)
74
+ QEMU_AIO_ZONE_MGMT | \
75
+ QEMU_AIO_ZONE_APPEND)
76
77
/* AIO flags */
78
#define QEMU_AIO_MISALIGNED 0x1000
79
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
80
index XXXXXXX..XXXXXXX 100644
81
--- a/include/sysemu/block-backend-io.h
82
+++ b/include/sysemu/block-backend-io.h
83
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
84
BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
85
int64_t offset, int64_t len,
86
BlockCompletionFunc *cb, void *opaque);
87
+BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
88
+ QEMUIOVector *qiov, BdrvRequestFlags flags,
89
+ BlockCompletionFunc *cb, void *opaque);
90
BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes,
91
BlockCompletionFunc *cb, void *opaque);
92
void blk_aio_cancel_async(BlockAIOCB *acb);
93
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
94
int64_t offset, int64_t len);
95
int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
96
int64_t offset, int64_t len);
97
+int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
98
+ QEMUIOVector *qiov,
99
+ BdrvRequestFlags flags);
100
+int co_wrapper_mixed blk_zone_append(BlockBackend *blk, int64_t *offset,
101
+ QEMUIOVector *qiov,
102
+ BdrvRequestFlags flags);
103
104
int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset,
105
int64_t bytes);
106
diff --git a/block/block-backend.c b/block/block-backend.c
107
index XXXXXXX..XXXXXXX 100644
108
--- a/block/block-backend.c
109
+++ b/block/block-backend.c
110
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
111
return &acb->common;
112
}
113
114
+static void coroutine_fn blk_aio_zone_append_entry(void *opaque)
115
+{
116
+ BlkAioEmAIOCB *acb = opaque;
117
+ BlkRwCo *rwco = &acb->rwco;
118
+
119
+ rwco->ret = blk_co_zone_append(rwco->blk, (int64_t *)(uintptr_t)acb->bytes,
120
+ rwco->iobuf, rwco->flags);
121
+ blk_aio_complete(acb);
122
+}
123
+
124
+BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
125
+ QEMUIOVector *qiov, BdrvRequestFlags flags,
126
+ BlockCompletionFunc *cb, void *opaque) {
127
+ BlkAioEmAIOCB *acb;
128
+ Coroutine *co;
129
+ IO_CODE();
130
+
131
+ blk_inc_in_flight(blk);
132
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
133
+ acb->rwco = (BlkRwCo) {
134
+ .blk = blk,
135
+ .ret = NOT_DONE,
136
+ .flags = flags,
137
+ .iobuf = qiov,
138
+ };
139
+ acb->bytes = (int64_t)(uintptr_t)offset;
140
+ acb->has_returned = false;
141
+
142
+ co = qemu_coroutine_create(blk_aio_zone_append_entry, acb);
143
+ aio_co_enter(blk_get_aio_context(blk), co);
144
+ acb->has_returned = true;
145
+ if (acb->rwco.ret != NOT_DONE) {
146
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
147
+ blk_aio_complete_bh, acb);
148
+ }
149
+
150
+ return &acb->common;
151
+}
152
+
153
/*
154
* Send a zone_report command.
155
* offset is a byte offset from the start of the device. No alignment
156
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
157
return ret;
158
}
159
160
+/*
161
+ * Send a zone_append command.
162
+ */
163
+int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
164
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
165
+{
166
+ int ret;
167
+ IO_CODE();
168
+
169
+ blk_inc_in_flight(blk);
170
+ blk_wait_while_drained(blk);
171
+ GRAPH_RDLOCK_GUARD();
172
+ if (!blk_is_available(blk)) {
173
+ blk_dec_in_flight(blk);
174
+ return -ENOMEDIUM;
175
+ }
176
+
177
+ ret = bdrv_co_zone_append(blk_bs(blk), offset, qiov, flags);
178
+ blk_dec_in_flight(blk);
179
+ return ret;
180
+}
181
+
182
void blk_drain(BlockBackend *blk)
183
{
184
BlockDriverState *bs = blk_bs(blk);
185
diff --git a/block/file-posix.c b/block/file-posix.c
186
index XXXXXXX..XXXXXXX 100644
187
--- a/block/file-posix.c
188
+++ b/block/file-posix.c
189
@@ -XXX,XX +XXX,XX @@ typedef struct BDRVRawState {
190
bool has_write_zeroes:1;
191
bool use_linux_aio:1;
192
bool use_linux_io_uring:1;
193
+ int64_t *offset; /* offset of zone append operation */
194
int page_cache_inconsistent; /* errno from fdatasync failure */
195
bool has_fallocate;
196
bool needs_alignment;
197
@@ -XXX,XX +XXX,XX @@ static ssize_t handle_aiocb_rw_vector(RawPosixAIOData *aiocb)
198
ssize_t len;
199
200
len = RETRY_ON_EINTR(
201
- (aiocb->aio_type & QEMU_AIO_WRITE) ?
202
+ (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) ?
203
qemu_pwritev(aiocb->aio_fildes,
204
aiocb->io.iov,
205
aiocb->io.niov,
206
@@ -XXX,XX +XXX,XX @@ static ssize_t handle_aiocb_rw_linear(RawPosixAIOData *aiocb, char *buf)
207
ssize_t len;
208
209
while (offset < aiocb->aio_nbytes) {
210
- if (aiocb->aio_type & QEMU_AIO_WRITE) {
211
+ if (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
212
len = pwrite(aiocb->aio_fildes,
213
(const char *)buf + offset,
214
aiocb->aio_nbytes - offset,
215
@@ -XXX,XX +XXX,XX @@ static int handle_aiocb_rw(void *opaque)
216
}
217
218
nbytes = handle_aiocb_rw_linear(aiocb, buf);
219
- if (!(aiocb->aio_type & QEMU_AIO_WRITE)) {
220
+ if (!(aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))) {
221
char *p = buf;
222
size_t count = aiocb->aio_nbytes, copy;
223
int i;
224
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
225
if (fd_open(bs) < 0)
226
return -EIO;
227
#if defined(CONFIG_BLKZONED)
228
- if (type & QEMU_AIO_WRITE && bs->wps) {
229
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && bs->wps) {
230
qemu_co_mutex_lock(&bs->wps->colock);
231
+ if (type & QEMU_AIO_ZONE_APPEND && bs->bl.zone_size) {
232
+ int index = offset / bs->bl.zone_size;
233
+ offset = bs->wps->wp[index];
234
+ }
235
}
236
#endif
237
238
@@ -XXX,XX +XXX,XX @@ out:
239
{
240
BlockZoneWps *wps = bs->wps;
241
if (ret == 0) {
242
- if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) {
243
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))
244
+ && wps && bs->bl.zone_size) {
245
uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
246
if (!BDRV_ZT_IS_CONV(*wp)) {
247
+ if (type & QEMU_AIO_ZONE_APPEND) {
248
+ *s->offset = *wp;
249
+ }
250
/* Advance the wp if needed */
251
if (offset + bytes > *wp) {
252
*wp = offset + bytes;
253
@@ -XXX,XX +XXX,XX @@ out:
254
}
255
}
256
} else {
257
- if (type & QEMU_AIO_WRITE) {
258
+ if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
259
update_zones_wp(bs, s->fd, 0, 1);
260
}
261
}
262
263
- if (type & QEMU_AIO_WRITE && wps) {
264
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && wps) {
265
qemu_co_mutex_unlock(&wps->colock);
266
}
267
}
268
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
269
}
270
#endif
271
272
+#if defined(CONFIG_BLKZONED)
273
+static int coroutine_fn raw_co_zone_append(BlockDriverState *bs,
274
+ int64_t *offset,
275
+ QEMUIOVector *qiov,
276
+ BdrvRequestFlags flags) {
277
+ assert(flags == 0);
278
+ int64_t zone_size_mask = bs->bl.zone_size - 1;
279
+ int64_t iov_len = 0;
280
+ int64_t len = 0;
281
+ BDRVRawState *s = bs->opaque;
282
+ s->offset = offset;
283
+
284
+ if (*offset & zone_size_mask) {
285
+ error_report("sector offset %" PRId64 " is not aligned to zone size "
286
+ "%" PRId32 "", *offset / 512, bs->bl.zone_size / 512);
287
+ return -EINVAL;
288
+ }
289
+
290
+ int64_t wg = bs->bl.write_granularity;
291
+ int64_t wg_mask = wg - 1;
292
+ for (int i = 0; i < qiov->niov; i++) {
293
+ iov_len = qiov->iov[i].iov_len;
294
+ if (iov_len & wg_mask) {
295
+ error_report("len of IOVector[%d] %" PRId64 " is not aligned to "
296
+ "block size %" PRId64 "", i, iov_len, wg);
297
+ return -EINVAL;
298
+ }
299
+ len += iov_len;
300
+ }
301
+
302
+ return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
303
+}
304
+#endif
305
+
306
static coroutine_fn int
307
raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
308
bool blkdev)
309
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_device = {
310
/* zone management operations */
311
.bdrv_co_zone_report = raw_co_zone_report,
312
.bdrv_co_zone_mgmt = raw_co_zone_mgmt,
313
+ .bdrv_co_zone_append = raw_co_zone_append,
314
#endif
315
};
316
10
diff --git a/block/io.c b/block/io.c
317
diff --git a/block/io.c b/block/io.c
11
index XXXXXXX..XXXXXXX 100644
318
index XXXXXXX..XXXXXXX 100644
12
--- a/block/io.c
319
--- a/block/io.c
13
+++ b/block/io.c
320
+++ b/block/io.c
14
@@ -XXX,XX +XXX,XX @@ void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll)
321
@@ -XXX,XX +XXX,XX @@ out:
15
322
return co.ret;
16
static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
323
}
324
325
+int coroutine_fn bdrv_co_zone_append(BlockDriverState *bs, int64_t *offset,
326
+ QEMUIOVector *qiov,
327
+ BdrvRequestFlags flags)
328
+{
329
+ int ret;
330
+ BlockDriver *drv = bs->drv;
331
+ CoroutineIOCompletion co = {
332
+ .coroutine = qemu_coroutine_self(),
333
+ };
334
+ IO_CODE();
335
+
336
+ ret = bdrv_check_qiov_request(*offset, qiov->size, qiov, 0, NULL);
337
+ if (ret < 0) {
338
+ return ret;
339
+ }
340
+
341
+ bdrv_inc_in_flight(bs);
342
+ if (!drv || !drv->bdrv_co_zone_append || bs->bl.zoned == BLK_Z_NONE) {
343
+ co.ret = -ENOTSUP;
344
+ goto out;
345
+ }
346
+ co.ret = drv->bdrv_co_zone_append(bs, offset, qiov, flags);
347
+out:
348
+ bdrv_dec_in_flight(bs);
349
+ return co.ret;
350
+}
351
+
352
void *qemu_blockalign(BlockDriverState *bs, size_t size)
17
{
353
{
18
+ dst->pdiscard_alignment = MAX(dst->pdiscard_alignment,
354
IO_CODE();
19
+ src->pdiscard_alignment);
355
diff --git a/block/io_uring.c b/block/io_uring.c
20
dst->opt_transfer = MAX(dst->opt_transfer, src->opt_transfer);
356
index XXXXXXX..XXXXXXX 100644
21
dst->max_transfer = MIN_NON_ZERO(dst->max_transfer, src->max_transfer);
357
--- a/block/io_uring.c
22
dst->max_hw_transfer = MIN_NON_ZERO(dst->max_hw_transfer,
358
+++ b/block/io_uring.c
359
@@ -XXX,XX +XXX,XX @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s,
360
io_uring_prep_writev(sqes, fd, luringcb->qiov->iov,
361
luringcb->qiov->niov, offset);
362
break;
363
+ case QEMU_AIO_ZONE_APPEND:
364
+ io_uring_prep_writev(sqes, fd, luringcb->qiov->iov,
365
+ luringcb->qiov->niov, offset);
366
+ break;
367
case QEMU_AIO_READ:
368
io_uring_prep_readv(sqes, fd, luringcb->qiov->iov,
369
luringcb->qiov->niov, offset);
370
diff --git a/block/linux-aio.c b/block/linux-aio.c
371
index XXXXXXX..XXXXXXX 100644
372
--- a/block/linux-aio.c
373
+++ b/block/linux-aio.c
374
@@ -XXX,XX +XXX,XX @@ static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset,
375
case QEMU_AIO_WRITE:
376
io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset);
377
break;
378
+ case QEMU_AIO_ZONE_APPEND:
379
+ io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset);
380
+ break;
381
case QEMU_AIO_READ:
382
io_prep_preadv(iocbs, fd, qiov->iov, qiov->niov, offset);
383
break;
384
diff --git a/block/raw-format.c b/block/raw-format.c
385
index XXXXXXX..XXXXXXX 100644
386
--- a/block/raw-format.c
387
+++ b/block/raw-format.c
388
@@ -XXX,XX +XXX,XX @@ raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
389
return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
390
}
391
392
+static int coroutine_fn GRAPH_RDLOCK
393
+raw_co_zone_append(BlockDriverState *bs,int64_t *offset, QEMUIOVector *qiov,
394
+ BdrvRequestFlags flags)
395
+{
396
+ return bdrv_co_zone_append(bs->file->bs, offset, qiov, flags);
397
+}
398
+
399
static int64_t coroutine_fn GRAPH_RDLOCK
400
raw_co_getlength(BlockDriverState *bs)
401
{
402
@@ -XXX,XX +XXX,XX @@ BlockDriver bdrv_raw = {
403
.bdrv_co_pdiscard = &raw_co_pdiscard,
404
.bdrv_co_zone_report = &raw_co_zone_report,
405
.bdrv_co_zone_mgmt = &raw_co_zone_mgmt,
406
+ .bdrv_co_zone_append = &raw_co_zone_append,
407
.bdrv_co_block_status = &raw_co_block_status,
408
.bdrv_co_copy_range_from = &raw_co_copy_range_from,
409
.bdrv_co_copy_range_to = &raw_co_copy_range_to,
23
--
410
--
24
2.31.1
411
2.40.1
25
diff view generated by jsdifflib
New patch
1
From: Sam Li <faithilikerun@gmail.com>
1
2
3
The patch tests zone append writes by reporting the zone wp after
4
the completion of the call. "zap -p" option can print the sector
5
offset value after completion, which should be the start sector
6
where the append write begins.
7
8
Signed-off-by: Sam Li <faithilikerun@gmail.com>
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Message-id: 20230508051510.177850-4-faithilikerun@gmail.com
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
13
qemu-io-cmds.c | 75 ++++++++++++++++++++++++++++++
14
tests/qemu-iotests/tests/zoned | 16 +++++++
15
tests/qemu-iotests/tests/zoned.out | 16 +++++++
16
3 files changed, 107 insertions(+)
17
18
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
19
index XXXXXXX..XXXXXXX 100644
20
--- a/qemu-io-cmds.c
21
+++ b/qemu-io-cmds.c
22
@@ -XXX,XX +XXX,XX @@ static const cmdinfo_t zone_reset_cmd = {
23
.oneline = "reset a zone write pointer in zone block device",
24
};
25
26
+static int do_aio_zone_append(BlockBackend *blk, QEMUIOVector *qiov,
27
+ int64_t *offset, int flags, int *total)
28
+{
29
+ int async_ret = NOT_DONE;
30
+
31
+ blk_aio_zone_append(blk, offset, qiov, flags, aio_rw_done, &async_ret);
32
+ while (async_ret == NOT_DONE) {
33
+ main_loop_wait(false);
34
+ }
35
+
36
+ *total = qiov->size;
37
+ return async_ret < 0 ? async_ret : 1;
38
+}
39
+
40
+static int zone_append_f(BlockBackend *blk, int argc, char **argv)
41
+{
42
+ int ret;
43
+ bool pflag = false;
44
+ int flags = 0;
45
+ int total = 0;
46
+ int64_t offset;
47
+ char *buf;
48
+ int c, nr_iov;
49
+ int pattern = 0xcd;
50
+ QEMUIOVector qiov;
51
+
52
+ if (optind > argc - 3) {
53
+ return -EINVAL;
54
+ }
55
+
56
+ if ((c = getopt(argc, argv, "p")) != -1) {
57
+ pflag = true;
58
+ }
59
+
60
+ offset = cvtnum(argv[optind]);
61
+ if (offset < 0) {
62
+ print_cvtnum_err(offset, argv[optind]);
63
+ return offset;
64
+ }
65
+ optind++;
66
+ nr_iov = argc - optind;
67
+ buf = create_iovec(blk, &qiov, &argv[optind], nr_iov, pattern,
68
+ flags & BDRV_REQ_REGISTERED_BUF);
69
+ if (buf == NULL) {
70
+ return -EINVAL;
71
+ }
72
+ ret = do_aio_zone_append(blk, &qiov, &offset, flags, &total);
73
+ if (ret < 0) {
74
+ printf("zone append failed: %s\n", strerror(-ret));
75
+ goto out;
76
+ }
77
+
78
+ if (pflag) {
79
+ printf("After zap done, the append sector is 0x%" PRIx64 "\n",
80
+ tosector(offset));
81
+ }
82
+
83
+out:
84
+ qemu_io_free(blk, buf, qiov.size,
85
+ flags & BDRV_REQ_REGISTERED_BUF);
86
+ qemu_iovec_destroy(&qiov);
87
+ return ret;
88
+}
89
+
90
+static const cmdinfo_t zone_append_cmd = {
91
+ .name = "zone_append",
92
+ .altname = "zap",
93
+ .cfunc = zone_append_f,
94
+ .argmin = 3,
95
+ .argmax = 4,
96
+ .args = "offset len [len..]",
97
+ .oneline = "append write a number of bytes at a specified offset",
98
+};
99
+
100
static int truncate_f(BlockBackend *blk, int argc, char **argv);
101
static const cmdinfo_t truncate_cmd = {
102
.name = "truncate",
103
@@ -XXX,XX +XXX,XX @@ static void __attribute((constructor)) init_qemuio_commands(void)
104
qemuio_add_command(&zone_close_cmd);
105
qemuio_add_command(&zone_finish_cmd);
106
qemuio_add_command(&zone_reset_cmd);
107
+ qemuio_add_command(&zone_append_cmd);
108
qemuio_add_command(&truncate_cmd);
109
qemuio_add_command(&length_cmd);
110
qemuio_add_command(&info_cmd);
111
diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned
112
index XXXXXXX..XXXXXXX 100755
113
--- a/tests/qemu-iotests/tests/zoned
114
+++ b/tests/qemu-iotests/tests/zoned
115
@@ -XXX,XX +XXX,XX @@ echo "(5) resetting the second zone"
116
$QEMU_IO $IMG -c "zrs 268435456 268435456"
117
echo "After resetting a zone:"
118
$QEMU_IO $IMG -c "zrp 268435456 1"
119
+echo
120
+echo
121
+echo "(6) append write" # the physical block size of the device is 4096
122
+$QEMU_IO $IMG -c "zrp 0 1"
123
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
124
+echo "After appending the first zone firstly:"
125
+$QEMU_IO $IMG -c "zrp 0 1"
126
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
127
+echo "After appending the first zone secondly:"
128
+$QEMU_IO $IMG -c "zrp 0 1"
129
+$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000"
130
+echo "After appending the second zone firstly:"
131
+$QEMU_IO $IMG -c "zrp 268435456 1"
132
+$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000"
133
+echo "After appending the second zone secondly:"
134
+$QEMU_IO $IMG -c "zrp 268435456 1"
135
136
# success, all done
137
echo "*** done"
138
diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out
139
index XXXXXXX..XXXXXXX 100644
140
--- a/tests/qemu-iotests/tests/zoned.out
141
+++ b/tests/qemu-iotests/tests/zoned.out
142
@@ -XXX,XX +XXX,XX @@ start: 0x80000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:14, [type: 2]
143
(5) resetting the second zone
144
After resetting a zone:
145
start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
146
+
147
+
148
+(6) append write
149
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
150
+After zap done, the append sector is 0x0
151
+After appending the first zone firstly:
152
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x18, zcond:2, [type: 2]
153
+After zap done, the append sector is 0x18
154
+After appending the first zone secondly:
155
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x30, zcond:2, [type: 2]
156
+After zap done, the append sector is 0x80000
157
+After appending the second zone firstly:
158
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80018, zcond:2, [type: 2]
159
+After zap done, the append sector is 0x80018
160
+After appending the second zone secondly:
161
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80030, zcond:2, [type: 2]
162
*** done
163
--
164
2.40.1
diff view generated by jsdifflib
New patch
1
From: Sam Li <faithilikerun@gmail.com>
1
2
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
4
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
5
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Message-id: 20230508051510.177850-5-faithilikerun@gmail.com
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
---
9
block/file-posix.c | 3 +++
10
block/trace-events | 2 ++
11
2 files changed, 5 insertions(+)
12
13
diff --git a/block/file-posix.c b/block/file-posix.c
14
index XXXXXXX..XXXXXXX 100644
15
--- a/block/file-posix.c
16
+++ b/block/file-posix.c
17
@@ -XXX,XX +XXX,XX @@ out:
18
if (!BDRV_ZT_IS_CONV(*wp)) {
19
if (type & QEMU_AIO_ZONE_APPEND) {
20
*s->offset = *wp;
21
+ trace_zbd_zone_append_complete(bs, *s->offset
22
+ >> BDRV_SECTOR_BITS);
23
}
24
/* Advance the wp if needed */
25
if (offset + bytes > *wp) {
26
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_append(BlockDriverState *bs,
27
len += iov_len;
28
}
29
30
+ trace_zbd_zone_append(bs, *offset >> BDRV_SECTOR_BITS);
31
return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
32
}
33
#endif
34
diff --git a/block/trace-events b/block/trace-events
35
index XXXXXXX..XXXXXXX 100644
36
--- a/block/trace-events
37
+++ b/block/trace-events
38
@@ -XXX,XX +XXX,XX @@ file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d"
39
file_flush_fdatasync_failed(int err) "errno %d"
40
zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report %d zones starting at sector offset 0x%" PRIx64 ""
41
zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs %p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " sectors"
42
+zbd_zone_append(void *bs, int64_t sector) "bs %p append at sector offset 0x%" PRIx64 ""
43
+zbd_zone_append_complete(void *bs, int64_t sector) "bs %p returns append sector 0x%" PRIx64 ""
44
45
# ssh.c
46
sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
47
--
48
2.40.1
diff view generated by jsdifflib
New patch
1
From: Sam Li <faithilikerun@gmail.com>
1
2
3
This patch extends virtio-blk emulation to handle zoned device commands
4
by calling the new block layer APIs to perform zoned device I/O on
5
behalf of the guest. It supports Report Zone, four zone oparations (open,
6
close, finish, reset), and Append Zone.
7
8
The VIRTIO_BLK_F_ZONED feature bit will only be set if the host does
9
support zoned block devices. Regular block devices(conventional zones)
10
will not be set.
11
12
The guest os can use blktests, fio to test those commands on zoned devices.
13
Furthermore, using zonefs to test zone append write is also supported.
14
15
Signed-off-by: Sam Li <faithilikerun@gmail.com>
16
Message-id: 20230508051916.178322-2-faithilikerun@gmail.com
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
18
---
19
hw/block/virtio-blk-common.c | 2 +
20
hw/block/virtio-blk.c | 389 +++++++++++++++++++++++++++++++++++
21
hw/virtio/virtio-qmp.c | 2 +
22
3 files changed, 393 insertions(+)
23
24
diff --git a/hw/block/virtio-blk-common.c b/hw/block/virtio-blk-common.c
25
index XXXXXXX..XXXXXXX 100644
26
--- a/hw/block/virtio-blk-common.c
27
+++ b/hw/block/virtio-blk-common.c
28
@@ -XXX,XX +XXX,XX @@ static const VirtIOFeature feature_sizes[] = {
29
.end = endof(struct virtio_blk_config, discard_sector_alignment)},
30
{.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES,
31
.end = endof(struct virtio_blk_config, write_zeroes_may_unmap)},
32
+ {.flags = 1ULL << VIRTIO_BLK_F_ZONED,
33
+ .end = endof(struct virtio_blk_config, zoned)},
34
{}
35
};
36
37
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
38
index XXXXXXX..XXXXXXX 100644
39
--- a/hw/block/virtio-blk.c
40
+++ b/hw/block/virtio-blk.c
41
@@ -XXX,XX +XXX,XX @@
42
#include "qemu/module.h"
43
#include "qemu/error-report.h"
44
#include "qemu/main-loop.h"
45
+#include "block/block_int.h"
46
#include "trace.h"
47
#include "hw/block/block.h"
48
#include "hw/qdev-properties.h"
49
@@ -XXX,XX +XXX,XX @@ err:
50
return err_status;
51
}
52
53
+typedef struct ZoneCmdData {
54
+ VirtIOBlockReq *req;
55
+ struct iovec *in_iov;
56
+ unsigned in_num;
57
+ union {
58
+ struct {
59
+ unsigned int nr_zones;
60
+ BlockZoneDescriptor *zones;
61
+ } zone_report_data;
62
+ struct {
63
+ int64_t offset;
64
+ } zone_append_data;
65
+ };
66
+} ZoneCmdData;
67
+
68
+/*
69
+ * check zoned_request: error checking before issuing requests. If all checks
70
+ * passed, return true.
71
+ * append: true if only zone append requests issued.
72
+ */
73
+static bool check_zoned_request(VirtIOBlock *s, int64_t offset, int64_t len,
74
+ bool append, uint8_t *status) {
75
+ BlockDriverState *bs = blk_bs(s->blk);
76
+ int index;
77
+
78
+ if (!virtio_has_feature(s->host_features, VIRTIO_BLK_F_ZONED)) {
79
+ *status = VIRTIO_BLK_S_UNSUPP;
80
+ return false;
81
+ }
82
+
83
+ if (offset < 0 || len < 0 || len > (bs->total_sectors << BDRV_SECTOR_BITS)
84
+ || offset > (bs->total_sectors << BDRV_SECTOR_BITS) - len) {
85
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
86
+ return false;
87
+ }
88
+
89
+ if (append) {
90
+ if (bs->bl.write_granularity) {
91
+ if ((offset % bs->bl.write_granularity) != 0) {
92
+ *status = VIRTIO_BLK_S_ZONE_UNALIGNED_WP;
93
+ return false;
94
+ }
95
+ }
96
+
97
+ index = offset / bs->bl.zone_size;
98
+ if (BDRV_ZT_IS_CONV(bs->wps->wp[index])) {
99
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
100
+ return false;
101
+ }
102
+
103
+ if (len / 512 > bs->bl.max_append_sectors) {
104
+ if (bs->bl.max_append_sectors == 0) {
105
+ *status = VIRTIO_BLK_S_UNSUPP;
106
+ } else {
107
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
108
+ }
109
+ return false;
110
+ }
111
+ }
112
+ return true;
113
+}
114
+
115
+static void virtio_blk_zone_report_complete(void *opaque, int ret)
116
+{
117
+ ZoneCmdData *data = opaque;
118
+ VirtIOBlockReq *req = data->req;
119
+ VirtIOBlock *s = req->dev;
120
+ VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
121
+ struct iovec *in_iov = data->in_iov;
122
+ unsigned in_num = data->in_num;
123
+ int64_t zrp_size, n, j = 0;
124
+ int64_t nz = data->zone_report_data.nr_zones;
125
+ int8_t err_status = VIRTIO_BLK_S_OK;
126
+
127
+ if (ret) {
128
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
129
+ goto out;
130
+ }
131
+
132
+ struct virtio_blk_zone_report zrp_hdr = (struct virtio_blk_zone_report) {
133
+ .nr_zones = cpu_to_le64(nz),
134
+ };
135
+ zrp_size = sizeof(struct virtio_blk_zone_report)
136
+ + sizeof(struct virtio_blk_zone_descriptor) * nz;
137
+ n = iov_from_buf(in_iov, in_num, 0, &zrp_hdr, sizeof(zrp_hdr));
138
+ if (n != sizeof(zrp_hdr)) {
139
+ virtio_error(vdev, "Driver provided input buffer that is too small!");
140
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
141
+ goto out;
142
+ }
143
+
144
+ for (size_t i = sizeof(zrp_hdr); i < zrp_size;
145
+ i += sizeof(struct virtio_blk_zone_descriptor), ++j) {
146
+ struct virtio_blk_zone_descriptor desc =
147
+ (struct virtio_blk_zone_descriptor) {
148
+ .z_start = cpu_to_le64(data->zone_report_data.zones[j].start
149
+ >> BDRV_SECTOR_BITS),
150
+ .z_cap = cpu_to_le64(data->zone_report_data.zones[j].cap
151
+ >> BDRV_SECTOR_BITS),
152
+ .z_wp = cpu_to_le64(data->zone_report_data.zones[j].wp
153
+ >> BDRV_SECTOR_BITS),
154
+ };
155
+
156
+ switch (data->zone_report_data.zones[j].type) {
157
+ case BLK_ZT_CONV:
158
+ desc.z_type = VIRTIO_BLK_ZT_CONV;
159
+ break;
160
+ case BLK_ZT_SWR:
161
+ desc.z_type = VIRTIO_BLK_ZT_SWR;
162
+ break;
163
+ case BLK_ZT_SWP:
164
+ desc.z_type = VIRTIO_BLK_ZT_SWP;
165
+ break;
166
+ default:
167
+ g_assert_not_reached();
168
+ }
169
+
170
+ switch (data->zone_report_data.zones[j].state) {
171
+ case BLK_ZS_RDONLY:
172
+ desc.z_state = VIRTIO_BLK_ZS_RDONLY;
173
+ break;
174
+ case BLK_ZS_OFFLINE:
175
+ desc.z_state = VIRTIO_BLK_ZS_OFFLINE;
176
+ break;
177
+ case BLK_ZS_EMPTY:
178
+ desc.z_state = VIRTIO_BLK_ZS_EMPTY;
179
+ break;
180
+ case BLK_ZS_CLOSED:
181
+ desc.z_state = VIRTIO_BLK_ZS_CLOSED;
182
+ break;
183
+ case BLK_ZS_FULL:
184
+ desc.z_state = VIRTIO_BLK_ZS_FULL;
185
+ break;
186
+ case BLK_ZS_EOPEN:
187
+ desc.z_state = VIRTIO_BLK_ZS_EOPEN;
188
+ break;
189
+ case BLK_ZS_IOPEN:
190
+ desc.z_state = VIRTIO_BLK_ZS_IOPEN;
191
+ break;
192
+ case BLK_ZS_NOT_WP:
193
+ desc.z_state = VIRTIO_BLK_ZS_NOT_WP;
194
+ break;
195
+ default:
196
+ g_assert_not_reached();
197
+ }
198
+
199
+ /* TODO: it takes O(n^2) time complexity. Optimizations required. */
200
+ n = iov_from_buf(in_iov, in_num, i, &desc, sizeof(desc));
201
+ if (n != sizeof(desc)) {
202
+ virtio_error(vdev, "Driver provided input buffer "
203
+ "for descriptors that is too small!");
204
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
205
+ }
206
+ }
207
+
208
+out:
209
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
210
+ virtio_blk_req_complete(req, err_status);
211
+ virtio_blk_free_request(req);
212
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
213
+ g_free(data->zone_report_data.zones);
214
+ g_free(data);
215
+}
216
+
217
+static void virtio_blk_handle_zone_report(VirtIOBlockReq *req,
218
+ struct iovec *in_iov,
219
+ unsigned in_num)
220
+{
221
+ VirtIOBlock *s = req->dev;
222
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
223
+ unsigned int nr_zones;
224
+ ZoneCmdData *data;
225
+ int64_t zone_size, offset;
226
+ uint8_t err_status;
227
+
228
+ if (req->in_len < sizeof(struct virtio_blk_inhdr) +
229
+ sizeof(struct virtio_blk_zone_report) +
230
+ sizeof(struct virtio_blk_zone_descriptor)) {
231
+ virtio_error(vdev, "in buffer too small for zone report");
232
+ return;
233
+ }
234
+
235
+ /* start byte offset of the zone report */
236
+ offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
237
+ if (!check_zoned_request(s, offset, 0, false, &err_status)) {
238
+ goto out;
239
+ }
240
+ nr_zones = (req->in_len - sizeof(struct virtio_blk_inhdr) -
241
+ sizeof(struct virtio_blk_zone_report)) /
242
+ sizeof(struct virtio_blk_zone_descriptor);
243
+
244
+ zone_size = sizeof(BlockZoneDescriptor) * nr_zones;
245
+ data = g_malloc(sizeof(ZoneCmdData));
246
+ data->req = req;
247
+ data->in_iov = in_iov;
248
+ data->in_num = in_num;
249
+ data->zone_report_data.nr_zones = nr_zones;
250
+ data->zone_report_data.zones = g_malloc(zone_size),
251
+
252
+ blk_aio_zone_report(s->blk, offset, &data->zone_report_data.nr_zones,
253
+ data->zone_report_data.zones,
254
+ virtio_blk_zone_report_complete, data);
255
+ return;
256
+out:
257
+ virtio_blk_req_complete(req, err_status);
258
+ virtio_blk_free_request(req);
259
+}
260
+
261
+static void virtio_blk_zone_mgmt_complete(void *opaque, int ret)
262
+{
263
+ VirtIOBlockReq *req = opaque;
264
+ VirtIOBlock *s = req->dev;
265
+ int8_t err_status = VIRTIO_BLK_S_OK;
266
+
267
+ if (ret) {
268
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
269
+ }
270
+
271
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
272
+ virtio_blk_req_complete(req, err_status);
273
+ virtio_blk_free_request(req);
274
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
275
+}
276
+
277
+static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
278
+{
279
+ VirtIOBlock *s = req->dev;
280
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
281
+ BlockDriverState *bs = blk_bs(s->blk);
282
+ int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
283
+ uint64_t len;
284
+ uint64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
285
+ uint8_t err_status = VIRTIO_BLK_S_OK;
286
+
287
+ uint32_t type = virtio_ldl_p(vdev, &req->out.type);
288
+ if (type == VIRTIO_BLK_T_ZONE_RESET_ALL) {
289
+ /* Entire drive capacity */
290
+ offset = 0;
291
+ len = capacity;
292
+ } else {
293
+ if (bs->bl.zone_size > capacity - offset) {
294
+ /* The zoned device allows the last smaller zone. */
295
+ len = capacity - bs->bl.zone_size * (bs->bl.nr_zones - 1);
296
+ } else {
297
+ len = bs->bl.zone_size;
298
+ }
299
+ }
300
+
301
+ if (!check_zoned_request(s, offset, len, false, &err_status)) {
302
+ goto out;
303
+ }
304
+
305
+ blk_aio_zone_mgmt(s->blk, op, offset, len,
306
+ virtio_blk_zone_mgmt_complete, req);
307
+
308
+ return 0;
309
+out:
310
+ virtio_blk_req_complete(req, err_status);
311
+ virtio_blk_free_request(req);
312
+ return err_status;
313
+}
314
+
315
+static void virtio_blk_zone_append_complete(void *opaque, int ret)
316
+{
317
+ ZoneCmdData *data = opaque;
318
+ VirtIOBlockReq *req = data->req;
319
+ VirtIOBlock *s = req->dev;
320
+ VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
321
+ int64_t append_sector, n;
322
+ uint8_t err_status = VIRTIO_BLK_S_OK;
323
+
324
+ if (ret) {
325
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
326
+ goto out;
327
+ }
328
+
329
+ virtio_stq_p(vdev, &append_sector,
330
+ data->zone_append_data.offset >> BDRV_SECTOR_BITS);
331
+ n = iov_from_buf(data->in_iov, data->in_num, 0, &append_sector,
332
+ sizeof(append_sector));
333
+ if (n != sizeof(append_sector)) {
334
+ virtio_error(vdev, "Driver provided input buffer less than size of "
335
+ "append_sector");
336
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
337
+ goto out;
338
+ }
339
+
340
+out:
341
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
342
+ virtio_blk_req_complete(req, err_status);
343
+ virtio_blk_free_request(req);
344
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
345
+ g_free(data);
346
+}
347
+
348
+static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
349
+ struct iovec *out_iov,
350
+ struct iovec *in_iov,
351
+ uint64_t out_num,
352
+ unsigned in_num) {
353
+ VirtIOBlock *s = req->dev;
354
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
355
+ uint8_t err_status = VIRTIO_BLK_S_OK;
356
+
357
+ int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
358
+ int64_t len = iov_size(out_iov, out_num);
359
+
360
+ if (!check_zoned_request(s, offset, len, true, &err_status)) {
361
+ goto out;
362
+ }
363
+
364
+ ZoneCmdData *data = g_malloc(sizeof(ZoneCmdData));
365
+ data->req = req;
366
+ data->in_iov = in_iov;
367
+ data->in_num = in_num;
368
+ data->zone_append_data.offset = offset;
369
+ qemu_iovec_init_external(&req->qiov, out_iov, out_num);
370
+ blk_aio_zone_append(s->blk, &data->zone_append_data.offset, &req->qiov, 0,
371
+ virtio_blk_zone_append_complete, data);
372
+ return 0;
373
+
374
+out:
375
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
376
+ virtio_blk_req_complete(req, err_status);
377
+ virtio_blk_free_request(req);
378
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
379
+ return err_status;
380
+}
381
+
382
static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
383
{
384
uint32_t type;
385
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
386
case VIRTIO_BLK_T_FLUSH:
387
virtio_blk_handle_flush(req, mrb);
388
break;
389
+ case VIRTIO_BLK_T_ZONE_REPORT:
390
+ virtio_blk_handle_zone_report(req, in_iov, in_num);
391
+ break;
392
+ case VIRTIO_BLK_T_ZONE_OPEN:
393
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_OPEN);
394
+ break;
395
+ case VIRTIO_BLK_T_ZONE_CLOSE:
396
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_CLOSE);
397
+ break;
398
+ case VIRTIO_BLK_T_ZONE_FINISH:
399
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_FINISH);
400
+ break;
401
+ case VIRTIO_BLK_T_ZONE_RESET:
402
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_RESET);
403
+ break;
404
+ case VIRTIO_BLK_T_ZONE_RESET_ALL:
405
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_RESET);
406
+ break;
407
case VIRTIO_BLK_T_SCSI_CMD:
408
virtio_blk_handle_scsi(req);
409
break;
410
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
411
virtio_blk_free_request(req);
412
break;
413
}
414
+ case VIRTIO_BLK_T_ZONE_APPEND & ~VIRTIO_BLK_T_OUT:
415
+ /*
416
+ * Passing out_iov/out_num and in_iov/in_num is not safe
417
+ * to access req->elem.out_sg directly because it may be
418
+ * modified by virtio_blk_handle_request().
419
+ */
420
+ virtio_blk_handle_zone_append(req, out_iov, in_iov, out_num, in_num);
421
+ break;
422
/*
423
* VIRTIO_BLK_T_DISCARD and VIRTIO_BLK_T_WRITE_ZEROES are defined with
424
* VIRTIO_BLK_T_OUT flag set. We masked this flag in the switch statement,
425
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
426
{
427
VirtIOBlock *s = VIRTIO_BLK(vdev);
428
BlockConf *conf = &s->conf.conf;
429
+ BlockDriverState *bs = blk_bs(s->blk);
430
struct virtio_blk_config blkcfg;
431
uint64_t capacity;
432
int64_t length;
433
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
434
blkcfg.write_zeroes_may_unmap = 1;
435
virtio_stl_p(vdev, &blkcfg.max_write_zeroes_seg, 1);
436
}
437
+ if (bs->bl.zoned != BLK_Z_NONE) {
438
+ switch (bs->bl.zoned) {
439
+ case BLK_Z_HM:
440
+ blkcfg.zoned.model = VIRTIO_BLK_Z_HM;
441
+ break;
442
+ case BLK_Z_HA:
443
+ blkcfg.zoned.model = VIRTIO_BLK_Z_HA;
444
+ break;
445
+ default:
446
+ g_assert_not_reached();
447
+ }
448
+
449
+ virtio_stl_p(vdev, &blkcfg.zoned.zone_sectors,
450
+ bs->bl.zone_size / 512);
451
+ virtio_stl_p(vdev, &blkcfg.zoned.max_active_zones,
452
+ bs->bl.max_active_zones);
453
+ virtio_stl_p(vdev, &blkcfg.zoned.max_open_zones,
454
+ bs->bl.max_open_zones);
455
+ virtio_stl_p(vdev, &blkcfg.zoned.write_granularity, blk_size);
456
+ virtio_stl_p(vdev, &blkcfg.zoned.max_append_sectors,
457
+ bs->bl.max_append_sectors);
458
+ } else {
459
+ blkcfg.zoned.model = VIRTIO_BLK_Z_NONE;
460
+ }
461
memcpy(config, &blkcfg, s->config_size);
462
}
463
464
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
465
return;
466
}
467
468
+ BlockDriverState *bs = blk_bs(conf->conf.blk);
469
+ if (bs->bl.zoned != BLK_Z_NONE) {
470
+ virtio_add_feature(&s->host_features, VIRTIO_BLK_F_ZONED);
471
+ if (bs->bl.zoned == BLK_Z_HM) {
472
+ virtio_clear_feature(&s->host_features, VIRTIO_BLK_F_DISCARD);
473
+ }
474
+ }
475
+
476
if (virtio_has_feature(s->host_features, VIRTIO_BLK_F_DISCARD) &&
477
(!conf->max_discard_sectors ||
478
conf->max_discard_sectors > BDRV_REQUEST_MAX_SECTORS)) {
479
diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c
480
index XXXXXXX..XXXXXXX 100644
481
--- a/hw/virtio/virtio-qmp.c
482
+++ b/hw/virtio/virtio-qmp.c
483
@@ -XXX,XX +XXX,XX @@ static const qmp_virtio_feature_map_t virtio_blk_feature_map[] = {
484
"VIRTIO_BLK_F_DISCARD: Discard command supported"),
485
FEATURE_ENTRY(VIRTIO_BLK_F_WRITE_ZEROES, \
486
"VIRTIO_BLK_F_WRITE_ZEROES: Write zeroes command supported"),
487
+ FEATURE_ENTRY(VIRTIO_BLK_F_ZONED, \
488
+ "VIRTIO_BLK_F_ZONED: Zoned block devices"),
489
#ifndef VIRTIO_BLK_NO_LEGACY
490
FEATURE_ENTRY(VIRTIO_BLK_F_BARRIER, \
491
"VIRTIO_BLK_F_BARRIER: Request barriers supported"),
492
--
493
2.40.1
diff view generated by jsdifflib
1
From: Akihiko Odaki <akihiko.odaki@gmail.com>
1
From: Sam Li <faithilikerun@gmail.com>
2
2
3
backend_defaults property allow users to control if default block
3
Taking account of the new zone append write operation for zoned devices,
4
properties should be decided with backend information.
4
BLOCK_ACCT_ZONE_APPEND enum is introduced as other I/O request type (read,
5
write, flush).
5
6
6
If it is off, any backend information will be discarded, which is
7
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
suitable if you plan to perform live migration to a different disk backend.
8
Message-id: 20230508051916.178322-3-faithilikerun@gmail.com
8
9
If it is on, a block device may utilize backend information more
10
aggressively.
11
12
By default, it is auto, which uses backend information for block
13
sizes and ignores the others, which is consistent with the older
14
versions.
15
16
Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com>
17
Message-id: 20210705130458.97642-2-akihiko.odaki@gmail.com
18
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
19
---
10
---
20
include/hw/block/block.h | 3 +++
11
qapi/block-core.json | 68 ++++++++++++++++++++++++++++++++------
21
hw/block/block.c | 42 ++++++++++++++++++++++++++++++++++----
12
qapi/block.json | 4 +++
22
tests/qemu-iotests/172.out | 38 ++++++++++++++++++++++++++++++++++
13
include/block/accounting.h | 1 +
23
3 files changed, 79 insertions(+), 4 deletions(-)
14
block/qapi-sysemu.c | 11 ++++++
15
block/qapi.c | 18 ++++++++++
16
hw/block/virtio-blk.c | 4 +++
17
tests/qemu-iotests/227.out | 18 ++++++++++
18
7 files changed, 113 insertions(+), 11 deletions(-)
24
19
25
diff --git a/include/hw/block/block.h b/include/hw/block/block.h
20
diff --git a/qapi/block-core.json b/qapi/block-core.json
26
index XXXXXXX..XXXXXXX 100644
21
index XXXXXXX..XXXXXXX 100644
27
--- a/include/hw/block/block.h
22
--- a/qapi/block-core.json
28
+++ b/include/hw/block/block.h
23
+++ b/qapi/block-core.json
29
@@ -XXX,XX +XXX,XX @@
24
@@ -XXX,XX +XXX,XX @@
30
25
# @min_wr_latency_ns: Minimum latency of write operations in the
31
typedef struct BlockConf {
26
# defined interval, in nanoseconds.
32
BlockBackend *blk;
27
#
33
+ OnOffAuto backend_defaults;
28
+# @min_zone_append_latency_ns: Minimum latency of zone append operations
34
uint32_t physical_block_size;
29
+# in the defined interval, in nanoseconds
35
uint32_t logical_block_size;
30
+# (since 8.1)
36
uint32_t min_io_size;
31
+#
37
@@ -XXX,XX +XXX,XX @@ static inline unsigned int get_physical_block_exp(BlockConf *conf)
32
# @min_flush_latency_ns: Minimum latency of flush operations in the
38
}
33
# defined interval, in nanoseconds.
39
34
#
40
#define DEFINE_BLOCK_PROPERTIES_BASE(_state, _conf) \
35
@@ -XXX,XX +XXX,XX @@
41
+ DEFINE_PROP_ON_OFF_AUTO("backend_defaults", _state, \
36
# @max_wr_latency_ns: Maximum latency of write operations in the
42
+ _conf.backend_defaults, ON_OFF_AUTO_AUTO), \
37
# defined interval, in nanoseconds.
43
DEFINE_PROP_BLOCKSIZE("logical_block_size", _state, \
38
#
44
_conf.logical_block_size), \
39
+# @max_zone_append_latency_ns: Maximum latency of zone append operations
45
DEFINE_PROP_BLOCKSIZE("physical_block_size", _state, \
40
+# in the defined interval, in nanoseconds
46
diff --git a/hw/block/block.c b/hw/block/block.c
41
+# (since 8.1)
47
index XXXXXXX..XXXXXXX 100644
42
+#
48
--- a/hw/block/block.c
43
# @max_flush_latency_ns: Maximum latency of flush operations in the
49
+++ b/hw/block/block.c
44
# defined interval, in nanoseconds.
50
@@ -XXX,XX +XXX,XX @@ bool blkconf_blocksizes(BlockConf *conf, Error **errp)
45
#
46
@@ -XXX,XX +XXX,XX @@
47
# @avg_wr_latency_ns: Average latency of write operations in the
48
# defined interval, in nanoseconds.
49
#
50
+# @avg_zone_append_latency_ns: Average latency of zone append operations
51
+# in the defined interval, in nanoseconds
52
+# (since 8.1)
53
+#
54
# @avg_flush_latency_ns: Average latency of flush operations in the
55
# defined interval, in nanoseconds.
56
#
57
@@ -XXX,XX +XXX,XX @@
58
# @avg_wr_queue_depth: Average number of pending write operations in
59
# the defined interval.
60
#
61
+# @avg_zone_append_queue_depth: Average number of pending zone append
62
+# operations in the defined interval
63
+# (since 8.1).
64
+#
65
# Since: 2.5
66
##
67
{ 'struct': 'BlockDeviceTimedStats',
68
'data': { 'interval_length': 'int', 'min_rd_latency_ns': 'int',
69
'max_rd_latency_ns': 'int', 'avg_rd_latency_ns': 'int',
70
'min_wr_latency_ns': 'int', 'max_wr_latency_ns': 'int',
71
- 'avg_wr_latency_ns': 'int', 'min_flush_latency_ns': 'int',
72
- 'max_flush_latency_ns': 'int', 'avg_flush_latency_ns': 'int',
73
- 'avg_rd_queue_depth': 'number', 'avg_wr_queue_depth': 'number' } }
74
+ 'avg_wr_latency_ns': 'int', 'min_zone_append_latency_ns': 'int',
75
+ 'max_zone_append_latency_ns': 'int',
76
+ 'avg_zone_append_latency_ns': 'int',
77
+ 'min_flush_latency_ns': 'int', 'max_flush_latency_ns': 'int',
78
+ 'avg_flush_latency_ns': 'int', 'avg_rd_queue_depth': 'number',
79
+ 'avg_wr_queue_depth': 'number',
80
+ 'avg_zone_append_queue_depth': 'number' } }
81
82
##
83
# @BlockDeviceStats:
84
@@ -XXX,XX +XXX,XX @@
85
#
86
# @wr_bytes: The number of bytes written by the device.
87
#
88
+# @zone_append_bytes: The number of bytes appended by the zoned devices
89
+# (since 8.1)
90
+#
91
# @unmap_bytes: The number of bytes unmapped by the device (Since 4.2)
92
#
93
# @rd_operations: The number of read operations performed by the
94
@@ -XXX,XX +XXX,XX @@
95
# @wr_operations: The number of write operations performed by the
96
# device.
97
#
98
+# @zone_append_operations: The number of zone append operations performed
99
+# by the zoned devices (since 8.1)
100
+#
101
# @flush_operations: The number of cache flush operations performed by
102
# the device (since 0.15)
103
#
104
@@ -XXX,XX +XXX,XX @@
105
# @wr_total_time_ns: Total time spent on writes in nanoseconds (since
106
# 0.15).
107
#
108
+# @zone_append_total_time_ns: Total time spent on zone append writes
109
+# in nanoseconds (since 8.1)
110
+#
111
# @flush_total_time_ns: Total time spent on cache flushes in
112
# nanoseconds (since 0.15).
113
#
114
@@ -XXX,XX +XXX,XX @@
115
# @wr_merged: Number of write requests that have been merged into
116
# another request (Since 2.3).
117
#
118
+# @zone_append_merged: Number of zone append requests that have been merged
119
+# into another request (since 8.1)
120
+#
121
# @unmap_merged: Number of unmap requests that have been merged into
122
# another request (Since 4.2)
123
#
124
@@ -XXX,XX +XXX,XX @@
125
# @failed_wr_operations: The number of failed write operations
126
# performed by the device (Since 2.5)
127
#
128
+# @failed_zone_append_operations: The number of failed zone append write
129
+# operations performed by the zoned devices
130
+# (since 8.1)
131
+#
132
# @failed_flush_operations: The number of failed flush operations
133
# performed by the device (Since 2.5)
134
#
135
@@ -XXX,XX +XXX,XX @@
136
# @invalid_wr_operations: The number of invalid write operations
137
# performed by the device (Since 2.5)
138
#
139
+# @invalid_zone_append_operations: The number of invalid zone append operations
140
+# performed by the zoned device (since 8.1)
141
+#
142
# @invalid_flush_operations: The number of invalid flush operations
143
# performed by the device (Since 2.5)
144
#
145
@@ -XXX,XX +XXX,XX @@
146
#
147
# @wr_latency_histogram: @BlockLatencyHistogramInfo. (Since 4.0)
148
#
149
+# @zone_append_latency_histogram: @BlockLatencyHistogramInfo. (since 8.1)
150
+#
151
# @flush_latency_histogram: @BlockLatencyHistogramInfo. (Since 4.0)
152
#
153
# Since: 0.14
154
##
155
{ 'struct': 'BlockDeviceStats',
156
- 'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'unmap_bytes' : 'int',
157
- 'rd_operations': 'int', 'wr_operations': 'int',
158
+ 'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'zone_append_bytes': 'int',
159
+ 'unmap_bytes' : 'int', 'rd_operations': 'int',
160
+ 'wr_operations': 'int', 'zone_append_operations': 'int',
161
'flush_operations': 'int', 'unmap_operations': 'int',
162
'rd_total_time_ns': 'int', 'wr_total_time_ns': 'int',
163
- 'flush_total_time_ns': 'int', 'unmap_total_time_ns': 'int',
164
- 'wr_highest_offset': 'int',
165
- 'rd_merged': 'int', 'wr_merged': 'int', 'unmap_merged': 'int',
166
- '*idle_time_ns': 'int',
167
+ 'zone_append_total_time_ns': 'int', 'flush_total_time_ns': 'int',
168
+ 'unmap_total_time_ns': 'int', 'wr_highest_offset': 'int',
169
+ 'rd_merged': 'int', 'wr_merged': 'int', 'zone_append_merged': 'int',
170
+ 'unmap_merged': 'int', '*idle_time_ns': 'int',
171
'failed_rd_operations': 'int', 'failed_wr_operations': 'int',
172
- 'failed_flush_operations': 'int', 'failed_unmap_operations': 'int',
173
- 'invalid_rd_operations': 'int', 'invalid_wr_operations': 'int',
174
+ 'failed_zone_append_operations': 'int',
175
+ 'failed_flush_operations': 'int',
176
+ 'failed_unmap_operations': 'int', 'invalid_rd_operations': 'int',
177
+ 'invalid_wr_operations': 'int',
178
+ 'invalid_zone_append_operations': 'int',
179
'invalid_flush_operations': 'int', 'invalid_unmap_operations': 'int',
180
'account_invalid': 'bool', 'account_failed': 'bool',
181
'timed_stats': ['BlockDeviceTimedStats'],
182
'*rd_latency_histogram': 'BlockLatencyHistogramInfo',
183
'*wr_latency_histogram': 'BlockLatencyHistogramInfo',
184
+ '*zone_append_latency_histogram': 'BlockLatencyHistogramInfo',
185
'*flush_latency_histogram': 'BlockLatencyHistogramInfo' } }
186
187
##
188
diff --git a/qapi/block.json b/qapi/block.json
189
index XXXXXXX..XXXXXXX 100644
190
--- a/qapi/block.json
191
+++ b/qapi/block.json
192
@@ -XXX,XX +XXX,XX @@
193
# @boundaries-write: list of interval boundary values for write
194
# latency histogram.
195
#
196
+# @boundaries-zap: list of interval boundary values for zone append write
197
+# latency histogram.
198
+#
199
# @boundaries-flush: list of interval boundary values for flush
200
# latency histogram.
201
#
202
@@ -XXX,XX +XXX,XX @@
203
'*boundaries': ['uint64'],
204
'*boundaries-read': ['uint64'],
205
'*boundaries-write': ['uint64'],
206
+ '*boundaries-zap': ['uint64'],
207
'*boundaries-flush': ['uint64'] },
208
'allow-preconfig': true }
209
diff --git a/include/block/accounting.h b/include/block/accounting.h
210
index XXXXXXX..XXXXXXX 100644
211
--- a/include/block/accounting.h
212
+++ b/include/block/accounting.h
213
@@ -XXX,XX +XXX,XX @@ enum BlockAcctType {
214
BLOCK_ACCT_READ,
215
BLOCK_ACCT_WRITE,
216
BLOCK_ACCT_FLUSH,
217
+ BLOCK_ACCT_ZONE_APPEND,
218
BLOCK_ACCT_UNMAP,
219
BLOCK_MAX_IOTYPE,
220
};
221
diff --git a/block/qapi-sysemu.c b/block/qapi-sysemu.c
222
index XXXXXXX..XXXXXXX 100644
223
--- a/block/qapi-sysemu.c
224
+++ b/block/qapi-sysemu.c
225
@@ -XXX,XX +XXX,XX @@ void qmp_block_latency_histogram_set(
226
bool has_boundaries, uint64List *boundaries,
227
bool has_boundaries_read, uint64List *boundaries_read,
228
bool has_boundaries_write, uint64List *boundaries_write,
229
+ bool has_boundaries_append, uint64List *boundaries_append,
230
bool has_boundaries_flush, uint64List *boundaries_flush,
231
Error **errp)
51
{
232
{
52
BlockBackend *blk = conf->blk;
233
@@ -XXX,XX +XXX,XX @@ void qmp_block_latency_histogram_set(
53
BlockSizes blocksizes;
54
- int backend_ret;
55
+ BlockDriverState *bs;
56
+ bool use_blocksizes;
57
+ bool use_bs;
58
+
59
+ switch (conf->backend_defaults) {
60
+ case ON_OFF_AUTO_AUTO:
61
+ use_blocksizes = !blk_probe_blocksizes(blk, &blocksizes);
62
+ use_bs = false;
63
+ break;
64
+
65
+ case ON_OFF_AUTO_ON:
66
+ use_blocksizes = !blk_probe_blocksizes(blk, &blocksizes);
67
+ bs = blk_bs(blk);
68
+ use_bs = bs;
69
+ break;
70
+
71
+ case ON_OFF_AUTO_OFF:
72
+ use_blocksizes = false;
73
+ use_bs = false;
74
+ break;
75
+
76
+ default:
77
+ abort();
78
+ }
79
80
- backend_ret = blk_probe_blocksizes(blk, &blocksizes);
81
/* fill in detected values if they are not defined via qemu command line */
82
if (!conf->physical_block_size) {
83
- if (!backend_ret) {
84
+ if (use_blocksizes) {
85
conf->physical_block_size = blocksizes.phys;
86
} else {
87
conf->physical_block_size = BDRV_SECTOR_SIZE;
88
}
234
}
89
}
235
}
90
if (!conf->logical_block_size) {
236
91
- if (!backend_ret) {
237
+ if (has_boundaries || has_boundaries_append) {
92
+ if (use_blocksizes) {
238
+ ret = block_latency_histogram_set(
93
conf->logical_block_size = blocksizes.log;
239
+ stats, BLOCK_ACCT_ZONE_APPEND,
94
} else {
240
+ has_boundaries_append ? boundaries_append : boundaries);
95
conf->logical_block_size = BDRV_SECTOR_SIZE;
241
+ if (ret) {
96
}
242
+ error_setg(errp, "Device '%s' set append write boundaries fail", id);
97
}
243
+ return;
98
+ if (use_bs) {
99
+ if (!conf->opt_io_size) {
100
+ conf->opt_io_size = bs->bl.opt_transfer;
101
+ }
102
+ if (conf->discard_granularity == -1) {
103
+ if (bs->bl.pdiscard_alignment) {
104
+ conf->discard_granularity = bs->bl.pdiscard_alignment;
105
+ } else if (bs->bl.request_alignment != 1) {
106
+ conf->discard_granularity = bs->bl.request_alignment;
107
+ }
108
+ }
244
+ }
109
+ }
245
+ }
110
246
+
111
if (conf->logical_block_size > conf->physical_block_size) {
247
if (has_boundaries || has_boundaries_flush) {
112
error_setg(errp,
248
ret = block_latency_histogram_set(
113
diff --git a/tests/qemu-iotests/172.out b/tests/qemu-iotests/172.out
249
stats, BLOCK_ACCT_FLUSH,
114
index XXXXXXX..XXXXXXX 100644
250
diff --git a/block/qapi.c b/block/qapi.c
115
--- a/tests/qemu-iotests/172.out
251
index XXXXXXX..XXXXXXX 100644
116
+++ b/tests/qemu-iotests/172.out
252
--- a/block/qapi.c
117
@@ -XXX,XX +XXX,XX @@ Testing:
253
+++ b/block/qapi.c
118
dev: floppy, id ""
254
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
119
unit = 0 (0x0)
255
120
drive = "floppy0"
256
ds->rd_bytes = stats->nr_bytes[BLOCK_ACCT_READ];
121
+ backend_defaults = "auto"
257
ds->wr_bytes = stats->nr_bytes[BLOCK_ACCT_WRITE];
122
logical_block_size = 512 (512 B)
258
+ ds->zone_append_bytes = stats->nr_bytes[BLOCK_ACCT_ZONE_APPEND];
123
physical_block_size = 512 (512 B)
259
ds->unmap_bytes = stats->nr_bytes[BLOCK_ACCT_UNMAP];
124
min_io_size = 0 (0 B)
260
ds->rd_operations = stats->nr_ops[BLOCK_ACCT_READ];
125
@@ -XXX,XX +XXX,XX @@ Testing: -fda TEST_DIR/t.qcow2
261
ds->wr_operations = stats->nr_ops[BLOCK_ACCT_WRITE];
126
dev: floppy, id ""
262
+ ds->zone_append_operations = stats->nr_ops[BLOCK_ACCT_ZONE_APPEND];
127
unit = 0 (0x0)
263
ds->unmap_operations = stats->nr_ops[BLOCK_ACCT_UNMAP];
128
drive = "floppy0"
264
129
+ backend_defaults = "auto"
265
ds->failed_rd_operations = stats->failed_ops[BLOCK_ACCT_READ];
130
logical_block_size = 512 (512 B)
266
ds->failed_wr_operations = stats->failed_ops[BLOCK_ACCT_WRITE];
131
physical_block_size = 512 (512 B)
267
+ ds->failed_zone_append_operations =
132
min_io_size = 0 (0 B)
268
+ stats->failed_ops[BLOCK_ACCT_ZONE_APPEND];
133
@@ -XXX,XX +XXX,XX @@ Testing: -fdb TEST_DIR/t.qcow2
269
ds->failed_flush_operations = stats->failed_ops[BLOCK_ACCT_FLUSH];
134
dev: floppy, id ""
270
ds->failed_unmap_operations = stats->failed_ops[BLOCK_ACCT_UNMAP];
135
unit = 1 (0x1)
271
136
drive = "floppy1"
272
ds->invalid_rd_operations = stats->invalid_ops[BLOCK_ACCT_READ];
137
+ backend_defaults = "auto"
273
ds->invalid_wr_operations = stats->invalid_ops[BLOCK_ACCT_WRITE];
138
logical_block_size = 512 (512 B)
274
+ ds->invalid_zone_append_operations =
139
physical_block_size = 512 (512 B)
275
+ stats->invalid_ops[BLOCK_ACCT_ZONE_APPEND];
140
min_io_size = 0 (0 B)
276
ds->invalid_flush_operations =
141
@@ -XXX,XX +XXX,XX @@ Testing: -fdb TEST_DIR/t.qcow2
277
stats->invalid_ops[BLOCK_ACCT_FLUSH];
142
dev: floppy, id ""
278
ds->invalid_unmap_operations = stats->invalid_ops[BLOCK_ACCT_UNMAP];
143
unit = 0 (0x0)
279
144
drive = "floppy0"
280
ds->rd_merged = stats->merged[BLOCK_ACCT_READ];
145
+ backend_defaults = "auto"
281
ds->wr_merged = stats->merged[BLOCK_ACCT_WRITE];
146
logical_block_size = 512 (512 B)
282
+ ds->zone_append_merged = stats->merged[BLOCK_ACCT_ZONE_APPEND];
147
physical_block_size = 512 (512 B)
283
ds->unmap_merged = stats->merged[BLOCK_ACCT_UNMAP];
148
min_io_size = 0 (0 B)
284
ds->flush_operations = stats->nr_ops[BLOCK_ACCT_FLUSH];
149
@@ -XXX,XX +XXX,XX @@ Testing: -fda TEST_DIR/t.qcow2 -fdb TEST_DIR/t.qcow2.2
285
ds->wr_total_time_ns = stats->total_time_ns[BLOCK_ACCT_WRITE];
150
dev: floppy, id ""
286
+ ds->zone_append_total_time_ns =
151
unit = 1 (0x1)
287
+ stats->total_time_ns[BLOCK_ACCT_ZONE_APPEND];
152
drive = "floppy1"
288
ds->rd_total_time_ns = stats->total_time_ns[BLOCK_ACCT_READ];
153
+ backend_defaults = "auto"
289
ds->flush_total_time_ns = stats->total_time_ns[BLOCK_ACCT_FLUSH];
154
logical_block_size = 512 (512 B)
290
ds->unmap_total_time_ns = stats->total_time_ns[BLOCK_ACCT_UNMAP];
155
physical_block_size = 512 (512 B)
291
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
156
min_io_size = 0 (0 B)
292
157
@@ -XXX,XX +XXX,XX @@ Testing: -fda TEST_DIR/t.qcow2 -fdb TEST_DIR/t.qcow2.2
293
TimedAverage *rd = &ts->latency[BLOCK_ACCT_READ];
158
dev: floppy, id ""
294
TimedAverage *wr = &ts->latency[BLOCK_ACCT_WRITE];
159
unit = 0 (0x0)
295
+ TimedAverage *zap = &ts->latency[BLOCK_ACCT_ZONE_APPEND];
160
drive = "floppy0"
296
TimedAverage *fl = &ts->latency[BLOCK_ACCT_FLUSH];
161
+ backend_defaults = "auto"
297
162
logical_block_size = 512 (512 B)
298
dev_stats->interval_length = ts->interval_length;
163
physical_block_size = 512 (512 B)
299
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
164
min_io_size = 0 (0 B)
300
dev_stats->max_wr_latency_ns = timed_average_max(wr);
165
@@ -XXX,XX +XXX,XX @@ Testing: -fdb
301
dev_stats->avg_wr_latency_ns = timed_average_avg(wr);
166
dev: floppy, id ""
302
167
unit = 1 (0x1)
303
+ dev_stats->min_zone_append_latency_ns = timed_average_min(zap);
168
drive = "floppy1"
304
+ dev_stats->max_zone_append_latency_ns = timed_average_max(zap);
169
+ backend_defaults = "auto"
305
+ dev_stats->avg_zone_append_latency_ns = timed_average_avg(zap);
170
logical_block_size = 512 (512 B)
306
+
171
physical_block_size = 512 (512 B)
307
dev_stats->min_flush_latency_ns = timed_average_min(fl);
172
min_io_size = 0 (0 B)
308
dev_stats->max_flush_latency_ns = timed_average_max(fl);
173
@@ -XXX,XX +XXX,XX @@ Testing: -fdb
309
dev_stats->avg_flush_latency_ns = timed_average_avg(fl);
174
dev: floppy, id ""
310
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
175
unit = 0 (0x0)
311
block_acct_queue_depth(ts, BLOCK_ACCT_READ);
176
drive = "floppy0"
312
dev_stats->avg_wr_queue_depth =
177
+ backend_defaults = "auto"
313
block_acct_queue_depth(ts, BLOCK_ACCT_WRITE);
178
logical_block_size = 512 (512 B)
314
+ dev_stats->avg_zone_append_queue_depth =
179
physical_block_size = 512 (512 B)
315
+ block_acct_queue_depth(ts, BLOCK_ACCT_ZONE_APPEND);
180
min_io_size = 0 (0 B)
316
181
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2
317
QAPI_LIST_PREPEND(ds->timed_stats, dev_stats);
182
dev: floppy, id ""
318
}
183
unit = 0 (0x0)
319
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
184
drive = "floppy0"
320
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_READ]);
185
+ backend_defaults = "auto"
321
ds->wr_latency_histogram
186
logical_block_size = 512 (512 B)
322
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_WRITE]);
187
physical_block_size = 512 (512 B)
323
+ ds->zone_append_latency_histogram
188
min_io_size = 0 (0 B)
324
+ = bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_ZONE_APPEND]);
189
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2,index=1
325
ds->flush_latency_histogram
190
dev: floppy, id ""
326
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_FLUSH]);
191
unit = 1 (0x1)
327
}
192
drive = "floppy1"
328
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
193
+ backend_defaults = "auto"
329
index XXXXXXX..XXXXXXX 100644
194
logical_block_size = 512 (512 B)
330
--- a/hw/block/virtio-blk.c
195
physical_block_size = 512 (512 B)
331
+++ b/hw/block/virtio-blk.c
196
min_io_size = 0 (0 B)
332
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
197
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2,index=1
333
data->in_num = in_num;
198
dev: floppy, id ""
334
data->zone_append_data.offset = offset;
199
unit = 0 (0x0)
335
qemu_iovec_init_external(&req->qiov, out_iov, out_num);
200
drive = "floppy0"
336
+
201
+ backend_defaults = "auto"
337
+ block_acct_start(blk_get_stats(s->blk), &req->acct, len,
202
logical_block_size = 512 (512 B)
338
+ BLOCK_ACCT_ZONE_APPEND);
203
physical_block_size = 512 (512 B)
339
+
204
min_io_size = 0 (0 B)
340
blk_aio_zone_append(s->blk, &data->zone_append_data.offset, &req->qiov, 0,
205
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2 -drive if=floppy,file=TEST_DIR/t
341
virtio_blk_zone_append_complete, data);
206
dev: floppy, id ""
342
return 0;
207
unit = 1 (0x1)
343
diff --git a/tests/qemu-iotests/227.out b/tests/qemu-iotests/227.out
208
drive = "floppy1"
344
index XXXXXXX..XXXXXXX 100644
209
+ backend_defaults = "auto"
345
--- a/tests/qemu-iotests/227.out
210
logical_block_size = 512 (512 B)
346
+++ b/tests/qemu-iotests/227.out
211
physical_block_size = 512 (512 B)
347
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,read-zeroes=on,if=virtio
212
min_io_size = 0 (0 B)
348
"stats": {
213
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2 -drive if=floppy,file=TEST_DIR/t
349
"unmap_operations": 0,
214
dev: floppy, id ""
350
"unmap_merged": 0,
215
unit = 0 (0x0)
351
+ "failed_zone_append_operations": 0,
216
drive = "floppy0"
352
"flush_total_time_ns": 0,
217
+ backend_defaults = "auto"
353
"wr_highest_offset": 0,
218
logical_block_size = 512 (512 B)
354
"wr_total_time_ns": 0,
219
physical_block_size = 512 (512 B)
355
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,read-zeroes=on,if=virtio
220
min_io_size = 0 (0 B)
356
"timed_stats": [
221
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -device floppy,drive=none0
357
],
222
dev: floppy, id ""
358
"failed_unmap_operations": 0,
223
unit = 0 (0x0)
359
+ "zone_append_merged": 0,
224
drive = "none0"
360
"failed_flush_operations": 0,
225
+ backend_defaults = "auto"
361
"account_invalid": true,
226
logical_block_size = 512 (512 B)
362
"rd_total_time_ns": 0,
227
physical_block_size = 512 (512 B)
363
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,read-zeroes=on,if=virtio
228
min_io_size = 0 (0 B)
364
"unmap_total_time_ns": 0,
229
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -device floppy,drive=none0,unit=1
365
"invalid_flush_operations": 0,
230
dev: floppy, id ""
366
"account_failed": true,
231
unit = 1 (0x1)
367
+ "zone_append_total_time_ns": 0,
232
drive = "none0"
368
+ "zone_append_operations": 0,
233
+ backend_defaults = "auto"
369
"rd_operations": 0,
234
logical_block_size = 512 (512 B)
370
+ "zone_append_bytes": 0,
235
physical_block_size = 512 (512 B)
371
+ "invalid_zone_append_operations": 0,
236
min_io_size = 0 (0 B)
372
"invalid_wr_operations": 0,
237
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qco
373
"invalid_rd_operations": 0
238
dev: floppy, id ""
374
},
239
unit = 1 (0x1)
375
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,if=none
240
drive = "none1"
376
"stats": {
241
+ backend_defaults = "auto"
377
"unmap_operations": 0,
242
logical_block_size = 512 (512 B)
378
"unmap_merged": 0,
243
physical_block_size = 512 (512 B)
379
+ "failed_zone_append_operations": 0,
244
min_io_size = 0 (0 B)
380
"flush_total_time_ns": 0,
245
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qco
381
"wr_highest_offset": 0,
246
dev: floppy, id ""
382
"wr_total_time_ns": 0,
247
unit = 0 (0x0)
383
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,if=none
248
drive = "none0"
384
"timed_stats": [
249
+ backend_defaults = "auto"
385
],
250
logical_block_size = 512 (512 B)
386
"failed_unmap_operations": 0,
251
physical_block_size = 512 (512 B)
387
+ "zone_append_merged": 0,
252
min_io_size = 0 (0 B)
388
"failed_flush_operations": 0,
253
@@ -XXX,XX +XXX,XX @@ Testing: -fda TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
389
"account_invalid": true,
254
dev: floppy, id ""
390
"rd_total_time_ns": 0,
255
unit = 1 (0x1)
391
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,if=none
256
drive = "none0"
392
"unmap_total_time_ns": 0,
257
+ backend_defaults = "auto"
393
"invalid_flush_operations": 0,
258
logical_block_size = 512 (512 B)
394
"account_failed": true,
259
physical_block_size = 512 (512 B)
395
+ "zone_append_total_time_ns": 0,
260
min_io_size = 0 (0 B)
396
+ "zone_append_operations": 0,
261
@@ -XXX,XX +XXX,XX @@ Testing: -fda TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
397
"rd_operations": 0,
262
dev: floppy, id ""
398
+ "zone_append_bytes": 0,
263
unit = 0 (0x0)
399
+ "invalid_zone_append_operations": 0,
264
drive = "floppy0"
400
"invalid_wr_operations": 0,
265
+ backend_defaults = "auto"
401
"invalid_rd_operations": 0
266
logical_block_size = 512 (512 B)
402
},
267
physical_block_size = 512 (512 B)
403
@@ -XXX,XX +XXX,XX @@ Testing: -blockdev driver=null-co,read-zeroes=on,node-name=null -device virtio-b
268
min_io_size = 0 (0 B)
404
"stats": {
269
@@ -XXX,XX +XXX,XX @@ Testing: -fda TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
405
"unmap_operations": 0,
270
dev: floppy, id ""
406
"unmap_merged": 0,
271
unit = 1 (0x1)
407
+ "failed_zone_append_operations": 0,
272
drive = "none0"
408
"flush_total_time_ns": 0,
273
+ backend_defaults = "auto"
409
"wr_highest_offset": 0,
274
logical_block_size = 512 (512 B)
410
"wr_total_time_ns": 0,
275
physical_block_size = 512 (512 B)
411
@@ -XXX,XX +XXX,XX @@ Testing: -blockdev driver=null-co,read-zeroes=on,node-name=null -device virtio-b
276
min_io_size = 0 (0 B)
412
"timed_stats": [
277
@@ -XXX,XX +XXX,XX @@ Testing: -fda TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
413
],
278
dev: floppy, id ""
414
"failed_unmap_operations": 0,
279
unit = 0 (0x0)
415
+ "zone_append_merged": 0,
280
drive = "floppy0"
416
"failed_flush_operations": 0,
281
+ backend_defaults = "auto"
417
"account_invalid": true,
282
logical_block_size = 512 (512 B)
418
"rd_total_time_ns": 0,
283
physical_block_size = 512 (512 B)
419
@@ -XXX,XX +XXX,XX @@ Testing: -blockdev driver=null-co,read-zeroes=on,node-name=null -device virtio-b
284
min_io_size = 0 (0 B)
420
"unmap_total_time_ns": 0,
285
@@ -XXX,XX +XXX,XX @@ Testing: -fdb TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
421
"invalid_flush_operations": 0,
286
dev: floppy, id ""
422
"account_failed": true,
287
unit = 0 (0x0)
423
+ "zone_append_total_time_ns": 0,
288
drive = "none0"
424
+ "zone_append_operations": 0,
289
+ backend_defaults = "auto"
425
"rd_operations": 0,
290
logical_block_size = 512 (512 B)
426
+ "zone_append_bytes": 0,
291
physical_block_size = 512 (512 B)
427
+ "invalid_zone_append_operations": 0,
292
min_io_size = 0 (0 B)
428
"invalid_wr_operations": 0,
293
@@ -XXX,XX +XXX,XX @@ Testing: -fdb TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
429
"invalid_rd_operations": 0
294
dev: floppy, id ""
430
},
295
unit = 1 (0x1)
296
drive = "floppy1"
297
+ backend_defaults = "auto"
298
logical_block_size = 512 (512 B)
299
physical_block_size = 512 (512 B)
300
min_io_size = 0 (0 B)
301
@@ -XXX,XX +XXX,XX @@ Testing: -fdb TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
302
dev: floppy, id ""
303
unit = 0 (0x0)
304
drive = "none0"
305
+ backend_defaults = "auto"
306
logical_block_size = 512 (512 B)
307
physical_block_size = 512 (512 B)
308
min_io_size = 0 (0 B)
309
@@ -XXX,XX +XXX,XX @@ Testing: -fdb TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
310
dev: floppy, id ""
311
unit = 1 (0x1)
312
drive = "floppy1"
313
+ backend_defaults = "auto"
314
logical_block_size = 512 (512 B)
315
physical_block_size = 512 (512 B)
316
min_io_size = 0 (0 B)
317
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.q
318
dev: floppy, id ""
319
unit = 1 (0x1)
320
drive = "none0"
321
+ backend_defaults = "auto"
322
logical_block_size = 512 (512 B)
323
physical_block_size = 512 (512 B)
324
min_io_size = 0 (0 B)
325
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.q
326
dev: floppy, id ""
327
unit = 0 (0x0)
328
drive = "floppy0"
329
+ backend_defaults = "auto"
330
logical_block_size = 512 (512 B)
331
physical_block_size = 512 (512 B)
332
min_io_size = 0 (0 B)
333
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.q
334
dev: floppy, id ""
335
unit = 1 (0x1)
336
drive = "none0"
337
+ backend_defaults = "auto"
338
logical_block_size = 512 (512 B)
339
physical_block_size = 512 (512 B)
340
min_io_size = 0 (0 B)
341
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.q
342
dev: floppy, id ""
343
unit = 0 (0x0)
344
drive = "floppy0"
345
+ backend_defaults = "auto"
346
logical_block_size = 512 (512 B)
347
physical_block_size = 512 (512 B)
348
min_io_size = 0 (0 B)
349
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -global floppy.drive=none0 -device
350
dev: floppy, id ""
351
unit = 0 (0x0)
352
drive = "none0"
353
+ backend_defaults = "auto"
354
logical_block_size = 512 (512 B)
355
physical_block_size = 512 (512 B)
356
min_io_size = 0 (0 B)
357
@@ -XXX,XX +XXX,XX @@ Testing: -device floppy
358
dev: floppy, id ""
359
unit = 0 (0x0)
360
drive = ""
361
+ backend_defaults = "auto"
362
logical_block_size = 512 (512 B)
363
physical_block_size = 512 (512 B)
364
min_io_size = 0 (0 B)
365
@@ -XXX,XX +XXX,XX @@ Testing: -device floppy,drive-type=120
366
dev: floppy, id ""
367
unit = 0 (0x0)
368
drive = ""
369
+ backend_defaults = "auto"
370
logical_block_size = 512 (512 B)
371
physical_block_size = 512 (512 B)
372
min_io_size = 0 (0 B)
373
@@ -XXX,XX +XXX,XX @@ Testing: -device floppy,drive-type=144
374
dev: floppy, id ""
375
unit = 0 (0x0)
376
drive = ""
377
+ backend_defaults = "auto"
378
logical_block_size = 512 (512 B)
379
physical_block_size = 512 (512 B)
380
min_io_size = 0 (0 B)
381
@@ -XXX,XX +XXX,XX @@ Testing: -device floppy,drive-type=288
382
dev: floppy, id ""
383
unit = 0 (0x0)
384
drive = ""
385
+ backend_defaults = "auto"
386
logical_block_size = 512 (512 B)
387
physical_block_size = 512 (512 B)
388
min_io_size = 0 (0 B)
389
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -device floppy,drive=none0,drive-t
390
dev: floppy, id ""
391
unit = 0 (0x0)
392
drive = "none0"
393
+ backend_defaults = "auto"
394
logical_block_size = 512 (512 B)
395
physical_block_size = 512 (512 B)
396
min_io_size = 0 (0 B)
397
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -device floppy,drive=none0,drive-t
398
dev: floppy, id ""
399
unit = 0 (0x0)
400
drive = "none0"
401
+ backend_defaults = "auto"
402
logical_block_size = 512 (512 B)
403
physical_block_size = 512 (512 B)
404
min_io_size = 0 (0 B)
405
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -device floppy,drive=none0,logical
406
dev: floppy, id ""
407
unit = 0 (0x0)
408
drive = "none0"
409
+ backend_defaults = "auto"
410
logical_block_size = 512 (512 B)
411
physical_block_size = 512 (512 B)
412
min_io_size = 0 (0 B)
413
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -device floppy,drive=none0,physica
414
dev: floppy, id ""
415
unit = 0 (0x0)
416
drive = "none0"
417
+ backend_defaults = "auto"
418
logical_block_size = 512 (512 B)
419
physical_block_size = 512 (512 B)
420
min_io_size = 0 (0 B)
421
--
431
--
422
2.31.1
432
2.40.1
423
diff view generated by jsdifflib
New patch
1
From: Sam Li <faithilikerun@gmail.com>
1
2
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Message-id: 20230508051916.178322-4-faithilikerun@gmail.com
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
---
8
hw/block/virtio-blk.c | 12 ++++++++++++
9
hw/block/trace-events | 7 +++++++
10
2 files changed, 19 insertions(+)
11
12
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
13
index XXXXXXX..XXXXXXX 100644
14
--- a/hw/block/virtio-blk.c
15
+++ b/hw/block/virtio-blk.c
16
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_report_complete(void *opaque, int ret)
17
int64_t nz = data->zone_report_data.nr_zones;
18
int8_t err_status = VIRTIO_BLK_S_OK;
19
20
+ trace_virtio_blk_zone_report_complete(vdev, req, nz, ret);
21
if (ret) {
22
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
23
goto out;
24
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_handle_zone_report(VirtIOBlockReq *req,
25
nr_zones = (req->in_len - sizeof(struct virtio_blk_inhdr) -
26
sizeof(struct virtio_blk_zone_report)) /
27
sizeof(struct virtio_blk_zone_descriptor);
28
+ trace_virtio_blk_handle_zone_report(vdev, req,
29
+ offset >> BDRV_SECTOR_BITS, nr_zones);
30
31
zone_size = sizeof(BlockZoneDescriptor) * nr_zones;
32
data = g_malloc(sizeof(ZoneCmdData));
33
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_mgmt_complete(void *opaque, int ret)
34
{
35
VirtIOBlockReq *req = opaque;
36
VirtIOBlock *s = req->dev;
37
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
38
int8_t err_status = VIRTIO_BLK_S_OK;
39
+ trace_virtio_blk_zone_mgmt_complete(vdev, req,ret);
40
41
if (ret) {
42
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
43
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
44
/* Entire drive capacity */
45
offset = 0;
46
len = capacity;
47
+ trace_virtio_blk_handle_zone_reset_all(vdev, req, 0,
48
+ bs->total_sectors);
49
} else {
50
if (bs->bl.zone_size > capacity - offset) {
51
/* The zoned device allows the last smaller zone. */
52
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
53
} else {
54
len = bs->bl.zone_size;
55
}
56
+ trace_virtio_blk_handle_zone_mgmt(vdev, req, op,
57
+ offset >> BDRV_SECTOR_BITS,
58
+ len >> BDRV_SECTOR_BITS);
59
}
60
61
if (!check_zoned_request(s, offset, len, false, &err_status)) {
62
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_append_complete(void *opaque, int ret)
63
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
64
goto out;
65
}
66
+ trace_virtio_blk_zone_append_complete(vdev, req, append_sector, ret);
67
68
out:
69
aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
70
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
71
int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
72
int64_t len = iov_size(out_iov, out_num);
73
74
+ trace_virtio_blk_handle_zone_append(vdev, req, offset >> BDRV_SECTOR_BITS);
75
if (!check_zoned_request(s, offset, len, true, &err_status)) {
76
goto out;
77
}
78
diff --git a/hw/block/trace-events b/hw/block/trace-events
79
index XXXXXXX..XXXXXXX 100644
80
--- a/hw/block/trace-events
81
+++ b/hw/block/trace-events
82
@@ -XXX,XX +XXX,XX @@ pflash_write_unknown(const char *name, uint8_t cmd) "%s: unknown command 0x%02x"
83
# virtio-blk.c
84
virtio_blk_req_complete(void *vdev, void *req, int status) "vdev %p req %p status %d"
85
virtio_blk_rw_complete(void *vdev, void *req, int ret) "vdev %p req %p ret %d"
86
+virtio_blk_zone_report_complete(void *vdev, void *req, unsigned int nr_zones, int ret) "vdev %p req %p nr_zones %u ret %d"
87
+virtio_blk_zone_mgmt_complete(void *vdev, void *req, int ret) "vdev %p req %p ret %d"
88
+virtio_blk_zone_append_complete(void *vdev, void *req, int64_t sector, int ret) "vdev %p req %p, append sector 0x%" PRIx64 " ret %d"
89
virtio_blk_handle_write(void *vdev, void *req, uint64_t sector, size_t nsectors) "vdev %p req %p sector %"PRIu64" nsectors %zu"
90
virtio_blk_handle_read(void *vdev, void *req, uint64_t sector, size_t nsectors) "vdev %p req %p sector %"PRIu64" nsectors %zu"
91
virtio_blk_submit_multireq(void *vdev, void *mrb, int start, int num_reqs, uint64_t offset, size_t size, bool is_write) "vdev %p mrb %p start %d num_reqs %d offset %"PRIu64" size %zu is_write %d"
92
+virtio_blk_handle_zone_report(void *vdev, void *req, int64_t sector, unsigned int nr_zones) "vdev %p req %p sector 0x%" PRIx64 " nr_zones %u"
93
+virtio_blk_handle_zone_mgmt(void *vdev, void *req, uint8_t op, int64_t sector, int64_t len) "vdev %p req %p op 0x%x sector 0x%" PRIx64 " len 0x%" PRIx64 ""
94
+virtio_blk_handle_zone_reset_all(void *vdev, void *req, int64_t sector, int64_t len) "vdev %p req %p sector 0x%" PRIx64 " cap 0x%" PRIx64 ""
95
+virtio_blk_handle_zone_append(void *vdev, void *req, int64_t sector) "vdev %p req %p, append sector 0x%" PRIx64 ""
96
97
# hd-geometry.c
98
hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p LCHS %d %d %d"
99
--
100
2.40.1
diff view generated by jsdifflib
New patch
1
From: Sam Li <faithilikerun@gmail.com>
1
2
3
Add the documentation about the example of using virtio-blk driver
4
to pass the zoned block devices through to the guest.
5
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Message-id: 20230508051916.178322-5-faithilikerun@gmail.com
8
[Fix pre-formatted code syntax
9
--Stefan]
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
12
docs/devel/zoned-storage.rst | 19 +++++++++++++++++++
13
1 file changed, 19 insertions(+)
14
15
diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
16
index XXXXXXX..XXXXXXX 100644
17
--- a/docs/devel/zoned-storage.rst
18
+++ b/docs/devel/zoned-storage.rst
19
@@ -XXX,XX +XXX,XX @@ APIs for zoned storage emulation or testing.
20
For example, to test zone_report on a null_blk device using qemu-io is::
21
22
$ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0 -c "zrp offset nr_zones"
23
+
24
+To expose the host's zoned block device through virtio-blk, the command line
25
+can be (includes the -device parameter)::
26
+
27
+ -blockdev node-name=drive0,driver=host_device,filename=/dev/nullb0,cache.direct=on \
28
+ -device virtio-blk-pci,drive=drive0
29
+
30
+Or only use the -drive parameter::
31
+
32
+ -driver driver=host_device,file=/dev/nullb0,if=virtio,cache.direct=on
33
+
34
+Additionally, QEMU has several ways of supporting zoned storage, including:
35
+(1) Using virtio-scsi: --device scsi-block allows for the passing through of
36
+SCSI ZBC devices, enabling the attachment of ZBC or ZAC HDDs to QEMU.
37
+(2) PCI device pass-through: While NVMe ZNS emulation is available for testing
38
+purposes, it cannot yet pass through a zoned device from the host. To pass on
39
+the NVMe ZNS device to the guest, use VFIO PCI pass the entire NVMe PCI adapter
40
+through to the guest. Likewise, an HDD HBA can be passed on to QEMU all HDDs
41
+attached to the HBA.
42
--
43
2.40.1
diff view generated by jsdifflib