1
The following changes since commit ac793156f650ae2d77834932d72224175ee69086:
1
The following changes since commit 8844bb8d896595ee1d25d21c770e6e6f29803097:
2
2
3
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20201020-1' into staging (2020-10-20 21:11:35 +0100)
3
Merge tag 'or1k-pull-request-20230513' of https://github.com/stffrdhrn/qemu into staging (2023-05-13 11:23:14 +0100)
4
4
5
are available in the Git repository at:
5
are available in the Git repository at:
6
6
7
https://gitlab.com/stefanha/qemu.git tags/block-pull-request
7
https://gitlab.com/stefanha/qemu.git tags/block-pull-request
8
8
9
for you to fetch changes up to 32a3fd65e7e3551337fd26bfc0e2f899d70c028c:
9
for you to fetch changes up to 01562fee5f3ad4506d57dbcf4b1903b565eceec7:
10
10
11
iotests: add commit top->base cases to 274 (2020-10-22 09:55:39 +0100)
11
docs/zoned-storage:add zoned emulation use case (2023-05-15 08:19:04 -0400)
12
12
13
----------------------------------------------------------------
13
----------------------------------------------------------------
14
Pull request
14
Pull request
15
15
16
This pull request contain's Sam Li's zoned storage support in the QEMU block
17
layer and virtio-blk emulation.
18
16
v2:
19
v2:
17
* Fix format string issues on 32-bit hosts [Peter]
20
- Sam fixed the CI failures. CI passes for me now. [Richard]
18
* Fix qemu-nbd.c CONFIG_POSIX ifdef issue [Eric]
19
* Fix missing eventfd.h header on macOS [Peter]
20
* Drop unreliable vhost-user-blk test (will send a new patch when ready) [Peter]
21
22
This pull request contains the vhost-user-blk server by Coiby Xu along with my
23
additions, block/nvme.c alignment and hardware error statistics by Philippe
24
Mathieu-Daudé, and bdrv_co_block_status_above() fixes by Vladimir
25
Sementsov-Ogievskiy.
26
21
27
----------------------------------------------------------------
22
----------------------------------------------------------------
28
23
29
Coiby Xu (6):
24
Sam Li (16):
30
libvhost-user: Allow vu_message_read to be replaced
25
block/block-common: add zoned device structs
31
libvhost-user: remove watch for kick_fd when de-initialize vu-dev
26
block/file-posix: introduce helper functions for sysfs attributes
32
util/vhost-user-server: generic vhost user server
27
block/block-backend: add block layer APIs resembling Linux
33
block: move logical block size check function to a common utility
28
ZonedBlockDevice ioctls
34
function
29
block/raw-format: add zone operations to pass through requests
35
block/export: vhost-user block device backend server
30
block: add zoned BlockDriver check to block layer
36
MAINTAINERS: Add vhost-user block device backend server maintainer
31
iotests: test new zone operations
32
block: add some trace events for new block layer APIs
33
docs/zoned-storage: add zoned device documentation
34
file-posix: add tracking of the zone write pointers
35
block: introduce zone append write for zoned devices
36
qemu-iotests: test zone append operation
37
block: add some trace events for zone append
38
virtio-blk: add zoned storage emulation for zoned devices
39
block: add accounting for zone append operation
40
virtio-blk: add some trace events for zoned emulation
41
docs/zoned-storage:add zoned emulation use case
37
42
38
Philippe Mathieu-Daudé (1):
43
docs/devel/index-api.rst | 1 +
39
block/nvme: Add driver statistics for access alignment and hw errors
44
docs/devel/zoned-storage.rst | 62 +++
40
45
qapi/block-core.json | 68 ++-
41
Stefan Hajnoczi (16):
46
qapi/block.json | 4 +
42
util/vhost-user-server: s/fileds/fields/ typo fix
47
meson.build | 5 +
43
util/vhost-user-server: drop unnecessary QOM cast
48
include/block/accounting.h | 1 +
44
util/vhost-user-server: drop unnecessary watch deletion
49
include/block/block-common.h | 57 ++
45
block/export: consolidate request structs into VuBlockReq
50
include/block/block-io.h | 13 +
46
util/vhost-user-server: drop unused DevicePanicNotifier
51
include/block/block_int-common.h | 37 ++
47
util/vhost-user-server: fix memory leak in vu_message_read()
52
include/block/raw-aio.h | 8 +-
48
util/vhost-user-server: check EOF when reading payload
53
include/sysemu/block-backend-io.h | 27 +
49
util/vhost-user-server: rework vu_client_trip() coroutine lifecycle
54
block.c | 19 +
50
block/export: report flush errors
55
block/block-backend.c | 198 +++++++
51
block/export: convert vhost-user-blk server to block export API
56
block/file-posix.c | 692 +++++++++++++++++++++++--
52
util/vhost-user-server: move header to include/
57
block/io.c | 68 +++
53
util/vhost-user-server: use static library in meson.build
58
block/io_uring.c | 4 +
54
qemu-storage-daemon: avoid compiling blockdev_ss twice
59
block/linux-aio.c | 3 +
55
block: move block exports to libblockdev
60
block/qapi-sysemu.c | 11 +
56
block/export: add iothread and fixed-iothread options
61
block/qapi.c | 18 +
57
block/export: add vhost-user-blk multi-queue support
62
block/raw-format.c | 26 +
58
63
hw/block/virtio-blk-common.c | 2 +
59
Vladimir Sementsov-Ogievskiy (5):
64
hw/block/virtio-blk.c | 405 +++++++++++++++
60
block/io: fix bdrv_co_block_status_above
65
hw/virtio/virtio-qmp.c | 2 +
61
block/io: bdrv_common_block_status_above: support include_base
66
qemu-io-cmds.c | 224 ++++++++
62
block/io: bdrv_common_block_status_above: support bs == base
67
block/trace-events | 4 +
63
block/io: fix bdrv_is_allocated_above
68
docs/system/qemu-block-drivers.rst.inc | 6 +
64
iotests: add commit top->base cases to 274
69
hw/block/trace-events | 7 +
65
70
tests/qemu-iotests/227.out | 18 +
66
MAINTAINERS | 9 +
71
tests/qemu-iotests/tests/zoned | 105 ++++
67
qapi/block-core.json | 24 +-
72
tests/qemu-iotests/tests/zoned.out | 69 +++
68
qapi/block-export.json | 36 +-
73
30 files changed, 2106 insertions(+), 58 deletions(-)
69
block/coroutines.h | 2 +
74
create mode 100644 docs/devel/zoned-storage.rst
70
block/export/vhost-user-blk-server.h | 19 +
75
create mode 100755 tests/qemu-iotests/tests/zoned
71
contrib/libvhost-user/libvhost-user.h | 21 +
76
create mode 100644 tests/qemu-iotests/tests/zoned.out
72
include/qemu/vhost-user-server.h | 65 +++
73
util/block-helpers.h | 19 +
74
block/export/export.c | 37 +-
75
block/export/vhost-user-blk-server.c | 431 ++++++++++++++++++++
76
block/io.c | 132 +++---
77
block/nvme.c | 27 ++
78
block/qcow2.c | 16 +-
79
contrib/libvhost-user/libvhost-user-glib.c | 2 +-
80
contrib/libvhost-user/libvhost-user.c | 15 +-
81
hw/core/qdev-properties-system.c | 31 +-
82
nbd/server.c | 2 -
83
qemu-nbd.c | 21 +-
84
softmmu/vl.c | 4 +
85
stubs/blk-exp-close-all.c | 7 +
86
tests/vhost-user-bridge.c | 2 +
87
tools/virtiofsd/fuse_virtio.c | 4 +-
88
util/block-helpers.c | 46 +++
89
util/vhost-user-server.c | 446 +++++++++++++++++++++
90
block/export/meson.build | 3 +-
91
contrib/libvhost-user/meson.build | 1 +
92
meson.build | 22 +-
93
nbd/meson.build | 2 +
94
storage-daemon/meson.build | 3 +-
95
stubs/meson.build | 1 +
96
tests/qemu-iotests/274 | 20 +
97
tests/qemu-iotests/274.out | 68 ++++
98
util/meson.build | 4 +
99
33 files changed, 1420 insertions(+), 122 deletions(-)
100
create mode 100644 block/export/vhost-user-blk-server.h
101
create mode 100644 include/qemu/vhost-user-server.h
102
create mode 100644 util/block-helpers.h
103
create mode 100644 block/export/vhost-user-blk-server.c
104
create mode 100644 stubs/blk-exp-close-all.c
105
create mode 100644 util/block-helpers.c
106
create mode 100644 util/vhost-user-server.c
107
77
108
--
78
--
109
2.26.2
79
2.40.1
110
diff view generated by jsdifflib
1
Allow the number of queues to be configured using --export
1
From: Sam Li <faithilikerun@gmail.com>
2
vhost-user-blk,num-queues=N. This setting should match the QEMU --device
3
vhost-user-blk-pci,num-queues=N setting but QEMU vhost-user-blk.c lowers
4
its own value if the vhost-user-blk backend offers fewer queues than
5
QEMU.
6
2
7
The vhost-user-blk-server.c code is already capable of multi-queue. All
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
8
virtqueue processing runs in the same AioContext. No new locking is
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
9
needed.
5
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
10
6
Reviewed-by: Hannes Reinecke <hare@suse.de>
11
Add the num-queues=N option and set the VIRTIO_BLK_F_MQ feature bit.
7
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
12
Note that the feature bit only announces the presence of the num_queues
8
Acked-by: Kevin Wolf <kwolf@redhat.com>
13
configuration space field. It does not promise that there is more than 1
14
virtqueue, so we can set it unconditionally.
15
16
I tested multi-queue by running a random read fio test with numjobs=4 on
17
an -smp 4 guest. After the benchmark finished the guest /proc/interrupts
18
file showed activity on all 4 virtio-blk MSI-X. The /sys/block/vda/mq/
19
directory shows that Linux blk-mq has 4 queues configured.
20
21
An automated test is included in the next commit.
22
23
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
24
Acked-by: Markus Armbruster <armbru@redhat.com>
10
Message-id: 20230508045533.175575-2-faithilikerun@gmail.com
25
Message-id: 20201001144604.559733-2-stefanha@redhat.com
11
Message-id: 20230324090605.28361-2-faithilikerun@gmail.com
26
[Fixed accidental tab characters as suggested by Markus Armbruster
12
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
13
<philmd@linaro.org>.
27
--Stefan]
14
--Stefan]
28
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
29
---
16
---
30
qapi/block-export.json | 10 +++++++---
17
include/block/block-common.h | 43 ++++++++++++++++++++++++++++++++++++
31
block/export/vhost-user-blk-server.c | 24 ++++++++++++++++++------
18
1 file changed, 43 insertions(+)
32
2 files changed, 25 insertions(+), 9 deletions(-)
33
19
34
diff --git a/qapi/block-export.json b/qapi/block-export.json
20
diff --git a/include/block/block-common.h b/include/block/block-common.h
35
index XXXXXXX..XXXXXXX 100644
21
index XXXXXXX..XXXXXXX 100644
36
--- a/qapi/block-export.json
22
--- a/include/block/block-common.h
37
+++ b/qapi/block-export.json
23
+++ b/include/block/block-common.h
38
@@ -XXX,XX +XXX,XX @@
24
@@ -XXX,XX +XXX,XX @@ typedef struct BlockDriver BlockDriver;
39
# SocketAddress types are supported. Passed fds must be UNIX domain
25
typedef struct BdrvChild BdrvChild;
40
# sockets.
26
typedef struct BdrvChildClass BdrvChildClass;
41
# @logical-block-size: Logical block size in bytes. Defaults to 512 bytes.
27
42
+# @num-queues: Number of request virtqueues. Must be greater than 0. Defaults
28
+typedef enum BlockZoneOp {
43
+# to 1.
29
+ BLK_ZO_OPEN,
44
#
30
+ BLK_ZO_CLOSE,
45
# Since: 5.2
31
+ BLK_ZO_FINISH,
46
##
32
+ BLK_ZO_RESET,
47
{ 'struct': 'BlockExportOptionsVhostUserBlk',
33
+} BlockZoneOp;
48
- 'data': { 'addr': 'SocketAddress', '*logical-block-size': 'size' } }
49
+ 'data': { 'addr': 'SocketAddress',
50
+     '*logical-block-size': 'size',
51
+ '*num-queues': 'uint16'} }
52
53
##
54
# @NbdServerAddOptions:
55
@@ -XXX,XX +XXX,XX @@
56
{ 'union': 'BlockExportOptions',
57
'base': { 'type': 'BlockExportType',
58
'id': 'str',
59
-     '*fixed-iothread': 'bool',
60
-     '*iothread': 'str',
61
+ '*fixed-iothread': 'bool',
62
+ '*iothread': 'str',
63
'node-name': 'str',
64
'*writable': 'bool',
65
'*writethrough': 'bool' },
66
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
67
index XXXXXXX..XXXXXXX 100644
68
--- a/block/export/vhost-user-blk-server.c
69
+++ b/block/export/vhost-user-blk-server.c
70
@@ -XXX,XX +XXX,XX @@
71
#include "util/block-helpers.h"
72
73
enum {
74
- VHOST_USER_BLK_MAX_QUEUES = 1,
75
+ VHOST_USER_BLK_NUM_QUEUES_DEFAULT = 1,
76
};
77
struct virtio_blk_inhdr {
78
unsigned char status;
79
@@ -XXX,XX +XXX,XX @@ static uint64_t vu_blk_get_features(VuDev *dev)
80
1ull << VIRTIO_BLK_F_DISCARD |
81
1ull << VIRTIO_BLK_F_WRITE_ZEROES |
82
1ull << VIRTIO_BLK_F_CONFIG_WCE |
83
+ 1ull << VIRTIO_BLK_F_MQ |
84
1ull << VIRTIO_F_VERSION_1 |
85
1ull << VIRTIO_RING_F_INDIRECT_DESC |
86
1ull << VIRTIO_RING_F_EVENT_IDX |
87
@@ -XXX,XX +XXX,XX @@ static void blk_aio_detach(void *opaque)
88
89
static void
90
vu_blk_initialize_config(BlockDriverState *bs,
91
- struct virtio_blk_config *config, uint32_t blk_size)
92
+ struct virtio_blk_config *config,
93
+ uint32_t blk_size,
94
+ uint16_t num_queues)
95
{
96
config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
97
config->blk_size = blk_size;
98
@@ -XXX,XX +XXX,XX @@ vu_blk_initialize_config(BlockDriverState *bs,
99
config->seg_max = 128 - 2;
100
config->min_io_size = 1;
101
config->opt_io_size = 1;
102
- config->num_queues = VHOST_USER_BLK_MAX_QUEUES;
103
+ config->num_queues = num_queues;
104
config->max_discard_sectors = 32768;
105
config->max_discard_seg = 1;
106
config->discard_sector_alignment = config->blk_size >> 9;
107
@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
108
BlockExportOptionsVhostUserBlk *vu_opts = &opts->u.vhost_user_blk;
109
Error *local_err = NULL;
110
uint64_t logical_block_size;
111
+ uint16_t num_queues = VHOST_USER_BLK_NUM_QUEUES_DEFAULT;
112
113
vexp->writable = opts->writable;
114
vexp->blkcfg.wce = 0;
115
@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
116
}
117
vexp->blk_size = logical_block_size;
118
blk_set_guest_block_size(exp->blk, logical_block_size);
119
+
34
+
120
+ if (vu_opts->has_num_queues) {
35
+typedef enum BlockZoneModel {
121
+ num_queues = vu_opts->num_queues;
36
+ BLK_Z_NONE = 0x0, /* Regular block device */
122
+ }
37
+ BLK_Z_HM = 0x1, /* Host-managed zoned block device */
123
+ if (num_queues == 0) {
38
+ BLK_Z_HA = 0x2, /* Host-aware zoned block device */
124
+ error_setg(errp, "num-queues must be greater than 0");
39
+} BlockZoneModel;
125
+ return -EINVAL;
126
+ }
127
+
40
+
128
vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
41
+typedef enum BlockZoneState {
129
- logical_block_size);
42
+ BLK_ZS_NOT_WP = 0x0,
130
+ logical_block_size, num_queues);
43
+ BLK_ZS_EMPTY = 0x1,
131
44
+ BLK_ZS_IOPEN = 0x2,
132
blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
45
+ BLK_ZS_EOPEN = 0x3,
133
vexp);
46
+ BLK_ZS_CLOSED = 0x4,
134
47
+ BLK_ZS_RDONLY = 0xD,
135
if (!vhost_user_server_start(&vexp->vu_server, vu_opts->addr, exp->ctx,
48
+ BLK_ZS_FULL = 0xE,
136
- VHOST_USER_BLK_MAX_QUEUES, &vu_blk_iface,
49
+ BLK_ZS_OFFLINE = 0xF,
137
- errp)) {
50
+} BlockZoneState;
138
+ num_queues, &vu_blk_iface, errp)) {
51
+
139
blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
52
+typedef enum BlockZoneType {
140
blk_aio_detach, vexp);
53
+ BLK_ZT_CONV = 0x1, /* Conventional random writes supported */
141
return -EADDRNOTAVAIL;
54
+ BLK_ZT_SWR = 0x2, /* Sequential writes required */
55
+ BLK_ZT_SWP = 0x3, /* Sequential writes preferred */
56
+} BlockZoneType;
57
+
58
+/*
59
+ * Zone descriptor data structure.
60
+ * Provides information on a zone with all position and size values in bytes.
61
+ */
62
+typedef struct BlockZoneDescriptor {
63
+ uint64_t start;
64
+ uint64_t length;
65
+ uint64_t cap;
66
+ uint64_t wp;
67
+ BlockZoneType type;
68
+ BlockZoneState state;
69
+} BlockZoneDescriptor;
70
+
71
typedef struct BlockDriverInfo {
72
/* in bytes, 0 if irrelevant */
73
int cluster_size;
142
--
74
--
143
2.26.2
75
2.40.1
144
76
77
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Sam Li <faithilikerun@gmail.com>
2
2
3
bdrv_co_block_status_above has several design problems with handling
3
Use get_sysfs_str_val() to get the string value of device
4
short backing files:
4
zoned model. Then get_sysfs_zoned_model() can convert it to
5
5
BlockZoneModel type of QEMU.
6
1. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but
6
7
without BDRV_BLOCK_ALLOCATED flag, when actually short backing file
7
Use get_sysfs_long_val() to get the long value of zoned device
8
which produces these after-EOF zeros is inside requested backing
8
information.
9
sequence.
9
10
10
Signed-off-by: Sam Li <faithilikerun@gmail.com>
11
2. With want_zero=false, it may return pnum=0 prior to actual EOF,
11
Reviewed-by: Hannes Reinecke <hare@suse.de>
12
because of EOF of short backing file.
12
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
13
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
14
Fix these things, making logic about short backing files clearer.
14
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
15
15
Acked-by: Kevin Wolf <kwolf@redhat.com>
16
With fixed bdrv_block_status_above we also have to improve is_zero in
16
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
17
qcow2 code, otherwise iotest 154 will fail, because with this patch we
17
Message-id: 20230508045533.175575-3-faithilikerun@gmail.com
18
stop to merge zeros of different types (produced by fully unallocated
18
Message-id: 20230324090605.28361-3-faithilikerun@gmail.com
19
in the whole backing chain regions vs produced by short backing files).
19
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
20
20
<philmd@linaro.org>.
21
Note also, that this patch leaves for another day the general problem
22
around block-status: misuse of BDRV_BLOCK_ALLOCATED as is-fs-allocated
23
vs go-to-backing.
24
25
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
26
Reviewed-by: Alberto Garcia <berto@igalia.com>
27
Reviewed-by: Eric Blake <eblake@redhat.com>
28
Message-id: 20200924194003.22080-2-vsementsov@virtuozzo.com
29
[Fix s/comes/come/ as suggested by Eric Blake
30
--Stefan]
21
--Stefan]
31
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
22
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
32
---
23
---
33
block/io.c | 68 ++++++++++++++++++++++++++++++++++++++++-----------
24
include/block/block_int-common.h | 3 +
34
block/qcow2.c | 16 ++++++++++--
25
block/file-posix.c | 135 ++++++++++++++++++++++---------
35
2 files changed, 68 insertions(+), 16 deletions(-)
26
2 files changed, 100 insertions(+), 38 deletions(-)
36
27
37
diff --git a/block/io.c b/block/io.c
28
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
38
index XXXXXXX..XXXXXXX 100644
29
index XXXXXXX..XXXXXXX 100644
39
--- a/block/io.c
30
--- a/include/block/block_int-common.h
40
+++ b/block/io.c
31
+++ b/include/block/block_int-common.h
41
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
32
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
42
int64_t *map,
33
* an explicit monitor command to load the disk inside the guest).
43
BlockDriverState **file)
34
*/
35
bool has_variable_length;
36
+
37
+ /* device zone model */
38
+ BlockZoneModel zoned;
39
} BlockLimits;
40
41
typedef struct BdrvOpBlocker BdrvOpBlocker;
42
diff --git a/block/file-posix.c b/block/file-posix.c
43
index XXXXXXX..XXXXXXX 100644
44
--- a/block/file-posix.c
45
+++ b/block/file-posix.c
46
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_hw_transfer(int fd, struct stat *st)
47
#endif
48
}
49
50
-static int hdev_get_max_segments(int fd, struct stat *st)
51
+/*
52
+ * Get a sysfs attribute value as character string.
53
+ */
54
+#ifdef CONFIG_LINUX
55
+static int get_sysfs_str_val(struct stat *st, const char *attribute,
56
+ char **val) {
57
+ g_autofree char *sysfspath = NULL;
58
+ int ret;
59
+ size_t len;
60
+
61
+ if (!S_ISBLK(st->st_mode)) {
62
+ return -ENOTSUP;
63
+ }
64
+
65
+ sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s",
66
+ major(st->st_rdev), minor(st->st_rdev),
67
+ attribute);
68
+ ret = g_file_get_contents(sysfspath, val, &len, NULL);
69
+ if (ret == -1) {
70
+ return -ENOENT;
71
+ }
72
+
73
+ /* The file is ended with '\n' */
74
+ char *p;
75
+ p = *val;
76
+ if (*(p + len - 1) == '\n') {
77
+ *(p + len - 1) = '\0';
78
+ }
79
+ return ret;
80
+}
81
+#endif
82
+
83
+static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
44
{
84
{
45
+ int ret;
85
+ g_autofree char *val = NULL;
46
BlockDriverState *p;
86
+ int ret;
47
- int ret = 0;
87
+
48
- bool first = true;
88
+ ret = get_sysfs_str_val(st, "zoned", &val);
49
+ int64_t eof = 0;
89
+ if (ret < 0) {
50
51
assert(bs != base);
52
- for (p = bs; p != base; p = bdrv_filter_or_cow_bs(p)) {
53
+
54
+ ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
55
+ if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED) {
56
+ return ret;
90
+ return ret;
57
+ }
91
+ }
58
+
92
+
59
+ if (ret & BDRV_BLOCK_EOF) {
93
+ if (strcmp(val, "host-managed") == 0) {
60
+ eof = offset + *pnum;
94
+ *zoned = BLK_Z_HM;
61
+ }
95
+ } else if (strcmp(val, "host-aware") == 0) {
62
+
96
+ *zoned = BLK_Z_HA;
63
+ assert(*pnum <= bytes);
97
+ } else if (strcmp(val, "none") == 0) {
64
+ bytes = *pnum;
98
+ *zoned = BLK_Z_NONE;
65
+
99
+ } else {
66
+ for (p = bdrv_filter_or_cow_bs(bs); p != base;
100
+ return -ENOTSUP;
67
+ p = bdrv_filter_or_cow_bs(p))
101
+ }
68
+ {
102
+ return 0;
69
ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
103
+}
70
file);
104
+
71
if (ret < 0) {
105
+/*
72
- break;
106
+ * Get a sysfs attribute value as a long integer.
73
+ return ret;
107
+ */
108
#ifdef CONFIG_LINUX
109
- char buf[32];
110
+static long get_sysfs_long_val(struct stat *st, const char *attribute)
111
+{
112
+ g_autofree char *str = NULL;
113
const char *end;
114
- char *sysfspath = NULL;
115
+ long val;
116
+ int ret;
117
+
118
+ ret = get_sysfs_str_val(st, attribute, &str);
119
+ if (ret < 0) {
120
+ return ret;
121
+ }
122
+
123
+ /* The file is ended with '\n', pass 'end' to accept that. */
124
+ ret = qemu_strtol(str, &end, 10, &val);
125
+ if (ret == 0 && end && *end == '\0') {
126
+ ret = val;
127
+ }
128
+ return ret;
129
+}
130
+#endif
131
+
132
+static int hdev_get_max_segments(int fd, struct stat *st)
133
+{
134
+#ifdef CONFIG_LINUX
135
int ret;
136
- int sysfd = -1;
137
- long max_segments;
138
139
if (S_ISCHR(st->st_mode)) {
140
if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
141
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_segments(int fd, struct stat *st)
74
}
142
}
75
- if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
143
return -ENOTSUP;
76
+ if (*pnum == 0) {
144
}
77
/*
145
-
78
- * Reading beyond the end of the file continues to read
146
- if (!S_ISBLK(st->st_mode)) {
79
- * zeroes, but we can only widen the result to the
147
- return -ENOTSUP;
80
- * unallocated length we learned from an earlier
148
- }
81
- * iteration.
149
-
82
+ * The top layer deferred to this layer, and because this layer is
150
- sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
83
+ * short, any zeroes that we synthesize beyond EOF behave as if they
151
- major(st->st_rdev), minor(st->st_rdev));
84
+ * were allocated at this layer.
152
- sysfd = open(sysfspath, O_RDONLY);
85
+ *
153
- if (sysfd == -1) {
86
+ * We don't include BDRV_BLOCK_EOF into ret, as upper layer may be
154
- ret = -errno;
87
+ * larger. We'll add BDRV_BLOCK_EOF if needed at function end, see
155
- goto out;
88
+ * below.
156
- }
89
*/
157
- ret = RETRY_ON_EINTR(read(sysfd, buf, sizeof(buf) - 1));
90
+ assert(ret & BDRV_BLOCK_EOF);
158
- if (ret < 0) {
91
*pnum = bytes;
159
- ret = -errno;
92
+ if (file) {
160
- goto out;
93
+ *file = p;
161
- } else if (ret == 0) {
94
+ }
162
- ret = -EIO;
95
+ ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_ALLOCATED;
163
- goto out;
96
+ break;
164
- }
165
- buf[ret] = 0;
166
- /* The file is ended with '\n', pass 'end' to accept that. */
167
- ret = qemu_strtol(buf, &end, 10, &max_segments);
168
- if (ret == 0 && end && *end == '\n') {
169
- ret = max_segments;
170
- }
171
-
172
-out:
173
- if (sysfd != -1) {
174
- close(sysfd);
175
- }
176
- g_free(sysfspath);
177
- return ret;
178
+ return get_sysfs_long_val(st, "max_segments");
179
#else
180
return -ENOTSUP;
181
#endif
182
}
183
184
+static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
185
+ Error **errp)
186
+{
187
+ BlockZoneModel zoned;
188
+ int ret;
189
+
190
+ bs->bl.zoned = BLK_Z_NONE;
191
+
192
+ ret = get_sysfs_zoned_model(st, &zoned);
193
+ if (ret < 0 || zoned == BLK_Z_NONE) {
194
+ return;
195
+ }
196
+ bs->bl.zoned = zoned;
197
+}
198
+
199
static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
200
{
201
BDRVRawState *s = bs->opaque;
202
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
203
bs->bl.max_hw_iov = ret;
97
}
204
}
98
- if (ret & (BDRV_BLOCK_ZERO | BDRV_BLOCK_DATA)) {
99
+ if (ret & BDRV_BLOCK_ALLOCATED) {
100
+ /*
101
+ * We've found the node and the status, we must break.
102
+ *
103
+ * Drop BDRV_BLOCK_EOF, as it's not for upper layer, which may be
104
+ * larger. We'll add BDRV_BLOCK_EOF if needed at function end, see
105
+ * below.
106
+ */
107
+ ret &= ~BDRV_BLOCK_EOF;
108
break;
109
}
110
- /* [offset, pnum] unallocated on this layer, which could be only
111
- * the first part of [offset, bytes]. */
112
- bytes = MIN(bytes, *pnum);
113
- first = false;
114
+
115
+ /*
116
+ * OK, [offset, offset + *pnum) region is unallocated on this layer,
117
+ * let's continue the diving.
118
+ */
119
+ assert(*pnum <= bytes);
120
+ bytes = *pnum;
121
+ }
122
+
123
+ if (offset + *pnum == eof) {
124
+ ret |= BDRV_BLOCK_EOF;
125
}
205
}
126
+
206
+
127
return ret;
207
+ raw_refresh_zoned_limits(bs, &st, errp);
128
}
208
}
129
209
130
diff --git a/block/qcow2.c b/block/qcow2.c
210
static int check_for_dasd(int fd)
131
index XXXXXXX..XXXXXXX 100644
132
--- a/block/qcow2.c
133
+++ b/block/qcow2.c
134
@@ -XXX,XX +XXX,XX @@ static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
135
if (!bytes) {
136
return true;
137
}
138
- res = bdrv_block_status_above(bs, NULL, offset, bytes, &nr, NULL, NULL);
139
- return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
140
+
141
+ /*
142
+ * bdrv_block_status_above doesn't merge different types of zeros, for
143
+ * example, zeros which come from the region which is unallocated in
144
+ * the whole backing chain, and zeros which come because of a short
145
+ * backing file. So, we need a loop.
146
+ */
147
+ do {
148
+ res = bdrv_block_status_above(bs, NULL, offset, bytes, &nr, NULL, NULL);
149
+ offset += nr;
150
+ bytes -= nr;
151
+ } while (res >= 0 && (res & BDRV_BLOCK_ZERO) && nr && bytes);
152
+
153
+ return res >= 0 && (res & BDRV_BLOCK_ZERO) && bytes == 0;
154
}
155
156
static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
157
--
211
--
158
2.26.2
212
2.40.1
159
213
214
diff view generated by jsdifflib
1
Introduce libblkdev.fa to avoid recompiling blockdev_ss twice.
1
From: Sam Li <faithilikerun@gmail.com>
2
2
3
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
3
Add zoned device option to host_device BlockDriver. It will be presented only
4
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
4
for zoned host block devices. By adding zone management operations to the
5
host_block_device BlockDriver, users can use the new block layer APIs
6
including Report Zone and four zone management operations
7
(open, close, finish, reset, reset_all).
8
9
Qemu-io uses the new APIs to perform zoned storage commands of the device:
10
zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs),
11
zone_finish(zf).
12
13
For example, to test zone_report, use following command:
14
$ ./build/qemu-io --image-opts -n driver=host_device, filename=/dev/nullb0
15
-c "zrp offset nr_zones"
16
17
Signed-off-by: Sam Li <faithilikerun@gmail.com>
18
Reviewed-by: Hannes Reinecke <hare@suse.de>
19
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
20
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
21
Acked-by: Kevin Wolf <kwolf@redhat.com>
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
22
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Message-id: 20200929125516.186715-3-stefanha@redhat.com
23
Message-id: 20230508045533.175575-4-faithilikerun@gmail.com
24
Message-id: 20230324090605.28361-4-faithilikerun@gmail.com
25
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
26
<philmd@linaro.org> and remove spurious ret = -errno in
27
raw_co_zone_mgmt().
28
--Stefan]
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
29
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
---
30
---
9
meson.build | 12 ++++++++++--
31
meson.build | 5 +
10
storage-daemon/meson.build | 3 +--
32
include/block/block-io.h | 9 +
11
2 files changed, 11 insertions(+), 4 deletions(-)
33
include/block/block_int-common.h | 21 ++
34
include/block/raw-aio.h | 6 +-
35
include/sysemu/block-backend-io.h | 18 ++
36
block/block-backend.c | 137 +++++++++++++
37
block/file-posix.c | 313 +++++++++++++++++++++++++++++-
38
block/io.c | 41 ++++
39
qemu-io-cmds.c | 149 ++++++++++++++
40
9 files changed, 696 insertions(+), 3 deletions(-)
12
41
13
diff --git a/meson.build b/meson.build
42
diff --git a/meson.build b/meson.build
14
index XXXXXXX..XXXXXXX 100644
43
index XXXXXXX..XXXXXXX 100644
15
--- a/meson.build
44
--- a/meson.build
16
+++ b/meson.build
45
+++ b/meson.build
17
@@ -XXX,XX +XXX,XX @@ blockdev_ss.add(files(
46
@@ -XXX,XX +XXX,XX @@ if rdma.found()
18
# os-win32.c does not
47
endif
19
blockdev_ss.add(when: 'CONFIG_POSIX', if_true: files('os-posix.c'))
48
20
softmmu_ss.add(when: 'CONFIG_WIN32', if_true: [files('os-win32.c')])
49
# has_header_symbol
21
-softmmu_ss.add_all(blockdev_ss)
50
+config_host_data.set('CONFIG_BLKZONED',
22
51
+ cc.has_header_symbol('linux/blkzoned.h', 'BLKOPENZONE'))
23
common_ss.add(files('cpus-common.c'))
52
config_host_data.set('CONFIG_EPOLL_CREATE1',
24
53
cc.has_header_symbol('sys/epoll.h', 'epoll_create1'))
25
@@ -XXX,XX +XXX,XX @@ block = declare_dependency(link_whole: [libblock],
54
config_host_data.set('CONFIG_FALLOCATE_PUNCH_HOLE',
26
link_args: '@block.syms',
55
@@ -XXX,XX +XXX,XX @@ config_host_data.set('HAVE_SIGEV_NOTIFY_THREAD_ID',
27
dependencies: [crypto, io])
56
config_host_data.set('HAVE_STRUCT_STAT_ST_ATIM',
28
57
cc.has_member('struct stat', 'st_atim',
29
+blockdev_ss = blockdev_ss.apply(config_host, strict: false)
58
prefix: '#include <sys/stat.h>'))
30
+libblockdev = static_library('blockdev', blockdev_ss.sources() + genh,
59
+config_host_data.set('HAVE_BLK_ZONE_REP_CAPACITY',
31
+ dependencies: blockdev_ss.dependencies(),
60
+ cc.has_member('struct blk_zone', 'capacity',
32
+ name_suffix: 'fa',
61
+ prefix: '#include <linux/blkzoned.h>'))
33
+ build_by_default: false)
62
34
+
63
# has_type
35
+blockdev = declare_dependency(link_whole: [libblockdev],
64
config_host_data.set('CONFIG_IOVEC',
36
+ dependencies: [block])
65
diff --git a/include/block/block-io.h b/include/block/block-io.h
37
+
38
qmp_ss = qmp_ss.apply(config_host, strict: false)
39
libqmp = static_library('qmp', qmp_ss.sources() + genh,
40
dependencies: qmp_ss.dependencies(),
41
@@ -XXX,XX +XXX,XX @@ foreach m : block_mods + softmmu_mods
42
install_dir: config_host['qemu_moddir'])
43
endforeach
44
45
-softmmu_ss.add(authz, block, chardev, crypto, io, qmp)
46
+softmmu_ss.add(authz, blockdev, chardev, crypto, io, qmp)
47
common_ss.add(qom, qemuutil)
48
49
common_ss.add_all(when: 'CONFIG_SOFTMMU', if_true: [softmmu_ss])
50
diff --git a/storage-daemon/meson.build b/storage-daemon/meson.build
51
index XXXXXXX..XXXXXXX 100644
66
index XXXXXXX..XXXXXXX 100644
52
--- a/storage-daemon/meson.build
67
--- a/include/block/block-io.h
53
+++ b/storage-daemon/meson.build
68
+++ b/include/block/block-io.h
69
@@ -XXX,XX +XXX,XX @@ int coroutine_fn GRAPH_RDLOCK bdrv_co_flush(BlockDriverState *bs);
70
int coroutine_fn GRAPH_RDLOCK bdrv_co_pdiscard(BdrvChild *child, int64_t offset,
71
int64_t bytes);
72
73
+/* Report zone information of zone block device. */
74
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_report(BlockDriverState *bs,
75
+ int64_t offset,
76
+ unsigned int *nr_zones,
77
+ BlockZoneDescriptor *zones);
78
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs,
79
+ BlockZoneOp op,
80
+ int64_t offset, int64_t len);
81
+
82
bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
83
int bdrv_block_status(BlockDriverState *bs, int64_t offset,
84
int64_t bytes, int64_t *pnum, int64_t *map,
85
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
86
index XXXXXXX..XXXXXXX 100644
87
--- a/include/block/block_int-common.h
88
+++ b/include/block/block_int-common.h
89
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
90
int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_load_vmstate)(
91
BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
92
93
+ int coroutine_fn (*bdrv_co_zone_report)(BlockDriverState *bs,
94
+ int64_t offset, unsigned int *nr_zones,
95
+ BlockZoneDescriptor *zones);
96
+ int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
97
+ int64_t offset, int64_t len);
98
+
99
/* removable device specific */
100
bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)(
101
BlockDriverState *bs);
102
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
103
104
/* device zone model */
105
BlockZoneModel zoned;
106
+
107
+ /* zone size expressed in bytes */
108
+ uint32_t zone_size;
109
+
110
+ /* total number of zones */
111
+ uint32_t nr_zones;
112
+
113
+ /* maximum sectors of a zone append write operation */
114
+ uint32_t max_append_sectors;
115
+
116
+ /* maximum number of open zones */
117
+ uint32_t max_open_zones;
118
+
119
+ /* maximum number of active zones */
120
+ uint32_t max_active_zones;
121
} BlockLimits;
122
123
typedef struct BdrvOpBlocker BdrvOpBlocker;
124
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
125
index XXXXXXX..XXXXXXX 100644
126
--- a/include/block/raw-aio.h
127
+++ b/include/block/raw-aio.h
54
@@ -XXX,XX +XXX,XX @@
128
@@ -XXX,XX +XXX,XX @@
55
qsd_ss = ss.source_set()
129
#define QEMU_AIO_WRITE_ZEROES 0x0020
56
qsd_ss.add(files('qemu-storage-daemon.c'))
130
#define QEMU_AIO_COPY_RANGE 0x0040
57
-qsd_ss.add(block, chardev, qmp, qom, qemuutil)
131
#define QEMU_AIO_TRUNCATE 0x0080
58
-qsd_ss.add_all(blockdev_ss)
132
+#define QEMU_AIO_ZONE_REPORT 0x0100
59
+qsd_ss.add(blockdev, chardev, qmp, qom, qemuutil)
133
+#define QEMU_AIO_ZONE_MGMT 0x0200
60
134
#define QEMU_AIO_TYPE_MASK \
61
subdir('qapi')
135
(QEMU_AIO_READ | \
62
136
QEMU_AIO_WRITE | \
137
@@ -XXX,XX +XXX,XX @@
138
QEMU_AIO_DISCARD | \
139
QEMU_AIO_WRITE_ZEROES | \
140
QEMU_AIO_COPY_RANGE | \
141
- QEMU_AIO_TRUNCATE)
142
+ QEMU_AIO_TRUNCATE | \
143
+ QEMU_AIO_ZONE_REPORT | \
144
+ QEMU_AIO_ZONE_MGMT)
145
146
/* AIO flags */
147
#define QEMU_AIO_MISALIGNED 0x1000
148
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
149
index XXXXXXX..XXXXXXX 100644
150
--- a/include/sysemu/block-backend-io.h
151
+++ b/include/sysemu/block-backend-io.h
152
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t offset,
153
BlockCompletionFunc *cb, void *opaque);
154
BlockAIOCB *blk_aio_flush(BlockBackend *blk,
155
BlockCompletionFunc *cb, void *opaque);
156
+BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
157
+ unsigned int *nr_zones,
158
+ BlockZoneDescriptor *zones,
159
+ BlockCompletionFunc *cb, void *opaque);
160
+BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
161
+ int64_t offset, int64_t len,
162
+ BlockCompletionFunc *cb, void *opaque);
163
BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes,
164
BlockCompletionFunc *cb, void *opaque);
165
void blk_aio_cancel_async(BlockAIOCB *acb);
166
@@ -XXX,XX +XXX,XX @@ int co_wrapper_mixed blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
167
int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
168
int64_t bytes, BdrvRequestFlags flags);
169
170
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
171
+ unsigned int *nr_zones,
172
+ BlockZoneDescriptor *zones);
173
+int co_wrapper_mixed blk_zone_report(BlockBackend *blk, int64_t offset,
174
+ unsigned int *nr_zones,
175
+ BlockZoneDescriptor *zones);
176
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
177
+ int64_t offset, int64_t len);
178
+int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
179
+ int64_t offset, int64_t len);
180
+
181
int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset,
182
int64_t bytes);
183
int coroutine_fn blk_co_pdiscard(BlockBackend *blk, int64_t offset,
184
diff --git a/block/block-backend.c b/block/block-backend.c
185
index XXXXXXX..XXXXXXX 100644
186
--- a/block/block-backend.c
187
+++ b/block/block-backend.c
188
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
189
return ret;
190
}
191
192
+static void coroutine_fn blk_aio_zone_report_entry(void *opaque)
193
+{
194
+ BlkAioEmAIOCB *acb = opaque;
195
+ BlkRwCo *rwco = &acb->rwco;
196
+
197
+ rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset,
198
+ (unsigned int*)(uintptr_t)acb->bytes,
199
+ rwco->iobuf);
200
+ blk_aio_complete(acb);
201
+}
202
+
203
+BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
204
+ unsigned int *nr_zones,
205
+ BlockZoneDescriptor *zones,
206
+ BlockCompletionFunc *cb, void *opaque)
207
+{
208
+ BlkAioEmAIOCB *acb;
209
+ Coroutine *co;
210
+ IO_CODE();
211
+
212
+ blk_inc_in_flight(blk);
213
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
214
+ acb->rwco = (BlkRwCo) {
215
+ .blk = blk,
216
+ .offset = offset,
217
+ .iobuf = zones,
218
+ .ret = NOT_DONE,
219
+ };
220
+ acb->bytes = (int64_t)(uintptr_t)nr_zones,
221
+ acb->has_returned = false;
222
+
223
+ co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
224
+ aio_co_enter(blk_get_aio_context(blk), co);
225
+
226
+ acb->has_returned = true;
227
+ if (acb->rwco.ret != NOT_DONE) {
228
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
229
+ blk_aio_complete_bh, acb);
230
+ }
231
+
232
+ return &acb->common;
233
+}
234
+
235
+static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque)
236
+{
237
+ BlkAioEmAIOCB *acb = opaque;
238
+ BlkRwCo *rwco = &acb->rwco;
239
+
240
+ rwco->ret = blk_co_zone_mgmt(rwco->blk,
241
+ (BlockZoneOp)(uintptr_t)rwco->iobuf,
242
+ rwco->offset, acb->bytes);
243
+ blk_aio_complete(acb);
244
+}
245
+
246
+BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
247
+ int64_t offset, int64_t len,
248
+ BlockCompletionFunc *cb, void *opaque) {
249
+ BlkAioEmAIOCB *acb;
250
+ Coroutine *co;
251
+ IO_CODE();
252
+
253
+ blk_inc_in_flight(blk);
254
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
255
+ acb->rwco = (BlkRwCo) {
256
+ .blk = blk,
257
+ .offset = offset,
258
+ .iobuf = (void *)(uintptr_t)op,
259
+ .ret = NOT_DONE,
260
+ };
261
+ acb->bytes = len;
262
+ acb->has_returned = false;
263
+
264
+ co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
265
+ aio_co_enter(blk_get_aio_context(blk), co);
266
+
267
+ acb->has_returned = true;
268
+ if (acb->rwco.ret != NOT_DONE) {
269
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
270
+ blk_aio_complete_bh, acb);
271
+ }
272
+
273
+ return &acb->common;
274
+}
275
+
276
+/*
277
+ * Send a zone_report command.
278
+ * offset is a byte offset from the start of the device. No alignment
279
+ * required for offset.
280
+ * nr_zones represents IN maximum and OUT actual.
281
+ */
282
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
283
+ unsigned int *nr_zones,
284
+ BlockZoneDescriptor *zones)
285
+{
286
+ int ret;
287
+ IO_CODE();
288
+
289
+ blk_inc_in_flight(blk); /* increase before waiting */
290
+ blk_wait_while_drained(blk);
291
+ GRAPH_RDLOCK_GUARD();
292
+ if (!blk_is_available(blk)) {
293
+ blk_dec_in_flight(blk);
294
+ return -ENOMEDIUM;
295
+ }
296
+ ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones);
297
+ blk_dec_in_flight(blk);
298
+ return ret;
299
+}
300
+
301
+/*
302
+ * Send a zone_management command.
303
+ * op is the zone operation;
304
+ * offset is the byte offset from the start of the zoned device;
305
+ * len is the maximum number of bytes the command should operate on. It
306
+ * should be aligned with the device zone size.
307
+ */
308
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
309
+ int64_t offset, int64_t len)
310
+{
311
+ int ret;
312
+ IO_CODE();
313
+
314
+ blk_inc_in_flight(blk);
315
+ blk_wait_while_drained(blk);
316
+ GRAPH_RDLOCK_GUARD();
317
+
318
+ ret = blk_check_byte_request(blk, offset, len);
319
+ if (ret < 0) {
320
+ blk_dec_in_flight(blk);
321
+ return ret;
322
+ }
323
+
324
+ ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len);
325
+ blk_dec_in_flight(blk);
326
+ return ret;
327
+}
328
+
329
void blk_drain(BlockBackend *blk)
330
{
331
BlockDriverState *bs = blk_bs(blk);
332
diff --git a/block/file-posix.c b/block/file-posix.c
333
index XXXXXXX..XXXXXXX 100644
334
--- a/block/file-posix.c
335
+++ b/block/file-posix.c
336
@@ -XXX,XX +XXX,XX @@
337
#include <sys/param.h>
338
#include <sys/syscall.h>
339
#include <sys/vfs.h>
340
+#if defined(CONFIG_BLKZONED)
341
+#include <linux/blkzoned.h>
342
+#endif
343
#include <linux/cdrom.h>
344
#include <linux/fd.h>
345
#include <linux/fs.h>
346
@@ -XXX,XX +XXX,XX @@ typedef struct RawPosixAIOData {
347
PreallocMode prealloc;
348
Error **errp;
349
} truncate;
350
+ struct {
351
+ unsigned int *nr_zones;
352
+ BlockZoneDescriptor *zones;
353
+ } zone_report;
354
+ struct {
355
+ unsigned long op;
356
+ } zone_mgmt;
357
};
358
} RawPosixAIOData;
359
360
@@ -XXX,XX +XXX,XX @@ static int get_sysfs_str_val(struct stat *st, const char *attribute,
361
}
362
#endif
363
364
+#if defined(CONFIG_BLKZONED)
365
static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
366
{
367
g_autofree char *val = NULL;
368
@@ -XXX,XX +XXX,XX @@ static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
369
}
370
return 0;
371
}
372
+#endif /* defined(CONFIG_BLKZONED) */
373
374
/*
375
* Get a sysfs attribute value as a long integer.
376
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_segments(int fd, struct stat *st)
377
#endif
378
}
379
380
+#if defined(CONFIG_BLKZONED)
381
static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
382
Error **errp)
383
{
384
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
385
return;
386
}
387
bs->bl.zoned = zoned;
388
+
389
+ ret = get_sysfs_long_val(st, "max_open_zones");
390
+ if (ret >= 0) {
391
+ bs->bl.max_open_zones = ret;
392
+ }
393
+
394
+ ret = get_sysfs_long_val(st, "max_active_zones");
395
+ if (ret >= 0) {
396
+ bs->bl.max_active_zones = ret;
397
+ }
398
+
399
+ /*
400
+ * The zoned device must at least have zone size and nr_zones fields.
401
+ */
402
+ ret = get_sysfs_long_val(st, "chunk_sectors");
403
+ if (ret < 0) {
404
+ error_setg_errno(errp, -ret, "Unable to read chunk_sectors "
405
+ "sysfs attribute");
406
+ return;
407
+ } else if (!ret) {
408
+ error_setg(errp, "Read 0 from chunk_sectors sysfs attribute");
409
+ return;
410
+ }
411
+ bs->bl.zone_size = ret << BDRV_SECTOR_BITS;
412
+
413
+ ret = get_sysfs_long_val(st, "nr_zones");
414
+ if (ret < 0) {
415
+ error_setg_errno(errp, -ret, "Unable to read nr_zones "
416
+ "sysfs attribute");
417
+ return;
418
+ } else if (!ret) {
419
+ error_setg(errp, "Read 0 from nr_zones sysfs attribute");
420
+ return;
421
+ }
422
+ bs->bl.nr_zones = ret;
423
+
424
+ ret = get_sysfs_long_val(st, "zone_append_max_bytes");
425
+ if (ret > 0) {
426
+ bs->bl.max_append_sectors = ret >> BDRV_SECTOR_BITS;
427
+ }
428
}
429
+#else /* !defined(CONFIG_BLKZONED) */
430
+static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
431
+ Error **errp)
432
+{
433
+ bs->bl.zoned = BLK_Z_NONE;
434
+}
435
+#endif /* !defined(CONFIG_BLKZONED) */
436
437
static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
438
{
439
@@ -XXX,XX +XXX,XX @@ static int hdev_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
440
BDRVRawState *s = bs->opaque;
441
int ret;
442
443
- /* If DASD, get blocksizes */
444
+ /* If DASD or zoned devices, get blocksizes */
445
if (check_for_dasd(s->fd) < 0) {
446
- return -ENOTSUP;
447
+ /* zoned devices are not DASD */
448
+ if (bs->bl.zoned == BLK_Z_NONE) {
449
+ return -ENOTSUP;
450
+ }
451
}
452
ret = probe_logical_blocksize(s->fd, &bsz->log);
453
if (ret < 0) {
454
@@ -XXX,XX +XXX,XX @@ static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
455
}
456
#endif
457
458
+/*
459
+ * parse_zone - Fill a zone descriptor
460
+ */
461
+#if defined(CONFIG_BLKZONED)
462
+static inline int parse_zone(struct BlockZoneDescriptor *zone,
463
+ const struct blk_zone *blkz) {
464
+ zone->start = blkz->start << BDRV_SECTOR_BITS;
465
+ zone->length = blkz->len << BDRV_SECTOR_BITS;
466
+ zone->wp = blkz->wp << BDRV_SECTOR_BITS;
467
+
468
+#ifdef HAVE_BLK_ZONE_REP_CAPACITY
469
+ zone->cap = blkz->capacity << BDRV_SECTOR_BITS;
470
+#else
471
+ zone->cap = blkz->len << BDRV_SECTOR_BITS;
472
+#endif
473
+
474
+ switch (blkz->type) {
475
+ case BLK_ZONE_TYPE_SEQWRITE_REQ:
476
+ zone->type = BLK_ZT_SWR;
477
+ break;
478
+ case BLK_ZONE_TYPE_SEQWRITE_PREF:
479
+ zone->type = BLK_ZT_SWP;
480
+ break;
481
+ case BLK_ZONE_TYPE_CONVENTIONAL:
482
+ zone->type = BLK_ZT_CONV;
483
+ break;
484
+ default:
485
+ error_report("Unsupported zone type: 0x%x", blkz->type);
486
+ return -ENOTSUP;
487
+ }
488
+
489
+ switch (blkz->cond) {
490
+ case BLK_ZONE_COND_NOT_WP:
491
+ zone->state = BLK_ZS_NOT_WP;
492
+ break;
493
+ case BLK_ZONE_COND_EMPTY:
494
+ zone->state = BLK_ZS_EMPTY;
495
+ break;
496
+ case BLK_ZONE_COND_IMP_OPEN:
497
+ zone->state = BLK_ZS_IOPEN;
498
+ break;
499
+ case BLK_ZONE_COND_EXP_OPEN:
500
+ zone->state = BLK_ZS_EOPEN;
501
+ break;
502
+ case BLK_ZONE_COND_CLOSED:
503
+ zone->state = BLK_ZS_CLOSED;
504
+ break;
505
+ case BLK_ZONE_COND_READONLY:
506
+ zone->state = BLK_ZS_RDONLY;
507
+ break;
508
+ case BLK_ZONE_COND_FULL:
509
+ zone->state = BLK_ZS_FULL;
510
+ break;
511
+ case BLK_ZONE_COND_OFFLINE:
512
+ zone->state = BLK_ZS_OFFLINE;
513
+ break;
514
+ default:
515
+ error_report("Unsupported zone state: 0x%x", blkz->cond);
516
+ return -ENOTSUP;
517
+ }
518
+ return 0;
519
+}
520
+#endif
521
+
522
+#if defined(CONFIG_BLKZONED)
523
+static int handle_aiocb_zone_report(void *opaque)
524
+{
525
+ RawPosixAIOData *aiocb = opaque;
526
+ int fd = aiocb->aio_fildes;
527
+ unsigned int *nr_zones = aiocb->zone_report.nr_zones;
528
+ BlockZoneDescriptor *zones = aiocb->zone_report.zones;
529
+ /* zoned block devices use 512-byte sectors */
530
+ uint64_t sector = aiocb->aio_offset / 512;
531
+
532
+ struct blk_zone *blkz;
533
+ size_t rep_size;
534
+ unsigned int nrz;
535
+ int ret;
536
+ unsigned int n = 0, i = 0;
537
+
538
+ nrz = *nr_zones;
539
+ rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
540
+ g_autofree struct blk_zone_report *rep = NULL;
541
+ rep = g_malloc(rep_size);
542
+
543
+ blkz = (struct blk_zone *)(rep + 1);
544
+ while (n < nrz) {
545
+ memset(rep, 0, rep_size);
546
+ rep->sector = sector;
547
+ rep->nr_zones = nrz - n;
548
+
549
+ do {
550
+ ret = ioctl(fd, BLKREPORTZONE, rep);
551
+ } while (ret != 0 && errno == EINTR);
552
+ if (ret != 0) {
553
+ error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
554
+ fd, sector, errno);
555
+ return -errno;
556
+ }
557
+
558
+ if (!rep->nr_zones) {
559
+ break;
560
+ }
561
+
562
+ for (i = 0; i < rep->nr_zones; i++, n++) {
563
+ ret = parse_zone(&zones[n], &blkz[i]);
564
+ if (ret != 0) {
565
+ return ret;
566
+ }
567
+
568
+ /* The next report should start after the last zone reported */
569
+ sector = blkz[i].start + blkz[i].len;
570
+ }
571
+ }
572
+
573
+ *nr_zones = n;
574
+ return 0;
575
+}
576
+#endif
577
+
578
+#if defined(CONFIG_BLKZONED)
579
+static int handle_aiocb_zone_mgmt(void *opaque)
580
+{
581
+ RawPosixAIOData *aiocb = opaque;
582
+ int fd = aiocb->aio_fildes;
583
+ uint64_t sector = aiocb->aio_offset / 512;
584
+ int64_t nr_sectors = aiocb->aio_nbytes / 512;
585
+ struct blk_zone_range range;
586
+ int ret;
587
+
588
+ /* Execute the operation */
589
+ range.sector = sector;
590
+ range.nr_sectors = nr_sectors;
591
+ do {
592
+ ret = ioctl(fd, aiocb->zone_mgmt.op, &range);
593
+ } while (ret != 0 && errno == EINTR);
594
+
595
+ return ret;
596
+}
597
+#endif
598
+
599
static int handle_aiocb_copy_range(void *opaque)
600
{
601
RawPosixAIOData *aiocb = opaque;
602
@@ -XXX,XX +XXX,XX @@ static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
603
}
604
}
605
606
+/*
607
+ * zone report - Get a zone block device's information in the form
608
+ * of an array of zone descriptors.
609
+ * zones is an array of zone descriptors to hold zone information on reply;
610
+ * offset can be any byte within the entire size of the device;
611
+ * nr_zones is the maxium number of sectors the command should operate on.
612
+ */
613
+#if defined(CONFIG_BLKZONED)
614
+static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
615
+ unsigned int *nr_zones,
616
+ BlockZoneDescriptor *zones) {
617
+ BDRVRawState *s = bs->opaque;
618
+ RawPosixAIOData acb = (RawPosixAIOData) {
619
+ .bs = bs,
620
+ .aio_fildes = s->fd,
621
+ .aio_type = QEMU_AIO_ZONE_REPORT,
622
+ .aio_offset = offset,
623
+ .zone_report = {
624
+ .nr_zones = nr_zones,
625
+ .zones = zones,
626
+ },
627
+ };
628
+
629
+ return raw_thread_pool_submit(handle_aiocb_zone_report, &acb);
630
+}
631
+#endif
632
+
633
+/*
634
+ * zone management operations - Execute an operation on a zone
635
+ */
636
+#if defined(CONFIG_BLKZONED)
637
+static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
638
+ int64_t offset, int64_t len) {
639
+ BDRVRawState *s = bs->opaque;
640
+ RawPosixAIOData acb;
641
+ int64_t zone_size, zone_size_mask;
642
+ const char *op_name;
643
+ unsigned long zo;
644
+ int ret;
645
+ int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
646
+
647
+ zone_size = bs->bl.zone_size;
648
+ zone_size_mask = zone_size - 1;
649
+ if (offset & zone_size_mask) {
650
+ error_report("sector offset %" PRId64 " is not aligned to zone size "
651
+ "%" PRId64 "", offset / 512, zone_size / 512);
652
+ return -EINVAL;
653
+ }
654
+
655
+ if (((offset + len) < capacity && len & zone_size_mask) ||
656
+ offset + len > capacity) {
657
+ error_report("number of sectors %" PRId64 " is not aligned to zone size"
658
+ " %" PRId64 "", len / 512, zone_size / 512);
659
+ return -EINVAL;
660
+ }
661
+
662
+ switch (op) {
663
+ case BLK_ZO_OPEN:
664
+ op_name = "BLKOPENZONE";
665
+ zo = BLKOPENZONE;
666
+ break;
667
+ case BLK_ZO_CLOSE:
668
+ op_name = "BLKCLOSEZONE";
669
+ zo = BLKCLOSEZONE;
670
+ break;
671
+ case BLK_ZO_FINISH:
672
+ op_name = "BLKFINISHZONE";
673
+ zo = BLKFINISHZONE;
674
+ break;
675
+ case BLK_ZO_RESET:
676
+ op_name = "BLKRESETZONE";
677
+ zo = BLKRESETZONE;
678
+ break;
679
+ default:
680
+ error_report("Unsupported zone op: 0x%x", op);
681
+ return -ENOTSUP;
682
+ }
683
+
684
+ acb = (RawPosixAIOData) {
685
+ .bs = bs,
686
+ .aio_fildes = s->fd,
687
+ .aio_type = QEMU_AIO_ZONE_MGMT,
688
+ .aio_offset = offset,
689
+ .aio_nbytes = len,
690
+ .zone_mgmt = {
691
+ .op = zo,
692
+ },
693
+ };
694
+
695
+ ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb);
696
+ if (ret != 0) {
697
+ error_report("ioctl %s failed %d", op_name, ret);
698
+ }
699
+
700
+ return ret;
701
+}
702
+#endif
703
+
704
static coroutine_fn int
705
raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
706
bool blkdev)
707
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_device = {
708
#ifdef __linux__
709
.bdrv_co_ioctl = hdev_co_ioctl,
710
#endif
711
+
712
+ /* zoned device */
713
+#if defined(CONFIG_BLKZONED)
714
+ /* zone management operations */
715
+ .bdrv_co_zone_report = raw_co_zone_report,
716
+ .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
717
+#endif
718
};
719
720
#if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
721
diff --git a/block/io.c b/block/io.c
722
index XXXXXXX..XXXXXXX 100644
723
--- a/block/io.c
724
+++ b/block/io.c
725
@@ -XXX,XX +XXX,XX @@ out:
726
return co.ret;
727
}
728
729
+int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset,
730
+ unsigned int *nr_zones,
731
+ BlockZoneDescriptor *zones)
732
+{
733
+ BlockDriver *drv = bs->drv;
734
+ CoroutineIOCompletion co = {
735
+ .coroutine = qemu_coroutine_self(),
736
+ };
737
+ IO_CODE();
738
+
739
+ bdrv_inc_in_flight(bs);
740
+ if (!drv || !drv->bdrv_co_zone_report || bs->bl.zoned == BLK_Z_NONE) {
741
+ co.ret = -ENOTSUP;
742
+ goto out;
743
+ }
744
+ co.ret = drv->bdrv_co_zone_report(bs, offset, nr_zones, zones);
745
+out:
746
+ bdrv_dec_in_flight(bs);
747
+ return co.ret;
748
+}
749
+
750
+int coroutine_fn bdrv_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
751
+ int64_t offset, int64_t len)
752
+{
753
+ BlockDriver *drv = bs->drv;
754
+ CoroutineIOCompletion co = {
755
+ .coroutine = qemu_coroutine_self(),
756
+ };
757
+ IO_CODE();
758
+
759
+ bdrv_inc_in_flight(bs);
760
+ if (!drv || !drv->bdrv_co_zone_mgmt || bs->bl.zoned == BLK_Z_NONE) {
761
+ co.ret = -ENOTSUP;
762
+ goto out;
763
+ }
764
+ co.ret = drv->bdrv_co_zone_mgmt(bs, op, offset, len);
765
+out:
766
+ bdrv_dec_in_flight(bs);
767
+ return co.ret;
768
+}
769
+
770
void *qemu_blockalign(BlockDriverState *bs, size_t size)
771
{
772
IO_CODE();
773
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
774
index XXXXXXX..XXXXXXX 100644
775
--- a/qemu-io-cmds.c
776
+++ b/qemu-io-cmds.c
777
@@ -XXX,XX +XXX,XX @@ static const cmdinfo_t flush_cmd = {
778
.oneline = "flush all in-core file state to disk",
779
};
780
781
+static inline int64_t tosector(int64_t bytes)
782
+{
783
+ return bytes >> BDRV_SECTOR_BITS;
784
+}
785
+
786
+static int zone_report_f(BlockBackend *blk, int argc, char **argv)
787
+{
788
+ int ret;
789
+ int64_t offset;
790
+ unsigned int nr_zones;
791
+
792
+ ++optind;
793
+ offset = cvtnum(argv[optind]);
794
+ ++optind;
795
+ nr_zones = cvtnum(argv[optind]);
796
+
797
+ g_autofree BlockZoneDescriptor *zones = NULL;
798
+ zones = g_new(BlockZoneDescriptor, nr_zones);
799
+ ret = blk_zone_report(blk, offset, &nr_zones, zones);
800
+ if (ret < 0) {
801
+ printf("zone report failed: %s\n", strerror(-ret));
802
+ } else {
803
+ for (int i = 0; i < nr_zones; ++i) {
804
+ printf("start: 0x%" PRIx64 ", len 0x%" PRIx64 ", "
805
+ "cap"" 0x%" PRIx64 ", wptr 0x%" PRIx64 ", "
806
+ "zcond:%u, [type: %u]\n",
807
+ tosector(zones[i].start), tosector(zones[i].length),
808
+ tosector(zones[i].cap), tosector(zones[i].wp),
809
+ zones[i].state, zones[i].type);
810
+ }
811
+ }
812
+ return ret;
813
+}
814
+
815
+static const cmdinfo_t zone_report_cmd = {
816
+ .name = "zone_report",
817
+ .altname = "zrp",
818
+ .cfunc = zone_report_f,
819
+ .argmin = 2,
820
+ .argmax = 2,
821
+ .args = "offset number",
822
+ .oneline = "report zone information",
823
+};
824
+
825
+static int zone_open_f(BlockBackend *blk, int argc, char **argv)
826
+{
827
+ int ret;
828
+ int64_t offset, len;
829
+ ++optind;
830
+ offset = cvtnum(argv[optind]);
831
+ ++optind;
832
+ len = cvtnum(argv[optind]);
833
+ ret = blk_zone_mgmt(blk, BLK_ZO_OPEN, offset, len);
834
+ if (ret < 0) {
835
+ printf("zone open failed: %s\n", strerror(-ret));
836
+ }
837
+ return ret;
838
+}
839
+
840
+static const cmdinfo_t zone_open_cmd = {
841
+ .name = "zone_open",
842
+ .altname = "zo",
843
+ .cfunc = zone_open_f,
844
+ .argmin = 2,
845
+ .argmax = 2,
846
+ .args = "offset len",
847
+ .oneline = "explicit open a range of zones in zone block device",
848
+};
849
+
850
+static int zone_close_f(BlockBackend *blk, int argc, char **argv)
851
+{
852
+ int ret;
853
+ int64_t offset, len;
854
+ ++optind;
855
+ offset = cvtnum(argv[optind]);
856
+ ++optind;
857
+ len = cvtnum(argv[optind]);
858
+ ret = blk_zone_mgmt(blk, BLK_ZO_CLOSE, offset, len);
859
+ if (ret < 0) {
860
+ printf("zone close failed: %s\n", strerror(-ret));
861
+ }
862
+ return ret;
863
+}
864
+
865
+static const cmdinfo_t zone_close_cmd = {
866
+ .name = "zone_close",
867
+ .altname = "zc",
868
+ .cfunc = zone_close_f,
869
+ .argmin = 2,
870
+ .argmax = 2,
871
+ .args = "offset len",
872
+ .oneline = "close a range of zones in zone block device",
873
+};
874
+
875
+static int zone_finish_f(BlockBackend *blk, int argc, char **argv)
876
+{
877
+ int ret;
878
+ int64_t offset, len;
879
+ ++optind;
880
+ offset = cvtnum(argv[optind]);
881
+ ++optind;
882
+ len = cvtnum(argv[optind]);
883
+ ret = blk_zone_mgmt(blk, BLK_ZO_FINISH, offset, len);
884
+ if (ret < 0) {
885
+ printf("zone finish failed: %s\n", strerror(-ret));
886
+ }
887
+ return ret;
888
+}
889
+
890
+static const cmdinfo_t zone_finish_cmd = {
891
+ .name = "zone_finish",
892
+ .altname = "zf",
893
+ .cfunc = zone_finish_f,
894
+ .argmin = 2,
895
+ .argmax = 2,
896
+ .args = "offset len",
897
+ .oneline = "finish a range of zones in zone block device",
898
+};
899
+
900
+static int zone_reset_f(BlockBackend *blk, int argc, char **argv)
901
+{
902
+ int ret;
903
+ int64_t offset, len;
904
+ ++optind;
905
+ offset = cvtnum(argv[optind]);
906
+ ++optind;
907
+ len = cvtnum(argv[optind]);
908
+ ret = blk_zone_mgmt(blk, BLK_ZO_RESET, offset, len);
909
+ if (ret < 0) {
910
+ printf("zone reset failed: %s\n", strerror(-ret));
911
+ }
912
+ return ret;
913
+}
914
+
915
+static const cmdinfo_t zone_reset_cmd = {
916
+ .name = "zone_reset",
917
+ .altname = "zrs",
918
+ .cfunc = zone_reset_f,
919
+ .argmin = 2,
920
+ .argmax = 2,
921
+ .args = "offset len",
922
+ .oneline = "reset a zone write pointer in zone block device",
923
+};
924
+
925
static int truncate_f(BlockBackend *blk, int argc, char **argv);
926
static const cmdinfo_t truncate_cmd = {
927
.name = "truncate",
928
@@ -XXX,XX +XXX,XX @@ static void __attribute((constructor)) init_qemuio_commands(void)
929
qemuio_add_command(&aio_write_cmd);
930
qemuio_add_command(&aio_flush_cmd);
931
qemuio_add_command(&flush_cmd);
932
+ qemuio_add_command(&zone_report_cmd);
933
+ qemuio_add_command(&zone_open_cmd);
934
+ qemuio_add_command(&zone_close_cmd);
935
+ qemuio_add_command(&zone_finish_cmd);
936
+ qemuio_add_command(&zone_reset_cmd);
937
qemuio_add_command(&truncate_cmd);
938
qemuio_add_command(&length_cmd);
939
qemuio_add_command(&info_cmd);
63
--
940
--
64
2.26.2
941
2.40.1
65
942
943
diff view generated by jsdifflib
1
The vu_client_trip() coroutine is leaked during AioContext switching. It
1
From: Sam Li <faithilikerun@gmail.com>
2
is also unsafe to destroy the vu_dev in panic_cb() since its callers
3
still access it in some cases.
4
2
5
Rework the lifecycle to solve these safety issues.
3
raw-format driver usually sits on top of file-posix driver. It needs to
4
pass through requests of zone commands.
6
5
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
9
Reviewed-by: Hannes Reinecke <hare@suse.de>
10
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
11
Acked-by: Kevin Wolf <kwolf@redhat.com>
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Message-id: 20200924151549.913737-10-stefanha@redhat.com
13
Message-id: 20230508045533.175575-5-faithilikerun@gmail.com
14
Message-id: 20230324090605.28361-5-faithilikerun@gmail.com
15
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
16
<philmd@linaro.org>.
17
--Stefan]
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
18
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
19
---
11
util/vhost-user-server.h | 29 ++--
20
block/raw-format.c | 17 +++++++++++++++++
12
block/export/vhost-user-blk-server.c | 9 +-
21
1 file changed, 17 insertions(+)
13
util/vhost-user-server.c | 245 +++++++++++++++------------
14
3 files changed, 155 insertions(+), 128 deletions(-)
15
22
16
diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
23
diff --git a/block/raw-format.c b/block/raw-format.c
17
index XXXXXXX..XXXXXXX 100644
24
index XXXXXXX..XXXXXXX 100644
18
--- a/util/vhost-user-server.h
25
--- a/block/raw-format.c
19
+++ b/util/vhost-user-server.h
26
+++ b/block/raw-format.c
20
@@ -XXX,XX +XXX,XX @@
27
@@ -XXX,XX +XXX,XX @@ raw_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
21
#include "qapi/error.h"
28
return bdrv_co_pdiscard(bs->file, offset, bytes);
22
#include "standard-headers/linux/virtio_blk.h"
23
24
+/* A kick fd that we monitor on behalf of libvhost-user */
25
typedef struct VuFdWatch {
26
VuDev *vu_dev;
27
int fd; /*kick fd*/
28
void *pvt;
29
vu_watch_cb cb;
30
- bool processing;
31
QTAILQ_ENTRY(VuFdWatch) next;
32
} VuFdWatch;
33
34
-typedef struct VuServer VuServer;
35
-
36
-struct VuServer {
37
+/**
38
+ * VuServer:
39
+ * A vhost-user server instance with user-defined VuDevIface callbacks.
40
+ * Vhost-user device backends can be implemented using VuServer. VuDevIface
41
+ * callbacks and virtqueue kicks run in the given AioContext.
42
+ */
43
+typedef struct {
44
QIONetListener *listener;
45
+ QEMUBH *restart_listener_bh;
46
AioContext *ctx;
47
int max_queues;
48
const VuDevIface *vu_iface;
49
+
50
+ /* Protected by ctx lock */
51
VuDev vu_dev;
52
QIOChannel *ioc; /* The I/O channel with the client */
53
QIOChannelSocket *sioc; /* The underlying data channel with the client */
54
- /* IOChannel for fd provided via VHOST_USER_SET_SLAVE_REQ_FD */
55
- QIOChannel *ioc_slave;
56
- QIOChannelSocket *sioc_slave;
57
- Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
58
QTAILQ_HEAD(, VuFdWatch) vu_fd_watches;
59
- /* restart coroutine co_trip if AIOContext is changed */
60
- bool aio_context_changed;
61
- bool processing_msg;
62
-};
63
+
64
+ Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
65
+} VuServer;
66
67
bool vhost_user_server_start(VuServer *server,
68
SocketAddress *unix_socket,
69
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
70
71
void vhost_user_server_stop(VuServer *server);
72
73
-void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx);
74
+void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx);
75
+void vhost_user_server_detach_aio_context(VuServer *server);
76
77
#endif /* VHOST_USER_SERVER_H */
78
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
79
index XXXXXXX..XXXXXXX 100644
80
--- a/block/export/vhost-user-blk-server.c
81
+++ b/block/export/vhost-user-blk-server.c
82
@@ -XXX,XX +XXX,XX @@ static const VuDevIface vu_block_iface = {
83
static void blk_aio_attached(AioContext *ctx, void *opaque)
84
{
85
VuBlockDev *vub_dev = opaque;
86
- aio_context_acquire(ctx);
87
- vhost_user_server_set_aio_context(&vub_dev->vu_server, ctx);
88
- aio_context_release(ctx);
89
+ vhost_user_server_attach_aio_context(&vub_dev->vu_server, ctx);
90
}
29
}
91
30
92
static void blk_aio_detach(void *opaque)
31
+static int coroutine_fn GRAPH_RDLOCK
93
{
32
+raw_co_zone_report(BlockDriverState *bs, int64_t offset,
94
VuBlockDev *vub_dev = opaque;
33
+ unsigned int *nr_zones,
95
- AioContext *ctx = vub_dev->vu_server.ctx;
34
+ BlockZoneDescriptor *zones)
96
- aio_context_acquire(ctx);
35
+{
97
- vhost_user_server_set_aio_context(&vub_dev->vu_server, NULL);
36
+ return bdrv_co_zone_report(bs->file->bs, offset, nr_zones, zones);
98
- aio_context_release(ctx);
99
+ vhost_user_server_detach_aio_context(&vub_dev->vu_server);
100
}
101
102
static void
103
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
104
index XXXXXXX..XXXXXXX 100644
105
--- a/util/vhost-user-server.c
106
+++ b/util/vhost-user-server.c
107
@@ -XXX,XX +XXX,XX @@
108
*/
109
#include "qemu/osdep.h"
110
#include "qemu/main-loop.h"
111
+#include "block/aio-wait.h"
112
#include "vhost-user-server.h"
113
114
+/*
115
+ * Theory of operation:
116
+ *
117
+ * VuServer is started and stopped by vhost_user_server_start() and
118
+ * vhost_user_server_stop() from the main loop thread. Starting the server
119
+ * opens a vhost-user UNIX domain socket and listens for incoming connections.
120
+ * Only one connection is allowed at a time.
121
+ *
122
+ * The connection is handled by the vu_client_trip() coroutine in the
123
+ * VuServer->ctx AioContext. The coroutine consists of a vu_dispatch() loop
124
+ * where libvhost-user calls vu_message_read() to receive the next vhost-user
125
+ * protocol messages over the UNIX domain socket.
126
+ *
127
+ * When virtqueues are set up libvhost-user calls set_watch() to monitor kick
128
+ * fds. These fds are also handled in the VuServer->ctx AioContext.
129
+ *
130
+ * Both vu_client_trip() and kick fd monitoring can be stopped by shutting down
131
+ * the socket connection. Shutting down the socket connection causes
132
+ * vu_message_read() to fail since no more data can be received from the socket.
133
+ * After vu_dispatch() fails, vu_client_trip() calls vu_deinit() to stop
134
+ * libvhost-user before terminating the coroutine. vu_deinit() calls
135
+ * remove_watch() to stop monitoring kick fds and this stops virtqueue
136
+ * processing.
137
+ *
138
+ * When vu_client_trip() has finished cleaning up it schedules a BH in the main
139
+ * loop thread to accept the next client connection.
140
+ *
141
+ * When libvhost-user detects an error it calls panic_cb() and sets the
142
+ * dev->broken flag. Both vu_client_trip() and kick fd processing stop when
143
+ * the dev->broken flag is set.
144
+ *
145
+ * It is possible to switch AioContexts using
146
+ * vhost_user_server_detach_aio_context() and
147
+ * vhost_user_server_attach_aio_context(). They stop monitoring fds in the old
148
+ * AioContext and resume monitoring in the new AioContext. The vu_client_trip()
149
+ * coroutine remains in a yielded state during the switch. This is made
150
+ * possible by QIOChannel's support for spurious coroutine re-entry in
151
+ * qio_channel_yield(). The coroutine will restart I/O when re-entered from the
152
+ * new AioContext.
153
+ */
154
+
155
static void vmsg_close_fds(VhostUserMsg *vmsg)
156
{
157
int i;
158
@@ -XXX,XX +XXX,XX @@ static void vmsg_unblock_fds(VhostUserMsg *vmsg)
159
}
160
}
161
162
-static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
163
- gpointer opaque);
164
-
165
-static void close_client(VuServer *server)
166
-{
167
- /*
168
- * Before closing the client
169
- *
170
- * 1. Let vu_client_trip stop processing new vhost-user msg
171
- *
172
- * 2. remove kick_handler
173
- *
174
- * 3. wait for the kick handler to be finished
175
- *
176
- * 4. wait for the current vhost-user msg to be finished processing
177
- */
178
-
179
- QIOChannelSocket *sioc = server->sioc;
180
- /* When this is set vu_client_trip will stop new processing vhost-user message */
181
- server->sioc = NULL;
182
-
183
- while (server->processing_msg) {
184
- if (server->ioc->read_coroutine) {
185
- server->ioc->read_coroutine = NULL;
186
- qio_channel_set_aio_fd_handler(server->ioc, server->ioc->ctx, NULL,
187
- NULL, server->ioc);
188
- server->processing_msg = false;
189
- }
190
- }
191
-
192
- vu_deinit(&server->vu_dev);
193
-
194
- /* vu_deinit() should have called remove_watch() */
195
- assert(QTAILQ_EMPTY(&server->vu_fd_watches));
196
-
197
- object_unref(OBJECT(sioc));
198
- object_unref(OBJECT(server->ioc));
199
-}
200
-
201
static void panic_cb(VuDev *vu_dev, const char *buf)
202
{
203
- VuServer *server = container_of(vu_dev, VuServer, vu_dev);
204
-
205
- /* avoid while loop in close_client */
206
- server->processing_msg = false;
207
-
208
- if (buf) {
209
- error_report("vu_panic: %s", buf);
210
- }
211
-
212
- if (server->sioc) {
213
- close_client(server);
214
- }
215
-
216
- /*
217
- * Set the callback function for network listener so another
218
- * vhost-user client can connect to this server
219
- */
220
- qio_net_listener_set_client_func(server->listener,
221
- vu_accept,
222
- server,
223
- NULL);
224
+ error_report("vu_panic: %s", buf);
225
}
226
227
static bool coroutine_fn
228
@@ -XXX,XX +XXX,XX @@ fail:
229
return false;
230
}
231
232
-
233
-static void vu_client_start(VuServer *server);
234
static coroutine_fn void vu_client_trip(void *opaque)
235
{
236
VuServer *server = opaque;
237
+ VuDev *vu_dev = &server->vu_dev;
238
239
- while (!server->aio_context_changed && server->sioc) {
240
- server->processing_msg = true;
241
- vu_dispatch(&server->vu_dev);
242
- server->processing_msg = false;
243
+ while (!vu_dev->broken && vu_dispatch(vu_dev)) {
244
+ /* Keep running */
245
}
246
247
- if (server->aio_context_changed && server->sioc) {
248
- server->aio_context_changed = false;
249
- vu_client_start(server);
250
- }
251
-}
252
+ vu_deinit(vu_dev);
253
+
254
+ /* vu_deinit() should have called remove_watch() */
255
+ assert(QTAILQ_EMPTY(&server->vu_fd_watches));
256
+
257
+ object_unref(OBJECT(server->sioc));
258
+ server->sioc = NULL;
259
260
-static void vu_client_start(VuServer *server)
261
-{
262
- server->co_trip = qemu_coroutine_create(vu_client_trip, server);
263
- aio_co_enter(server->ctx, server->co_trip);
264
+ object_unref(OBJECT(server->ioc));
265
+ server->ioc = NULL;
266
+
267
+ server->co_trip = NULL;
268
+ if (server->restart_listener_bh) {
269
+ qemu_bh_schedule(server->restart_listener_bh);
270
+ }
271
+ aio_wait_kick();
272
}
273
274
/*
275
@@ -XXX,XX +XXX,XX @@ static void vu_client_start(VuServer *server)
276
static void kick_handler(void *opaque)
277
{
278
VuFdWatch *vu_fd_watch = opaque;
279
- vu_fd_watch->processing = true;
280
- vu_fd_watch->cb(vu_fd_watch->vu_dev, 0, vu_fd_watch->pvt);
281
- vu_fd_watch->processing = false;
282
+ VuDev *vu_dev = vu_fd_watch->vu_dev;
283
+
284
+ vu_fd_watch->cb(vu_dev, 0, vu_fd_watch->pvt);
285
+
286
+ /* Stop vu_client_trip() if an error occurred in vu_fd_watch->cb() */
287
+ if (vu_dev->broken) {
288
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
289
+
290
+ qio_channel_shutdown(server->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
291
+ }
292
}
293
294
-
295
static VuFdWatch *find_vu_fd_watch(VuServer *server, int fd)
296
{
297
298
@@ -XXX,XX +XXX,XX @@ static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
299
qio_channel_set_name(QIO_CHANNEL(sioc), "vhost-user client");
300
server->ioc = QIO_CHANNEL(sioc);
301
object_ref(OBJECT(server->ioc));
302
- qio_channel_attach_aio_context(server->ioc, server->ctx);
303
+
304
+ /* TODO vu_message_write() spins if non-blocking! */
305
qio_channel_set_blocking(server->ioc, false, NULL);
306
- vu_client_start(server);
307
+
308
+ server->co_trip = qemu_coroutine_create(vu_client_trip, server);
309
+
310
+ aio_context_acquire(server->ctx);
311
+ vhost_user_server_attach_aio_context(server, server->ctx);
312
+ aio_context_release(server->ctx);
313
}
314
315
-
316
void vhost_user_server_stop(VuServer *server)
317
{
318
+ aio_context_acquire(server->ctx);
319
+
320
+ qemu_bh_delete(server->restart_listener_bh);
321
+ server->restart_listener_bh = NULL;
322
+
323
if (server->sioc) {
324
- close_client(server);
325
+ VuFdWatch *vu_fd_watch;
326
+
327
+ QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
328
+ aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true,
329
+ NULL, NULL, NULL, vu_fd_watch);
330
+ }
331
+
332
+ qio_channel_shutdown(server->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
333
+
334
+ AIO_WAIT_WHILE(server->ctx, server->co_trip);
335
}
336
337
+ aio_context_release(server->ctx);
338
+
339
if (server->listener) {
340
qio_net_listener_disconnect(server->listener);
341
object_unref(OBJECT(server->listener));
342
}
343
+}
37
+}
344
+
38
+
345
+/*
39
+static int coroutine_fn GRAPH_RDLOCK
346
+ * Allow the next client to connect to the server. Called from a BH in the main
40
+raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
347
+ * loop.
41
+ int64_t offset, int64_t len)
348
+ */
349
+static void restart_listener_bh(void *opaque)
350
+{
42
+{
351
+ VuServer *server = opaque;
43
+ return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
352
353
+ qio_net_listener_set_client_func(server->listener, vu_accept, server,
354
+ NULL);
355
}
356
357
-void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx)
358
+/* Called with ctx acquired */
359
+void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx)
360
{
361
- VuFdWatch *vu_fd_watch, *next;
362
- void *opaque = NULL;
363
- IOHandler *io_read = NULL;
364
- bool attach;
365
+ VuFdWatch *vu_fd_watch;
366
367
- server->ctx = ctx ? ctx : qemu_get_aio_context();
368
+ server->ctx = ctx;
369
370
if (!server->sioc) {
371
- /* not yet serving any client*/
372
return;
373
}
374
375
- if (ctx) {
376
- qio_channel_attach_aio_context(server->ioc, ctx);
377
- server->aio_context_changed = true;
378
- io_read = kick_handler;
379
- attach = true;
380
- } else {
381
+ qio_channel_attach_aio_context(server->ioc, ctx);
382
+
383
+ QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
384
+ aio_set_fd_handler(ctx, vu_fd_watch->fd, true, kick_handler, NULL,
385
+ NULL, vu_fd_watch);
386
+ }
387
+
388
+ aio_co_schedule(ctx, server->co_trip);
389
+}
44
+}
390
+
45
+
391
+/* Called with server->ctx acquired */
46
static int64_t coroutine_fn GRAPH_RDLOCK
392
+void vhost_user_server_detach_aio_context(VuServer *server)
47
raw_co_getlength(BlockDriverState *bs)
393
+{
394
+ if (server->sioc) {
395
+ VuFdWatch *vu_fd_watch;
396
+
397
+ QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
398
+ aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true,
399
+ NULL, NULL, NULL, vu_fd_watch);
400
+ }
401
+
402
qio_channel_detach_aio_context(server->ioc);
403
- /* server->ioc->ctx keeps the old AioConext */
404
- ctx = server->ioc->ctx;
405
- attach = false;
406
}
407
408
- QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
409
- if (vu_fd_watch->cb) {
410
- opaque = attach ? vu_fd_watch : NULL;
411
- aio_set_fd_handler(ctx, vu_fd_watch->fd, true,
412
- io_read, NULL, NULL,
413
- opaque);
414
- }
415
- }
416
+ server->ctx = NULL;
417
}
418
419
-
420
bool vhost_user_server_start(VuServer *server,
421
SocketAddress *socket_addr,
422
AioContext *ctx,
423
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
424
const VuDevIface *vu_iface,
425
Error **errp)
426
{
48
{
427
+ QEMUBH *bh;
49
@@ -XXX,XX +XXX,XX @@ BlockDriver bdrv_raw = {
428
QIONetListener *listener = qio_net_listener_new();
50
.bdrv_co_pwritev = &raw_co_pwritev,
429
if (qio_net_listener_open_sync(listener, socket_addr, 1,
51
.bdrv_co_pwrite_zeroes = &raw_co_pwrite_zeroes,
430
errp) < 0) {
52
.bdrv_co_pdiscard = &raw_co_pdiscard,
431
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
53
+ .bdrv_co_zone_report = &raw_co_zone_report,
432
return false;
54
+ .bdrv_co_zone_mgmt = &raw_co_zone_mgmt,
433
}
55
.bdrv_co_block_status = &raw_co_block_status,
434
56
.bdrv_co_copy_range_from = &raw_co_copy_range_from,
435
+ bh = qemu_bh_new(restart_listener_bh, server);
57
.bdrv_co_copy_range_to = &raw_co_copy_range_to,
436
+
437
/* zero out unspecified fields */
438
*server = (VuServer) {
439
.listener = listener,
440
+ .restart_listener_bh = bh,
441
.vu_iface = vu_iface,
442
.max_queues = max_queues,
443
.ctx = ctx,
444
--
58
--
445
2.26.2
59
2.40.1
446
60
61
diff view generated by jsdifflib
1
From: Coiby Xu <coiby.xu@gmail.com>
1
From: Sam Li <faithilikerun@gmail.com>
2
2
3
Move the constants from hw/core/qdev-properties.c to
3
Putting zoned/non-zoned BlockDrivers on top of each other is not
4
util/block-helpers.h so that knowledge of the min/max values is
4
allowed.
5
5
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Reviewed-by: Hannes Reinecke <hare@suse.de>
9
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
12
Message-id: 20230508045533.175575-6-faithilikerun@gmail.com
8
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Message-id: 20230324090605.28361-6-faithilikerun@gmail.com
9
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
14
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
10
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
15
<philmd@linaro.org> and clarify that the check is about zoned
11
Message-id: 20200918080912.321299-5-coiby.xu@gmail.com
16
BlockDrivers.
17
--Stefan]
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
18
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
---
19
---
14
util/block-helpers.h | 19 +++++++++++++
20
include/block/block_int-common.h | 5 +++++
15
hw/core/qdev-properties-system.c | 31 ++++-----------------
21
block.c | 19 +++++++++++++++++++
16
util/block-helpers.c | 46 ++++++++++++++++++++++++++++++++
22
block/file-posix.c | 12 ++++++++++++
17
util/meson.build | 1 +
23
block/raw-format.c | 1 +
18
4 files changed, 71 insertions(+), 26 deletions(-)
24
4 files changed, 37 insertions(+)
19
create mode 100644 util/block-helpers.h
20
create mode 100644 util/block-helpers.c
21
25
22
diff --git a/util/block-helpers.h b/util/block-helpers.h
26
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
23
new file mode 100644
27
index XXXXXXX..XXXXXXX 100644
24
index XXXXXXX..XXXXXXX
28
--- a/include/block/block_int-common.h
25
--- /dev/null
29
+++ b/include/block/block_int-common.h
26
+++ b/util/block-helpers.h
30
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
27
@@ -XXX,XX +XXX,XX @@
31
*/
28
+#ifndef BLOCK_HELPERS_H
32
bool is_format;
29
+#define BLOCK_HELPERS_H
33
34
+ /*
35
+ * Set to true if the BlockDriver supports zoned children.
36
+ */
37
+ bool supports_zoned_children;
30
+
38
+
31
+#include "qemu/units.h"
39
/*
32
+
40
* Drivers not implementing bdrv_parse_filename nor bdrv_open should have
33
+/* lower limit is sector size */
41
* this field set to true, except ones that are defined only by their
34
+#define MIN_BLOCK_SIZE INT64_C(512)
42
diff --git a/block.c b/block.c
35
+#define MIN_BLOCK_SIZE_STR "512 B"
36
+/*
37
+ * upper limit is arbitrary, 2 MiB looks sufficient for all sensible uses, and
38
+ * matches qcow2 cluster size limit
39
+ */
40
+#define MAX_BLOCK_SIZE (2 * MiB)
41
+#define MAX_BLOCK_SIZE_STR "2 MiB"
42
+
43
+void check_block_size(const char *id, const char *name, int64_t value,
44
+ Error **errp);
45
+
46
+#endif /* BLOCK_HELPERS_H */
47
diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
48
index XXXXXXX..XXXXXXX 100644
43
index XXXXXXX..XXXXXXX 100644
49
--- a/hw/core/qdev-properties-system.c
44
--- a/block.c
50
+++ b/hw/core/qdev-properties-system.c
45
+++ b/block.c
51
@@ -XXX,XX +XXX,XX @@
46
@@ -XXX,XX +XXX,XX @@ void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
52
#include "sysemu/blockdev.h"
53
#include "net/net.h"
54
#include "hw/pci/pci.h"
55
+#include "util/block-helpers.h"
56
57
static bool check_prop_still_unset(DeviceState *dev, const char *name,
58
const void *old_val, const char *new_val,
59
@@ -XXX,XX +XXX,XX @@ const PropertyInfo qdev_prop_losttickpolicy = {
60
61
/* --- blocksize --- */
62
63
-/* lower limit is sector size */
64
-#define MIN_BLOCK_SIZE 512
65
-#define MIN_BLOCK_SIZE_STR "512 B"
66
-/*
67
- * upper limit is arbitrary, 2 MiB looks sufficient for all sensible uses, and
68
- * matches qcow2 cluster size limit
69
- */
70
-#define MAX_BLOCK_SIZE (2 * MiB)
71
-#define MAX_BLOCK_SIZE_STR "2 MiB"
72
-
73
static void set_blocksize(Object *obj, Visitor *v, const char *name,
74
void *opaque, Error **errp)
75
{
76
@@ -XXX,XX +XXX,XX @@ static void set_blocksize(Object *obj, Visitor *v, const char *name,
77
Property *prop = opaque;
78
uint32_t *ptr = qdev_get_prop_ptr(dev, prop);
79
uint64_t value;
80
+ Error *local_err = NULL;
81
82
if (dev->realized) {
83
qdev_prop_set_after_realize(dev, name, errp);
84
@@ -XXX,XX +XXX,XX @@ static void set_blocksize(Object *obj, Visitor *v, const char *name,
85
if (!visit_type_size(v, name, &value, errp)) {
86
return;
47
return;
87
}
48
}
88
- /* value of 0 means "unset" */
49
89
- if (value && (value < MIN_BLOCK_SIZE || value > MAX_BLOCK_SIZE)) {
50
+ /*
90
- error_setg(errp,
51
+ * Non-zoned block drivers do not follow zoned storage constraints
91
- "Property %s.%s doesn't take value %" PRIu64
52
+ * (i.e. sequential writes to zones). Refuse mixing zoned and non-zoned
92
- " (minimum: " MIN_BLOCK_SIZE_STR
53
+ * drivers in a graph.
93
- ", maximum: " MAX_BLOCK_SIZE_STR ")",
54
+ */
94
- dev->id ? : "", name, value);
55
+ if (!parent_bs->drv->supports_zoned_children &&
95
+ check_block_size(dev->id ? : "", name, value, &local_err);
56
+ child_bs->bl.zoned == BLK_Z_HM) {
96
+ if (local_err) {
57
+ /*
97
+ error_propagate(errp, local_err);
58
+ * The host-aware model allows zoned storage constraints and random
98
return;
59
+ * write. Allow mixing host-aware and non-zoned drivers. Using
99
}
60
+ * host-aware device as a regular device.
100
-
61
+ */
101
- /* We rely on power-of-2 blocksizes for bitmasks */
62
+ error_setg(errp, "Cannot add a %s child to a %s parent",
102
- if ((value & (value - 1)) != 0) {
63
+ child_bs->bl.zoned == BLK_Z_HM ? "zoned" : "non-zoned",
103
- error_setg(errp,
64
+ parent_bs->drv->supports_zoned_children ?
104
- "Property %s.%s doesn't take value '%" PRId64 "', "
65
+ "support zoned children" : "not support zoned children");
105
- "it's not a power of 2", dev->id ?: "", name, (int64_t)value);
106
- return;
107
- }
108
-
109
*ptr = value;
110
}
111
112
diff --git a/util/block-helpers.c b/util/block-helpers.c
113
new file mode 100644
114
index XXXXXXX..XXXXXXX
115
--- /dev/null
116
+++ b/util/block-helpers.c
117
@@ -XXX,XX +XXX,XX @@
118
+/*
119
+ * Block utility functions
120
+ *
121
+ * Copyright IBM, Corp. 2011
122
+ * Copyright (c) 2020 Coiby Xu <coiby.xu@gmail.com>
123
+ *
124
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
125
+ * See the COPYING file in the top-level directory.
126
+ */
127
+
128
+#include "qemu/osdep.h"
129
+#include "qapi/error.h"
130
+#include "qapi/qmp/qerror.h"
131
+#include "block-helpers.h"
132
+
133
+/**
134
+ * check_block_size:
135
+ * @id: The unique ID of the object
136
+ * @name: The name of the property being validated
137
+ * @value: The block size in bytes
138
+ * @errp: A pointer to an area to store an error
139
+ *
140
+ * This function checks that the block size meets the following conditions:
141
+ * 1. At least MIN_BLOCK_SIZE
142
+ * 2. No larger than MAX_BLOCK_SIZE
143
+ * 3. A power of 2
144
+ */
145
+void check_block_size(const char *id, const char *name, int64_t value,
146
+ Error **errp)
147
+{
148
+ /* value of 0 means "unset" */
149
+ if (value && (value < MIN_BLOCK_SIZE || value > MAX_BLOCK_SIZE)) {
150
+ error_setg(errp, QERR_PROPERTY_VALUE_OUT_OF_RANGE,
151
+ id, name, value, MIN_BLOCK_SIZE, MAX_BLOCK_SIZE);
152
+ return;
66
+ return;
153
+ }
67
+ }
154
+
68
+
155
+ /* We rely on power-of-2 blocksizes for bitmasks */
69
if (!QLIST_EMPTY(&child_bs->parents)) {
156
+ if ((value & (value - 1)) != 0) {
70
error_setg(errp, "The node %s already has a parent",
157
+ error_setg(errp,
71
child_bs->node_name);
158
+ "Property %s.%s doesn't take value '%" PRId64
72
diff --git a/block/file-posix.c b/block/file-posix.c
159
+ "', it's not a power of 2",
73
index XXXXXXX..XXXXXXX 100644
160
+ id, name, value);
74
--- a/block/file-posix.c
161
+ return;
75
+++ b/block/file-posix.c
76
@@ -XXX,XX +XXX,XX @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
77
goto fail;
78
}
79
}
80
+#ifdef CONFIG_BLKZONED
81
+ /*
82
+ * The kernel page cache does not reliably work for writes to SWR zones
83
+ * of zoned block device because it can not guarantee the order of writes.
84
+ */
85
+ if ((bs->bl.zoned != BLK_Z_NONE) &&
86
+ (!(s->open_flags & O_DIRECT))) {
87
+ error_setg(errp, "The driver supports zoned devices, and it requires "
88
+ "cache.direct=on, which was not specified.");
89
+ return -EINVAL; /* No host kernel page cache */
162
+ }
90
+ }
163
+}
91
+#endif
164
diff --git a/util/meson.build b/util/meson.build
92
93
if (S_ISBLK(st.st_mode)) {
94
#ifdef __linux__
95
diff --git a/block/raw-format.c b/block/raw-format.c
165
index XXXXXXX..XXXXXXX 100644
96
index XXXXXXX..XXXXXXX 100644
166
--- a/util/meson.build
97
--- a/block/raw-format.c
167
+++ b/util/meson.build
98
+++ b/block/raw-format.c
168
@@ -XXX,XX +XXX,XX @@ if have_block
99
@@ -XXX,XX +XXX,XX @@ static void raw_child_perm(BlockDriverState *bs, BdrvChild *c,
169
util_ss.add(files('nvdimm-utils.c'))
100
BlockDriver bdrv_raw = {
170
util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
101
.format_name = "raw",
171
util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
102
.instance_size = sizeof(BDRVRawState),
172
+ util_ss.add(files('block-helpers.c'))
103
+ .supports_zoned_children = true,
173
util_ss.add(files('qemu-coroutine-sleep.c'))
104
.bdrv_probe = &raw_probe,
174
util_ss.add(files('qemu-co-shared-resource.c'))
105
.bdrv_reopen_prepare = &raw_reopen_prepare,
175
util_ss.add(files('thread-pool.c', 'qemu-timer.c'))
106
.bdrv_reopen_commit = &raw_reopen_commit,
176
--
107
--
177
2.26.2
108
2.40.1
178
109
110
diff view generated by jsdifflib
1
From: Coiby Xu <coiby.xu@gmail.com>
1
From: Sam Li <faithilikerun@gmail.com>
2
2
3
Sharing QEMU devices via vhost-user protocol.
3
The new block layer APIs of zoned block devices can be tested by:
4
$ tests/qemu-iotests/check zoned
5
Run each zone operation on a newly created null_blk device
6
and see whether it outputs the same zone information.
4
7
5
Only one vhost-user client can connect to the server one time.
8
Signed-off-by: Sam Li <faithilikerun@gmail.com>
6
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Suggested-by: Kevin Wolf <kwolf@redhat.com>
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
12
Message-id: 20230508045533.175575-7-faithilikerun@gmail.com
10
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Message-id: 20230324090605.28361-7-faithilikerun@gmail.com
11
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
14
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
12
Message-id: 20200918080912.321299-4-coiby.xu@gmail.com
15
<philmd@linaro.org>.
13
[Fixed size_t %lu -> %zu format string compiler error.
14
--Stefan]
16
--Stefan]
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
16
---
18
---
17
util/vhost-user-server.h | 65 ++++++
19
tests/qemu-iotests/tests/zoned | 89 ++++++++++++++++++++++++++++++
18
util/vhost-user-server.c | 428 +++++++++++++++++++++++++++++++++++++++
20
tests/qemu-iotests/tests/zoned.out | 53 ++++++++++++++++++
19
util/meson.build | 1 +
21
2 files changed, 142 insertions(+)
20
3 files changed, 494 insertions(+)
22
create mode 100755 tests/qemu-iotests/tests/zoned
21
create mode 100644 util/vhost-user-server.h
23
create mode 100644 tests/qemu-iotests/tests/zoned.out
22
create mode 100644 util/vhost-user-server.c
23
24
24
diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
25
diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned
26
new file mode 100755
27
index XXXXXXX..XXXXXXX
28
--- /dev/null
29
+++ b/tests/qemu-iotests/tests/zoned
30
@@ -XXX,XX +XXX,XX @@
31
+#!/usr/bin/env bash
32
+#
33
+# Test zone management operations.
34
+#
35
+
36
+seq="$(basename $0)"
37
+echo "QA output created by $seq"
38
+status=1 # failure is the default!
39
+
40
+_cleanup()
41
+{
42
+ _cleanup_test_img
43
+ sudo -n rmmod null_blk
44
+}
45
+trap "_cleanup; exit \$status" 0 1 2 3 15
46
+
47
+# get standard environment, filters and checks
48
+. ../common.rc
49
+. ../common.filter
50
+. ../common.qemu
51
+
52
+# This test only runs on Linux hosts with raw image files.
53
+_supported_fmt raw
54
+_supported_proto file
55
+_supported_os Linux
56
+
57
+sudo -n true || \
58
+ _notrun 'Password-less sudo required'
59
+
60
+IMG="--image-opts -n driver=host_device,filename=/dev/nullb0"
61
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
62
+
63
+echo "Testing a null_blk device:"
64
+echo "case 1: if the operations work"
65
+sudo -n modprobe null_blk nr_devices=1 zoned=1
66
+sudo -n chmod 0666 /dev/nullb0
67
+
68
+echo "(1) report the first zone:"
69
+$QEMU_IO $IMG -c "zrp 0 1"
70
+echo
71
+echo "report the first 10 zones"
72
+$QEMU_IO $IMG -c "zrp 0 10"
73
+echo
74
+echo "report the last zone:"
75
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2" # 0x3e70000000 / 512 = 0x1f380000
76
+echo
77
+echo
78
+echo "(2) opening the first zone"
79
+$QEMU_IO $IMG -c "zo 0 268435456" # 268435456 / 512 = 524288
80
+echo "report after:"
81
+$QEMU_IO $IMG -c "zrp 0 1"
82
+echo
83
+echo "opening the second zone"
84
+$QEMU_IO $IMG -c "zo 268435456 268435456" #
85
+echo "report after:"
86
+$QEMU_IO $IMG -c "zrp 268435456 1"
87
+echo
88
+echo "opening the last zone"
89
+$QEMU_IO $IMG -c "zo 0x3e70000000 268435456"
90
+echo "report after:"
91
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2"
92
+echo
93
+echo
94
+echo "(3) closing the first zone"
95
+$QEMU_IO $IMG -c "zc 0 268435456"
96
+echo "report after:"
97
+$QEMU_IO $IMG -c "zrp 0 1"
98
+echo
99
+echo "closing the last zone"
100
+$QEMU_IO $IMG -c "zc 0x3e70000000 268435456"
101
+echo "report after:"
102
+$QEMU_IO $IMG -c "zrp 0x3e70000000 2"
103
+echo
104
+echo
105
+echo "(4) finishing the second zone"
106
+$QEMU_IO $IMG -c "zf 268435456 268435456"
107
+echo "After finishing a zone:"
108
+$QEMU_IO $IMG -c "zrp 268435456 1"
109
+echo
110
+echo
111
+echo "(5) resetting the second zone"
112
+$QEMU_IO $IMG -c "zrs 268435456 268435456"
113
+echo "After resetting a zone:"
114
+$QEMU_IO $IMG -c "zrp 268435456 1"
115
+
116
+# success, all done
117
+echo "*** done"
118
+rm -f $seq.full
119
+status=0
120
diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out
25
new file mode 100644
121
new file mode 100644
26
index XXXXXXX..XXXXXXX
122
index XXXXXXX..XXXXXXX
27
--- /dev/null
123
--- /dev/null
28
+++ b/util/vhost-user-server.h
124
+++ b/tests/qemu-iotests/tests/zoned.out
29
@@ -XXX,XX +XXX,XX @@
125
@@ -XXX,XX +XXX,XX @@
30
+/*
126
+QA output created by zoned
31
+ * Sharing QEMU devices via vhost-user protocol
127
+Testing a null_blk device:
32
+ *
128
+case 1: if the operations work
33
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
129
+(1) report the first zone:
34
+ * Copyright (c) 2020 Red Hat, Inc.
130
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
35
+ *
36
+ * This work is licensed under the terms of the GNU GPL, version 2 or
37
+ * later. See the COPYING file in the top-level directory.
38
+ */
39
+
131
+
40
+#ifndef VHOST_USER_SERVER_H
132
+report the first 10 zones
41
+#define VHOST_USER_SERVER_H
133
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
134
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
135
+start: 0x100000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:1, [type: 2]
136
+start: 0x180000, len 0x80000, cap 0x80000, wptr 0x180000, zcond:1, [type: 2]
137
+start: 0x200000, len 0x80000, cap 0x80000, wptr 0x200000, zcond:1, [type: 2]
138
+start: 0x280000, len 0x80000, cap 0x80000, wptr 0x280000, zcond:1, [type: 2]
139
+start: 0x300000, len 0x80000, cap 0x80000, wptr 0x300000, zcond:1, [type: 2]
140
+start: 0x380000, len 0x80000, cap 0x80000, wptr 0x380000, zcond:1, [type: 2]
141
+start: 0x400000, len 0x80000, cap 0x80000, wptr 0x400000, zcond:1, [type: 2]
142
+start: 0x480000, len 0x80000, cap 0x80000, wptr 0x480000, zcond:1, [type: 2]
42
+
143
+
43
+#include "contrib/libvhost-user/libvhost-user.h"
144
+report the last zone:
44
+#include "io/channel-socket.h"
145
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:1, [type: 2]
45
+#include "io/channel-file.h"
46
+#include "io/net-listener.h"
47
+#include "qemu/error-report.h"
48
+#include "qapi/error.h"
49
+#include "standard-headers/linux/virtio_blk.h"
50
+
51
+typedef struct VuFdWatch {
52
+ VuDev *vu_dev;
53
+ int fd; /*kick fd*/
54
+ void *pvt;
55
+ vu_watch_cb cb;
56
+ bool processing;
57
+ QTAILQ_ENTRY(VuFdWatch) next;
58
+} VuFdWatch;
59
+
60
+typedef struct VuServer VuServer;
61
+typedef void DevicePanicNotifierFn(VuServer *server);
62
+
63
+struct VuServer {
64
+ QIONetListener *listener;
65
+ AioContext *ctx;
66
+ DevicePanicNotifierFn *device_panic_notifier;
67
+ int max_queues;
68
+ const VuDevIface *vu_iface;
69
+ VuDev vu_dev;
70
+ QIOChannel *ioc; /* The I/O channel with the client */
71
+ QIOChannelSocket *sioc; /* The underlying data channel with the client */
72
+ /* IOChannel for fd provided via VHOST_USER_SET_SLAVE_REQ_FD */
73
+ QIOChannel *ioc_slave;
74
+ QIOChannelSocket *sioc_slave;
75
+ Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
76
+ QTAILQ_HEAD(, VuFdWatch) vu_fd_watches;
77
+ /* restart coroutine co_trip if AIOContext is changed */
78
+ bool aio_context_changed;
79
+ bool processing_msg;
80
+};
81
+
82
+bool vhost_user_server_start(VuServer *server,
83
+ SocketAddress *unix_socket,
84
+ AioContext *ctx,
85
+ uint16_t max_queues,
86
+ DevicePanicNotifierFn *device_panic_notifier,
87
+ const VuDevIface *vu_iface,
88
+ Error **errp);
89
+
90
+void vhost_user_server_stop(VuServer *server);
91
+
92
+void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx);
93
+
94
+#endif /* VHOST_USER_SERVER_H */
95
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
96
new file mode 100644
97
index XXXXXXX..XXXXXXX
98
--- /dev/null
99
+++ b/util/vhost-user-server.c
100
@@ -XXX,XX +XXX,XX @@
101
+/*
102
+ * Sharing QEMU devices via vhost-user protocol
103
+ *
104
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
105
+ * Copyright (c) 2020 Red Hat, Inc.
106
+ *
107
+ * This work is licensed under the terms of the GNU GPL, version 2 or
108
+ * later. See the COPYING file in the top-level directory.
109
+ */
110
+#include "qemu/osdep.h"
111
+#include "qemu/main-loop.h"
112
+#include "vhost-user-server.h"
113
+
114
+static void vmsg_close_fds(VhostUserMsg *vmsg)
115
+{
116
+ int i;
117
+ for (i = 0; i < vmsg->fd_num; i++) {
118
+ close(vmsg->fds[i]);
119
+ }
120
+}
121
+
122
+static void vmsg_unblock_fds(VhostUserMsg *vmsg)
123
+{
124
+ int i;
125
+ for (i = 0; i < vmsg->fd_num; i++) {
126
+ qemu_set_nonblock(vmsg->fds[i]);
127
+ }
128
+}
129
+
130
+static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
131
+ gpointer opaque);
132
+
133
+static void close_client(VuServer *server)
134
+{
135
+ /*
136
+ * Before closing the client
137
+ *
138
+ * 1. Let vu_client_trip stop processing new vhost-user msg
139
+ *
140
+ * 2. remove kick_handler
141
+ *
142
+ * 3. wait for the kick handler to be finished
143
+ *
144
+ * 4. wait for the current vhost-user msg to be finished processing
145
+ */
146
+
147
+ QIOChannelSocket *sioc = server->sioc;
148
+ /* When this is set vu_client_trip will stop new processing vhost-user message */
149
+ server->sioc = NULL;
150
+
151
+ VuFdWatch *vu_fd_watch, *next;
152
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
153
+ aio_set_fd_handler(server->ioc->ctx, vu_fd_watch->fd, true, NULL,
154
+ NULL, NULL, NULL);
155
+ }
156
+
157
+ while (!QTAILQ_EMPTY(&server->vu_fd_watches)) {
158
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
159
+ if (!vu_fd_watch->processing) {
160
+ QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
161
+ g_free(vu_fd_watch);
162
+ }
163
+ }
164
+ }
165
+
166
+ while (server->processing_msg) {
167
+ if (server->ioc->read_coroutine) {
168
+ server->ioc->read_coroutine = NULL;
169
+ qio_channel_set_aio_fd_handler(server->ioc, server->ioc->ctx, NULL,
170
+ NULL, server->ioc);
171
+ server->processing_msg = false;
172
+ }
173
+ }
174
+
175
+ vu_deinit(&server->vu_dev);
176
+ object_unref(OBJECT(sioc));
177
+ object_unref(OBJECT(server->ioc));
178
+}
179
+
180
+static void panic_cb(VuDev *vu_dev, const char *buf)
181
+{
182
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
183
+
184
+ /* avoid while loop in close_client */
185
+ server->processing_msg = false;
186
+
187
+ if (buf) {
188
+ error_report("vu_panic: %s", buf);
189
+ }
190
+
191
+ if (server->sioc) {
192
+ close_client(server);
193
+ }
194
+
195
+ if (server->device_panic_notifier) {
196
+ server->device_panic_notifier(server);
197
+ }
198
+
199
+ /*
200
+ * Set the callback function for network listener so another
201
+ * vhost-user client can connect to this server
202
+ */
203
+ qio_net_listener_set_client_func(server->listener,
204
+ vu_accept,
205
+ server,
206
+ NULL);
207
+}
208
+
209
+static bool coroutine_fn
210
+vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
211
+{
212
+ struct iovec iov = {
213
+ .iov_base = (char *)vmsg,
214
+ .iov_len = VHOST_USER_HDR_SIZE,
215
+ };
216
+ int rc, read_bytes = 0;
217
+ Error *local_err = NULL;
218
+ /*
219
+ * Store fds/nfds returned from qio_channel_readv_full into
220
+ * temporary variables.
221
+ *
222
+ * VhostUserMsg is a packed structure, gcc will complain about passing
223
+ * pointer to a packed structure member if we pass &VhostUserMsg.fd_num
224
+ * and &VhostUserMsg.fds directly when calling qio_channel_readv_full,
225
+ * thus two temporary variables nfds and fds are used here.
226
+ */
227
+ size_t nfds = 0, nfds_t = 0;
228
+ const size_t max_fds = G_N_ELEMENTS(vmsg->fds);
229
+ int *fds_t = NULL;
230
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
231
+ QIOChannel *ioc = server->ioc;
232
+
233
+ if (!ioc) {
234
+ error_report_err(local_err);
235
+ goto fail;
236
+ }
237
+
238
+ assert(qemu_in_coroutine());
239
+ do {
240
+ /*
241
+ * qio_channel_readv_full may have short reads, keeping calling it
242
+ * until getting VHOST_USER_HDR_SIZE or 0 bytes in total
243
+ */
244
+ rc = qio_channel_readv_full(ioc, &iov, 1, &fds_t, &nfds_t, &local_err);
245
+ if (rc < 0) {
246
+ if (rc == QIO_CHANNEL_ERR_BLOCK) {
247
+ qio_channel_yield(ioc, G_IO_IN);
248
+ continue;
249
+ } else {
250
+ error_report_err(local_err);
251
+ return false;
252
+ }
253
+ }
254
+ read_bytes += rc;
255
+ if (nfds_t > 0) {
256
+ if (nfds + nfds_t > max_fds) {
257
+ error_report("A maximum of %zu fds are allowed, "
258
+ "however got %zu fds now",
259
+ max_fds, nfds + nfds_t);
260
+ goto fail;
261
+ }
262
+ memcpy(vmsg->fds + nfds, fds_t,
263
+ nfds_t *sizeof(vmsg->fds[0]));
264
+ nfds += nfds_t;
265
+ g_free(fds_t);
266
+ }
267
+ if (read_bytes == VHOST_USER_HDR_SIZE || rc == 0) {
268
+ break;
269
+ }
270
+ iov.iov_base = (char *)vmsg + read_bytes;
271
+ iov.iov_len = VHOST_USER_HDR_SIZE - read_bytes;
272
+ } while (true);
273
+
274
+ vmsg->fd_num = nfds;
275
+ /* qio_channel_readv_full will make socket fds blocking, unblock them */
276
+ vmsg_unblock_fds(vmsg);
277
+ if (vmsg->size > sizeof(vmsg->payload)) {
278
+ error_report("Error: too big message request: %d, "
279
+ "size: vmsg->size: %u, "
280
+ "while sizeof(vmsg->payload) = %zu",
281
+ vmsg->request, vmsg->size, sizeof(vmsg->payload));
282
+ goto fail;
283
+ }
284
+
285
+ struct iovec iov_payload = {
286
+ .iov_base = (char *)&vmsg->payload,
287
+ .iov_len = vmsg->size,
288
+ };
289
+ if (vmsg->size) {
290
+ rc = qio_channel_readv_all_eof(ioc, &iov_payload, 1, &local_err);
291
+ if (rc == -1) {
292
+ error_report_err(local_err);
293
+ goto fail;
294
+ }
295
+ }
296
+
297
+ return true;
298
+
299
+fail:
300
+ vmsg_close_fds(vmsg);
301
+
302
+ return false;
303
+}
304
+
146
+
305
+
147
+
306
+static void vu_client_start(VuServer *server);
148
+(2) opening the first zone
307
+static coroutine_fn void vu_client_trip(void *opaque)
149
+report after:
308
+{
150
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:3, [type: 2]
309
+ VuServer *server = opaque;
310
+
151
+
311
+ while (!server->aio_context_changed && server->sioc) {
152
+opening the second zone
312
+ server->processing_msg = true;
153
+report after:
313
+ vu_dispatch(&server->vu_dev);
154
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:3, [type: 2]
314
+ server->processing_msg = false;
315
+ }
316
+
155
+
317
+ if (server->aio_context_changed && server->sioc) {
156
+opening the last zone
318
+ server->aio_context_changed = false;
157
+report after:
319
+ vu_client_start(server);
158
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:3, [type: 2]
320
+ }
321
+}
322
+
323
+static void vu_client_start(VuServer *server)
324
+{
325
+ server->co_trip = qemu_coroutine_create(vu_client_trip, server);
326
+ aio_co_enter(server->ctx, server->co_trip);
327
+}
328
+
329
+/*
330
+ * a wrapper for vu_kick_cb
331
+ *
332
+ * since aio_dispatch can only pass one user data pointer to the
333
+ * callback function, pack VuDev and pvt into a struct. Then unpack it
334
+ * and pass them to vu_kick_cb
335
+ */
336
+static void kick_handler(void *opaque)
337
+{
338
+ VuFdWatch *vu_fd_watch = opaque;
339
+ vu_fd_watch->processing = true;
340
+ vu_fd_watch->cb(vu_fd_watch->vu_dev, 0, vu_fd_watch->pvt);
341
+ vu_fd_watch->processing = false;
342
+}
343
+
159
+
344
+
160
+
345
+static VuFdWatch *find_vu_fd_watch(VuServer *server, int fd)
161
+(3) closing the first zone
346
+{
162
+report after:
163
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
347
+
164
+
348
+ VuFdWatch *vu_fd_watch, *next;
165
+closing the last zone
349
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
166
+report after:
350
+ if (vu_fd_watch->fd == fd) {
167
+start: 0x1f380000, len 0x80000, cap 0x80000, wptr 0x1f380000, zcond:1, [type: 2]
351
+ return vu_fd_watch;
352
+ }
353
+ }
354
+ return NULL;
355
+}
356
+
357
+static void
358
+set_watch(VuDev *vu_dev, int fd, int vu_evt,
359
+ vu_watch_cb cb, void *pvt)
360
+{
361
+
362
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
363
+ g_assert(vu_dev);
364
+ g_assert(fd >= 0);
365
+ g_assert(cb);
366
+
367
+ VuFdWatch *vu_fd_watch = find_vu_fd_watch(server, fd);
368
+
369
+ if (!vu_fd_watch) {
370
+ VuFdWatch *vu_fd_watch = g_new0(VuFdWatch, 1);
371
+
372
+ QTAILQ_INSERT_TAIL(&server->vu_fd_watches, vu_fd_watch, next);
373
+
374
+ vu_fd_watch->fd = fd;
375
+ vu_fd_watch->cb = cb;
376
+ qemu_set_nonblock(fd);
377
+ aio_set_fd_handler(server->ioc->ctx, fd, true, kick_handler,
378
+ NULL, NULL, vu_fd_watch);
379
+ vu_fd_watch->vu_dev = vu_dev;
380
+ vu_fd_watch->pvt = pvt;
381
+ }
382
+}
383
+
168
+
384
+
169
+
385
+static void remove_watch(VuDev *vu_dev, int fd)
170
+(4) finishing the second zone
386
+{
171
+After finishing a zone:
387
+ VuServer *server;
172
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:14, [type: 2]
388
+ g_assert(vu_dev);
389
+ g_assert(fd >= 0);
390
+
391
+ server = container_of(vu_dev, VuServer, vu_dev);
392
+
393
+ VuFdWatch *vu_fd_watch = find_vu_fd_watch(server, fd);
394
+
395
+ if (!vu_fd_watch) {
396
+ return;
397
+ }
398
+ aio_set_fd_handler(server->ioc->ctx, fd, true, NULL, NULL, NULL, NULL);
399
+
400
+ QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
401
+ g_free(vu_fd_watch);
402
+}
403
+
173
+
404
+
174
+
405
+static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
175
+(5) resetting the second zone
406
+ gpointer opaque)
176
+After resetting a zone:
407
+{
177
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
408
+ VuServer *server = opaque;
178
+*** done
409
+
410
+ if (server->sioc) {
411
+ warn_report("Only one vhost-user client is allowed to "
412
+ "connect the server one time");
413
+ return;
414
+ }
415
+
416
+ if (!vu_init(&server->vu_dev, server->max_queues, sioc->fd, panic_cb,
417
+ vu_message_read, set_watch, remove_watch, server->vu_iface)) {
418
+ error_report("Failed to initialize libvhost-user");
419
+ return;
420
+ }
421
+
422
+ /*
423
+ * Unset the callback function for network listener to make another
424
+ * vhost-user client keeping waiting until this client disconnects
425
+ */
426
+ qio_net_listener_set_client_func(server->listener,
427
+ NULL,
428
+ NULL,
429
+ NULL);
430
+ server->sioc = sioc;
431
+ /*
432
+ * Increase the object reference, so sioc will not freed by
433
+ * qio_net_listener_channel_func which will call object_unref(OBJECT(sioc))
434
+ */
435
+ object_ref(OBJECT(server->sioc));
436
+ qio_channel_set_name(QIO_CHANNEL(sioc), "vhost-user client");
437
+ server->ioc = QIO_CHANNEL(sioc);
438
+ object_ref(OBJECT(server->ioc));
439
+ qio_channel_attach_aio_context(server->ioc, server->ctx);
440
+ qio_channel_set_blocking(QIO_CHANNEL(server->sioc), false, NULL);
441
+ vu_client_start(server);
442
+}
443
+
444
+
445
+void vhost_user_server_stop(VuServer *server)
446
+{
447
+ if (server->sioc) {
448
+ close_client(server);
449
+ }
450
+
451
+ if (server->listener) {
452
+ qio_net_listener_disconnect(server->listener);
453
+ object_unref(OBJECT(server->listener));
454
+ }
455
+
456
+}
457
+
458
+void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx)
459
+{
460
+ VuFdWatch *vu_fd_watch, *next;
461
+ void *opaque = NULL;
462
+ IOHandler *io_read = NULL;
463
+ bool attach;
464
+
465
+ server->ctx = ctx ? ctx : qemu_get_aio_context();
466
+
467
+ if (!server->sioc) {
468
+ /* not yet serving any client*/
469
+ return;
470
+ }
471
+
472
+ if (ctx) {
473
+ qio_channel_attach_aio_context(server->ioc, ctx);
474
+ server->aio_context_changed = true;
475
+ io_read = kick_handler;
476
+ attach = true;
477
+ } else {
478
+ qio_channel_detach_aio_context(server->ioc);
479
+ /* server->ioc->ctx keeps the old AioConext */
480
+ ctx = server->ioc->ctx;
481
+ attach = false;
482
+ }
483
+
484
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
485
+ if (vu_fd_watch->cb) {
486
+ opaque = attach ? vu_fd_watch : NULL;
487
+ aio_set_fd_handler(ctx, vu_fd_watch->fd, true,
488
+ io_read, NULL, NULL,
489
+ opaque);
490
+ }
491
+ }
492
+}
493
+
494
+
495
+bool vhost_user_server_start(VuServer *server,
496
+ SocketAddress *socket_addr,
497
+ AioContext *ctx,
498
+ uint16_t max_queues,
499
+ DevicePanicNotifierFn *device_panic_notifier,
500
+ const VuDevIface *vu_iface,
501
+ Error **errp)
502
+{
503
+ QIONetListener *listener = qio_net_listener_new();
504
+ if (qio_net_listener_open_sync(listener, socket_addr, 1,
505
+ errp) < 0) {
506
+ object_unref(OBJECT(listener));
507
+ return false;
508
+ }
509
+
510
+ /* zero out unspecified fileds */
511
+ *server = (VuServer) {
512
+ .listener = listener,
513
+ .vu_iface = vu_iface,
514
+ .max_queues = max_queues,
515
+ .ctx = ctx,
516
+ .device_panic_notifier = device_panic_notifier,
517
+ };
518
+
519
+ qio_net_listener_set_name(server->listener, "vhost-user-backend-listener");
520
+
521
+ qio_net_listener_set_client_func(server->listener,
522
+ vu_accept,
523
+ server,
524
+ NULL);
525
+
526
+ QTAILQ_INIT(&server->vu_fd_watches);
527
+ return true;
528
+}
529
diff --git a/util/meson.build b/util/meson.build
530
index XXXXXXX..XXXXXXX 100644
531
--- a/util/meson.build
532
+++ b/util/meson.build
533
@@ -XXX,XX +XXX,XX @@ if have_block
534
util_ss.add(files('main-loop.c'))
535
util_ss.add(files('nvdimm-utils.c'))
536
util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
537
+ util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
538
util_ss.add(files('qemu-coroutine-sleep.c'))
539
util_ss.add(files('qemu-co-shared-resource.c'))
540
util_ss.add(files('thread-pool.c', 'qemu-timer.c'))
541
--
179
--
542
2.26.2
180
2.40.1
543
181
182
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Sam Li <faithilikerun@gmail.com>
2
2
3
These cases are fixed by previous patches around block_status and
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
4
is_allocated.
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
5
5
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
6
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
6
Acked-by: Kevin Wolf <kwolf@redhat.com>
7
Reviewed-by: Eric Blake <eblake@redhat.com>
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Reviewed-by: Alberto Garcia <berto@igalia.com>
8
Message-id: 20230508045533.175575-8-faithilikerun@gmail.com
9
Message-id: 20200924194003.22080-6-vsementsov@virtuozzo.com
9
Message-id: 20230324090605.28361-8-faithilikerun@gmail.com
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
11
---
12
tests/qemu-iotests/274 | 20 +++++++++++
12
block/file-posix.c | 3 +++
13
tests/qemu-iotests/274.out | 68 ++++++++++++++++++++++++++++++++++++++
13
block/trace-events | 2 ++
14
2 files changed, 88 insertions(+)
14
2 files changed, 5 insertions(+)
15
15
16
diff --git a/tests/qemu-iotests/274 b/tests/qemu-iotests/274
16
diff --git a/block/file-posix.c b/block/file-posix.c
17
index XXXXXXX..XXXXXXX 100755
18
--- a/tests/qemu-iotests/274
19
+++ b/tests/qemu-iotests/274
20
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('base') as base, \
21
iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, mid)
22
iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), mid)
23
24
+ iotests.log('=== Testing qemu-img commit (top -> base) ===')
25
+
26
+ create_chain()
27
+ iotests.qemu_img_log('commit', '-b', base, top)
28
+ iotests.img_info_log(base)
29
+ iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
30
+ iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), base)
31
+
32
+ iotests.log('=== Testing QMP active commit (top -> base) ===')
33
+
34
+ create_chain()
35
+ with create_vm() as vm:
36
+ vm.launch()
37
+ vm.qmp_log('block-commit', device='top', base_node='base',
38
+ job_id='job0', auto_dismiss=False)
39
+ vm.run_job('job0', wait=5)
40
+
41
+ iotests.img_info_log(mid)
42
+ iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
43
+ iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), base)
44
45
iotests.log('== Resize tests ==')
46
47
diff --git a/tests/qemu-iotests/274.out b/tests/qemu-iotests/274.out
48
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
49
--- a/tests/qemu-iotests/274.out
18
--- a/block/file-posix.c
50
+++ b/tests/qemu-iotests/274.out
19
+++ b/block/file-posix.c
51
@@ -XXX,XX +XXX,XX @@ read 1048576/1048576 bytes at offset 0
20
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
52
read 1048576/1048576 bytes at offset 1048576
21
},
53
1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
22
};
54
23
55
+=== Testing qemu-img commit (top -> base) ===
24
+ trace_zbd_zone_report(bs, *nr_zones, offset >> BDRV_SECTOR_BITS);
56
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 lazy_refcounts=off refcount_bits=16
25
return raw_thread_pool_submit(handle_aiocb_zone_report, &acb);
57
+
26
}
58
+Formatting 'TEST_DIR/PID-mid', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1048576 backing_file=TEST_DIR/PID-base backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
27
#endif
59
+
28
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
60
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 backing_file=TEST_DIR/PID-mid backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
29
},
61
+
30
};
62
+wrote 2097152/2097152 bytes at offset 0
31
63
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
32
+ trace_zbd_zone_mgmt(bs, op_name, offset >> BDRV_SECTOR_BITS,
64
+
33
+ len >> BDRV_SECTOR_BITS);
65
+Image committed.
34
ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb);
66
+
35
if (ret != 0) {
67
+image: TEST_IMG
36
error_report("ioctl %s failed %d", op_name, ret);
68
+file format: IMGFMT
37
diff --git a/block/trace-events b/block/trace-events
69
+virtual size: 2 MiB (2097152 bytes)
38
index XXXXXXX..XXXXXXX 100644
70
+cluster_size: 65536
39
--- a/block/trace-events
71
+Format specific information:
40
+++ b/block/trace-events
72
+ compat: 1.1
41
@@ -XXX,XX +XXX,XX @@ file_FindEjectableOpticalMedia(const char *media) "Matching using %s"
73
+ compression type: zlib
42
file_setup_cdrom(const char *partition) "Using %s as optical disc"
74
+ lazy refcounts: false
43
file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d"
75
+ refcount bits: 16
44
file_flush_fdatasync_failed(int err) "errno %d"
76
+ corrupt: false
45
+zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report %d zones starting at sector offset 0x%" PRIx64 ""
77
+ extended l2: false
46
+zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs %p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " sectors"
78
+
47
79
+read 1048576/1048576 bytes at offset 0
48
# ssh.c
80
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
49
sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
81
+
82
+read 1048576/1048576 bytes at offset 1048576
83
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
84
+
85
+=== Testing QMP active commit (top -> base) ===
86
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 lazy_refcounts=off refcount_bits=16
87
+
88
+Formatting 'TEST_DIR/PID-mid', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1048576 backing_file=TEST_DIR/PID-base backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
89
+
90
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 backing_file=TEST_DIR/PID-mid backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
91
+
92
+wrote 2097152/2097152 bytes at offset 0
93
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
94
+
95
+{"execute": "block-commit", "arguments": {"auto-dismiss": false, "base-node": "base", "device": "top", "job-id": "job0"}}
96
+{"return": {}}
97
+{"execute": "job-complete", "arguments": {"id": "job0"}}
98
+{"return": {}}
99
+{"data": {"device": "job0", "len": 1048576, "offset": 1048576, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
100
+{"data": {"device": "job0", "len": 1048576, "offset": 1048576, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
101
+{"execute": "job-dismiss", "arguments": {"id": "job0"}}
102
+{"return": {}}
103
+image: TEST_IMG
104
+file format: IMGFMT
105
+virtual size: 1 MiB (1048576 bytes)
106
+cluster_size: 65536
107
+backing file: TEST_DIR/PID-base
108
+backing file format: IMGFMT
109
+Format specific information:
110
+ compat: 1.1
111
+ compression type: zlib
112
+ lazy refcounts: false
113
+ refcount bits: 16
114
+ corrupt: false
115
+ extended l2: false
116
+
117
+read 1048576/1048576 bytes at offset 0
118
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
119
+
120
+read 1048576/1048576 bytes at offset 1048576
121
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
122
+
123
== Resize tests ==
124
=== preallocation=off ===
125
Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=6442450944 lazy_refcounts=off refcount_bits=16
126
--
50
--
127
2.26.2
51
2.40.1
128
diff view generated by jsdifflib
1
Block exports are used by softmmu, qemu-storage-daemon, and qemu-nbd.
1
From: Sam Li <faithilikerun@gmail.com>
2
They are not used by other programs and are not otherwise needed in
3
libblock.
4
2
5
Undo the recent move of blockdev-nbd.c from blockdev_ss into block_ss.
3
Add the documentation about the zoned device support to virtio-blk
6
Since bdrv_close_all() (libblock) calls blk_exp_close_all()
4
emulation.
7
(libblockdev) a stub function is required..
8
5
9
Make qemu-nbd.c use signal handling utility functions instead of
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
10
duplicating the code. This helps because os-posix.c is in libblockdev
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
11
and it depends on a qemu_system_killed() symbol that qemu-nbd.c lacks.
8
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
12
Once we use the signal handling utility functions we also end up
9
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
13
providing the necessary symbol.
10
Acked-by: Kevin Wolf <kwolf@redhat.com>
14
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
16
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
12
Message-id: 20230508045533.175575-9-faithilikerun@gmail.com
17
Reviewed-by: Eric Blake <eblake@redhat.com>
13
Message-id: 20230324090605.28361-9-faithilikerun@gmail.com
18
Message-id: 20200929125516.186715-4-stefanha@redhat.com
14
[Add index-api.rst to fix "zoned-storage.rst:document isn't included in
19
[Fixed s/ndb/nbd/ typo in commit description as suggested by Eric Blake
15
any toctree" error and fix pre-formatted code syntax.
20
--Stefan]
16
--Stefan]
21
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
22
---
18
---
23
qemu-nbd.c | 21 ++++++++-------------
19
docs/devel/index-api.rst | 1 +
24
stubs/blk-exp-close-all.c | 7 +++++++
20
docs/devel/zoned-storage.rst | 43 ++++++++++++++++++++++++++
25
block/export/meson.build | 4 ++--
21
docs/system/qemu-block-drivers.rst.inc | 6 ++++
26
meson.build | 4 ++--
22
3 files changed, 50 insertions(+)
27
nbd/meson.build | 2 ++
23
create mode 100644 docs/devel/zoned-storage.rst
28
stubs/meson.build | 1 +
29
6 files changed, 22 insertions(+), 17 deletions(-)
30
create mode 100644 stubs/blk-exp-close-all.c
31
24
32
diff --git a/qemu-nbd.c b/qemu-nbd.c
25
diff --git a/docs/devel/index-api.rst b/docs/devel/index-api.rst
33
index XXXXXXX..XXXXXXX 100644
26
index XXXXXXX..XXXXXXX 100644
34
--- a/qemu-nbd.c
27
--- a/docs/devel/index-api.rst
35
+++ b/qemu-nbd.c
28
+++ b/docs/devel/index-api.rst
36
@@ -XXX,XX +XXX,XX @@
29
@@ -XXX,XX +XXX,XX @@ generated from in-code annotations to function prototypes.
37
#include "qapi/error.h"
30
memory
38
#include "qemu/cutils.h"
31
modules
39
#include "sysemu/block-backend.h"
32
ui
40
+#include "sysemu/runstate.h" /* for qemu_system_killed() prototype */
33
+ zoned-storage
41
#include "block/block_int.h"
34
diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
42
#include "block/nbd.h"
43
#include "qemu/main-loop.h"
44
@@ -XXX,XX +XXX,XX @@ QEMU_COPYRIGHT "\n"
45
}
46
47
#ifdef CONFIG_POSIX
48
-static void termsig_handler(int signum)
49
+/*
50
+ * The client thread uses SIGTERM to interrupt the server. A signal
51
+ * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
52
+ */
53
+void qemu_system_killed(int signum, pid_t pid)
54
{
55
qatomic_cmpxchg(&state, RUNNING, TERMINATE);
56
qemu_notify_event();
57
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
58
BlockExportOptions *export_opts;
59
60
#ifdef CONFIG_POSIX
61
- /*
62
- * Exit gracefully on various signals, which includes SIGTERM used
63
- * by 'qemu-nbd -v -c'.
64
- */
65
- struct sigaction sa_sigterm;
66
- memset(&sa_sigterm, 0, sizeof(sa_sigterm));
67
- sa_sigterm.sa_handler = termsig_handler;
68
- sigaction(SIGTERM, &sa_sigterm, NULL);
69
- sigaction(SIGINT, &sa_sigterm, NULL);
70
- sigaction(SIGHUP, &sa_sigterm, NULL);
71
-
72
- signal(SIGPIPE, SIG_IGN);
73
+ os_setup_early_signal_handling();
74
+ os_setup_signal_handling();
75
#endif
76
77
socket_init();
78
diff --git a/stubs/blk-exp-close-all.c b/stubs/blk-exp-close-all.c
79
new file mode 100644
35
new file mode 100644
80
index XXXXXXX..XXXXXXX
36
index XXXXXXX..XXXXXXX
81
--- /dev/null
37
--- /dev/null
82
+++ b/stubs/blk-exp-close-all.c
38
+++ b/docs/devel/zoned-storage.rst
83
@@ -XXX,XX +XXX,XX @@
39
@@ -XXX,XX +XXX,XX @@
84
+#include "qemu/osdep.h"
40
+=============
85
+#include "block/export.h"
41
+zoned-storage
42
+=============
86
+
43
+
87
+/* Only used in programs that support block exports (libblockdev.fa) */
44
+Zoned Block Devices (ZBDs) divide the LBA space into block regions called zones
88
+void blk_exp_close_all(void)
45
+that are larger than the LBA size. They can only allow sequential writes, which
89
+{
46
+can reduce write amplification in SSDs, and potentially lead to higher
90
+}
47
+throughput and increased capacity. More details about ZBDs can be found at:
91
diff --git a/block/export/meson.build b/block/export/meson.build
48
+
49
+https://zonedstorage.io/docs/introduction/zoned-storage
50
+
51
+1. Block layer APIs for zoned storage
52
+-------------------------------------
53
+QEMU block layer supports three zoned storage models:
54
+- BLK_Z_HM: The host-managed zoned model only allows sequential writes access
55
+to zones. It supports ZBD-specific I/O commands that can be used by a host to
56
+manage the zones of a device.
57
+- BLK_Z_HA: The host-aware zoned model allows random write operations in
58
+zones, making it backward compatible with regular block devices.
59
+- BLK_Z_NONE: The non-zoned model has no zones support. It includes both
60
+regular and drive-managed ZBD devices. ZBD-specific I/O commands are not
61
+supported.
62
+
63
+The block device information resides inside BlockDriverState. QEMU uses
64
+BlockLimits struct(BlockDriverState::bl) that is continuously accessed by the
65
+block layer while processing I/O requests. A BlockBackend has a root pointer to
66
+a BlockDriverState graph(for example, raw format on top of file-posix). The
67
+zoned storage information can be propagated from the leaf BlockDriverState all
68
+the way up to the BlockBackend. If the zoned storage model in file-posix is
69
+set to BLK_Z_HM, then block drivers will declare support for zoned host device.
70
+
71
+The block layer APIs support commands needed for zoned storage devices,
72
+including report zones, four zone operations, and zone append.
73
+
74
+2. Emulating zoned storage controllers
75
+--------------------------------------
76
+When the BlockBackend's BlockLimits model reports a zoned storage device, users
77
+like the virtio-blk emulation or the qemu-io-cmds.c utility can use block layer
78
+APIs for zoned storage emulation or testing.
79
+
80
+For example, to test zone_report on a null_blk device using qemu-io is::
81
+
82
+ $ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0 -c "zrp offset nr_zones"
83
diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc
92
index XXXXXXX..XXXXXXX 100644
84
index XXXXXXX..XXXXXXX 100644
93
--- a/block/export/meson.build
85
--- a/docs/system/qemu-block-drivers.rst.inc
94
+++ b/block/export/meson.build
86
+++ b/docs/system/qemu-block-drivers.rst.inc
95
@@ -XXX,XX +XXX,XX @@
87
@@ -XXX,XX +XXX,XX @@ Hard disks
96
-block_ss.add(files('export.c'))
88
you may corrupt your host data (use the ``-snapshot`` command
97
-block_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
89
line option or modify the device permissions accordingly).
98
+blockdev_ss.add(files('export.c'))
90
99
+blockdev_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
91
+Zoned block devices
100
diff --git a/meson.build b/meson.build
92
+ Zoned block devices can be passed through to the guest if the emulated storage
101
index XXXXXXX..XXXXXXX 100644
93
+ controller supports zoned storage. Use ``--blockdev host_device,
102
--- a/meson.build
94
+ node-name=drive0,filename=/dev/nullb0,cache.direct=on`` to pass through
103
+++ b/meson.build
95
+ ``/dev/nullb0`` as ``drive0``.
104
@@ -XXX,XX +XXX,XX @@ subdir('dump')
96
+
105
97
Windows
106
block_ss.add(files(
98
^^^^^^^
107
'block.c',
99
108
- 'blockdev-nbd.c',
109
'blockjob.c',
110
'job.c',
111
'qemu-io-cmds.c',
112
@@ -XXX,XX +XXX,XX @@ subdir('block')
113
114
blockdev_ss.add(files(
115
'blockdev.c',
116
+ 'blockdev-nbd.c',
117
'iothread.c',
118
'job-qmp.c',
119
))
120
@@ -XXX,XX +XXX,XX @@ if have_tools
121
qemu_io = executable('qemu-io', files('qemu-io.c'),
122
dependencies: [block, qemuutil], install: true)
123
qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'),
124
- dependencies: [block, qemuutil], install: true)
125
+ dependencies: [blockdev, qemuutil], install: true)
126
127
subdir('storage-daemon')
128
subdir('contrib/rdmacm-mux')
129
diff --git a/nbd/meson.build b/nbd/meson.build
130
index XXXXXXX..XXXXXXX 100644
131
--- a/nbd/meson.build
132
+++ b/nbd/meson.build
133
@@ -XXX,XX +XXX,XX @@
134
block_ss.add(files(
135
'client.c',
136
'common.c',
137
+))
138
+blockdev_ss.add(files(
139
'server.c',
140
))
141
diff --git a/stubs/meson.build b/stubs/meson.build
142
index XXXXXXX..XXXXXXX 100644
143
--- a/stubs/meson.build
144
+++ b/stubs/meson.build
145
@@ -XXX,XX +XXX,XX @@
146
stub_ss.add(files('arch_type.c'))
147
stub_ss.add(files('bdrv-next-monitor-owned.c'))
148
stub_ss.add(files('blk-commit-all.c'))
149
+stub_ss.add(files('blk-exp-close-all.c'))
150
stub_ss.add(files('blockdev-close-all-bdrv-states.c'))
151
stub_ss.add(files('change-state-handler.c'))
152
stub_ss.add(files('cmos.c'))
153
--
100
--
154
2.26.2
101
2.40.1
155
diff view generated by jsdifflib
1
Use the new QAPI block exports API instead of defining our own QOM
1
From: Sam Li <faithilikerun@gmail.com>
2
objects.
3
2
4
This is a large change because the lifecycle of VuBlockDev needs to
3
Since Linux doesn't have a user API to issue zone append operations to
5
follow BlockExportDriver. QOM properties are replaced by QAPI options
4
zoned devices from user space, the file-posix driver is modified to add
6
objects.
5
zone append emulation using regular writes. To do this, the file-posix
6
driver tracks the wp location of all zones of the device. It uses an
7
array of uint64_t. The most significant bit of each wp location indicates
8
if the zone type is conventional zones.
7
9
8
VuBlockDev is renamed VuBlkExport and contains a BlockExport field.
10
The zones wp can be changed due to the following operations issued:
9
Several fields can be dropped since BlockExport already has equivalents.
11
- zone reset: change the wp to the start offset of that zone
12
- zone finish: change to the end location of that zone
13
- write to a zone
14
- zone append
10
15
11
The file names and meson build integration will be adjusted in a future
16
Signed-off-by: Sam Li <faithilikerun@gmail.com>
12
patch. libvhost-user should probably be built as a static library that
17
Message-id: 20230508051510.177850-2-faithilikerun@gmail.com
13
is linked into QEMU instead of as a .c file that results in duplicate
18
[Fix errno propagation from handle_aiocb_zone_mgmt()
14
compilation.
15
16
The new command-line syntax is:
17
18
$ qemu-storage-daemon \
19
--blockdev file,node-name=drive0,filename=test.img \
20
--export vhost-user-blk,node-name=drive0,id=export0,unix-socket=/tmp/vhost-user-blk.sock
21
22
Note that unix-socket is optional because we may wish to accept chardevs
23
too in the future.
24
25
Markus noted that supported address families are not explicit in the
26
QAPI schema. It is unlikely that support for more address families will
27
be added since file descriptor passing is required and few address
28
families support it. If a new address family needs to be added, then the
29
QAPI 'features' syntax can be used to advertize them.
30
31
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
32
Acked-by: Markus Armbruster <armbru@redhat.com>
33
Message-id: 20200924151549.913737-12-stefanha@redhat.com
34
[Skip test on big-endian host architectures because this device doesn't
35
support them yet (as already mentioned in a code comment).
36
--Stefan]
19
--Stefan]
37
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
20
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
38
---
21
---
39
qapi/block-export.json | 21 +-
22
include/block/block-common.h | 14 +++
40
block/export/vhost-user-blk-server.h | 23 +-
23
include/block/block_int-common.h | 5 +
41
block/export/export.c | 6 +
24
block/file-posix.c | 178 ++++++++++++++++++++++++++++++-
42
block/export/vhost-user-blk-server.c | 452 +++++++--------------------
25
3 files changed, 193 insertions(+), 4 deletions(-)
43
util/vhost-user-server.c | 10 +-
44
block/export/meson.build | 1 +
45
block/meson.build | 1 -
46
7 files changed, 156 insertions(+), 358 deletions(-)
47
26
48
diff --git a/qapi/block-export.json b/qapi/block-export.json
27
diff --git a/include/block/block-common.h b/include/block/block-common.h
49
index XXXXXXX..XXXXXXX 100644
28
index XXXXXXX..XXXXXXX 100644
50
--- a/qapi/block-export.json
29
--- a/include/block/block-common.h
51
+++ b/qapi/block-export.json
30
+++ b/include/block/block-common.h
52
@@ -XXX,XX +XXX,XX @@
31
@@ -XXX,XX +XXX,XX @@ typedef struct BlockZoneDescriptor {
53
'data': { '*name': 'str', '*description': 'str',
32
BlockZoneState state;
54
'*bitmap': 'str' } }
33
} BlockZoneDescriptor;
55
34
56
+##
35
+/*
57
+# @BlockExportOptionsVhostUserBlk:
36
+ * Track write pointers of a zone in bytes.
58
+#
37
+ */
59
+# A vhost-user-blk block export.
38
+typedef struct BlockZoneWps {
60
+#
39
+ CoMutex colock;
61
+# @addr: The vhost-user socket on which to listen. Both 'unix' and 'fd'
40
+ uint64_t wp[];
62
+# SocketAddress types are supported. Passed fds must be UNIX domain
41
+} BlockZoneWps;
63
+# sockets.
42
+
64
+# @logical-block-size: Logical block size in bytes. Defaults to 512 bytes.
43
typedef struct BlockDriverInfo {
65
+#
44
/* in bytes, 0 if irrelevant */
66
+# Since: 5.2
45
int cluster_size;
67
+##
46
@@ -XXX,XX +XXX,XX @@ typedef enum {
68
+{ 'struct': 'BlockExportOptionsVhostUserBlk',
47
#define BDRV_SECTOR_BITS 9
69
+ 'data': { 'addr': 'SocketAddress', '*logical-block-size': 'size' } }
48
#define BDRV_SECTOR_SIZE (1ULL << BDRV_SECTOR_BITS)
70
+
49
71
##
50
+/*
72
# @NbdServerAddOptions:
51
+ * Get the first most significant bit of wp. If it is zero, then
73
#
52
+ * the zone type is SWR.
74
@@ -XXX,XX +XXX,XX @@
53
+ */
75
# An enumeration of block export types
54
+#define BDRV_ZT_IS_CONV(wp) (wp & (1ULL << 63))
76
#
55
+
77
# @nbd: NBD export
56
#define BDRV_REQUEST_MAX_SECTORS MIN_CONST(SIZE_MAX >> BDRV_SECTOR_BITS, \
78
+# @vhost-user-blk: vhost-user-blk export (since 5.2)
57
INT_MAX >> BDRV_SECTOR_BITS)
79
#
58
#define BDRV_REQUEST_MAX_BYTES (BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS)
80
# Since: 4.2
59
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
81
##
82
{ 'enum': 'BlockExportType',
83
- 'data': [ 'nbd' ] }
84
+ 'data': [ 'nbd', 'vhost-user-blk' ] }
85
86
##
87
# @BlockExportOptions:
88
@@ -XXX,XX +XXX,XX @@
89
'*writethrough': 'bool' },
90
'discriminator': 'type',
91
'data': {
92
- 'nbd': 'BlockExportOptionsNbd'
93
+ 'nbd': 'BlockExportOptionsNbd',
94
+ 'vhost-user-blk': 'BlockExportOptionsVhostUserBlk'
95
} }
96
97
##
98
diff --git a/block/export/vhost-user-blk-server.h b/block/export/vhost-user-blk-server.h
99
index XXXXXXX..XXXXXXX 100644
60
index XXXXXXX..XXXXXXX 100644
100
--- a/block/export/vhost-user-blk-server.h
61
--- a/include/block/block_int-common.h
101
+++ b/block/export/vhost-user-blk-server.h
62
+++ b/include/block/block_int-common.h
102
@@ -XXX,XX +XXX,XX @@
63
@@ -XXX,XX +XXX,XX @@ typedef struct BlockLimits {
103
64
104
#ifndef VHOST_USER_BLK_SERVER_H
65
/* maximum number of active zones */
105
#define VHOST_USER_BLK_SERVER_H
66
uint32_t max_active_zones;
106
-#include "util/vhost-user-server.h"
67
+
107
68
+ uint32_t write_granularity;
108
-typedef struct VuBlockDev VuBlockDev;
69
} BlockLimits;
109
-#define TYPE_VHOST_USER_BLK_SERVER "vhost-user-blk-server"
70
110
-#define VHOST_USER_BLK_SERVER(obj) \
71
typedef struct BdrvOpBlocker BdrvOpBlocker;
111
- OBJECT_CHECK(VuBlockDev, obj, TYPE_VHOST_USER_BLK_SERVER)
72
@@ -XXX,XX +XXX,XX @@ struct BlockDriverState {
112
+#include "block/export.h"
73
CoMutex bsc_modify_lock;
113
74
/* Always non-NULL, but must only be dereferenced under an RCU read guard */
114
-/* vhost user block device */
75
BdrvBlockStatusCache *block_status_cache;
115
-struct VuBlockDev {
76
+
116
- Object parent_obj;
77
+ /* array of write pointers' location of each zone in the zoned device. */
117
- char *node_name;
78
+ BlockZoneWps *wps;
118
- SocketAddress *addr;
79
};
119
- AioContext *ctx;
80
120
- VuServer vu_server;
81
struct BlockBackendRootState {
121
- bool running;
82
diff --git a/block/file-posix.c b/block/file-posix.c
122
- uint32_t blk_size;
123
- BlockBackend *backend;
124
- QIOChannelSocket *sioc;
125
- QTAILQ_ENTRY(VuBlockDev) next;
126
- struct virtio_blk_config blkcfg;
127
- bool writable;
128
-};
129
+/* For block/export/export.c */
130
+extern const BlockExportDriver blk_exp_vhost_user_blk;
131
132
#endif /* VHOST_USER_BLK_SERVER_H */
133
diff --git a/block/export/export.c b/block/export/export.c
134
index XXXXXXX..XXXXXXX 100644
83
index XXXXXXX..XXXXXXX 100644
135
--- a/block/export/export.c
84
--- a/block/file-posix.c
136
+++ b/block/export/export.c
85
+++ b/block/file-posix.c
137
@@ -XXX,XX +XXX,XX @@
86
@@ -XXX,XX +XXX,XX @@ static int hdev_get_max_segments(int fd, struct stat *st)
138
#include "sysemu/block-backend.h"
87
}
139
#include "block/export.h"
88
140
#include "block/nbd.h"
89
#if defined(CONFIG_BLKZONED)
141
+#if CONFIG_LINUX
90
+/*
142
+#include "block/export/vhost-user-blk-server.h"
91
+ * If the reset_all flag is true, then the wps of zone whose state is
92
+ * not readonly or offline should be all reset to the start sector.
93
+ * Else, take the real wp of the device.
94
+ */
95
+static int get_zones_wp(BlockDriverState *bs, int fd, int64_t offset,
96
+ unsigned int nrz, bool reset_all)
97
+{
98
+ struct blk_zone *blkz;
99
+ size_t rep_size;
100
+ uint64_t sector = offset >> BDRV_SECTOR_BITS;
101
+ BlockZoneWps *wps = bs->wps;
102
+ unsigned int j = offset / bs->bl.zone_size;
103
+ unsigned int n = 0, i = 0;
104
+ int ret;
105
+ rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
106
+ g_autofree struct blk_zone_report *rep = NULL;
107
+
108
+ rep = g_malloc(rep_size);
109
+ blkz = (struct blk_zone *)(rep + 1);
110
+ while (n < nrz) {
111
+ memset(rep, 0, rep_size);
112
+ rep->sector = sector;
113
+ rep->nr_zones = nrz - n;
114
+
115
+ do {
116
+ ret = ioctl(fd, BLKREPORTZONE, rep);
117
+ } while (ret != 0 && errno == EINTR);
118
+ if (ret != 0) {
119
+ error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
120
+ fd, offset, errno);
121
+ return -errno;
122
+ }
123
+
124
+ if (!rep->nr_zones) {
125
+ break;
126
+ }
127
+
128
+ for (i = 0; i < rep->nr_zones; ++i, ++n, ++j) {
129
+ /*
130
+ * The wp tracking cares only about sequential writes required and
131
+ * sequential write preferred zones so that the wp can advance to
132
+ * the right location.
133
+ * Use the most significant bit of the wp location to indicate the
134
+ * zone type: 0 for SWR/SWP zones and 1 for conventional zones.
135
+ */
136
+ if (blkz[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
137
+ wps->wp[j] |= 1ULL << 63;
138
+ } else {
139
+ switch(blkz[i].cond) {
140
+ case BLK_ZONE_COND_FULL:
141
+ case BLK_ZONE_COND_READONLY:
142
+ /* Zone not writable */
143
+ wps->wp[j] = (blkz[i].start + blkz[i].len) << BDRV_SECTOR_BITS;
144
+ break;
145
+ case BLK_ZONE_COND_OFFLINE:
146
+ /* Zone not writable nor readable */
147
+ wps->wp[j] = (blkz[i].start) << BDRV_SECTOR_BITS;
148
+ break;
149
+ default:
150
+ if (reset_all) {
151
+ wps->wp[j] = blkz[i].start << BDRV_SECTOR_BITS;
152
+ } else {
153
+ wps->wp[j] = blkz[i].wp << BDRV_SECTOR_BITS;
154
+ }
155
+ break;
156
+ }
157
+ }
158
+ }
159
+ sector = blkz[i - 1].start + blkz[i - 1].len;
160
+ }
161
+
162
+ return 0;
163
+}
164
+
165
+static void update_zones_wp(BlockDriverState *bs, int fd, int64_t offset,
166
+ unsigned int nrz)
167
+{
168
+ if (get_zones_wp(bs, fd, offset, nrz, 0) < 0) {
169
+ error_report("update zone wp failed");
170
+ }
171
+}
172
+
173
static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
174
Error **errp)
175
{
176
+ BDRVRawState *s = bs->opaque;
177
BlockZoneModel zoned;
178
int ret;
179
180
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
181
if (ret > 0) {
182
bs->bl.max_append_sectors = ret >> BDRV_SECTOR_BITS;
183
}
184
+
185
+ ret = get_sysfs_long_val(st, "physical_block_size");
186
+ if (ret >= 0) {
187
+ bs->bl.write_granularity = ret;
188
+ }
189
+
190
+ /* The refresh_limits() function can be called multiple times. */
191
+ g_free(bs->wps);
192
+ bs->wps = g_malloc(sizeof(BlockZoneWps) +
193
+ sizeof(int64_t) * bs->bl.nr_zones);
194
+ ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 0);
195
+ if (ret < 0) {
196
+ error_setg_errno(errp, -ret, "report wps failed");
197
+ bs->wps = NULL;
198
+ return;
199
+ }
200
+ qemu_co_mutex_init(&bs->wps->colock);
201
}
202
#else /* !defined(CONFIG_BLKZONED) */
203
static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
204
@@ -XXX,XX +XXX,XX @@ static int handle_aiocb_zone_mgmt(void *opaque)
205
ret = ioctl(fd, aiocb->zone_mgmt.op, &range);
206
} while (ret != 0 && errno == EINTR);
207
208
- return ret;
209
+ return ret < 0 ? -errno : ret;
210
}
211
#endif
212
213
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
214
{
215
BDRVRawState *s = bs->opaque;
216
RawPosixAIOData acb;
217
+ int ret;
218
219
if (fd_open(bs) < 0)
220
return -EIO;
221
+#if defined(CONFIG_BLKZONED)
222
+ if (type & QEMU_AIO_WRITE && bs->wps) {
223
+ qemu_co_mutex_lock(&bs->wps->colock);
224
+ }
143
+#endif
225
+#endif
144
#include "qapi/error.h"
226
145
#include "qapi/qapi-commands-block-export.h"
227
/*
146
#include "qapi/qapi-events-block-export.h"
228
* When using O_DIRECT, the request must be aligned to be able to use
147
@@ -XXX,XX +XXX,XX @@
229
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
148
230
#ifdef CONFIG_LINUX_IO_URING
149
static const BlockExportDriver *blk_exp_drivers[] = {
231
} else if (s->use_linux_io_uring) {
150
&blk_exp_nbd,
232
assert(qiov->size == bytes);
151
+#if CONFIG_LINUX
233
- return luring_co_submit(bs, s->fd, offset, qiov, type);
152
+ &blk_exp_vhost_user_blk,
234
+ ret = luring_co_submit(bs, s->fd, offset, qiov, type);
235
+ goto out;
236
#endif
237
#ifdef CONFIG_LINUX_AIO
238
} else if (s->use_linux_aio) {
239
assert(qiov->size == bytes);
240
- return laio_co_submit(s->fd, offset, qiov, type, s->aio_max_batch);
241
+ ret = laio_co_submit(s->fd, offset, qiov, type,
242
+ s->aio_max_batch);
243
+ goto out;
244
#endif
245
}
246
247
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
248
};
249
250
assert(qiov->size == bytes);
251
- return raw_thread_pool_submit(handle_aiocb_rw, &acb);
252
+ ret = raw_thread_pool_submit(handle_aiocb_rw, &acb);
253
+ goto out; /* Avoid the compiler err of unused label */
254
+
255
+out:
256
+#if defined(CONFIG_BLKZONED)
257
+{
258
+ BlockZoneWps *wps = bs->wps;
259
+ if (ret == 0) {
260
+ if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) {
261
+ uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
262
+ if (!BDRV_ZT_IS_CONV(*wp)) {
263
+ /* Advance the wp if needed */
264
+ if (offset + bytes > *wp) {
265
+ *wp = offset + bytes;
266
+ }
267
+ }
268
+ }
269
+ } else {
270
+ if (type & QEMU_AIO_WRITE) {
271
+ update_zones_wp(bs, s->fd, 0, 1);
272
+ }
273
+ }
274
+
275
+ if (type & QEMU_AIO_WRITE && wps) {
276
+ qemu_co_mutex_unlock(&wps->colock);
277
+ }
278
+}
153
+#endif
279
+#endif
154
};
280
+ return ret;
155
156
/* Only accessed from the main thread */
157
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
158
index XXXXXXX..XXXXXXX 100644
159
--- a/block/export/vhost-user-blk-server.c
160
+++ b/block/export/vhost-user-blk-server.c
161
@@ -XXX,XX +XXX,XX @@
162
*/
163
#include "qemu/osdep.h"
164
#include "block/block.h"
165
+#include "contrib/libvhost-user/libvhost-user.h"
166
+#include "standard-headers/linux/virtio_blk.h"
167
+#include "util/vhost-user-server.h"
168
#include "vhost-user-blk-server.h"
169
#include "qapi/error.h"
170
#include "qom/object_interfaces.h"
171
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_inhdr {
172
unsigned char status;
173
};
174
175
-typedef struct VuBlockReq {
176
+typedef struct VuBlkReq {
177
VuVirtqElement elem;
178
int64_t sector_num;
179
size_t size;
180
@@ -XXX,XX +XXX,XX @@ typedef struct VuBlockReq {
181
struct virtio_blk_outhdr out;
182
VuServer *server;
183
struct VuVirtq *vq;
184
-} VuBlockReq;
185
+} VuBlkReq;
186
187
-static void vu_block_req_complete(VuBlockReq *req)
188
+/* vhost user block device */
189
+typedef struct {
190
+ BlockExport export;
191
+ VuServer vu_server;
192
+ uint32_t blk_size;
193
+ QIOChannelSocket *sioc;
194
+ struct virtio_blk_config blkcfg;
195
+ bool writable;
196
+} VuBlkExport;
197
+
198
+static void vu_blk_req_complete(VuBlkReq *req)
199
{
200
VuDev *vu_dev = &req->server->vu_dev;
201
202
@@ -XXX,XX +XXX,XX @@ static void vu_block_req_complete(VuBlockReq *req)
203
free(req);
204
}
281
}
205
282
206
-static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
283
static int coroutine_fn raw_co_preadv(BlockDriverState *bs, int64_t offset,
207
-{
284
@@ -XXX,XX +XXX,XX @@ static void raw_close(BlockDriverState *bs)
208
- return container_of(server, VuBlockDev, vu_server);
285
BDRVRawState *s = bs->opaque;
209
-}
286
210
-
287
if (s->fd >= 0) {
211
static int coroutine_fn
288
+#if defined(CONFIG_BLKZONED)
212
-vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
289
+ g_free(bs->wps);
213
- uint32_t iovcnt, uint32_t type)
290
+#endif
214
+vu_blk_discard_write_zeroes(BlockBackend *blk, struct iovec *iov,
291
qemu_close(s->fd);
215
+ uint32_t iovcnt, uint32_t type)
292
s->fd = -1;
216
{
293
}
217
struct virtio_blk_discard_write_zeroes desc;
294
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
218
ssize_t size = iov_to_buf(iov, iovcnt, 0, &desc, sizeof(desc));
295
const char *op_name;
219
@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
296
unsigned long zo;
297
int ret;
298
+ BlockZoneWps *wps = bs->wps;
299
int64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
300
301
zone_size = bs->bl.zone_size;
302
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
220
return -EINVAL;
303
return -EINVAL;
221
}
304
}
222
305
223
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
306
+ uint32_t i = offset / bs->bl.zone_size;
224
uint64_t range[2] = { le64_to_cpu(desc.sector) << 9,
307
+ uint32_t nrz = len / bs->bl.zone_size;
225
le32_to_cpu(desc.num_sectors) << 9 };
308
+ uint64_t *wp = &wps->wp[i];
226
if (type == VIRTIO_BLK_T_DISCARD) {
309
+ if (BDRV_ZT_IS_CONV(*wp) && len != capacity) {
227
- if (blk_co_pdiscard(vdev_blk->backend, range[0], range[1]) == 0) {
310
+ error_report("zone mgmt operations are not allowed for conventional zones");
228
+ if (blk_co_pdiscard(blk, range[0], range[1]) == 0) {
311
+ return -EIO;
229
return 0;
312
+ }
230
}
313
+
231
} else if (type == VIRTIO_BLK_T_WRITE_ZEROES) {
314
switch (op) {
232
- if (blk_co_pwrite_zeroes(vdev_blk->backend,
315
case BLK_ZO_OPEN:
233
- range[0], range[1], 0) == 0) {
316
op_name = "BLKOPENZONE";
234
+ if (blk_co_pwrite_zeroes(blk, range[0], range[1], 0) == 0) {
317
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
235
return 0;
318
len >> BDRV_SECTOR_BITS);
236
}
319
ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, &acb);
237
}
320
if (ret != 0) {
238
@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
321
+ update_zones_wp(bs, s->fd, offset, i);
239
return -EINVAL;
322
error_report("ioctl %s failed %d", op_name, ret);
240
}
323
+ return ret;
241
324
+ }
242
-static int coroutine_fn vu_block_flush(VuBlockReq *req)
325
+
243
+static void coroutine_fn vu_blk_virtio_process_req(void *opaque)
326
+ if (zo == BLKRESETZONE && len == capacity) {
244
{
327
+ ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 1);
245
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
328
+ if (ret < 0) {
246
- BlockBackend *backend = vdev_blk->backend;
329
+ error_report("reporting single wp failed");
247
- return blk_co_flush(backend);
330
+ return ret;
248
-}
331
+ }
249
-
332
+ } else if (zo == BLKRESETZONE) {
250
-static void coroutine_fn vu_block_virtio_process_req(void *opaque)
333
+ for (unsigned int j = 0; j < nrz; ++j) {
251
-{
334
+ wp[j] = offset + j * zone_size;
252
- VuBlockReq *req = opaque;
335
+ }
253
+ VuBlkReq *req = opaque;
336
+ } else if (zo == BLKFINISHZONE) {
254
VuServer *server = req->server;
337
+ for (unsigned int j = 0; j < nrz; ++j) {
255
VuVirtqElement *elem = &req->elem;
338
+ /* The zoned device allows the last zone smaller that the
256
uint32_t type;
339
+ * zone size. */
257
340
+ wp[j] = MIN(offset + (j + 1) * zone_size, offset + len);
258
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
341
+ }
259
- BlockBackend *backend = vdev_blk->backend;
342
}
260
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
343
261
+ BlockBackend *blk = vexp->export.blk;
344
return ret;
262
263
struct iovec *in_iov = elem->in_sg;
264
struct iovec *out_iov = elem->out_sg;
265
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
266
bool is_write = type & VIRTIO_BLK_T_OUT;
267
req->sector_num = le64_to_cpu(req->out.sector);
268
269
- int64_t offset = req->sector_num * vdev_blk->blk_size;
270
+ if (is_write && !vexp->writable) {
271
+ req->in->status = VIRTIO_BLK_S_IOERR;
272
+ break;
273
+ }
274
+
275
+ int64_t offset = req->sector_num * vexp->blk_size;
276
QEMUIOVector qiov;
277
if (is_write) {
278
qemu_iovec_init_external(&qiov, out_iov, out_num);
279
- ret = blk_co_pwritev(backend, offset, qiov.size,
280
- &qiov, 0);
281
+ ret = blk_co_pwritev(blk, offset, qiov.size, &qiov, 0);
282
} else {
283
qemu_iovec_init_external(&qiov, in_iov, in_num);
284
- ret = blk_co_preadv(backend, offset, qiov.size,
285
- &qiov, 0);
286
+ ret = blk_co_preadv(blk, offset, qiov.size, &qiov, 0);
287
}
288
if (ret >= 0) {
289
req->in->status = VIRTIO_BLK_S_OK;
290
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
291
break;
292
}
293
case VIRTIO_BLK_T_FLUSH:
294
- if (vu_block_flush(req) == 0) {
295
+ if (blk_co_flush(blk) == 0) {
296
req->in->status = VIRTIO_BLK_S_OK;
297
} else {
298
req->in->status = VIRTIO_BLK_S_IOERR;
299
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
300
case VIRTIO_BLK_T_DISCARD:
301
case VIRTIO_BLK_T_WRITE_ZEROES: {
302
int rc;
303
- rc = vu_block_discard_write_zeroes(req, &elem->out_sg[1],
304
- out_num, type);
305
+
306
+ if (!vexp->writable) {
307
+ req->in->status = VIRTIO_BLK_S_IOERR;
308
+ break;
309
+ }
310
+
311
+ rc = vu_blk_discard_write_zeroes(blk, &elem->out_sg[1], out_num, type);
312
if (rc == 0) {
313
req->in->status = VIRTIO_BLK_S_OK;
314
} else {
315
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
316
break;
317
}
318
319
- vu_block_req_complete(req);
320
+ vu_blk_req_complete(req);
321
return;
322
323
err:
324
- free(elem);
325
+ free(req);
326
}
327
328
-static void vu_block_process_vq(VuDev *vu_dev, int idx)
329
+static void vu_blk_process_vq(VuDev *vu_dev, int idx)
330
{
331
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
332
VuVirtq *vq = vu_get_queue(vu_dev, idx);
333
334
while (1) {
335
- VuBlockReq *req;
336
+ VuBlkReq *req;
337
338
- req = vu_queue_pop(vu_dev, vq, sizeof(VuBlockReq));
339
+ req = vu_queue_pop(vu_dev, vq, sizeof(VuBlkReq));
340
if (!req) {
341
break;
342
}
343
@@ -XXX,XX +XXX,XX @@ static void vu_block_process_vq(VuDev *vu_dev, int idx)
344
req->vq = vq;
345
346
Coroutine *co =
347
- qemu_coroutine_create(vu_block_virtio_process_req, req);
348
+ qemu_coroutine_create(vu_blk_virtio_process_req, req);
349
qemu_coroutine_enter(co);
350
}
351
}
352
353
-static void vu_block_queue_set_started(VuDev *vu_dev, int idx, bool started)
354
+static void vu_blk_queue_set_started(VuDev *vu_dev, int idx, bool started)
355
{
356
VuVirtq *vq;
357
358
assert(vu_dev);
359
360
vq = vu_get_queue(vu_dev, idx);
361
- vu_set_queue_handler(vu_dev, vq, started ? vu_block_process_vq : NULL);
362
+ vu_set_queue_handler(vu_dev, vq, started ? vu_blk_process_vq : NULL);
363
}
364
365
-static uint64_t vu_block_get_features(VuDev *dev)
366
+static uint64_t vu_blk_get_features(VuDev *dev)
367
{
368
uint64_t features;
369
VuServer *server = container_of(dev, VuServer, vu_dev);
370
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
371
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
372
features = 1ull << VIRTIO_BLK_F_SIZE_MAX |
373
1ull << VIRTIO_BLK_F_SEG_MAX |
374
1ull << VIRTIO_BLK_F_TOPOLOGY |
375
@@ -XXX,XX +XXX,XX @@ static uint64_t vu_block_get_features(VuDev *dev)
376
1ull << VIRTIO_RING_F_EVENT_IDX |
377
1ull << VHOST_USER_F_PROTOCOL_FEATURES;
378
379
- if (!vdev_blk->writable) {
380
+ if (!vexp->writable) {
381
features |= 1ull << VIRTIO_BLK_F_RO;
382
}
383
384
return features;
385
}
386
387
-static uint64_t vu_block_get_protocol_features(VuDev *dev)
388
+static uint64_t vu_blk_get_protocol_features(VuDev *dev)
389
{
390
return 1ull << VHOST_USER_PROTOCOL_F_CONFIG |
391
1ull << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD;
392
}
393
394
static int
395
-vu_block_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
396
+vu_blk_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
397
{
398
+ /* TODO blkcfg must be little-endian for VIRTIO 1.0 */
399
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
400
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
401
- memcpy(config, &vdev_blk->blkcfg, len);
402
-
403
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
404
+ memcpy(config, &vexp->blkcfg, len);
405
return 0;
406
}
407
408
static int
409
-vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
410
+vu_blk_set_config(VuDev *vu_dev, const uint8_t *data,
411
uint32_t offset, uint32_t size, uint32_t flags)
412
{
413
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
414
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
415
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
416
uint8_t wce;
417
418
/* don't support live migration */
419
@@ -XXX,XX +XXX,XX @@ vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
420
}
421
422
wce = *data;
423
- vdev_blk->blkcfg.wce = wce;
424
- blk_set_enable_write_cache(vdev_blk->backend, wce);
425
+ vexp->blkcfg.wce = wce;
426
+ blk_set_enable_write_cache(vexp->export.blk, wce);
427
return 0;
428
}
429
430
@@ -XXX,XX +XXX,XX @@ vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
431
* of vu_process_message.
432
*
433
*/
434
-static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
435
+static int vu_blk_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
436
{
437
if (vmsg->request == VHOST_USER_NONE) {
438
dev->panic(dev, "disconnect");
439
@@ -XXX,XX +XXX,XX @@ static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
440
return false;
441
}
442
443
-static const VuDevIface vu_block_iface = {
444
- .get_features = vu_block_get_features,
445
- .queue_set_started = vu_block_queue_set_started,
446
- .get_protocol_features = vu_block_get_protocol_features,
447
- .get_config = vu_block_get_config,
448
- .set_config = vu_block_set_config,
449
- .process_msg = vu_block_process_msg,
450
+static const VuDevIface vu_blk_iface = {
451
+ .get_features = vu_blk_get_features,
452
+ .queue_set_started = vu_blk_queue_set_started,
453
+ .get_protocol_features = vu_blk_get_protocol_features,
454
+ .get_config = vu_blk_get_config,
455
+ .set_config = vu_blk_set_config,
456
+ .process_msg = vu_blk_process_msg,
457
};
458
459
static void blk_aio_attached(AioContext *ctx, void *opaque)
460
{
461
- VuBlockDev *vub_dev = opaque;
462
- vhost_user_server_attach_aio_context(&vub_dev->vu_server, ctx);
463
+ VuBlkExport *vexp = opaque;
464
+ vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
465
}
466
467
static void blk_aio_detach(void *opaque)
468
{
469
- VuBlockDev *vub_dev = opaque;
470
- vhost_user_server_detach_aio_context(&vub_dev->vu_server);
471
+ VuBlkExport *vexp = opaque;
472
+ vhost_user_server_detach_aio_context(&vexp->vu_server);
473
}
474
475
static void
476
-vu_block_initialize_config(BlockDriverState *bs,
477
+vu_blk_initialize_config(BlockDriverState *bs,
478
struct virtio_blk_config *config, uint32_t blk_size)
479
{
480
config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
481
@@ -XXX,XX +XXX,XX @@ vu_block_initialize_config(BlockDriverState *bs,
482
config->max_write_zeroes_seg = 1;
483
}
484
485
-static VuBlockDev *vu_block_init(VuBlockDev *vu_block_device, Error **errp)
486
+static void vu_blk_exp_request_shutdown(BlockExport *exp)
487
{
488
+ VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
489
490
- BlockBackend *blk;
491
- Error *local_error = NULL;
492
- const char *node_name = vu_block_device->node_name;
493
- bool writable = vu_block_device->writable;
494
- uint64_t perm = BLK_PERM_CONSISTENT_READ;
495
- int ret;
496
-
497
- AioContext *ctx;
498
-
499
- BlockDriverState *bs = bdrv_lookup_bs(node_name, node_name, &local_error);
500
-
501
- if (!bs) {
502
- error_propagate(errp, local_error);
503
- return NULL;
504
- }
505
-
506
- if (bdrv_is_read_only(bs)) {
507
- writable = false;
508
- }
509
-
510
- if (writable) {
511
- perm |= BLK_PERM_WRITE;
512
- }
513
-
514
- ctx = bdrv_get_aio_context(bs);
515
- aio_context_acquire(ctx);
516
- bdrv_invalidate_cache(bs, NULL);
517
- aio_context_release(ctx);
518
-
519
- /*
520
- * Don't allow resize while the vhost user server is running,
521
- * otherwise we don't care what happens with the node.
522
- */
523
- blk = blk_new(bdrv_get_aio_context(bs), perm,
524
- BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
525
- BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD);
526
- ret = blk_insert_bs(blk, bs, errp);
527
-
528
- if (ret < 0) {
529
- goto fail;
530
- }
531
-
532
- blk_set_enable_write_cache(blk, false);
533
-
534
- blk_set_allow_aio_context_change(blk, true);
535
-
536
- vu_block_device->blkcfg.wce = 0;
537
- vu_block_device->backend = blk;
538
- if (!vu_block_device->blk_size) {
539
- vu_block_device->blk_size = BDRV_SECTOR_SIZE;
540
- }
541
- vu_block_device->blkcfg.blk_size = vu_block_device->blk_size;
542
- blk_set_guest_block_size(blk, vu_block_device->blk_size);
543
- vu_block_initialize_config(bs, &vu_block_device->blkcfg,
544
- vu_block_device->blk_size);
545
- return vu_block_device;
546
-
547
-fail:
548
- blk_unref(blk);
549
- return NULL;
550
-}
551
-
552
-static void vu_block_deinit(VuBlockDev *vu_block_device)
553
-{
554
- if (vu_block_device->backend) {
555
- blk_remove_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
556
- blk_aio_detach, vu_block_device);
557
- }
558
-
559
- blk_unref(vu_block_device->backend);
560
-}
561
-
562
-static void vhost_user_blk_server_stop(VuBlockDev *vu_block_device)
563
-{
564
- vhost_user_server_stop(&vu_block_device->vu_server);
565
- vu_block_deinit(vu_block_device);
566
-}
567
-
568
-static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
569
- Error **errp)
570
-{
571
- AioContext *ctx;
572
- SocketAddress *addr = vu_block_device->addr;
573
-
574
- if (!vu_block_init(vu_block_device, errp)) {
575
- return;
576
- }
577
-
578
- ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
579
-
580
- if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
581
- VHOST_USER_BLK_MAX_QUEUES, &vu_block_iface,
582
- errp)) {
583
- goto error;
584
- }
585
-
586
- blk_add_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
587
- blk_aio_detach, vu_block_device);
588
- vu_block_device->running = true;
589
- return;
590
-
591
- error:
592
- vu_block_deinit(vu_block_device);
593
-}
594
-
595
-static bool vu_prop_modifiable(VuBlockDev *vus, Error **errp)
596
-{
597
- if (vus->running) {
598
- error_setg(errp, "The property can't be modified "
599
- "while the server is running");
600
- return false;
601
- }
602
- return true;
603
-}
604
-
605
-static void vu_set_node_name(Object *obj, const char *value, Error **errp)
606
-{
607
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
608
-
609
- if (!vu_prop_modifiable(vus, errp)) {
610
- return;
611
- }
612
-
613
- if (vus->node_name) {
614
- g_free(vus->node_name);
615
- }
616
-
617
- vus->node_name = g_strdup(value);
618
-}
619
-
620
-static char *vu_get_node_name(Object *obj, Error **errp)
621
-{
622
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
623
- return g_strdup(vus->node_name);
624
-}
625
-
626
-static void free_socket_addr(SocketAddress *addr)
627
-{
628
- g_free(addr->u.q_unix.path);
629
- g_free(addr);
630
-}
631
-
632
-static void vu_set_unix_socket(Object *obj, const char *value,
633
- Error **errp)
634
-{
635
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
636
-
637
- if (!vu_prop_modifiable(vus, errp)) {
638
- return;
639
- }
640
-
641
- if (vus->addr) {
642
- free_socket_addr(vus->addr);
643
- }
644
-
645
- SocketAddress *addr = g_new0(SocketAddress, 1);
646
- addr->type = SOCKET_ADDRESS_TYPE_UNIX;
647
- addr->u.q_unix.path = g_strdup(value);
648
- vus->addr = addr;
649
+ vhost_user_server_stop(&vexp->vu_server);
650
}
651
652
-static char *vu_get_unix_socket(Object *obj, Error **errp)
653
+static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
654
+ Error **errp)
655
{
656
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
657
- return g_strdup(vus->addr->u.q_unix.path);
658
-}
659
-
660
-static bool vu_get_block_writable(Object *obj, Error **errp)
661
-{
662
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
663
- return vus->writable;
664
-}
665
-
666
-static void vu_set_block_writable(Object *obj, bool value, Error **errp)
667
-{
668
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
669
-
670
- if (!vu_prop_modifiable(vus, errp)) {
671
- return;
672
- }
673
-
674
- vus->writable = value;
675
-}
676
-
677
-static void vu_get_blk_size(Object *obj, Visitor *v, const char *name,
678
- void *opaque, Error **errp)
679
-{
680
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
681
- uint32_t value = vus->blk_size;
682
-
683
- visit_type_uint32(v, name, &value, errp);
684
-}
685
-
686
-static void vu_set_blk_size(Object *obj, Visitor *v, const char *name,
687
- void *opaque, Error **errp)
688
-{
689
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
690
-
691
+ VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
692
+ BlockExportOptionsVhostUserBlk *vu_opts = &opts->u.vhost_user_blk;
693
Error *local_err = NULL;
694
- uint32_t value;
695
+ uint64_t logical_block_size;
696
697
- if (!vu_prop_modifiable(vus, errp)) {
698
- return;
699
- }
700
+ vexp->writable = opts->writable;
701
+ vexp->blkcfg.wce = 0;
702
703
- visit_type_uint32(v, name, &value, &local_err);
704
- if (local_err) {
705
- goto out;
706
+ if (vu_opts->has_logical_block_size) {
707
+ logical_block_size = vu_opts->logical_block_size;
708
+ } else {
709
+ logical_block_size = BDRV_SECTOR_SIZE;
710
}
711
-
712
- check_block_size(object_get_typename(obj), name, value, &local_err);
713
+ check_block_size(exp->id, "logical-block-size", logical_block_size,
714
+ &local_err);
715
if (local_err) {
716
- goto out;
717
+ error_propagate(errp, local_err);
718
+ return -EINVAL;
719
+ }
720
+ vexp->blk_size = logical_block_size;
721
+ blk_set_guest_block_size(exp->blk, logical_block_size);
722
+ vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
723
+ logical_block_size);
724
+
725
+ blk_set_allow_aio_context_change(exp->blk, true);
726
+ blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
727
+ vexp);
728
+
729
+ if (!vhost_user_server_start(&vexp->vu_server, vu_opts->addr, exp->ctx,
730
+ VHOST_USER_BLK_MAX_QUEUES, &vu_blk_iface,
731
+ errp)) {
732
+ blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
733
+ blk_aio_detach, vexp);
734
+ return -EADDRNOTAVAIL;
735
}
736
737
- vus->blk_size = value;
738
-
739
-out:
740
- error_propagate(errp, local_err);
741
-}
742
-
743
-static void vhost_user_blk_server_instance_finalize(Object *obj)
744
-{
745
- VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
746
-
747
- vhost_user_blk_server_stop(vub);
748
-
749
- /*
750
- * Unlike object_property_add_str, object_class_property_add_str
751
- * doesn't have a release method. Thus manual memory freeing is
752
- * needed.
753
- */
754
- free_socket_addr(vub->addr);
755
- g_free(vub->node_name);
756
-}
757
-
758
-static void vhost_user_blk_server_complete(UserCreatable *obj, Error **errp)
759
-{
760
- VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
761
-
762
- vhost_user_blk_server_start(vub, errp);
763
+ return 0;
764
}
765
766
-static void vhost_user_blk_server_class_init(ObjectClass *klass,
767
- void *class_data)
768
+static void vu_blk_exp_delete(BlockExport *exp)
769
{
770
- UserCreatableClass *ucc = USER_CREATABLE_CLASS(klass);
771
- ucc->complete = vhost_user_blk_server_complete;
772
-
773
- object_class_property_add_bool(klass, "writable",
774
- vu_get_block_writable,
775
- vu_set_block_writable);
776
-
777
- object_class_property_add_str(klass, "node-name",
778
- vu_get_node_name,
779
- vu_set_node_name);
780
-
781
- object_class_property_add_str(klass, "unix-socket",
782
- vu_get_unix_socket,
783
- vu_set_unix_socket);
784
+ VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
785
786
- object_class_property_add(klass, "logical-block-size", "uint32",
787
- vu_get_blk_size, vu_set_blk_size,
788
- NULL, NULL);
789
+ blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
790
+ vexp);
791
}
792
793
-static const TypeInfo vhost_user_blk_server_info = {
794
- .name = TYPE_VHOST_USER_BLK_SERVER,
795
- .parent = TYPE_OBJECT,
796
- .instance_size = sizeof(VuBlockDev),
797
- .instance_finalize = vhost_user_blk_server_instance_finalize,
798
- .class_init = vhost_user_blk_server_class_init,
799
- .interfaces = (InterfaceInfo[]) {
800
- {TYPE_USER_CREATABLE},
801
- {}
802
- },
803
+const BlockExportDriver blk_exp_vhost_user_blk = {
804
+ .type = BLOCK_EXPORT_TYPE_VHOST_USER_BLK,
805
+ .instance_size = sizeof(VuBlkExport),
806
+ .create = vu_blk_exp_create,
807
+ .delete = vu_blk_exp_delete,
808
+ .request_shutdown = vu_blk_exp_request_shutdown,
809
};
810
-
811
-static void vhost_user_blk_server_register_types(void)
812
-{
813
- type_register_static(&vhost_user_blk_server_info);
814
-}
815
-
816
-type_init(vhost_user_blk_server_register_types)
817
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
818
index XXXXXXX..XXXXXXX 100644
819
--- a/util/vhost-user-server.c
820
+++ b/util/vhost-user-server.c
821
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
822
Error **errp)
823
{
824
QEMUBH *bh;
825
- QIONetListener *listener = qio_net_listener_new();
826
+ QIONetListener *listener;
827
+
828
+ if (socket_addr->type != SOCKET_ADDRESS_TYPE_UNIX &&
829
+ socket_addr->type != SOCKET_ADDRESS_TYPE_FD) {
830
+ error_setg(errp, "Only socket address types 'unix' and 'fd' are supported");
831
+ return false;
832
+ }
833
+
834
+ listener = qio_net_listener_new();
835
if (qio_net_listener_open_sync(listener, socket_addr, 1,
836
errp) < 0) {
837
object_unref(OBJECT(listener));
838
diff --git a/block/export/meson.build b/block/export/meson.build
839
index XXXXXXX..XXXXXXX 100644
840
--- a/block/export/meson.build
841
+++ b/block/export/meson.build
842
@@ -1 +1,2 @@
843
block_ss.add(files('export.c'))
844
+block_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-blk-server.c', '../../contrib/libvhost-user/libvhost-user.c'))
845
diff --git a/block/meson.build b/block/meson.build
846
index XXXXXXX..XXXXXXX 100644
847
--- a/block/meson.build
848
+++ b/block/meson.build
849
@@ -XXX,XX +XXX,XX @@ block_ss.add(when: 'CONFIG_WIN32', if_true: files('file-win32.c', 'win32-aio.c')
850
block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, iokit])
851
block_ss.add(when: 'CONFIG_LIBISCSI', if_true: files('iscsi-opts.c'))
852
block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c'))
853
-block_ss.add(when: 'CONFIG_LINUX', if_true: files('export/vhost-user-blk-server.c', '../contrib/libvhost-user/libvhost-user.c'))
854
block_ss.add(when: 'CONFIG_REPLICATION', if_true: files('replication.c'))
855
block_ss.add(when: 'CONFIG_SHEEPDOG', if_true: files('sheepdog.c'))
856
block_ss.add(when: ['CONFIG_LINUX_AIO', libaio], if_true: files('linux-aio.c'))
857
--
345
--
858
2.26.2
346
2.40.1
859
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Sam Li <faithilikerun@gmail.com>
2
2
3
bdrv_is_allocated_above wrongly handles short backing files: it reports
3
A zone append command is a write operation that specifies the first
4
after-EOF space as UNALLOCATED which is wrong, as on read the data is
4
logical block of a zone as the write position. When writing to a zoned
5
generated on the level of short backing file (if all overlays have
5
block device using zone append, the byte offset of the call may point at
6
unallocated areas at that place).
6
any position within the zone to which the data is being appended. Upon
7
completion the device will respond with the position where the data has
8
been written in the zone.
7
9
8
Reusing bdrv_common_block_status_above fixes the issue and unifies code
10
Signed-off-by: Sam Li <faithilikerun@gmail.com>
9
path.
11
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
10
12
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
13
Message-id: 20230508051510.177850-3-faithilikerun@gmail.com
12
Reviewed-by: Eric Blake <eblake@redhat.com>
13
Reviewed-by: Alberto Garcia <berto@igalia.com>
14
Message-id: 20200924194003.22080-5-vsementsov@virtuozzo.com
15
[Fix s/has/have/ as suggested by Eric Blake. Fix s/area/areas/.
16
--Stefan]
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
18
---
15
---
19
block/io.c | 43 +++++--------------------------------------
16
include/block/block-io.h | 4 ++
20
1 file changed, 5 insertions(+), 38 deletions(-)
17
include/block/block_int-common.h | 3 ++
18
include/block/raw-aio.h | 4 +-
19
include/sysemu/block-backend-io.h | 9 +++++
20
block/block-backend.c | 61 +++++++++++++++++++++++++++++++
21
block/file-posix.c | 58 +++++++++++++++++++++++++----
22
block/io.c | 27 ++++++++++++++
23
block/io_uring.c | 4 ++
24
block/linux-aio.c | 3 ++
25
block/raw-format.c | 8 ++++
26
10 files changed, 173 insertions(+), 8 deletions(-)
21
27
28
diff --git a/include/block/block-io.h b/include/block/block-io.h
29
index XXXXXXX..XXXXXXX 100644
30
--- a/include/block/block-io.h
31
+++ b/include/block/block-io.h
32
@@ -XXX,XX +XXX,XX @@ int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_report(BlockDriverState *bs,
33
int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_mgmt(BlockDriverState *bs,
34
BlockZoneOp op,
35
int64_t offset, int64_t len);
36
+int coroutine_fn GRAPH_RDLOCK bdrv_co_zone_append(BlockDriverState *bs,
37
+ int64_t *offset,
38
+ QEMUIOVector *qiov,
39
+ BdrvRequestFlags flags);
40
41
bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
42
int bdrv_block_status(BlockDriverState *bs, int64_t offset,
43
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
44
index XXXXXXX..XXXXXXX 100644
45
--- a/include/block/block_int-common.h
46
+++ b/include/block/block_int-common.h
47
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
48
BlockZoneDescriptor *zones);
49
int coroutine_fn (*bdrv_co_zone_mgmt)(BlockDriverState *bs, BlockZoneOp op,
50
int64_t offset, int64_t len);
51
+ int coroutine_fn (*bdrv_co_zone_append)(BlockDriverState *bs,
52
+ int64_t *offset, QEMUIOVector *qiov,
53
+ BdrvRequestFlags flags);
54
55
/* removable device specific */
56
bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_is_inserted)(
57
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
58
index XXXXXXX..XXXXXXX 100644
59
--- a/include/block/raw-aio.h
60
+++ b/include/block/raw-aio.h
61
@@ -XXX,XX +XXX,XX @@
62
#define QEMU_AIO_TRUNCATE 0x0080
63
#define QEMU_AIO_ZONE_REPORT 0x0100
64
#define QEMU_AIO_ZONE_MGMT 0x0200
65
+#define QEMU_AIO_ZONE_APPEND 0x0400
66
#define QEMU_AIO_TYPE_MASK \
67
(QEMU_AIO_READ | \
68
QEMU_AIO_WRITE | \
69
@@ -XXX,XX +XXX,XX @@
70
QEMU_AIO_COPY_RANGE | \
71
QEMU_AIO_TRUNCATE | \
72
QEMU_AIO_ZONE_REPORT | \
73
- QEMU_AIO_ZONE_MGMT)
74
+ QEMU_AIO_ZONE_MGMT | \
75
+ QEMU_AIO_ZONE_APPEND)
76
77
/* AIO flags */
78
#define QEMU_AIO_MISALIGNED 0x1000
79
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
80
index XXXXXXX..XXXXXXX 100644
81
--- a/include/sysemu/block-backend-io.h
82
+++ b/include/sysemu/block-backend-io.h
83
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
84
BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
85
int64_t offset, int64_t len,
86
BlockCompletionFunc *cb, void *opaque);
87
+BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
88
+ QEMUIOVector *qiov, BdrvRequestFlags flags,
89
+ BlockCompletionFunc *cb, void *opaque);
90
BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes,
91
BlockCompletionFunc *cb, void *opaque);
92
void blk_aio_cancel_async(BlockAIOCB *acb);
93
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
94
int64_t offset, int64_t len);
95
int co_wrapper_mixed blk_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
96
int64_t offset, int64_t len);
97
+int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
98
+ QEMUIOVector *qiov,
99
+ BdrvRequestFlags flags);
100
+int co_wrapper_mixed blk_zone_append(BlockBackend *blk, int64_t *offset,
101
+ QEMUIOVector *qiov,
102
+ BdrvRequestFlags flags);
103
104
int co_wrapper_mixed blk_pdiscard(BlockBackend *blk, int64_t offset,
105
int64_t bytes);
106
diff --git a/block/block-backend.c b/block/block-backend.c
107
index XXXXXXX..XXXXXXX 100644
108
--- a/block/block-backend.c
109
+++ b/block/block-backend.c
110
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
111
return &acb->common;
112
}
113
114
+static void coroutine_fn blk_aio_zone_append_entry(void *opaque)
115
+{
116
+ BlkAioEmAIOCB *acb = opaque;
117
+ BlkRwCo *rwco = &acb->rwco;
118
+
119
+ rwco->ret = blk_co_zone_append(rwco->blk, (int64_t *)(uintptr_t)acb->bytes,
120
+ rwco->iobuf, rwco->flags);
121
+ blk_aio_complete(acb);
122
+}
123
+
124
+BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
125
+ QEMUIOVector *qiov, BdrvRequestFlags flags,
126
+ BlockCompletionFunc *cb, void *opaque) {
127
+ BlkAioEmAIOCB *acb;
128
+ Coroutine *co;
129
+ IO_CODE();
130
+
131
+ blk_inc_in_flight(blk);
132
+ acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
133
+ acb->rwco = (BlkRwCo) {
134
+ .blk = blk,
135
+ .ret = NOT_DONE,
136
+ .flags = flags,
137
+ .iobuf = qiov,
138
+ };
139
+ acb->bytes = (int64_t)(uintptr_t)offset;
140
+ acb->has_returned = false;
141
+
142
+ co = qemu_coroutine_create(blk_aio_zone_append_entry, acb);
143
+ aio_co_enter(blk_get_aio_context(blk), co);
144
+ acb->has_returned = true;
145
+ if (acb->rwco.ret != NOT_DONE) {
146
+ replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
147
+ blk_aio_complete_bh, acb);
148
+ }
149
+
150
+ return &acb->common;
151
+}
152
+
153
/*
154
* Send a zone_report command.
155
* offset is a byte offset from the start of the device. No alignment
156
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
157
return ret;
158
}
159
160
+/*
161
+ * Send a zone_append command.
162
+ */
163
+int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
164
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
165
+{
166
+ int ret;
167
+ IO_CODE();
168
+
169
+ blk_inc_in_flight(blk);
170
+ blk_wait_while_drained(blk);
171
+ GRAPH_RDLOCK_GUARD();
172
+ if (!blk_is_available(blk)) {
173
+ blk_dec_in_flight(blk);
174
+ return -ENOMEDIUM;
175
+ }
176
+
177
+ ret = bdrv_co_zone_append(blk_bs(blk), offset, qiov, flags);
178
+ blk_dec_in_flight(blk);
179
+ return ret;
180
+}
181
+
182
void blk_drain(BlockBackend *blk)
183
{
184
BlockDriverState *bs = blk_bs(blk);
185
diff --git a/block/file-posix.c b/block/file-posix.c
186
index XXXXXXX..XXXXXXX 100644
187
--- a/block/file-posix.c
188
+++ b/block/file-posix.c
189
@@ -XXX,XX +XXX,XX @@ typedef struct BDRVRawState {
190
bool has_write_zeroes:1;
191
bool use_linux_aio:1;
192
bool use_linux_io_uring:1;
193
+ int64_t *offset; /* offset of zone append operation */
194
int page_cache_inconsistent; /* errno from fdatasync failure */
195
bool has_fallocate;
196
bool needs_alignment;
197
@@ -XXX,XX +XXX,XX @@ static ssize_t handle_aiocb_rw_vector(RawPosixAIOData *aiocb)
198
ssize_t len;
199
200
len = RETRY_ON_EINTR(
201
- (aiocb->aio_type & QEMU_AIO_WRITE) ?
202
+ (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) ?
203
qemu_pwritev(aiocb->aio_fildes,
204
aiocb->io.iov,
205
aiocb->io.niov,
206
@@ -XXX,XX +XXX,XX @@ static ssize_t handle_aiocb_rw_linear(RawPosixAIOData *aiocb, char *buf)
207
ssize_t len;
208
209
while (offset < aiocb->aio_nbytes) {
210
- if (aiocb->aio_type & QEMU_AIO_WRITE) {
211
+ if (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
212
len = pwrite(aiocb->aio_fildes,
213
(const char *)buf + offset,
214
aiocb->aio_nbytes - offset,
215
@@ -XXX,XX +XXX,XX @@ static int handle_aiocb_rw(void *opaque)
216
}
217
218
nbytes = handle_aiocb_rw_linear(aiocb, buf);
219
- if (!(aiocb->aio_type & QEMU_AIO_WRITE)) {
220
+ if (!(aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))) {
221
char *p = buf;
222
size_t count = aiocb->aio_nbytes, copy;
223
int i;
224
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
225
if (fd_open(bs) < 0)
226
return -EIO;
227
#if defined(CONFIG_BLKZONED)
228
- if (type & QEMU_AIO_WRITE && bs->wps) {
229
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && bs->wps) {
230
qemu_co_mutex_lock(&bs->wps->colock);
231
+ if (type & QEMU_AIO_ZONE_APPEND && bs->bl.zone_size) {
232
+ int index = offset / bs->bl.zone_size;
233
+ offset = bs->wps->wp[index];
234
+ }
235
}
236
#endif
237
238
@@ -XXX,XX +XXX,XX @@ out:
239
{
240
BlockZoneWps *wps = bs->wps;
241
if (ret == 0) {
242
- if (type & QEMU_AIO_WRITE && wps && bs->bl.zone_size) {
243
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))
244
+ && wps && bs->bl.zone_size) {
245
uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
246
if (!BDRV_ZT_IS_CONV(*wp)) {
247
+ if (type & QEMU_AIO_ZONE_APPEND) {
248
+ *s->offset = *wp;
249
+ }
250
/* Advance the wp if needed */
251
if (offset + bytes > *wp) {
252
*wp = offset + bytes;
253
@@ -XXX,XX +XXX,XX @@ out:
254
}
255
}
256
} else {
257
- if (type & QEMU_AIO_WRITE) {
258
+ if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
259
update_zones_wp(bs, s->fd, 0, 1);
260
}
261
}
262
263
- if (type & QEMU_AIO_WRITE && wps) {
264
+ if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && wps) {
265
qemu_co_mutex_unlock(&wps->colock);
266
}
267
}
268
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
269
}
270
#endif
271
272
+#if defined(CONFIG_BLKZONED)
273
+static int coroutine_fn raw_co_zone_append(BlockDriverState *bs,
274
+ int64_t *offset,
275
+ QEMUIOVector *qiov,
276
+ BdrvRequestFlags flags) {
277
+ assert(flags == 0);
278
+ int64_t zone_size_mask = bs->bl.zone_size - 1;
279
+ int64_t iov_len = 0;
280
+ int64_t len = 0;
281
+ BDRVRawState *s = bs->opaque;
282
+ s->offset = offset;
283
+
284
+ if (*offset & zone_size_mask) {
285
+ error_report("sector offset %" PRId64 " is not aligned to zone size "
286
+ "%" PRId32 "", *offset / 512, bs->bl.zone_size / 512);
287
+ return -EINVAL;
288
+ }
289
+
290
+ int64_t wg = bs->bl.write_granularity;
291
+ int64_t wg_mask = wg - 1;
292
+ for (int i = 0; i < qiov->niov; i++) {
293
+ iov_len = qiov->iov[i].iov_len;
294
+ if (iov_len & wg_mask) {
295
+ error_report("len of IOVector[%d] %" PRId64 " is not aligned to "
296
+ "block size %" PRId64 "", i, iov_len, wg);
297
+ return -EINVAL;
298
+ }
299
+ len += iov_len;
300
+ }
301
+
302
+ return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
303
+}
304
+#endif
305
+
306
static coroutine_fn int
307
raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
308
bool blkdev)
309
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_device = {
310
/* zone management operations */
311
.bdrv_co_zone_report = raw_co_zone_report,
312
.bdrv_co_zone_mgmt = raw_co_zone_mgmt,
313
+ .bdrv_co_zone_append = raw_co_zone_append,
314
#endif
315
};
316
22
diff --git a/block/io.c b/block/io.c
317
diff --git a/block/io.c b/block/io.c
23
index XXXXXXX..XXXXXXX 100644
318
index XXXXXXX..XXXXXXX 100644
24
--- a/block/io.c
319
--- a/block/io.c
25
+++ b/block/io.c
320
+++ b/block/io.c
26
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
321
@@ -XXX,XX +XXX,XX @@ out:
27
* at 'offset + *pnum' may return the same allocation status (in other
322
return co.ret;
28
* words, the result is not necessarily the maximum possible range);
323
}
29
* but 'pnum' will only be 0 when end of file is reached.
324
30
- *
325
+int coroutine_fn bdrv_co_zone_append(BlockDriverState *bs, int64_t *offset,
31
*/
326
+ QEMUIOVector *qiov,
32
int bdrv_is_allocated_above(BlockDriverState *top,
327
+ BdrvRequestFlags flags)
33
BlockDriverState *base,
328
+{
34
bool include_base, int64_t offset,
329
+ int ret;
35
int64_t bytes, int64_t *pnum)
330
+ BlockDriver *drv = bs->drv;
36
{
331
+ CoroutineIOCompletion co = {
37
- BlockDriverState *intermediate;
332
+ .coroutine = qemu_coroutine_self(),
38
- int ret;
333
+ };
39
- int64_t n = bytes;
334
+ IO_CODE();
40
-
335
+
41
- assert(base || !include_base);
336
+ ret = bdrv_check_qiov_request(*offset, qiov->size, qiov, 0, NULL);
42
-
43
- intermediate = top;
44
- while (include_base || intermediate != base) {
45
- int64_t pnum_inter;
46
- int64_t size_inter;
47
-
48
- assert(intermediate);
49
- ret = bdrv_is_allocated(intermediate, offset, bytes, &pnum_inter);
50
- if (ret < 0) {
51
- return ret;
52
- }
53
- if (ret) {
54
- *pnum = pnum_inter;
55
- return 1;
56
- }
57
-
58
- size_inter = bdrv_getlength(intermediate);
59
- if (size_inter < 0) {
60
- return size_inter;
61
- }
62
- if (n > pnum_inter &&
63
- (intermediate == top || offset + pnum_inter < size_inter)) {
64
- n = pnum_inter;
65
- }
66
-
67
- if (intermediate == base) {
68
- break;
69
- }
70
-
71
- intermediate = bdrv_filter_or_cow_bs(intermediate);
72
+ int ret = bdrv_common_block_status_above(top, base, include_base, false,
73
+ offset, bytes, pnum, NULL, NULL);
74
+ if (ret < 0) {
337
+ if (ret < 0) {
75
+ return ret;
338
+ return ret;
76
}
339
+ }
77
340
+
78
- *pnum = n;
341
+ bdrv_inc_in_flight(bs);
79
- return 0;
342
+ if (!drv || !drv->bdrv_co_zone_append || bs->bl.zoned == BLK_Z_NONE) {
80
+ return !!(ret & BDRV_BLOCK_ALLOCATED);
343
+ co.ret = -ENOTSUP;
81
}
344
+ goto out;
82
345
+ }
83
int coroutine_fn
346
+ co.ret = drv->bdrv_co_zone_append(bs, offset, qiov, flags);
347
+out:
348
+ bdrv_dec_in_flight(bs);
349
+ return co.ret;
350
+}
351
+
352
void *qemu_blockalign(BlockDriverState *bs, size_t size)
353
{
354
IO_CODE();
355
diff --git a/block/io_uring.c b/block/io_uring.c
356
index XXXXXXX..XXXXXXX 100644
357
--- a/block/io_uring.c
358
+++ b/block/io_uring.c
359
@@ -XXX,XX +XXX,XX @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s,
360
io_uring_prep_writev(sqes, fd, luringcb->qiov->iov,
361
luringcb->qiov->niov, offset);
362
break;
363
+ case QEMU_AIO_ZONE_APPEND:
364
+ io_uring_prep_writev(sqes, fd, luringcb->qiov->iov,
365
+ luringcb->qiov->niov, offset);
366
+ break;
367
case QEMU_AIO_READ:
368
io_uring_prep_readv(sqes, fd, luringcb->qiov->iov,
369
luringcb->qiov->niov, offset);
370
diff --git a/block/linux-aio.c b/block/linux-aio.c
371
index XXXXXXX..XXXXXXX 100644
372
--- a/block/linux-aio.c
373
+++ b/block/linux-aio.c
374
@@ -XXX,XX +XXX,XX @@ static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset,
375
case QEMU_AIO_WRITE:
376
io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset);
377
break;
378
+ case QEMU_AIO_ZONE_APPEND:
379
+ io_prep_pwritev(iocbs, fd, qiov->iov, qiov->niov, offset);
380
+ break;
381
case QEMU_AIO_READ:
382
io_prep_preadv(iocbs, fd, qiov->iov, qiov->niov, offset);
383
break;
384
diff --git a/block/raw-format.c b/block/raw-format.c
385
index XXXXXXX..XXXXXXX 100644
386
--- a/block/raw-format.c
387
+++ b/block/raw-format.c
388
@@ -XXX,XX +XXX,XX @@ raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
389
return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
390
}
391
392
+static int coroutine_fn GRAPH_RDLOCK
393
+raw_co_zone_append(BlockDriverState *bs,int64_t *offset, QEMUIOVector *qiov,
394
+ BdrvRequestFlags flags)
395
+{
396
+ return bdrv_co_zone_append(bs->file->bs, offset, qiov, flags);
397
+}
398
+
399
static int64_t coroutine_fn GRAPH_RDLOCK
400
raw_co_getlength(BlockDriverState *bs)
401
{
402
@@ -XXX,XX +XXX,XX @@ BlockDriver bdrv_raw = {
403
.bdrv_co_pdiscard = &raw_co_pdiscard,
404
.bdrv_co_zone_report = &raw_co_zone_report,
405
.bdrv_co_zone_mgmt = &raw_co_zone_mgmt,
406
+ .bdrv_co_zone_append = &raw_co_zone_append,
407
.bdrv_co_block_status = &raw_co_block_status,
408
.bdrv_co_copy_range_from = &raw_co_copy_range_from,
409
.bdrv_co_copy_range_to = &raw_co_copy_range_to,
84
--
410
--
85
2.26.2
411
2.40.1
86
diff view generated by jsdifflib
1
From: Coiby Xu <coiby.xu@gmail.com>
1
From: Sam Li <faithilikerun@gmail.com>
2
2
3
By making use of libvhost-user, block device drive can be shared to
3
The patch tests zone append writes by reporting the zone wp after
4
the connected vhost-user client. Only one client can connect to the
4
the completion of the call. "zap -p" option can print the sector
5
server one time.
5
offset value after completion, which should be the start sector
6
where the append write begins.
6
7
7
Since vhost-user-server needs a block drive to be created first, delay
8
Signed-off-by: Sam Li <faithilikerun@gmail.com>
8
the creation of this object.
9
10
Suggested-by: Kevin Wolf <kwolf@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
13
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
14
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
10
Message-id: 20230508051510.177850-4-faithilikerun@gmail.com
15
Message-id: 20200918080912.321299-6-coiby.xu@gmail.com
16
[Shorten "vhost_user_blk_server" string to "vhost_user_blk" to avoid the
17
following compiler warning:
18
../block/export/vhost-user-blk-server.c:178:50: error: ‘%s’ directive output truncated writing 21 bytes into a region of size 20 [-Werror=format-truncation=]
19
and fix "Invalid size %ld ..." ssize_t format string arguments for
20
32-bit hosts.
21
--Stefan]
22
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
23
---
12
---
24
block/export/vhost-user-blk-server.h | 36 ++
13
qemu-io-cmds.c | 75 ++++++++++++++++++++++++++++++
25
block/export/vhost-user-blk-server.c | 661 +++++++++++++++++++++++++++
14
tests/qemu-iotests/tests/zoned | 16 +++++++
26
softmmu/vl.c | 4 +
15
tests/qemu-iotests/tests/zoned.out | 16 +++++++
27
block/meson.build | 1 +
16
3 files changed, 107 insertions(+)
28
4 files changed, 702 insertions(+)
29
create mode 100644 block/export/vhost-user-blk-server.h
30
create mode 100644 block/export/vhost-user-blk-server.c
31
17
32
diff --git a/block/export/vhost-user-blk-server.h b/block/export/vhost-user-blk-server.h
18
diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
33
new file mode 100644
19
index XXXXXXX..XXXXXXX 100644
34
index XXXXXXX..XXXXXXX
20
--- a/qemu-io-cmds.c
35
--- /dev/null
21
+++ b/qemu-io-cmds.c
36
+++ b/block/export/vhost-user-blk-server.h
22
@@ -XXX,XX +XXX,XX @@ static const cmdinfo_t zone_reset_cmd = {
37
@@ -XXX,XX +XXX,XX @@
23
.oneline = "reset a zone write pointer in zone block device",
38
+/*
24
};
39
+ * Sharing QEMU block devices via vhost-user protocal
25
40
+ *
26
+static int do_aio_zone_append(BlockBackend *blk, QEMUIOVector *qiov,
41
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
27
+ int64_t *offset, int flags, int *total)
42
+ * Copyright (c) 2020 Red Hat, Inc.
28
+{
43
+ *
29
+ int async_ret = NOT_DONE;
44
+ * This work is licensed under the terms of the GNU GPL, version 2 or
45
+ * later. See the COPYING file in the top-level directory.
46
+ */
47
+
30
+
48
+#ifndef VHOST_USER_BLK_SERVER_H
31
+ blk_aio_zone_append(blk, offset, qiov, flags, aio_rw_done, &async_ret);
49
+#define VHOST_USER_BLK_SERVER_H
32
+ while (async_ret == NOT_DONE) {
50
+#include "util/vhost-user-server.h"
33
+ main_loop_wait(false);
51
+
52
+typedef struct VuBlockDev VuBlockDev;
53
+#define TYPE_VHOST_USER_BLK_SERVER "vhost-user-blk-server"
54
+#define VHOST_USER_BLK_SERVER(obj) \
55
+ OBJECT_CHECK(VuBlockDev, obj, TYPE_VHOST_USER_BLK_SERVER)
56
+
57
+/* vhost user block device */
58
+struct VuBlockDev {
59
+ Object parent_obj;
60
+ char *node_name;
61
+ SocketAddress *addr;
62
+ AioContext *ctx;
63
+ VuServer vu_server;
64
+ bool running;
65
+ uint32_t blk_size;
66
+ BlockBackend *backend;
67
+ QIOChannelSocket *sioc;
68
+ QTAILQ_ENTRY(VuBlockDev) next;
69
+ struct virtio_blk_config blkcfg;
70
+ bool writable;
71
+};
72
+
73
+#endif /* VHOST_USER_BLK_SERVER_H */
74
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
75
new file mode 100644
76
index XXXXXXX..XXXXXXX
77
--- /dev/null
78
+++ b/block/export/vhost-user-blk-server.c
79
@@ -XXX,XX +XXX,XX @@
80
+/*
81
+ * Sharing QEMU block devices via vhost-user protocal
82
+ *
83
+ * Parts of the code based on nbd/server.c.
84
+ *
85
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
86
+ * Copyright (c) 2020 Red Hat, Inc.
87
+ *
88
+ * This work is licensed under the terms of the GNU GPL, version 2 or
89
+ * later. See the COPYING file in the top-level directory.
90
+ */
91
+#include "qemu/osdep.h"
92
+#include "block/block.h"
93
+#include "vhost-user-blk-server.h"
94
+#include "qapi/error.h"
95
+#include "qom/object_interfaces.h"
96
+#include "sysemu/block-backend.h"
97
+#include "util/block-helpers.h"
98
+
99
+enum {
100
+ VHOST_USER_BLK_MAX_QUEUES = 1,
101
+};
102
+struct virtio_blk_inhdr {
103
+ unsigned char status;
104
+};
105
+
106
+typedef struct VuBlockReq {
107
+ VuVirtqElement *elem;
108
+ int64_t sector_num;
109
+ size_t size;
110
+ struct virtio_blk_inhdr *in;
111
+ struct virtio_blk_outhdr out;
112
+ VuServer *server;
113
+ struct VuVirtq *vq;
114
+} VuBlockReq;
115
+
116
+static void vu_block_req_complete(VuBlockReq *req)
117
+{
118
+ VuDev *vu_dev = &req->server->vu_dev;
119
+
120
+ /* IO size with 1 extra status byte */
121
+ vu_queue_push(vu_dev, req->vq, req->elem, req->size + 1);
122
+ vu_queue_notify(vu_dev, req->vq);
123
+
124
+ if (req->elem) {
125
+ free(req->elem);
126
+ }
34
+ }
127
+
35
+
128
+ g_free(req);
36
+ *total = qiov->size;
37
+ return async_ret < 0 ? async_ret : 1;
129
+}
38
+}
130
+
39
+
131
+static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
40
+static int zone_append_f(BlockBackend *blk, int argc, char **argv)
132
+{
41
+{
133
+ return container_of(server, VuBlockDev, vu_server);
42
+ int ret;
134
+}
43
+ bool pflag = false;
44
+ int flags = 0;
45
+ int total = 0;
46
+ int64_t offset;
47
+ char *buf;
48
+ int c, nr_iov;
49
+ int pattern = 0xcd;
50
+ QEMUIOVector qiov;
135
+
51
+
136
+static int coroutine_fn
52
+ if (optind > argc - 3) {
137
+vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
138
+ uint32_t iovcnt, uint32_t type)
139
+{
140
+ struct virtio_blk_discard_write_zeroes desc;
141
+ ssize_t size = iov_to_buf(iov, iovcnt, 0, &desc, sizeof(desc));
142
+ if (unlikely(size != sizeof(desc))) {
143
+ error_report("Invalid size %zd, expect %zu", size, sizeof(desc));
144
+ return -EINVAL;
53
+ return -EINVAL;
145
+ }
54
+ }
146
+
55
+
147
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
56
+ if ((c = getopt(argc, argv, "p")) != -1) {
148
+ uint64_t range[2] = { le64_to_cpu(desc.sector) << 9,
57
+ pflag = true;
149
+ le32_to_cpu(desc.num_sectors) << 9 };
150
+ if (type == VIRTIO_BLK_T_DISCARD) {
151
+ if (blk_co_pdiscard(vdev_blk->backend, range[0], range[1]) == 0) {
152
+ return 0;
153
+ }
154
+ } else if (type == VIRTIO_BLK_T_WRITE_ZEROES) {
155
+ if (blk_co_pwrite_zeroes(vdev_blk->backend,
156
+ range[0], range[1], 0) == 0) {
157
+ return 0;
158
+ }
159
+ }
58
+ }
160
+
59
+
161
+ return -EINVAL;
60
+ offset = cvtnum(argv[optind]);
162
+}
61
+ if (offset < 0) {
163
+
62
+ print_cvtnum_err(offset, argv[optind]);
164
+static void coroutine_fn vu_block_flush(VuBlockReq *req)
63
+ return offset;
165
+{
166
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
167
+ BlockBackend *backend = vdev_blk->backend;
168
+ blk_co_flush(backend);
169
+}
170
+
171
+struct req_data {
172
+ VuServer *server;
173
+ VuVirtq *vq;
174
+ VuVirtqElement *elem;
175
+};
176
+
177
+static void coroutine_fn vu_block_virtio_process_req(void *opaque)
178
+{
179
+ struct req_data *data = opaque;
180
+ VuServer *server = data->server;
181
+ VuVirtq *vq = data->vq;
182
+ VuVirtqElement *elem = data->elem;
183
+ uint32_t type;
184
+ VuBlockReq *req;
185
+
186
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
187
+ BlockBackend *backend = vdev_blk->backend;
188
+
189
+ struct iovec *in_iov = elem->in_sg;
190
+ struct iovec *out_iov = elem->out_sg;
191
+ unsigned in_num = elem->in_num;
192
+ unsigned out_num = elem->out_num;
193
+ /* refer to hw/block/virtio_blk.c */
194
+ if (elem->out_num < 1 || elem->in_num < 1) {
195
+ error_report("virtio-blk request missing headers");
196
+ free(elem);
197
+ return;
198
+ }
64
+ }
199
+
65
+ optind++;
200
+ req = g_new0(VuBlockReq, 1);
66
+ nr_iov = argc - optind;
201
+ req->server = server;
67
+ buf = create_iovec(blk, &qiov, &argv[optind], nr_iov, pattern,
202
+ req->vq = vq;
68
+ flags & BDRV_REQ_REGISTERED_BUF);
203
+ req->elem = elem;
69
+ if (buf == NULL) {
204
+
205
+ if (unlikely(iov_to_buf(out_iov, out_num, 0, &req->out,
206
+ sizeof(req->out)) != sizeof(req->out))) {
207
+ error_report("virtio-blk request outhdr too short");
208
+ goto err;
209
+ }
210
+
211
+ iov_discard_front(&out_iov, &out_num, sizeof(req->out));
212
+
213
+ if (in_iov[in_num - 1].iov_len < sizeof(struct virtio_blk_inhdr)) {
214
+ error_report("virtio-blk request inhdr too short");
215
+ goto err;
216
+ }
217
+
218
+ /* We always touch the last byte, so just see how big in_iov is. */
219
+ req->in = (void *)in_iov[in_num - 1].iov_base
220
+ + in_iov[in_num - 1].iov_len
221
+ - sizeof(struct virtio_blk_inhdr);
222
+ iov_discard_back(in_iov, &in_num, sizeof(struct virtio_blk_inhdr));
223
+
224
+ type = le32_to_cpu(req->out.type);
225
+ switch (type & ~VIRTIO_BLK_T_BARRIER) {
226
+ case VIRTIO_BLK_T_IN:
227
+ case VIRTIO_BLK_T_OUT: {
228
+ ssize_t ret = 0;
229
+ bool is_write = type & VIRTIO_BLK_T_OUT;
230
+ req->sector_num = le64_to_cpu(req->out.sector);
231
+
232
+ int64_t offset = req->sector_num * vdev_blk->blk_size;
233
+ QEMUIOVector qiov;
234
+ if (is_write) {
235
+ qemu_iovec_init_external(&qiov, out_iov, out_num);
236
+ ret = blk_co_pwritev(backend, offset, qiov.size,
237
+ &qiov, 0);
238
+ } else {
239
+ qemu_iovec_init_external(&qiov, in_iov, in_num);
240
+ ret = blk_co_preadv(backend, offset, qiov.size,
241
+ &qiov, 0);
242
+ }
243
+ if (ret >= 0) {
244
+ req->in->status = VIRTIO_BLK_S_OK;
245
+ } else {
246
+ req->in->status = VIRTIO_BLK_S_IOERR;
247
+ }
248
+ break;
249
+ }
250
+ case VIRTIO_BLK_T_FLUSH:
251
+ vu_block_flush(req);
252
+ req->in->status = VIRTIO_BLK_S_OK;
253
+ break;
254
+ case VIRTIO_BLK_T_GET_ID: {
255
+ size_t size = MIN(iov_size(&elem->in_sg[0], in_num),
256
+ VIRTIO_BLK_ID_BYTES);
257
+ snprintf(elem->in_sg[0].iov_base, size, "%s", "vhost_user_blk");
258
+ req->in->status = VIRTIO_BLK_S_OK;
259
+ req->size = elem->in_sg[0].iov_len;
260
+ break;
261
+ }
262
+ case VIRTIO_BLK_T_DISCARD:
263
+ case VIRTIO_BLK_T_WRITE_ZEROES: {
264
+ int rc;
265
+ rc = vu_block_discard_write_zeroes(req, &elem->out_sg[1],
266
+ out_num, type);
267
+ if (rc == 0) {
268
+ req->in->status = VIRTIO_BLK_S_OK;
269
+ } else {
270
+ req->in->status = VIRTIO_BLK_S_IOERR;
271
+ }
272
+ break;
273
+ }
274
+ default:
275
+ req->in->status = VIRTIO_BLK_S_UNSUPP;
276
+ break;
277
+ }
278
+
279
+ vu_block_req_complete(req);
280
+ return;
281
+
282
+err:
283
+ free(elem);
284
+ g_free(req);
285
+ return;
286
+}
287
+
288
+static void vu_block_process_vq(VuDev *vu_dev, int idx)
289
+{
290
+ VuServer *server;
291
+ VuVirtq *vq;
292
+ struct req_data *req_data;
293
+
294
+ server = container_of(vu_dev, VuServer, vu_dev);
295
+ assert(server);
296
+
297
+ vq = vu_get_queue(vu_dev, idx);
298
+ assert(vq);
299
+ VuVirtqElement *elem;
300
+ while (1) {
301
+ elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) +
302
+ sizeof(VuBlockReq));
303
+ if (elem) {
304
+ req_data = g_new0(struct req_data, 1);
305
+ req_data->server = server;
306
+ req_data->vq = vq;
307
+ req_data->elem = elem;
308
+ Coroutine *co = qemu_coroutine_create(vu_block_virtio_process_req,
309
+ req_data);
310
+ aio_co_enter(server->ioc->ctx, co);
311
+ } else {
312
+ break;
313
+ }
314
+ }
315
+}
316
+
317
+static void vu_block_queue_set_started(VuDev *vu_dev, int idx, bool started)
318
+{
319
+ VuVirtq *vq;
320
+
321
+ assert(vu_dev);
322
+
323
+ vq = vu_get_queue(vu_dev, idx);
324
+ vu_set_queue_handler(vu_dev, vq, started ? vu_block_process_vq : NULL);
325
+}
326
+
327
+static uint64_t vu_block_get_features(VuDev *dev)
328
+{
329
+ uint64_t features;
330
+ VuServer *server = container_of(dev, VuServer, vu_dev);
331
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
332
+ features = 1ull << VIRTIO_BLK_F_SIZE_MAX |
333
+ 1ull << VIRTIO_BLK_F_SEG_MAX |
334
+ 1ull << VIRTIO_BLK_F_TOPOLOGY |
335
+ 1ull << VIRTIO_BLK_F_BLK_SIZE |
336
+ 1ull << VIRTIO_BLK_F_FLUSH |
337
+ 1ull << VIRTIO_BLK_F_DISCARD |
338
+ 1ull << VIRTIO_BLK_F_WRITE_ZEROES |
339
+ 1ull << VIRTIO_BLK_F_CONFIG_WCE |
340
+ 1ull << VIRTIO_F_VERSION_1 |
341
+ 1ull << VIRTIO_RING_F_INDIRECT_DESC |
342
+ 1ull << VIRTIO_RING_F_EVENT_IDX |
343
+ 1ull << VHOST_USER_F_PROTOCOL_FEATURES;
344
+
345
+ if (!vdev_blk->writable) {
346
+ features |= 1ull << VIRTIO_BLK_F_RO;
347
+ }
348
+
349
+ return features;
350
+}
351
+
352
+static uint64_t vu_block_get_protocol_features(VuDev *dev)
353
+{
354
+ return 1ull << VHOST_USER_PROTOCOL_F_CONFIG |
355
+ 1ull << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD;
356
+}
357
+
358
+static int
359
+vu_block_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
360
+{
361
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
362
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
363
+ memcpy(config, &vdev_blk->blkcfg, len);
364
+
365
+ return 0;
366
+}
367
+
368
+static int
369
+vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
370
+ uint32_t offset, uint32_t size, uint32_t flags)
371
+{
372
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
373
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
374
+ uint8_t wce;
375
+
376
+ /* don't support live migration */
377
+ if (flags != VHOST_SET_CONFIG_TYPE_MASTER) {
378
+ return -EINVAL;
70
+ return -EINVAL;
379
+ }
71
+ }
380
+
72
+ ret = do_aio_zone_append(blk, &qiov, &offset, flags, &total);
381
+ if (offset != offsetof(struct virtio_blk_config, wce) ||
382
+ size != 1) {
383
+ return -EINVAL;
384
+ }
385
+
386
+ wce = *data;
387
+ vdev_blk->blkcfg.wce = wce;
388
+ blk_set_enable_write_cache(vdev_blk->backend, wce);
389
+ return 0;
390
+}
391
+
392
+/*
393
+ * When the client disconnects, it sends a VHOST_USER_NONE request
394
+ * and vu_process_message will simple call exit which cause the VM
395
+ * to exit abruptly.
396
+ * To avoid this issue, process VHOST_USER_NONE request ahead
397
+ * of vu_process_message.
398
+ *
399
+ */
400
+static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
401
+{
402
+ if (vmsg->request == VHOST_USER_NONE) {
403
+ dev->panic(dev, "disconnect");
404
+ return true;
405
+ }
406
+ return false;
407
+}
408
+
409
+static const VuDevIface vu_block_iface = {
410
+ .get_features = vu_block_get_features,
411
+ .queue_set_started = vu_block_queue_set_started,
412
+ .get_protocol_features = vu_block_get_protocol_features,
413
+ .get_config = vu_block_get_config,
414
+ .set_config = vu_block_set_config,
415
+ .process_msg = vu_block_process_msg,
416
+};
417
+
418
+static void blk_aio_attached(AioContext *ctx, void *opaque)
419
+{
420
+ VuBlockDev *vub_dev = opaque;
421
+ aio_context_acquire(ctx);
422
+ vhost_user_server_set_aio_context(&vub_dev->vu_server, ctx);
423
+ aio_context_release(ctx);
424
+}
425
+
426
+static void blk_aio_detach(void *opaque)
427
+{
428
+ VuBlockDev *vub_dev = opaque;
429
+ AioContext *ctx = vub_dev->vu_server.ctx;
430
+ aio_context_acquire(ctx);
431
+ vhost_user_server_set_aio_context(&vub_dev->vu_server, NULL);
432
+ aio_context_release(ctx);
433
+}
434
+
435
+static void
436
+vu_block_initialize_config(BlockDriverState *bs,
437
+ struct virtio_blk_config *config, uint32_t blk_size)
438
+{
439
+ config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
440
+ config->blk_size = blk_size;
441
+ config->size_max = 0;
442
+ config->seg_max = 128 - 2;
443
+ config->min_io_size = 1;
444
+ config->opt_io_size = 1;
445
+ config->num_queues = VHOST_USER_BLK_MAX_QUEUES;
446
+ config->max_discard_sectors = 32768;
447
+ config->max_discard_seg = 1;
448
+ config->discard_sector_alignment = config->blk_size >> 9;
449
+ config->max_write_zeroes_sectors = 32768;
450
+ config->max_write_zeroes_seg = 1;
451
+}
452
+
453
+static VuBlockDev *vu_block_init(VuBlockDev *vu_block_device, Error **errp)
454
+{
455
+
456
+ BlockBackend *blk;
457
+ Error *local_error = NULL;
458
+ const char *node_name = vu_block_device->node_name;
459
+ bool writable = vu_block_device->writable;
460
+ uint64_t perm = BLK_PERM_CONSISTENT_READ;
461
+ int ret;
462
+
463
+ AioContext *ctx;
464
+
465
+ BlockDriverState *bs = bdrv_lookup_bs(node_name, node_name, &local_error);
466
+
467
+ if (!bs) {
468
+ error_propagate(errp, local_error);
469
+ return NULL;
470
+ }
471
+
472
+ if (bdrv_is_read_only(bs)) {
473
+ writable = false;
474
+ }
475
+
476
+ if (writable) {
477
+ perm |= BLK_PERM_WRITE;
478
+ }
479
+
480
+ ctx = bdrv_get_aio_context(bs);
481
+ aio_context_acquire(ctx);
482
+ bdrv_invalidate_cache(bs, NULL);
483
+ aio_context_release(ctx);
484
+
485
+ /*
486
+ * Don't allow resize while the vhost user server is running,
487
+ * otherwise we don't care what happens with the node.
488
+ */
489
+ blk = blk_new(bdrv_get_aio_context(bs), perm,
490
+ BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
491
+ BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD);
492
+ ret = blk_insert_bs(blk, bs, errp);
493
+
494
+ if (ret < 0) {
73
+ if (ret < 0) {
495
+ goto fail;
74
+ printf("zone append failed: %s\n", strerror(-ret));
496
+ }
497
+
498
+ blk_set_enable_write_cache(blk, false);
499
+
500
+ blk_set_allow_aio_context_change(blk, true);
501
+
502
+ vu_block_device->blkcfg.wce = 0;
503
+ vu_block_device->backend = blk;
504
+ if (!vu_block_device->blk_size) {
505
+ vu_block_device->blk_size = BDRV_SECTOR_SIZE;
506
+ }
507
+ vu_block_device->blkcfg.blk_size = vu_block_device->blk_size;
508
+ blk_set_guest_block_size(blk, vu_block_device->blk_size);
509
+ vu_block_initialize_config(bs, &vu_block_device->blkcfg,
510
+ vu_block_device->blk_size);
511
+ return vu_block_device;
512
+
513
+fail:
514
+ blk_unref(blk);
515
+ return NULL;
516
+}
517
+
518
+static void vu_block_deinit(VuBlockDev *vu_block_device)
519
+{
520
+ if (vu_block_device->backend) {
521
+ blk_remove_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
522
+ blk_aio_detach, vu_block_device);
523
+ }
524
+
525
+ blk_unref(vu_block_device->backend);
526
+}
527
+
528
+static void vhost_user_blk_server_stop(VuBlockDev *vu_block_device)
529
+{
530
+ vhost_user_server_stop(&vu_block_device->vu_server);
531
+ vu_block_deinit(vu_block_device);
532
+}
533
+
534
+static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
535
+ Error **errp)
536
+{
537
+ AioContext *ctx;
538
+ SocketAddress *addr = vu_block_device->addr;
539
+
540
+ if (!vu_block_init(vu_block_device, errp)) {
541
+ return;
542
+ }
543
+
544
+ ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
545
+
546
+ if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
547
+ VHOST_USER_BLK_MAX_QUEUES,
548
+ NULL, &vu_block_iface,
549
+ errp)) {
550
+ goto error;
551
+ }
552
+
553
+ blk_add_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
554
+ blk_aio_detach, vu_block_device);
555
+ vu_block_device->running = true;
556
+ return;
557
+
558
+ error:
559
+ vu_block_deinit(vu_block_device);
560
+}
561
+
562
+static bool vu_prop_modifiable(VuBlockDev *vus, Error **errp)
563
+{
564
+ if (vus->running) {
565
+ error_setg(errp, "The property can't be modified "
566
+ "while the server is running");
567
+ return false;
568
+ }
569
+ return true;
570
+}
571
+
572
+static void vu_set_node_name(Object *obj, const char *value, Error **errp)
573
+{
574
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
575
+
576
+ if (!vu_prop_modifiable(vus, errp)) {
577
+ return;
578
+ }
579
+
580
+ if (vus->node_name) {
581
+ g_free(vus->node_name);
582
+ }
583
+
584
+ vus->node_name = g_strdup(value);
585
+}
586
+
587
+static char *vu_get_node_name(Object *obj, Error **errp)
588
+{
589
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
590
+ return g_strdup(vus->node_name);
591
+}
592
+
593
+static void free_socket_addr(SocketAddress *addr)
594
+{
595
+ g_free(addr->u.q_unix.path);
596
+ g_free(addr);
597
+}
598
+
599
+static void vu_set_unix_socket(Object *obj, const char *value,
600
+ Error **errp)
601
+{
602
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
603
+
604
+ if (!vu_prop_modifiable(vus, errp)) {
605
+ return;
606
+ }
607
+
608
+ if (vus->addr) {
609
+ free_socket_addr(vus->addr);
610
+ }
611
+
612
+ SocketAddress *addr = g_new0(SocketAddress, 1);
613
+ addr->type = SOCKET_ADDRESS_TYPE_UNIX;
614
+ addr->u.q_unix.path = g_strdup(value);
615
+ vus->addr = addr;
616
+}
617
+
618
+static char *vu_get_unix_socket(Object *obj, Error **errp)
619
+{
620
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
621
+ return g_strdup(vus->addr->u.q_unix.path);
622
+}
623
+
624
+static bool vu_get_block_writable(Object *obj, Error **errp)
625
+{
626
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
627
+ return vus->writable;
628
+}
629
+
630
+static void vu_set_block_writable(Object *obj, bool value, Error **errp)
631
+{
632
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
633
+
634
+ if (!vu_prop_modifiable(vus, errp)) {
635
+ return;
636
+ }
637
+
638
+ vus->writable = value;
639
+}
640
+
641
+static void vu_get_blk_size(Object *obj, Visitor *v, const char *name,
642
+ void *opaque, Error **errp)
643
+{
644
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
645
+ uint32_t value = vus->blk_size;
646
+
647
+ visit_type_uint32(v, name, &value, errp);
648
+}
649
+
650
+static void vu_set_blk_size(Object *obj, Visitor *v, const char *name,
651
+ void *opaque, Error **errp)
652
+{
653
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
654
+
655
+ Error *local_err = NULL;
656
+ uint32_t value;
657
+
658
+ if (!vu_prop_modifiable(vus, errp)) {
659
+ return;
660
+ }
661
+
662
+ visit_type_uint32(v, name, &value, &local_err);
663
+ if (local_err) {
664
+ goto out;
75
+ goto out;
665
+ }
76
+ }
666
+
77
+
667
+ check_block_size(object_get_typename(obj), name, value, &local_err);
78
+ if (pflag) {
668
+ if (local_err) {
79
+ printf("After zap done, the append sector is 0x%" PRIx64 "\n",
669
+ goto out;
80
+ tosector(offset));
670
+ }
81
+ }
671
+
82
+
672
+ vus->blk_size = value;
673
+
674
+out:
83
+out:
675
+ error_propagate(errp, local_err);
84
+ qemu_io_free(blk, buf, qiov.size,
85
+ flags & BDRV_REQ_REGISTERED_BUF);
86
+ qemu_iovec_destroy(&qiov);
87
+ return ret;
676
+}
88
+}
677
+
89
+
678
+static void vhost_user_blk_server_instance_finalize(Object *obj)
90
+static const cmdinfo_t zone_append_cmd = {
679
+{
91
+ .name = "zone_append",
680
+ VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
92
+ .altname = "zap",
681
+
93
+ .cfunc = zone_append_f,
682
+ vhost_user_blk_server_stop(vub);
94
+ .argmin = 3,
683
+
95
+ .argmax = 4,
684
+ /*
96
+ .args = "offset len [len..]",
685
+ * Unlike object_property_add_str, object_class_property_add_str
97
+ .oneline = "append write a number of bytes at a specified offset",
686
+ * doesn't have a release method. Thus manual memory freeing is
687
+ * needed.
688
+ */
689
+ free_socket_addr(vub->addr);
690
+ g_free(vub->node_name);
691
+}
692
+
693
+static void vhost_user_blk_server_complete(UserCreatable *obj, Error **errp)
694
+{
695
+ VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
696
+
697
+ vhost_user_blk_server_start(vub, errp);
698
+}
699
+
700
+static void vhost_user_blk_server_class_init(ObjectClass *klass,
701
+ void *class_data)
702
+{
703
+ UserCreatableClass *ucc = USER_CREATABLE_CLASS(klass);
704
+ ucc->complete = vhost_user_blk_server_complete;
705
+
706
+ object_class_property_add_bool(klass, "writable",
707
+ vu_get_block_writable,
708
+ vu_set_block_writable);
709
+
710
+ object_class_property_add_str(klass, "node-name",
711
+ vu_get_node_name,
712
+ vu_set_node_name);
713
+
714
+ object_class_property_add_str(klass, "unix-socket",
715
+ vu_get_unix_socket,
716
+ vu_set_unix_socket);
717
+
718
+ object_class_property_add(klass, "logical-block-size", "uint32",
719
+ vu_get_blk_size, vu_set_blk_size,
720
+ NULL, NULL);
721
+}
722
+
723
+static const TypeInfo vhost_user_blk_server_info = {
724
+ .name = TYPE_VHOST_USER_BLK_SERVER,
725
+ .parent = TYPE_OBJECT,
726
+ .instance_size = sizeof(VuBlockDev),
727
+ .instance_finalize = vhost_user_blk_server_instance_finalize,
728
+ .class_init = vhost_user_blk_server_class_init,
729
+ .interfaces = (InterfaceInfo[]) {
730
+ {TYPE_USER_CREATABLE},
731
+ {}
732
+ },
733
+};
98
+};
734
+
99
+
735
+static void vhost_user_blk_server_register_types(void)
100
static int truncate_f(BlockBackend *blk, int argc, char **argv);
736
+{
101
static const cmdinfo_t truncate_cmd = {
737
+ type_register_static(&vhost_user_blk_server_info);
102
.name = "truncate",
738
+}
103
@@ -XXX,XX +XXX,XX @@ static void __attribute((constructor)) init_qemuio_commands(void)
104
qemuio_add_command(&zone_close_cmd);
105
qemuio_add_command(&zone_finish_cmd);
106
qemuio_add_command(&zone_reset_cmd);
107
+ qemuio_add_command(&zone_append_cmd);
108
qemuio_add_command(&truncate_cmd);
109
qemuio_add_command(&length_cmd);
110
qemuio_add_command(&info_cmd);
111
diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned
112
index XXXXXXX..XXXXXXX 100755
113
--- a/tests/qemu-iotests/tests/zoned
114
+++ b/tests/qemu-iotests/tests/zoned
115
@@ -XXX,XX +XXX,XX @@ echo "(5) resetting the second zone"
116
$QEMU_IO $IMG -c "zrs 268435456 268435456"
117
echo "After resetting a zone:"
118
$QEMU_IO $IMG -c "zrp 268435456 1"
119
+echo
120
+echo
121
+echo "(6) append write" # the physical block size of the device is 4096
122
+$QEMU_IO $IMG -c "zrp 0 1"
123
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
124
+echo "After appending the first zone firstly:"
125
+$QEMU_IO $IMG -c "zrp 0 1"
126
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
127
+echo "After appending the first zone secondly:"
128
+$QEMU_IO $IMG -c "zrp 0 1"
129
+$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000"
130
+echo "After appending the second zone firstly:"
131
+$QEMU_IO $IMG -c "zrp 268435456 1"
132
+$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000"
133
+echo "After appending the second zone secondly:"
134
+$QEMU_IO $IMG -c "zrp 268435456 1"
135
136
# success, all done
137
echo "*** done"
138
diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out
139
index XXXXXXX..XXXXXXX 100644
140
--- a/tests/qemu-iotests/tests/zoned.out
141
+++ b/tests/qemu-iotests/tests/zoned.out
142
@@ -XXX,XX +XXX,XX @@ start: 0x80000, len 0x80000, cap 0x80000, wptr 0x100000, zcond:14, [type: 2]
143
(5) resetting the second zone
144
After resetting a zone:
145
start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80000, zcond:1, [type: 2]
739
+
146
+
740
+type_init(vhost_user_blk_server_register_types)
147
+
741
diff --git a/softmmu/vl.c b/softmmu/vl.c
148
+(6) append write
742
index XXXXXXX..XXXXXXX 100644
149
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x0, zcond:1, [type: 2]
743
--- a/softmmu/vl.c
150
+After zap done, the append sector is 0x0
744
+++ b/softmmu/vl.c
151
+After appending the first zone firstly:
745
@@ -XXX,XX +XXX,XX @@ static bool object_create_initial(const char *type, QemuOpts *opts)
152
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x18, zcond:2, [type: 2]
746
}
153
+After zap done, the append sector is 0x18
747
#endif
154
+After appending the first zone secondly:
748
155
+start: 0x0, len 0x80000, cap 0x80000, wptr 0x30, zcond:2, [type: 2]
749
+ /* Reason: vhost-user-blk-server property "node-name" */
156
+After zap done, the append sector is 0x80000
750
+ if (g_str_equal(type, "vhost-user-blk-server")) {
157
+After appending the second zone firstly:
751
+ return false;
158
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80018, zcond:2, [type: 2]
752
+ }
159
+After zap done, the append sector is 0x80018
753
/*
160
+After appending the second zone secondly:
754
* Reason: filter-* property "netdev" etc.
161
+start: 0x80000, len 0x80000, cap 0x80000, wptr 0x80030, zcond:2, [type: 2]
755
*/
162
*** done
756
diff --git a/block/meson.build b/block/meson.build
757
index XXXXXXX..XXXXXXX 100644
758
--- a/block/meson.build
759
+++ b/block/meson.build
760
@@ -XXX,XX +XXX,XX @@ block_ss.add(when: 'CONFIG_WIN32', if_true: files('file-win32.c', 'win32-aio.c')
761
block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, iokit])
762
block_ss.add(when: 'CONFIG_LIBISCSI', if_true: files('iscsi-opts.c'))
763
block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c'))
764
+block_ss.add(when: 'CONFIG_LINUX', if_true: files('export/vhost-user-blk-server.c', '../contrib/libvhost-user/libvhost-user.c'))
765
block_ss.add(when: 'CONFIG_REPLICATION', if_true: files('replication.c'))
766
block_ss.add(when: 'CONFIG_SHEEPDOG', if_true: files('sheepdog.c'))
767
block_ss.add(when: ['CONFIG_LINUX_AIO', libaio], if_true: files('linux-aio.c'))
768
--
163
--
769
2.26.2
164
2.40.1
770
diff view generated by jsdifflib
1
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
1
From: Sam Li <faithilikerun@gmail.com>
2
Message-id: 20200924151549.913737-3-stefanha@redhat.com
2
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
4
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
5
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Message-id: 20230508051510.177850-5-faithilikerun@gmail.com
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
---
8
---
5
util/vhost-user-server.c | 2 +-
9
block/file-posix.c | 3 +++
6
1 file changed, 1 insertion(+), 1 deletion(-)
10
block/trace-events | 2 ++
11
2 files changed, 5 insertions(+)
7
12
8
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
13
diff --git a/block/file-posix.c b/block/file-posix.c
9
index XXXXXXX..XXXXXXX 100644
14
index XXXXXXX..XXXXXXX 100644
10
--- a/util/vhost-user-server.c
15
--- a/block/file-posix.c
11
+++ b/util/vhost-user-server.c
16
+++ b/block/file-posix.c
12
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
17
@@ -XXX,XX +XXX,XX @@ out:
13
return false;
18
if (!BDRV_ZT_IS_CONV(*wp)) {
19
if (type & QEMU_AIO_ZONE_APPEND) {
20
*s->offset = *wp;
21
+ trace_zbd_zone_append_complete(bs, *s->offset
22
+ >> BDRV_SECTOR_BITS);
23
}
24
/* Advance the wp if needed */
25
if (offset + bytes > *wp) {
26
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_zone_append(BlockDriverState *bs,
27
len += iov_len;
14
}
28
}
15
29
16
- /* zero out unspecified fileds */
30
+ trace_zbd_zone_append(bs, *offset >> BDRV_SECTOR_BITS);
17
+ /* zero out unspecified fields */
31
return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
18
*server = (VuServer) {
32
}
19
.listener = listener,
33
#endif
20
.vu_iface = vu_iface,
34
diff --git a/block/trace-events b/block/trace-events
35
index XXXXXXX..XXXXXXX 100644
36
--- a/block/trace-events
37
+++ b/block/trace-events
38
@@ -XXX,XX +XXX,XX @@ file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d"
39
file_flush_fdatasync_failed(int err) "errno %d"
40
zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report %d zones starting at sector offset 0x%" PRIx64 ""
41
zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs %p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " sectors"
42
+zbd_zone_append(void *bs, int64_t sector) "bs %p append at sector offset 0x%" PRIx64 ""
43
+zbd_zone_append_complete(void *bs, int64_t sector) "bs %p returns append sector 0x%" PRIx64 ""
44
45
# ssh.c
46
sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
21
--
47
--
22
2.26.2
48
2.40.1
23
diff view generated by jsdifflib
1
Only one struct is needed per request. Drop req_data and the separate
1
From: Sam Li <faithilikerun@gmail.com>
2
VuBlockReq instance. Instead let vu_queue_pop() allocate everything at
3
once.
4
2
5
This fixes the req_data memory leak in vu_block_virtio_process_req().
3
This patch extends virtio-blk emulation to handle zoned device commands
4
by calling the new block layer APIs to perform zoned device I/O on
5
behalf of the guest. It supports Report Zone, four zone oparations (open,
6
close, finish, reset), and Append Zone.
6
7
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
The VIRTIO_BLK_F_ZONED feature bit will only be set if the host does
8
Message-id: 20200924151549.913737-6-stefanha@redhat.com
9
support zoned block devices. Regular block devices(conventional zones)
10
will not be set.
11
12
The guest os can use blktests, fio to test those commands on zoned devices.
13
Furthermore, using zonefs to test zone append write is also supported.
14
15
Signed-off-by: Sam Li <faithilikerun@gmail.com>
16
Message-id: 20230508051916.178322-2-faithilikerun@gmail.com
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
18
---
11
block/export/vhost-user-blk-server.c | 68 +++++++++-------------------
19
hw/block/virtio-blk-common.c | 2 +
12
1 file changed, 21 insertions(+), 47 deletions(-)
20
hw/block/virtio-blk.c | 389 +++++++++++++++++++++++++++++++++++
21
hw/virtio/virtio-qmp.c | 2 +
22
3 files changed, 393 insertions(+)
13
23
14
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
24
diff --git a/hw/block/virtio-blk-common.c b/hw/block/virtio-blk-common.c
15
index XXXXXXX..XXXXXXX 100644
25
index XXXXXXX..XXXXXXX 100644
16
--- a/block/export/vhost-user-blk-server.c
26
--- a/hw/block/virtio-blk-common.c
17
+++ b/block/export/vhost-user-blk-server.c
27
+++ b/hw/block/virtio-blk-common.c
18
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_inhdr {
28
@@ -XXX,XX +XXX,XX @@ static const VirtIOFeature feature_sizes[] = {
29
.end = endof(struct virtio_blk_config, discard_sector_alignment)},
30
{.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES,
31
.end = endof(struct virtio_blk_config, write_zeroes_may_unmap)},
32
+ {.flags = 1ULL << VIRTIO_BLK_F_ZONED,
33
+ .end = endof(struct virtio_blk_config, zoned)},
34
{}
19
};
35
};
20
36
21
typedef struct VuBlockReq {
37
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
22
- VuVirtqElement *elem;
38
index XXXXXXX..XXXXXXX 100644
23
+ VuVirtqElement elem;
39
--- a/hw/block/virtio-blk.c
24
int64_t sector_num;
40
+++ b/hw/block/virtio-blk.c
25
size_t size;
41
@@ -XXX,XX +XXX,XX @@
26
struct virtio_blk_inhdr *in;
42
#include "qemu/module.h"
27
@@ -XXX,XX +XXX,XX @@ static void vu_block_req_complete(VuBlockReq *req)
43
#include "qemu/error-report.h"
28
VuDev *vu_dev = &req->server->vu_dev;
44
#include "qemu/main-loop.h"
29
45
+#include "block/block_int.h"
30
/* IO size with 1 extra status byte */
46
#include "trace.h"
31
- vu_queue_push(vu_dev, req->vq, req->elem, req->size + 1);
47
#include "hw/block/block.h"
32
+ vu_queue_push(vu_dev, req->vq, &req->elem, req->size + 1);
48
#include "hw/qdev-properties.h"
33
vu_queue_notify(vu_dev, req->vq);
49
@@ -XXX,XX +XXX,XX @@ err:
34
50
return err_status;
35
- if (req->elem) {
36
- free(req->elem);
37
- }
38
-
39
- g_free(req);
40
+ free(req);
41
}
51
}
42
52
43
static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
53
+typedef struct ZoneCmdData {
44
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_flush(VuBlockReq *req)
54
+ VirtIOBlockReq *req;
45
blk_co_flush(backend);
55
+ struct iovec *in_iov;
56
+ unsigned in_num;
57
+ union {
58
+ struct {
59
+ unsigned int nr_zones;
60
+ BlockZoneDescriptor *zones;
61
+ } zone_report_data;
62
+ struct {
63
+ int64_t offset;
64
+ } zone_append_data;
65
+ };
66
+} ZoneCmdData;
67
+
68
+/*
69
+ * check zoned_request: error checking before issuing requests. If all checks
70
+ * passed, return true.
71
+ * append: true if only zone append requests issued.
72
+ */
73
+static bool check_zoned_request(VirtIOBlock *s, int64_t offset, int64_t len,
74
+ bool append, uint8_t *status) {
75
+ BlockDriverState *bs = blk_bs(s->blk);
76
+ int index;
77
+
78
+ if (!virtio_has_feature(s->host_features, VIRTIO_BLK_F_ZONED)) {
79
+ *status = VIRTIO_BLK_S_UNSUPP;
80
+ return false;
81
+ }
82
+
83
+ if (offset < 0 || len < 0 || len > (bs->total_sectors << BDRV_SECTOR_BITS)
84
+ || offset > (bs->total_sectors << BDRV_SECTOR_BITS) - len) {
85
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
86
+ return false;
87
+ }
88
+
89
+ if (append) {
90
+ if (bs->bl.write_granularity) {
91
+ if ((offset % bs->bl.write_granularity) != 0) {
92
+ *status = VIRTIO_BLK_S_ZONE_UNALIGNED_WP;
93
+ return false;
94
+ }
95
+ }
96
+
97
+ index = offset / bs->bl.zone_size;
98
+ if (BDRV_ZT_IS_CONV(bs->wps->wp[index])) {
99
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
100
+ return false;
101
+ }
102
+
103
+ if (len / 512 > bs->bl.max_append_sectors) {
104
+ if (bs->bl.max_append_sectors == 0) {
105
+ *status = VIRTIO_BLK_S_UNSUPP;
106
+ } else {
107
+ *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
108
+ }
109
+ return false;
110
+ }
111
+ }
112
+ return true;
113
+}
114
+
115
+static void virtio_blk_zone_report_complete(void *opaque, int ret)
116
+{
117
+ ZoneCmdData *data = opaque;
118
+ VirtIOBlockReq *req = data->req;
119
+ VirtIOBlock *s = req->dev;
120
+ VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
121
+ struct iovec *in_iov = data->in_iov;
122
+ unsigned in_num = data->in_num;
123
+ int64_t zrp_size, n, j = 0;
124
+ int64_t nz = data->zone_report_data.nr_zones;
125
+ int8_t err_status = VIRTIO_BLK_S_OK;
126
+
127
+ if (ret) {
128
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
129
+ goto out;
130
+ }
131
+
132
+ struct virtio_blk_zone_report zrp_hdr = (struct virtio_blk_zone_report) {
133
+ .nr_zones = cpu_to_le64(nz),
134
+ };
135
+ zrp_size = sizeof(struct virtio_blk_zone_report)
136
+ + sizeof(struct virtio_blk_zone_descriptor) * nz;
137
+ n = iov_from_buf(in_iov, in_num, 0, &zrp_hdr, sizeof(zrp_hdr));
138
+ if (n != sizeof(zrp_hdr)) {
139
+ virtio_error(vdev, "Driver provided input buffer that is too small!");
140
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
141
+ goto out;
142
+ }
143
+
144
+ for (size_t i = sizeof(zrp_hdr); i < zrp_size;
145
+ i += sizeof(struct virtio_blk_zone_descriptor), ++j) {
146
+ struct virtio_blk_zone_descriptor desc =
147
+ (struct virtio_blk_zone_descriptor) {
148
+ .z_start = cpu_to_le64(data->zone_report_data.zones[j].start
149
+ >> BDRV_SECTOR_BITS),
150
+ .z_cap = cpu_to_le64(data->zone_report_data.zones[j].cap
151
+ >> BDRV_SECTOR_BITS),
152
+ .z_wp = cpu_to_le64(data->zone_report_data.zones[j].wp
153
+ >> BDRV_SECTOR_BITS),
154
+ };
155
+
156
+ switch (data->zone_report_data.zones[j].type) {
157
+ case BLK_ZT_CONV:
158
+ desc.z_type = VIRTIO_BLK_ZT_CONV;
159
+ break;
160
+ case BLK_ZT_SWR:
161
+ desc.z_type = VIRTIO_BLK_ZT_SWR;
162
+ break;
163
+ case BLK_ZT_SWP:
164
+ desc.z_type = VIRTIO_BLK_ZT_SWP;
165
+ break;
166
+ default:
167
+ g_assert_not_reached();
168
+ }
169
+
170
+ switch (data->zone_report_data.zones[j].state) {
171
+ case BLK_ZS_RDONLY:
172
+ desc.z_state = VIRTIO_BLK_ZS_RDONLY;
173
+ break;
174
+ case BLK_ZS_OFFLINE:
175
+ desc.z_state = VIRTIO_BLK_ZS_OFFLINE;
176
+ break;
177
+ case BLK_ZS_EMPTY:
178
+ desc.z_state = VIRTIO_BLK_ZS_EMPTY;
179
+ break;
180
+ case BLK_ZS_CLOSED:
181
+ desc.z_state = VIRTIO_BLK_ZS_CLOSED;
182
+ break;
183
+ case BLK_ZS_FULL:
184
+ desc.z_state = VIRTIO_BLK_ZS_FULL;
185
+ break;
186
+ case BLK_ZS_EOPEN:
187
+ desc.z_state = VIRTIO_BLK_ZS_EOPEN;
188
+ break;
189
+ case BLK_ZS_IOPEN:
190
+ desc.z_state = VIRTIO_BLK_ZS_IOPEN;
191
+ break;
192
+ case BLK_ZS_NOT_WP:
193
+ desc.z_state = VIRTIO_BLK_ZS_NOT_WP;
194
+ break;
195
+ default:
196
+ g_assert_not_reached();
197
+ }
198
+
199
+ /* TODO: it takes O(n^2) time complexity. Optimizations required. */
200
+ n = iov_from_buf(in_iov, in_num, i, &desc, sizeof(desc));
201
+ if (n != sizeof(desc)) {
202
+ virtio_error(vdev, "Driver provided input buffer "
203
+ "for descriptors that is too small!");
204
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
205
+ }
206
+ }
207
+
208
+out:
209
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
210
+ virtio_blk_req_complete(req, err_status);
211
+ virtio_blk_free_request(req);
212
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
213
+ g_free(data->zone_report_data.zones);
214
+ g_free(data);
215
+}
216
+
217
+static void virtio_blk_handle_zone_report(VirtIOBlockReq *req,
218
+ struct iovec *in_iov,
219
+ unsigned in_num)
220
+{
221
+ VirtIOBlock *s = req->dev;
222
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
223
+ unsigned int nr_zones;
224
+ ZoneCmdData *data;
225
+ int64_t zone_size, offset;
226
+ uint8_t err_status;
227
+
228
+ if (req->in_len < sizeof(struct virtio_blk_inhdr) +
229
+ sizeof(struct virtio_blk_zone_report) +
230
+ sizeof(struct virtio_blk_zone_descriptor)) {
231
+ virtio_error(vdev, "in buffer too small for zone report");
232
+ return;
233
+ }
234
+
235
+ /* start byte offset of the zone report */
236
+ offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
237
+ if (!check_zoned_request(s, offset, 0, false, &err_status)) {
238
+ goto out;
239
+ }
240
+ nr_zones = (req->in_len - sizeof(struct virtio_blk_inhdr) -
241
+ sizeof(struct virtio_blk_zone_report)) /
242
+ sizeof(struct virtio_blk_zone_descriptor);
243
+
244
+ zone_size = sizeof(BlockZoneDescriptor) * nr_zones;
245
+ data = g_malloc(sizeof(ZoneCmdData));
246
+ data->req = req;
247
+ data->in_iov = in_iov;
248
+ data->in_num = in_num;
249
+ data->zone_report_data.nr_zones = nr_zones;
250
+ data->zone_report_data.zones = g_malloc(zone_size),
251
+
252
+ blk_aio_zone_report(s->blk, offset, &data->zone_report_data.nr_zones,
253
+ data->zone_report_data.zones,
254
+ virtio_blk_zone_report_complete, data);
255
+ return;
256
+out:
257
+ virtio_blk_req_complete(req, err_status);
258
+ virtio_blk_free_request(req);
259
+}
260
+
261
+static void virtio_blk_zone_mgmt_complete(void *opaque, int ret)
262
+{
263
+ VirtIOBlockReq *req = opaque;
264
+ VirtIOBlock *s = req->dev;
265
+ int8_t err_status = VIRTIO_BLK_S_OK;
266
+
267
+ if (ret) {
268
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
269
+ }
270
+
271
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
272
+ virtio_blk_req_complete(req, err_status);
273
+ virtio_blk_free_request(req);
274
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
275
+}
276
+
277
+static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
278
+{
279
+ VirtIOBlock *s = req->dev;
280
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
281
+ BlockDriverState *bs = blk_bs(s->blk);
282
+ int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
283
+ uint64_t len;
284
+ uint64_t capacity = bs->total_sectors << BDRV_SECTOR_BITS;
285
+ uint8_t err_status = VIRTIO_BLK_S_OK;
286
+
287
+ uint32_t type = virtio_ldl_p(vdev, &req->out.type);
288
+ if (type == VIRTIO_BLK_T_ZONE_RESET_ALL) {
289
+ /* Entire drive capacity */
290
+ offset = 0;
291
+ len = capacity;
292
+ } else {
293
+ if (bs->bl.zone_size > capacity - offset) {
294
+ /* The zoned device allows the last smaller zone. */
295
+ len = capacity - bs->bl.zone_size * (bs->bl.nr_zones - 1);
296
+ } else {
297
+ len = bs->bl.zone_size;
298
+ }
299
+ }
300
+
301
+ if (!check_zoned_request(s, offset, len, false, &err_status)) {
302
+ goto out;
303
+ }
304
+
305
+ blk_aio_zone_mgmt(s->blk, op, offset, len,
306
+ virtio_blk_zone_mgmt_complete, req);
307
+
308
+ return 0;
309
+out:
310
+ virtio_blk_req_complete(req, err_status);
311
+ virtio_blk_free_request(req);
312
+ return err_status;
313
+}
314
+
315
+static void virtio_blk_zone_append_complete(void *opaque, int ret)
316
+{
317
+ ZoneCmdData *data = opaque;
318
+ VirtIOBlockReq *req = data->req;
319
+ VirtIOBlock *s = req->dev;
320
+ VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
321
+ int64_t append_sector, n;
322
+ uint8_t err_status = VIRTIO_BLK_S_OK;
323
+
324
+ if (ret) {
325
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
326
+ goto out;
327
+ }
328
+
329
+ virtio_stq_p(vdev, &append_sector,
330
+ data->zone_append_data.offset >> BDRV_SECTOR_BITS);
331
+ n = iov_from_buf(data->in_iov, data->in_num, 0, &append_sector,
332
+ sizeof(append_sector));
333
+ if (n != sizeof(append_sector)) {
334
+ virtio_error(vdev, "Driver provided input buffer less than size of "
335
+ "append_sector");
336
+ err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
337
+ goto out;
338
+ }
339
+
340
+out:
341
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
342
+ virtio_blk_req_complete(req, err_status);
343
+ virtio_blk_free_request(req);
344
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
345
+ g_free(data);
346
+}
347
+
348
+static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
349
+ struct iovec *out_iov,
350
+ struct iovec *in_iov,
351
+ uint64_t out_num,
352
+ unsigned in_num) {
353
+ VirtIOBlock *s = req->dev;
354
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
355
+ uint8_t err_status = VIRTIO_BLK_S_OK;
356
+
357
+ int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
358
+ int64_t len = iov_size(out_iov, out_num);
359
+
360
+ if (!check_zoned_request(s, offset, len, true, &err_status)) {
361
+ goto out;
362
+ }
363
+
364
+ ZoneCmdData *data = g_malloc(sizeof(ZoneCmdData));
365
+ data->req = req;
366
+ data->in_iov = in_iov;
367
+ data->in_num = in_num;
368
+ data->zone_append_data.offset = offset;
369
+ qemu_iovec_init_external(&req->qiov, out_iov, out_num);
370
+ blk_aio_zone_append(s->blk, &data->zone_append_data.offset, &req->qiov, 0,
371
+ virtio_blk_zone_append_complete, data);
372
+ return 0;
373
+
374
+out:
375
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
376
+ virtio_blk_req_complete(req, err_status);
377
+ virtio_blk_free_request(req);
378
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
379
+ return err_status;
380
+}
381
+
382
static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
383
{
384
uint32_t type;
385
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
386
case VIRTIO_BLK_T_FLUSH:
387
virtio_blk_handle_flush(req, mrb);
388
break;
389
+ case VIRTIO_BLK_T_ZONE_REPORT:
390
+ virtio_blk_handle_zone_report(req, in_iov, in_num);
391
+ break;
392
+ case VIRTIO_BLK_T_ZONE_OPEN:
393
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_OPEN);
394
+ break;
395
+ case VIRTIO_BLK_T_ZONE_CLOSE:
396
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_CLOSE);
397
+ break;
398
+ case VIRTIO_BLK_T_ZONE_FINISH:
399
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_FINISH);
400
+ break;
401
+ case VIRTIO_BLK_T_ZONE_RESET:
402
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_RESET);
403
+ break;
404
+ case VIRTIO_BLK_T_ZONE_RESET_ALL:
405
+ virtio_blk_handle_zone_mgmt(req, BLK_ZO_RESET);
406
+ break;
407
case VIRTIO_BLK_T_SCSI_CMD:
408
virtio_blk_handle_scsi(req);
409
break;
410
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
411
virtio_blk_free_request(req);
412
break;
413
}
414
+ case VIRTIO_BLK_T_ZONE_APPEND & ~VIRTIO_BLK_T_OUT:
415
+ /*
416
+ * Passing out_iov/out_num and in_iov/in_num is not safe
417
+ * to access req->elem.out_sg directly because it may be
418
+ * modified by virtio_blk_handle_request().
419
+ */
420
+ virtio_blk_handle_zone_append(req, out_iov, in_iov, out_num, in_num);
421
+ break;
422
/*
423
* VIRTIO_BLK_T_DISCARD and VIRTIO_BLK_T_WRITE_ZEROES are defined with
424
* VIRTIO_BLK_T_OUT flag set. We masked this flag in the switch statement,
425
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
426
{
427
VirtIOBlock *s = VIRTIO_BLK(vdev);
428
BlockConf *conf = &s->conf.conf;
429
+ BlockDriverState *bs = blk_bs(s->blk);
430
struct virtio_blk_config blkcfg;
431
uint64_t capacity;
432
int64_t length;
433
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
434
blkcfg.write_zeroes_may_unmap = 1;
435
virtio_stl_p(vdev, &blkcfg.max_write_zeroes_seg, 1);
436
}
437
+ if (bs->bl.zoned != BLK_Z_NONE) {
438
+ switch (bs->bl.zoned) {
439
+ case BLK_Z_HM:
440
+ blkcfg.zoned.model = VIRTIO_BLK_Z_HM;
441
+ break;
442
+ case BLK_Z_HA:
443
+ blkcfg.zoned.model = VIRTIO_BLK_Z_HA;
444
+ break;
445
+ default:
446
+ g_assert_not_reached();
447
+ }
448
+
449
+ virtio_stl_p(vdev, &blkcfg.zoned.zone_sectors,
450
+ bs->bl.zone_size / 512);
451
+ virtio_stl_p(vdev, &blkcfg.zoned.max_active_zones,
452
+ bs->bl.max_active_zones);
453
+ virtio_stl_p(vdev, &blkcfg.zoned.max_open_zones,
454
+ bs->bl.max_open_zones);
455
+ virtio_stl_p(vdev, &blkcfg.zoned.write_granularity, blk_size);
456
+ virtio_stl_p(vdev, &blkcfg.zoned.max_append_sectors,
457
+ bs->bl.max_append_sectors);
458
+ } else {
459
+ blkcfg.zoned.model = VIRTIO_BLK_Z_NONE;
460
+ }
461
memcpy(config, &blkcfg, s->config_size);
46
}
462
}
47
463
48
-struct req_data {
464
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
49
- VuServer *server;
465
return;
50
- VuVirtq *vq;
51
- VuVirtqElement *elem;
52
-};
53
-
54
static void coroutine_fn vu_block_virtio_process_req(void *opaque)
55
{
56
- struct req_data *data = opaque;
57
- VuServer *server = data->server;
58
- VuVirtq *vq = data->vq;
59
- VuVirtqElement *elem = data->elem;
60
+ VuBlockReq *req = opaque;
61
+ VuServer *server = req->server;
62
+ VuVirtqElement *elem = &req->elem;
63
uint32_t type;
64
- VuBlockReq *req;
65
66
VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
67
BlockBackend *backend = vdev_blk->backend;
68
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
69
struct iovec *out_iov = elem->out_sg;
70
unsigned in_num = elem->in_num;
71
unsigned out_num = elem->out_num;
72
+
73
/* refer to hw/block/virtio_blk.c */
74
if (elem->out_num < 1 || elem->in_num < 1) {
75
error_report("virtio-blk request missing headers");
76
- free(elem);
77
- return;
78
+ goto err;
79
}
466
}
80
467
81
- req = g_new0(VuBlockReq, 1);
468
+ BlockDriverState *bs = blk_bs(conf->conf.blk);
82
- req->server = server;
469
+ if (bs->bl.zoned != BLK_Z_NONE) {
83
- req->vq = vq;
470
+ virtio_add_feature(&s->host_features, VIRTIO_BLK_F_ZONED);
84
- req->elem = elem;
471
+ if (bs->bl.zoned == BLK_Z_HM) {
85
-
472
+ virtio_clear_feature(&s->host_features, VIRTIO_BLK_F_DISCARD);
86
if (unlikely(iov_to_buf(out_iov, out_num, 0, &req->out,
473
+ }
87
sizeof(req->out)) != sizeof(req->out))) {
474
+ }
88
error_report("virtio-blk request outhdr too short");
475
+
89
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
476
if (virtio_has_feature(s->host_features, VIRTIO_BLK_F_DISCARD) &&
90
477
(!conf->max_discard_sectors ||
91
err:
478
conf->max_discard_sectors > BDRV_REQUEST_MAX_SECTORS)) {
92
free(elem);
479
diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c
93
- g_free(req);
480
index XXXXXXX..XXXXXXX 100644
94
- return;
481
--- a/hw/virtio/virtio-qmp.c
95
}
482
+++ b/hw/virtio/virtio-qmp.c
96
483
@@ -XXX,XX +XXX,XX @@ static const qmp_virtio_feature_map_t virtio_blk_feature_map[] = {
97
static void vu_block_process_vq(VuDev *vu_dev, int idx)
484
"VIRTIO_BLK_F_DISCARD: Discard command supported"),
98
{
485
FEATURE_ENTRY(VIRTIO_BLK_F_WRITE_ZEROES, \
99
- VuServer *server;
486
"VIRTIO_BLK_F_WRITE_ZEROES: Write zeroes command supported"),
100
- VuVirtq *vq;
487
+ FEATURE_ENTRY(VIRTIO_BLK_F_ZONED, \
101
- struct req_data *req_data;
488
+ "VIRTIO_BLK_F_ZONED: Zoned block devices"),
102
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
489
#ifndef VIRTIO_BLK_NO_LEGACY
103
+ VuVirtq *vq = vu_get_queue(vu_dev, idx);
490
FEATURE_ENTRY(VIRTIO_BLK_F_BARRIER, \
104
491
"VIRTIO_BLK_F_BARRIER: Request barriers supported"),
105
- server = container_of(vu_dev, VuServer, vu_dev);
106
- assert(server);
107
-
108
- vq = vu_get_queue(vu_dev, idx);
109
- assert(vq);
110
- VuVirtqElement *elem;
111
while (1) {
112
- elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) +
113
- sizeof(VuBlockReq));
114
- if (elem) {
115
- req_data = g_new0(struct req_data, 1);
116
- req_data->server = server;
117
- req_data->vq = vq;
118
- req_data->elem = elem;
119
- Coroutine *co = qemu_coroutine_create(vu_block_virtio_process_req,
120
- req_data);
121
- aio_co_enter(server->ioc->ctx, co);
122
- } else {
123
+ VuBlockReq *req;
124
+
125
+ req = vu_queue_pop(vu_dev, vq, sizeof(VuBlockReq));
126
+ if (!req) {
127
break;
128
}
129
+
130
+ req->server = server;
131
+ req->vq = vq;
132
+
133
+ Coroutine *co =
134
+ qemu_coroutine_create(vu_block_virtio_process_req, req);
135
+ qemu_coroutine_enter(co);
136
}
137
}
138
139
--
492
--
140
2.26.2
493
2.40.1
141
diff view generated by jsdifflib
1
From: Philippe Mathieu-Daudé <philmd@redhat.com>
1
From: Sam Li <faithilikerun@gmail.com>
2
2
3
Keep statistics of some hardware errors, and number of
3
Taking account of the new zone append write operation for zoned devices,
4
aligned/unaligned I/O accesses.
4
BLOCK_ACCT_ZONE_APPEND enum is introduced as other I/O request type (read,
5
write, flush).
5
6
6
QMP example booting a full RHEL 8.3 aarch64 guest:
7
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
8
Message-id: 20230508051916.178322-3-faithilikerun@gmail.com
8
{ "execute": "query-blockstats" }
9
{
10
"return": [
11
{
12
"device": "",
13
"node-name": "drive0",
14
"stats": {
15
"flush_total_time_ns": 6026948,
16
"wr_highest_offset": 3383991230464,
17
"wr_total_time_ns": 807450995,
18
"failed_wr_operations": 0,
19
"failed_rd_operations": 0,
20
"wr_merged": 3,
21
"wr_bytes": 50133504,
22
"failed_unmap_operations": 0,
23
"failed_flush_operations": 0,
24
"account_invalid": false,
25
"rd_total_time_ns": 1846979900,
26
"flush_operations": 130,
27
"wr_operations": 659,
28
"rd_merged": 1192,
29
"rd_bytes": 218244096,
30
"account_failed": false,
31
"idle_time_ns": 2678641497,
32
"rd_operations": 7406,
33
},
34
"driver-specific": {
35
"driver": "nvme",
36
"completion-errors": 0,
37
"unaligned-accesses": 2959,
38
"aligned-accesses": 4477
39
},
40
"qdev": "/machine/peripheral-anon/device[0]/virtio-backend"
41
}
42
]
43
}
44
45
Suggested-by: Stefan Hajnoczi <stefanha@gmail.com>
46
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
47
Acked-by: Markus Armbruster <armbru@redhat.com>
48
Message-id: 20201001162939.1567915-1-philmd@redhat.com
49
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
50
---
10
---
51
qapi/block-core.json | 24 +++++++++++++++++++++++-
11
qapi/block-core.json | 68 ++++++++++++++++++++++++++++++++------
52
block/nvme.c | 27 +++++++++++++++++++++++++++
12
qapi/block.json | 4 +++
53
2 files changed, 50 insertions(+), 1 deletion(-)
13
include/block/accounting.h | 1 +
14
block/qapi-sysemu.c | 11 ++++++
15
block/qapi.c | 18 ++++++++++
16
hw/block/virtio-blk.c | 4 +++
17
tests/qemu-iotests/227.out | 18 ++++++++++
18
7 files changed, 113 insertions(+), 11 deletions(-)
54
19
55
diff --git a/qapi/block-core.json b/qapi/block-core.json
20
diff --git a/qapi/block-core.json b/qapi/block-core.json
56
index XXXXXXX..XXXXXXX 100644
21
index XXXXXXX..XXXXXXX 100644
57
--- a/qapi/block-core.json
22
--- a/qapi/block-core.json
58
+++ b/qapi/block-core.json
23
+++ b/qapi/block-core.json
59
@@ -XXX,XX +XXX,XX @@
24
@@ -XXX,XX +XXX,XX @@
60
'discard-nb-failed': 'uint64',
25
# @min_wr_latency_ns: Minimum latency of write operations in the
61
'discard-bytes-ok': 'uint64' } }
26
# defined interval, in nanoseconds.
62
27
#
63
+##
28
+# @min_zone_append_latency_ns: Minimum latency of zone append operations
64
+# @BlockStatsSpecificNvme:
29
+# in the defined interval, in nanoseconds
65
+#
30
+# (since 8.1)
66
+# NVMe driver statistics
31
+#
67
+#
32
# @min_flush_latency_ns: Minimum latency of flush operations in the
68
+# @completion-errors: The number of completion errors.
33
# defined interval, in nanoseconds.
69
+#
34
#
70
+# @aligned-accesses: The number of aligned accesses performed by
35
@@ -XXX,XX +XXX,XX @@
71
+# the driver.
36
# @max_wr_latency_ns: Maximum latency of write operations in the
72
+#
37
# defined interval, in nanoseconds.
73
+# @unaligned-accesses: The number of unaligned accesses performed by
38
#
74
+# the driver.
39
+# @max_zone_append_latency_ns: Maximum latency of zone append operations
75
+#
40
+# in the defined interval, in nanoseconds
76
+# Since: 5.2
41
+# (since 8.1)
77
+##
42
+#
78
+{ 'struct': 'BlockStatsSpecificNvme',
43
# @max_flush_latency_ns: Maximum latency of flush operations in the
79
+ 'data': {
44
# defined interval, in nanoseconds.
80
+ 'completion-errors': 'uint64',
45
#
81
+ 'aligned-accesses': 'uint64',
46
@@ -XXX,XX +XXX,XX @@
82
+ 'unaligned-accesses': 'uint64' } }
47
# @avg_wr_latency_ns: Average latency of write operations in the
48
# defined interval, in nanoseconds.
49
#
50
+# @avg_zone_append_latency_ns: Average latency of zone append operations
51
+# in the defined interval, in nanoseconds
52
+# (since 8.1)
53
+#
54
# @avg_flush_latency_ns: Average latency of flush operations in the
55
# defined interval, in nanoseconds.
56
#
57
@@ -XXX,XX +XXX,XX @@
58
# @avg_wr_queue_depth: Average number of pending write operations in
59
# the defined interval.
60
#
61
+# @avg_zone_append_queue_depth: Average number of pending zone append
62
+# operations in the defined interval
63
+# (since 8.1).
64
+#
65
# Since: 2.5
66
##
67
{ 'struct': 'BlockDeviceTimedStats',
68
'data': { 'interval_length': 'int', 'min_rd_latency_ns': 'int',
69
'max_rd_latency_ns': 'int', 'avg_rd_latency_ns': 'int',
70
'min_wr_latency_ns': 'int', 'max_wr_latency_ns': 'int',
71
- 'avg_wr_latency_ns': 'int', 'min_flush_latency_ns': 'int',
72
- 'max_flush_latency_ns': 'int', 'avg_flush_latency_ns': 'int',
73
- 'avg_rd_queue_depth': 'number', 'avg_wr_queue_depth': 'number' } }
74
+ 'avg_wr_latency_ns': 'int', 'min_zone_append_latency_ns': 'int',
75
+ 'max_zone_append_latency_ns': 'int',
76
+ 'avg_zone_append_latency_ns': 'int',
77
+ 'min_flush_latency_ns': 'int', 'max_flush_latency_ns': 'int',
78
+ 'avg_flush_latency_ns': 'int', 'avg_rd_queue_depth': 'number',
79
+ 'avg_wr_queue_depth': 'number',
80
+ 'avg_zone_append_queue_depth': 'number' } }
81
82
##
83
# @BlockDeviceStats:
84
@@ -XXX,XX +XXX,XX @@
85
#
86
# @wr_bytes: The number of bytes written by the device.
87
#
88
+# @zone_append_bytes: The number of bytes appended by the zoned devices
89
+# (since 8.1)
90
+#
91
# @unmap_bytes: The number of bytes unmapped by the device (Since 4.2)
92
#
93
# @rd_operations: The number of read operations performed by the
94
@@ -XXX,XX +XXX,XX @@
95
# @wr_operations: The number of write operations performed by the
96
# device.
97
#
98
+# @zone_append_operations: The number of zone append operations performed
99
+# by the zoned devices (since 8.1)
100
+#
101
# @flush_operations: The number of cache flush operations performed by
102
# the device (since 0.15)
103
#
104
@@ -XXX,XX +XXX,XX @@
105
# @wr_total_time_ns: Total time spent on writes in nanoseconds (since
106
# 0.15).
107
#
108
+# @zone_append_total_time_ns: Total time spent on zone append writes
109
+# in nanoseconds (since 8.1)
110
+#
111
# @flush_total_time_ns: Total time spent on cache flushes in
112
# nanoseconds (since 0.15).
113
#
114
@@ -XXX,XX +XXX,XX @@
115
# @wr_merged: Number of write requests that have been merged into
116
# another request (Since 2.3).
117
#
118
+# @zone_append_merged: Number of zone append requests that have been merged
119
+# into another request (since 8.1)
120
+#
121
# @unmap_merged: Number of unmap requests that have been merged into
122
# another request (Since 4.2)
123
#
124
@@ -XXX,XX +XXX,XX @@
125
# @failed_wr_operations: The number of failed write operations
126
# performed by the device (Since 2.5)
127
#
128
+# @failed_zone_append_operations: The number of failed zone append write
129
+# operations performed by the zoned devices
130
+# (since 8.1)
131
+#
132
# @failed_flush_operations: The number of failed flush operations
133
# performed by the device (Since 2.5)
134
#
135
@@ -XXX,XX +XXX,XX @@
136
# @invalid_wr_operations: The number of invalid write operations
137
# performed by the device (Since 2.5)
138
#
139
+# @invalid_zone_append_operations: The number of invalid zone append operations
140
+# performed by the zoned device (since 8.1)
141
+#
142
# @invalid_flush_operations: The number of invalid flush operations
143
# performed by the device (Since 2.5)
144
#
145
@@ -XXX,XX +XXX,XX @@
146
#
147
# @wr_latency_histogram: @BlockLatencyHistogramInfo. (Since 4.0)
148
#
149
+# @zone_append_latency_histogram: @BlockLatencyHistogramInfo. (since 8.1)
150
+#
151
# @flush_latency_histogram: @BlockLatencyHistogramInfo. (Since 4.0)
152
#
153
# Since: 0.14
154
##
155
{ 'struct': 'BlockDeviceStats',
156
- 'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'unmap_bytes' : 'int',
157
- 'rd_operations': 'int', 'wr_operations': 'int',
158
+ 'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'zone_append_bytes': 'int',
159
+ 'unmap_bytes' : 'int', 'rd_operations': 'int',
160
+ 'wr_operations': 'int', 'zone_append_operations': 'int',
161
'flush_operations': 'int', 'unmap_operations': 'int',
162
'rd_total_time_ns': 'int', 'wr_total_time_ns': 'int',
163
- 'flush_total_time_ns': 'int', 'unmap_total_time_ns': 'int',
164
- 'wr_highest_offset': 'int',
165
- 'rd_merged': 'int', 'wr_merged': 'int', 'unmap_merged': 'int',
166
- '*idle_time_ns': 'int',
167
+ 'zone_append_total_time_ns': 'int', 'flush_total_time_ns': 'int',
168
+ 'unmap_total_time_ns': 'int', 'wr_highest_offset': 'int',
169
+ 'rd_merged': 'int', 'wr_merged': 'int', 'zone_append_merged': 'int',
170
+ 'unmap_merged': 'int', '*idle_time_ns': 'int',
171
'failed_rd_operations': 'int', 'failed_wr_operations': 'int',
172
- 'failed_flush_operations': 'int', 'failed_unmap_operations': 'int',
173
- 'invalid_rd_operations': 'int', 'invalid_wr_operations': 'int',
174
+ 'failed_zone_append_operations': 'int',
175
+ 'failed_flush_operations': 'int',
176
+ 'failed_unmap_operations': 'int', 'invalid_rd_operations': 'int',
177
+ 'invalid_wr_operations': 'int',
178
+ 'invalid_zone_append_operations': 'int',
179
'invalid_flush_operations': 'int', 'invalid_unmap_operations': 'int',
180
'account_invalid': 'bool', 'account_failed': 'bool',
181
'timed_stats': ['BlockDeviceTimedStats'],
182
'*rd_latency_histogram': 'BlockLatencyHistogramInfo',
183
'*wr_latency_histogram': 'BlockLatencyHistogramInfo',
184
+ '*zone_append_latency_histogram': 'BlockLatencyHistogramInfo',
185
'*flush_latency_histogram': 'BlockLatencyHistogramInfo' } }
186
187
##
188
diff --git a/qapi/block.json b/qapi/block.json
189
index XXXXXXX..XXXXXXX 100644
190
--- a/qapi/block.json
191
+++ b/qapi/block.json
192
@@ -XXX,XX +XXX,XX @@
193
# @boundaries-write: list of interval boundary values for write
194
# latency histogram.
195
#
196
+# @boundaries-zap: list of interval boundary values for zone append write
197
+# latency histogram.
198
+#
199
# @boundaries-flush: list of interval boundary values for flush
200
# latency histogram.
201
#
202
@@ -XXX,XX +XXX,XX @@
203
'*boundaries': ['uint64'],
204
'*boundaries-read': ['uint64'],
205
'*boundaries-write': ['uint64'],
206
+ '*boundaries-zap': ['uint64'],
207
'*boundaries-flush': ['uint64'] },
208
'allow-preconfig': true }
209
diff --git a/include/block/accounting.h b/include/block/accounting.h
210
index XXXXXXX..XXXXXXX 100644
211
--- a/include/block/accounting.h
212
+++ b/include/block/accounting.h
213
@@ -XXX,XX +XXX,XX @@ enum BlockAcctType {
214
BLOCK_ACCT_READ,
215
BLOCK_ACCT_WRITE,
216
BLOCK_ACCT_FLUSH,
217
+ BLOCK_ACCT_ZONE_APPEND,
218
BLOCK_ACCT_UNMAP,
219
BLOCK_MAX_IOTYPE,
220
};
221
diff --git a/block/qapi-sysemu.c b/block/qapi-sysemu.c
222
index XXXXXXX..XXXXXXX 100644
223
--- a/block/qapi-sysemu.c
224
+++ b/block/qapi-sysemu.c
225
@@ -XXX,XX +XXX,XX @@ void qmp_block_latency_histogram_set(
226
bool has_boundaries, uint64List *boundaries,
227
bool has_boundaries_read, uint64List *boundaries_read,
228
bool has_boundaries_write, uint64List *boundaries_write,
229
+ bool has_boundaries_append, uint64List *boundaries_append,
230
bool has_boundaries_flush, uint64List *boundaries_flush,
231
Error **errp)
232
{
233
@@ -XXX,XX +XXX,XX @@ void qmp_block_latency_histogram_set(
234
}
235
}
236
237
+ if (has_boundaries || has_boundaries_append) {
238
+ ret = block_latency_histogram_set(
239
+ stats, BLOCK_ACCT_ZONE_APPEND,
240
+ has_boundaries_append ? boundaries_append : boundaries);
241
+ if (ret) {
242
+ error_setg(errp, "Device '%s' set append write boundaries fail", id);
243
+ return;
244
+ }
245
+ }
83
+
246
+
84
##
247
if (has_boundaries || has_boundaries_flush) {
85
# @BlockStatsSpecific:
248
ret = block_latency_histogram_set(
86
#
249
stats, BLOCK_ACCT_FLUSH,
87
@@ -XXX,XX +XXX,XX @@
250
diff --git a/block/qapi.c b/block/qapi.c
88
'discriminator': 'driver',
251
index XXXXXXX..XXXXXXX 100644
89
'data': {
252
--- a/block/qapi.c
90
'file': 'BlockStatsSpecificFile',
253
+++ b/block/qapi.c
91
- 'host_device': 'BlockStatsSpecificFile' } }
254
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
92
+ 'host_device': 'BlockStatsSpecificFile',
255
93
+ 'nvme': 'BlockStatsSpecificNvme' } }
256
ds->rd_bytes = stats->nr_bytes[BLOCK_ACCT_READ];
94
257
ds->wr_bytes = stats->nr_bytes[BLOCK_ACCT_WRITE];
95
##
258
+ ds->zone_append_bytes = stats->nr_bytes[BLOCK_ACCT_ZONE_APPEND];
96
# @BlockStats:
259
ds->unmap_bytes = stats->nr_bytes[BLOCK_ACCT_UNMAP];
97
diff --git a/block/nvme.c b/block/nvme.c
260
ds->rd_operations = stats->nr_ops[BLOCK_ACCT_READ];
98
index XXXXXXX..XXXXXXX 100644
261
ds->wr_operations = stats->nr_ops[BLOCK_ACCT_WRITE];
99
--- a/block/nvme.c
262
+ ds->zone_append_operations = stats->nr_ops[BLOCK_ACCT_ZONE_APPEND];
100
+++ b/block/nvme.c
263
ds->unmap_operations = stats->nr_ops[BLOCK_ACCT_UNMAP];
101
@@ -XXX,XX +XXX,XX @@ struct BDRVNVMeState {
264
102
265
ds->failed_rd_operations = stats->failed_ops[BLOCK_ACCT_READ];
103
/* PCI address (required for nvme_refresh_filename()) */
266
ds->failed_wr_operations = stats->failed_ops[BLOCK_ACCT_WRITE];
104
char *device;
267
+ ds->failed_zone_append_operations =
268
+ stats->failed_ops[BLOCK_ACCT_ZONE_APPEND];
269
ds->failed_flush_operations = stats->failed_ops[BLOCK_ACCT_FLUSH];
270
ds->failed_unmap_operations = stats->failed_ops[BLOCK_ACCT_UNMAP];
271
272
ds->invalid_rd_operations = stats->invalid_ops[BLOCK_ACCT_READ];
273
ds->invalid_wr_operations = stats->invalid_ops[BLOCK_ACCT_WRITE];
274
+ ds->invalid_zone_append_operations =
275
+ stats->invalid_ops[BLOCK_ACCT_ZONE_APPEND];
276
ds->invalid_flush_operations =
277
stats->invalid_ops[BLOCK_ACCT_FLUSH];
278
ds->invalid_unmap_operations = stats->invalid_ops[BLOCK_ACCT_UNMAP];
279
280
ds->rd_merged = stats->merged[BLOCK_ACCT_READ];
281
ds->wr_merged = stats->merged[BLOCK_ACCT_WRITE];
282
+ ds->zone_append_merged = stats->merged[BLOCK_ACCT_ZONE_APPEND];
283
ds->unmap_merged = stats->merged[BLOCK_ACCT_UNMAP];
284
ds->flush_operations = stats->nr_ops[BLOCK_ACCT_FLUSH];
285
ds->wr_total_time_ns = stats->total_time_ns[BLOCK_ACCT_WRITE];
286
+ ds->zone_append_total_time_ns =
287
+ stats->total_time_ns[BLOCK_ACCT_ZONE_APPEND];
288
ds->rd_total_time_ns = stats->total_time_ns[BLOCK_ACCT_READ];
289
ds->flush_total_time_ns = stats->total_time_ns[BLOCK_ACCT_FLUSH];
290
ds->unmap_total_time_ns = stats->total_time_ns[BLOCK_ACCT_UNMAP];
291
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
292
293
TimedAverage *rd = &ts->latency[BLOCK_ACCT_READ];
294
TimedAverage *wr = &ts->latency[BLOCK_ACCT_WRITE];
295
+ TimedAverage *zap = &ts->latency[BLOCK_ACCT_ZONE_APPEND];
296
TimedAverage *fl = &ts->latency[BLOCK_ACCT_FLUSH];
297
298
dev_stats->interval_length = ts->interval_length;
299
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
300
dev_stats->max_wr_latency_ns = timed_average_max(wr);
301
dev_stats->avg_wr_latency_ns = timed_average_avg(wr);
302
303
+ dev_stats->min_zone_append_latency_ns = timed_average_min(zap);
304
+ dev_stats->max_zone_append_latency_ns = timed_average_max(zap);
305
+ dev_stats->avg_zone_append_latency_ns = timed_average_avg(zap);
105
+
306
+
106
+ struct {
307
dev_stats->min_flush_latency_ns = timed_average_min(fl);
107
+ uint64_t completion_errors;
308
dev_stats->max_flush_latency_ns = timed_average_max(fl);
108
+ uint64_t aligned_accesses;
309
dev_stats->avg_flush_latency_ns = timed_average_avg(fl);
109
+ uint64_t unaligned_accesses;
310
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
110
+ } stats;
311
block_acct_queue_depth(ts, BLOCK_ACCT_READ);
111
};
312
dev_stats->avg_wr_queue_depth =
112
313
block_acct_queue_depth(ts, BLOCK_ACCT_WRITE);
113
#define NVME_BLOCK_OPT_DEVICE "device"
314
+ dev_stats->avg_zone_append_queue_depth =
114
@@ -XXX,XX +XXX,XX @@ static bool nvme_process_completion(NVMeQueuePair *q)
315
+ block_acct_queue_depth(ts, BLOCK_ACCT_ZONE_APPEND);
115
break;
316
116
}
317
QAPI_LIST_PREPEND(ds->timed_stats, dev_stats);
117
ret = nvme_translate_error(c);
118
+ if (ret) {
119
+ s->stats.completion_errors++;
120
+ }
121
q->cq.head = (q->cq.head + 1) % NVME_QUEUE_SIZE;
122
if (!q->cq.head) {
123
q->cq_phase = !q->cq_phase;
124
@@ -XXX,XX +XXX,XX @@ static int nvme_co_prw(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
125
assert(QEMU_IS_ALIGNED(bytes, s->page_size));
126
assert(bytes <= s->max_transfer);
127
if (nvme_qiov_aligned(bs, qiov)) {
128
+ s->stats.aligned_accesses++;
129
return nvme_co_prw_aligned(bs, offset, bytes, qiov, is_write, flags);
130
}
318
}
131
+ s->stats.unaligned_accesses++;
319
@@ -XXX,XX +XXX,XX @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
132
trace_nvme_prw_buffered(s, offset, bytes, qiov->niov, is_write);
320
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_READ]);
133
buf = qemu_try_memalign(s->page_size, bytes);
321
ds->wr_latency_histogram
134
322
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_WRITE]);
135
@@ -XXX,XX +XXX,XX @@ static void nvme_unregister_buf(BlockDriverState *bs, void *host)
323
+ ds->zone_append_latency_histogram
136
qemu_vfio_dma_unmap(s->vfio, host);
324
+ = bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_ZONE_APPEND]);
325
ds->flush_latency_histogram
326
= bdrv_latency_histogram_stats(&hgram[BLOCK_ACCT_FLUSH]);
137
}
327
}
138
328
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
139
+static BlockStatsSpecific *nvme_get_specific_stats(BlockDriverState *bs)
329
index XXXXXXX..XXXXXXX 100644
140
+{
330
--- a/hw/block/virtio-blk.c
141
+ BlockStatsSpecific *stats = g_new(BlockStatsSpecific, 1);
331
+++ b/hw/block/virtio-blk.c
142
+ BDRVNVMeState *s = bs->opaque;
332
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
333
data->in_num = in_num;
334
data->zone_append_data.offset = offset;
335
qemu_iovec_init_external(&req->qiov, out_iov, out_num);
143
+
336
+
144
+ stats->driver = BLOCKDEV_DRIVER_NVME;
337
+ block_acct_start(blk_get_stats(s->blk), &req->acct, len,
145
+ stats->u.nvme = (BlockStatsSpecificNvme) {
338
+ BLOCK_ACCT_ZONE_APPEND);
146
+ .completion_errors = s->stats.completion_errors,
147
+ .aligned_accesses = s->stats.aligned_accesses,
148
+ .unaligned_accesses = s->stats.unaligned_accesses,
149
+ };
150
+
339
+
151
+ return stats;
340
blk_aio_zone_append(s->blk, &data->zone_append_data.offset, &req->qiov, 0,
152
+}
341
virtio_blk_zone_append_complete, data);
153
+
342
return 0;
154
static const char *const nvme_strong_runtime_opts[] = {
343
diff --git a/tests/qemu-iotests/227.out b/tests/qemu-iotests/227.out
155
NVME_BLOCK_OPT_DEVICE,
344
index XXXXXXX..XXXXXXX 100644
156
NVME_BLOCK_OPT_NAMESPACE,
345
--- a/tests/qemu-iotests/227.out
157
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_nvme = {
346
+++ b/tests/qemu-iotests/227.out
158
.bdrv_refresh_filename = nvme_refresh_filename,
347
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,read-zeroes=on,if=virtio
159
.bdrv_refresh_limits = nvme_refresh_limits,
348
"stats": {
160
.strong_runtime_opts = nvme_strong_runtime_opts,
349
"unmap_operations": 0,
161
+ .bdrv_get_specific_stats = nvme_get_specific_stats,
350
"unmap_merged": 0,
162
351
+ "failed_zone_append_operations": 0,
163
.bdrv_detach_aio_context = nvme_detach_aio_context,
352
"flush_total_time_ns": 0,
164
.bdrv_attach_aio_context = nvme_attach_aio_context,
353
"wr_highest_offset": 0,
354
"wr_total_time_ns": 0,
355
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,read-zeroes=on,if=virtio
356
"timed_stats": [
357
],
358
"failed_unmap_operations": 0,
359
+ "zone_append_merged": 0,
360
"failed_flush_operations": 0,
361
"account_invalid": true,
362
"rd_total_time_ns": 0,
363
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,read-zeroes=on,if=virtio
364
"unmap_total_time_ns": 0,
365
"invalid_flush_operations": 0,
366
"account_failed": true,
367
+ "zone_append_total_time_ns": 0,
368
+ "zone_append_operations": 0,
369
"rd_operations": 0,
370
+ "zone_append_bytes": 0,
371
+ "invalid_zone_append_operations": 0,
372
"invalid_wr_operations": 0,
373
"invalid_rd_operations": 0
374
},
375
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,if=none
376
"stats": {
377
"unmap_operations": 0,
378
"unmap_merged": 0,
379
+ "failed_zone_append_operations": 0,
380
"flush_total_time_ns": 0,
381
"wr_highest_offset": 0,
382
"wr_total_time_ns": 0,
383
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,if=none
384
"timed_stats": [
385
],
386
"failed_unmap_operations": 0,
387
+ "zone_append_merged": 0,
388
"failed_flush_operations": 0,
389
"account_invalid": true,
390
"rd_total_time_ns": 0,
391
@@ -XXX,XX +XXX,XX @@ Testing: -drive driver=null-co,if=none
392
"unmap_total_time_ns": 0,
393
"invalid_flush_operations": 0,
394
"account_failed": true,
395
+ "zone_append_total_time_ns": 0,
396
+ "zone_append_operations": 0,
397
"rd_operations": 0,
398
+ "zone_append_bytes": 0,
399
+ "invalid_zone_append_operations": 0,
400
"invalid_wr_operations": 0,
401
"invalid_rd_operations": 0
402
},
403
@@ -XXX,XX +XXX,XX @@ Testing: -blockdev driver=null-co,read-zeroes=on,node-name=null -device virtio-b
404
"stats": {
405
"unmap_operations": 0,
406
"unmap_merged": 0,
407
+ "failed_zone_append_operations": 0,
408
"flush_total_time_ns": 0,
409
"wr_highest_offset": 0,
410
"wr_total_time_ns": 0,
411
@@ -XXX,XX +XXX,XX @@ Testing: -blockdev driver=null-co,read-zeroes=on,node-name=null -device virtio-b
412
"timed_stats": [
413
],
414
"failed_unmap_operations": 0,
415
+ "zone_append_merged": 0,
416
"failed_flush_operations": 0,
417
"account_invalid": true,
418
"rd_total_time_ns": 0,
419
@@ -XXX,XX +XXX,XX @@ Testing: -blockdev driver=null-co,read-zeroes=on,node-name=null -device virtio-b
420
"unmap_total_time_ns": 0,
421
"invalid_flush_operations": 0,
422
"account_failed": true,
423
+ "zone_append_total_time_ns": 0,
424
+ "zone_append_operations": 0,
425
"rd_operations": 0,
426
+ "zone_append_bytes": 0,
427
+ "invalid_zone_append_operations": 0,
428
"invalid_wr_operations": 0,
429
"invalid_rd_operations": 0
430
},
165
--
431
--
166
2.26.2
432
2.40.1
167
diff view generated by jsdifflib
1
From: Coiby Xu <coiby.xu@gmail.com>
1
From: Sam Li <faithilikerun@gmail.com>
2
2
3
Allow vu_message_read to be replaced by one which will make use of the
3
Signed-off-by: Sam Li <faithilikerun@gmail.com>
4
QIOChannel functions. Thus reading vhost-user message won't stall the
5
guest. For slave channel, we still use the default vu_message_read.
6
7
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
8
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
4
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Message-id: 20200918080912.321299-2-coiby.xu@gmail.com
5
Message-id: 20230508051916.178322-4-faithilikerun@gmail.com
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
7
---
13
contrib/libvhost-user/libvhost-user.h | 21 +++++++++++++++++++++
8
hw/block/virtio-blk.c | 12 ++++++++++++
14
contrib/libvhost-user/libvhost-user-glib.c | 2 +-
9
hw/block/trace-events | 7 +++++++
15
contrib/libvhost-user/libvhost-user.c | 14 +++++++-------
10
2 files changed, 19 insertions(+)
16
tests/vhost-user-bridge.c | 2 ++
17
tools/virtiofsd/fuse_virtio.c | 4 ++--
18
5 files changed, 33 insertions(+), 10 deletions(-)
19
11
20
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
12
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
21
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
22
--- a/contrib/libvhost-user/libvhost-user.h
14
--- a/hw/block/virtio-blk.c
23
+++ b/contrib/libvhost-user/libvhost-user.h
15
+++ b/hw/block/virtio-blk.c
24
@@ -XXX,XX +XXX,XX @@
16
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_report_complete(void *opaque, int ret)
25
*/
17
int64_t nz = data->zone_report_data.nr_zones;
26
#define VHOST_USER_MAX_RAM_SLOTS 32
18
int8_t err_status = VIRTIO_BLK_S_OK;
27
19
28
+#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
20
+ trace_virtio_blk_zone_report_complete(vdev, req, nz, ret);
29
+
21
if (ret) {
30
typedef enum VhostSetConfigType {
22
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
31
VHOST_SET_CONFIG_TYPE_MASTER = 0,
23
goto out;
32
VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
24
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_handle_zone_report(VirtIOBlockReq *req,
33
@@ -XXX,XX +XXX,XX @@ typedef uint64_t (*vu_get_features_cb) (VuDev *dev);
25
nr_zones = (req->in_len - sizeof(struct virtio_blk_inhdr) -
34
typedef void (*vu_set_features_cb) (VuDev *dev, uint64_t features);
26
sizeof(struct virtio_blk_zone_report)) /
35
typedef int (*vu_process_msg_cb) (VuDev *dev, VhostUserMsg *vmsg,
27
sizeof(struct virtio_blk_zone_descriptor);
36
int *do_reply);
28
+ trace_virtio_blk_handle_zone_report(vdev, req,
37
+typedef bool (*vu_read_msg_cb) (VuDev *dev, int sock, VhostUserMsg *vmsg);
29
+ offset >> BDRV_SECTOR_BITS, nr_zones);
38
typedef void (*vu_queue_set_started_cb) (VuDev *dev, int qidx, bool started);
30
39
typedef bool (*vu_queue_is_processed_in_order_cb) (VuDev *dev, int qidx);
31
zone_size = sizeof(BlockZoneDescriptor) * nr_zones;
40
typedef int (*vu_get_config_cb) (VuDev *dev, uint8_t *config, uint32_t len);
32
data = g_malloc(sizeof(ZoneCmdData));
41
@@ -XXX,XX +XXX,XX @@ struct VuDev {
33
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_mgmt_complete(void *opaque, int ret)
42
bool broken;
34
{
43
uint16_t max_queues;
35
VirtIOBlockReq *req = opaque;
44
36
VirtIOBlock *s = req->dev;
45
+ /* @read_msg: custom method to read vhost-user message
37
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
46
+ *
38
int8_t err_status = VIRTIO_BLK_S_OK;
47
+ * Read data from vhost_user socket fd and fill up
39
+ trace_virtio_blk_zone_mgmt_complete(vdev, req,ret);
48
+ * the passed VhostUserMsg *vmsg struct.
40
49
+ *
41
if (ret) {
50
+ * If reading fails, it should close the received set of file
42
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
51
+ * descriptors as socket message's auxiliary data.
43
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
52
+ *
44
/* Entire drive capacity */
53
+ * For the details, please refer to vu_message_read in libvhost-user.c
45
offset = 0;
54
+ * which will be used by default if not custom method is provided when
46
len = capacity;
55
+ * calling vu_init
47
+ trace_virtio_blk_handle_zone_reset_all(vdev, req, 0,
56
+ *
48
+ bs->total_sectors);
57
+ * Returns: true if vhost-user message successfully received,
49
} else {
58
+ * otherwise return false.
50
if (bs->bl.zone_size > capacity - offset) {
59
+ *
51
/* The zoned device allows the last smaller zone. */
60
+ */
52
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
61
+ vu_read_msg_cb read_msg;
53
} else {
62
/* @set_watch: add or update the given fd to the watch set,
54
len = bs->bl.zone_size;
63
* call cb when condition is met */
55
}
64
vu_set_watch_cb set_watch;
56
+ trace_virtio_blk_handle_zone_mgmt(vdev, req, op,
65
@@ -XXX,XX +XXX,XX @@ bool vu_init(VuDev *dev,
57
+ offset >> BDRV_SECTOR_BITS,
66
uint16_t max_queues,
58
+ len >> BDRV_SECTOR_BITS);
67
int socket,
68
vu_panic_cb panic,
69
+ vu_read_msg_cb read_msg,
70
vu_set_watch_cb set_watch,
71
vu_remove_watch_cb remove_watch,
72
const VuDevIface *iface);
73
diff --git a/contrib/libvhost-user/libvhost-user-glib.c b/contrib/libvhost-user/libvhost-user-glib.c
74
index XXXXXXX..XXXXXXX 100644
75
--- a/contrib/libvhost-user/libvhost-user-glib.c
76
+++ b/contrib/libvhost-user/libvhost-user-glib.c
77
@@ -XXX,XX +XXX,XX @@ vug_init(VugDev *dev, uint16_t max_queues, int socket,
78
g_assert(dev);
79
g_assert(iface);
80
81
- if (!vu_init(&dev->parent, max_queues, socket, panic, set_watch,
82
+ if (!vu_init(&dev->parent, max_queues, socket, panic, NULL, set_watch,
83
remove_watch, iface)) {
84
return false;
85
}
59
}
86
diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
60
87
index XXXXXXX..XXXXXXX 100644
61
if (!check_zoned_request(s, offset, len, false, &err_status)) {
88
--- a/contrib/libvhost-user/libvhost-user.c
62
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_zone_append_complete(void *opaque, int ret)
89
+++ b/contrib/libvhost-user/libvhost-user.c
63
err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
90
@@ -XXX,XX +XXX,XX @@
91
/* The version of inflight buffer */
92
#define INFLIGHT_VERSION 1
93
94
-#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
95
-
96
/* The version of the protocol we support */
97
#define VHOST_USER_VERSION 1
98
#define LIBVHOST_USER_DEBUG 0
99
@@ -XXX,XX +XXX,XX @@ have_userfault(void)
100
}
101
102
static bool
103
-vu_message_read(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
104
+vu_message_read_default(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
105
{
106
char control[CMSG_SPACE(VHOST_MEMORY_BASELINE_NREGIONS * sizeof(int))] = {};
107
struct iovec iov = {
108
@@ -XXX,XX +XXX,XX @@ vu_process_message_reply(VuDev *dev, const VhostUserMsg *vmsg)
109
goto out;
64
goto out;
110
}
65
}
111
66
+ trace_virtio_blk_zone_append_complete(vdev, req, append_sector, ret);
112
- if (!vu_message_read(dev, dev->slave_fd, &msg_reply)) {
67
113
+ if (!vu_message_read_default(dev, dev->slave_fd, &msg_reply)) {
68
out:
69
aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
70
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq *req,
71
int64_t offset = virtio_ldq_p(vdev, &req->out.sector) << BDRV_SECTOR_BITS;
72
int64_t len = iov_size(out_iov, out_num);
73
74
+ trace_virtio_blk_handle_zone_append(vdev, req, offset >> BDRV_SECTOR_BITS);
75
if (!check_zoned_request(s, offset, len, true, &err_status)) {
114
goto out;
76
goto out;
115
}
77
}
116
78
diff --git a/hw/block/trace-events b/hw/block/trace-events
117
@@ -XXX,XX +XXX,XX @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
118
/* Wait for QEMU to confirm that it's registered the handler for the
119
* faults.
120
*/
121
- if (!vu_message_read(dev, dev->sock, vmsg) ||
122
+ if (!dev->read_msg(dev, dev->sock, vmsg) ||
123
vmsg->size != sizeof(vmsg->payload.u64) ||
124
vmsg->payload.u64 != 0) {
125
vu_panic(dev, "failed to receive valid ack for postcopy set-mem-table");
126
@@ -XXX,XX +XXX,XX @@ vu_dispatch(VuDev *dev)
127
int reply_requested;
128
bool need_reply, success = false;
129
130
- if (!vu_message_read(dev, dev->sock, &vmsg)) {
131
+ if (!dev->read_msg(dev, dev->sock, &vmsg)) {
132
goto end;
133
}
134
135
@@ -XXX,XX +XXX,XX @@ vu_init(VuDev *dev,
136
uint16_t max_queues,
137
int socket,
138
vu_panic_cb panic,
139
+ vu_read_msg_cb read_msg,
140
vu_set_watch_cb set_watch,
141
vu_remove_watch_cb remove_watch,
142
const VuDevIface *iface)
143
@@ -XXX,XX +XXX,XX @@ vu_init(VuDev *dev,
144
145
dev->sock = socket;
146
dev->panic = panic;
147
+ dev->read_msg = read_msg ? read_msg : vu_message_read_default;
148
dev->set_watch = set_watch;
149
dev->remove_watch = remove_watch;
150
dev->iface = iface;
151
@@ -XXX,XX +XXX,XX @@ static void _vu_queue_notify(VuDev *dev, VuVirtq *vq, bool sync)
152
153
vu_message_write(dev, dev->slave_fd, &vmsg);
154
if (ack) {
155
- vu_message_read(dev, dev->slave_fd, &vmsg);
156
+ vu_message_read_default(dev, dev->slave_fd, &vmsg);
157
}
158
return;
159
}
160
diff --git a/tests/vhost-user-bridge.c b/tests/vhost-user-bridge.c
161
index XXXXXXX..XXXXXXX 100644
79
index XXXXXXX..XXXXXXX 100644
162
--- a/tests/vhost-user-bridge.c
80
--- a/hw/block/trace-events
163
+++ b/tests/vhost-user-bridge.c
81
+++ b/hw/block/trace-events
164
@@ -XXX,XX +XXX,XX @@ vubr_accept_cb(int sock, void *ctx)
82
@@ -XXX,XX +XXX,XX @@ pflash_write_unknown(const char *name, uint8_t cmd) "%s: unknown command 0x%02x"
165
VHOST_USER_BRIDGE_MAX_QUEUES,
83
# virtio-blk.c
166
conn_fd,
84
virtio_blk_req_complete(void *vdev, void *req, int status) "vdev %p req %p status %d"
167
vubr_panic,
85
virtio_blk_rw_complete(void *vdev, void *req, int ret) "vdev %p req %p ret %d"
168
+ NULL,
86
+virtio_blk_zone_report_complete(void *vdev, void *req, unsigned int nr_zones, int ret) "vdev %p req %p nr_zones %u ret %d"
169
vubr_set_watch,
87
+virtio_blk_zone_mgmt_complete(void *vdev, void *req, int ret) "vdev %p req %p ret %d"
170
vubr_remove_watch,
88
+virtio_blk_zone_append_complete(void *vdev, void *req, int64_t sector, int ret) "vdev %p req %p, append sector 0x%" PRIx64 " ret %d"
171
&vuiface)) {
89
virtio_blk_handle_write(void *vdev, void *req, uint64_t sector, size_t nsectors) "vdev %p req %p sector %"PRIu64" nsectors %zu"
172
@@ -XXX,XX +XXX,XX @@ vubr_new(const char *path, bool client)
90
virtio_blk_handle_read(void *vdev, void *req, uint64_t sector, size_t nsectors) "vdev %p req %p sector %"PRIu64" nsectors %zu"
173
VHOST_USER_BRIDGE_MAX_QUEUES,
91
virtio_blk_submit_multireq(void *vdev, void *mrb, int start, int num_reqs, uint64_t offset, size_t size, bool is_write) "vdev %p mrb %p start %d num_reqs %d offset %"PRIu64" size %zu is_write %d"
174
dev->sock,
92
+virtio_blk_handle_zone_report(void *vdev, void *req, int64_t sector, unsigned int nr_zones) "vdev %p req %p sector 0x%" PRIx64 " nr_zones %u"
175
vubr_panic,
93
+virtio_blk_handle_zone_mgmt(void *vdev, void *req, uint8_t op, int64_t sector, int64_t len) "vdev %p req %p op 0x%x sector 0x%" PRIx64 " len 0x%" PRIx64 ""
176
+ NULL,
94
+virtio_blk_handle_zone_reset_all(void *vdev, void *req, int64_t sector, int64_t len) "vdev %p req %p sector 0x%" PRIx64 " cap 0x%" PRIx64 ""
177
vubr_set_watch,
95
+virtio_blk_handle_zone_append(void *vdev, void *req, int64_t sector) "vdev %p req %p, append sector 0x%" PRIx64 ""
178
vubr_remove_watch,
96
179
&vuiface)) {
97
# hd-geometry.c
180
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
98
hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p LCHS %d %d %d"
181
index XXXXXXX..XXXXXXX 100644
182
--- a/tools/virtiofsd/fuse_virtio.c
183
+++ b/tools/virtiofsd/fuse_virtio.c
184
@@ -XXX,XX +XXX,XX @@ int virtio_session_mount(struct fuse_session *se)
185
se->vu_socketfd = data_sock;
186
se->virtio_dev->se = se;
187
pthread_rwlock_init(&se->virtio_dev->vu_dispatch_rwlock, NULL);
188
- vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, fv_set_watch,
189
- fv_remove_watch, &fv_iface);
190
+ vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, NULL,
191
+ fv_set_watch, fv_remove_watch, &fv_iface);
192
193
return 0;
194
}
195
--
99
--
196
2.26.2
100
2.40.1
197
diff view generated by jsdifflib
Deleted patch
1
From: Coiby Xu <coiby.xu@gmail.com>
2
1
3
When the client is running in gdb and quit command is run in gdb,
4
QEMU will still dispatch the event which will cause segment fault in
5
the callback function.
6
7
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
8
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
10
Message-id: 20200918080912.321299-3-coiby.xu@gmail.com
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
13
contrib/libvhost-user/libvhost-user.c | 1 +
14
1 file changed, 1 insertion(+)
15
16
diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
17
index XXXXXXX..XXXXXXX 100644
18
--- a/contrib/libvhost-user/libvhost-user.c
19
+++ b/contrib/libvhost-user/libvhost-user.c
20
@@ -XXX,XX +XXX,XX @@ vu_deinit(VuDev *dev)
21
}
22
23
if (vq->kick_fd != -1) {
24
+ dev->remove_watch(dev, vq->kick_fd);
25
close(vq->kick_fd);
26
vq->kick_fd = -1;
27
}
28
--
29
2.26.2
30
diff view generated by jsdifflib
Deleted patch
1
From: Coiby Xu <coiby.xu@gmail.com>
2
1
3
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
4
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
5
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
7
Message-id: 20200918080912.321299-8-coiby.xu@gmail.com
8
[Removed reference to vhost-user-blk-test.c, it will be sent in a
9
separate pull request.
10
--Stefan]
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
13
MAINTAINERS | 7 +++++++
14
1 file changed, 7 insertions(+)
15
16
diff --git a/MAINTAINERS b/MAINTAINERS
17
index XXXXXXX..XXXXXXX 100644
18
--- a/MAINTAINERS
19
+++ b/MAINTAINERS
20
@@ -XXX,XX +XXX,XX @@ L: qemu-block@nongnu.org
21
S: Supported
22
F: tests/image-fuzzer/
23
24
+Vhost-user block device backend server
25
+M: Coiby Xu <Coiby.Xu@gmail.com>
26
+S: Maintained
27
+F: block/export/vhost-user-blk-server.c
28
+F: util/vhost-user-server.c
29
+F: tests/qtest/libqos/vhost-user-blk.c
30
+
31
Replication
32
M: Wen Congyang <wencongyang2@huawei.com>
33
M: Xie Changlong <xiechanglong.d@gmail.com>
34
--
35
2.26.2
36
diff view generated by jsdifflib
Deleted patch
1
We already have access to the value with the correct type (ioc and sioc
2
are the same QIOChannel).
3
1
4
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Message-id: 20200924151549.913737-4-stefanha@redhat.com
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
---
8
util/vhost-user-server.c | 2 +-
9
1 file changed, 1 insertion(+), 1 deletion(-)
10
11
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
12
index XXXXXXX..XXXXXXX 100644
13
--- a/util/vhost-user-server.c
14
+++ b/util/vhost-user-server.c
15
@@ -XXX,XX +XXX,XX @@ static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
16
server->ioc = QIO_CHANNEL(sioc);
17
object_ref(OBJECT(server->ioc));
18
qio_channel_attach_aio_context(server->ioc, server->ctx);
19
- qio_channel_set_blocking(QIO_CHANNEL(server->sioc), false, NULL);
20
+ qio_channel_set_blocking(server->ioc, false, NULL);
21
vu_client_start(server);
22
}
23
24
--
25
2.26.2
26
diff view generated by jsdifflib
Deleted patch
1
Explicitly deleting watches is not necessary since libvhost-user calls
2
remove_watch() during vu_deinit(). Add an assertion to check this
3
though.
4
1
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Message-id: 20200924151549.913737-5-stefanha@redhat.com
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
---
9
util/vhost-user-server.c | 19 ++++---------------
10
1 file changed, 4 insertions(+), 15 deletions(-)
11
12
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
13
index XXXXXXX..XXXXXXX 100644
14
--- a/util/vhost-user-server.c
15
+++ b/util/vhost-user-server.c
16
@@ -XXX,XX +XXX,XX @@ static void close_client(VuServer *server)
17
/* When this is set vu_client_trip will stop new processing vhost-user message */
18
server->sioc = NULL;
19
20
- VuFdWatch *vu_fd_watch, *next;
21
- QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
22
- aio_set_fd_handler(server->ioc->ctx, vu_fd_watch->fd, true, NULL,
23
- NULL, NULL, NULL);
24
- }
25
-
26
- while (!QTAILQ_EMPTY(&server->vu_fd_watches)) {
27
- QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
28
- if (!vu_fd_watch->processing) {
29
- QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
30
- g_free(vu_fd_watch);
31
- }
32
- }
33
- }
34
-
35
while (server->processing_msg) {
36
if (server->ioc->read_coroutine) {
37
server->ioc->read_coroutine = NULL;
38
@@ -XXX,XX +XXX,XX @@ static void close_client(VuServer *server)
39
}
40
41
vu_deinit(&server->vu_dev);
42
+
43
+ /* vu_deinit() should have called remove_watch() */
44
+ assert(QTAILQ_EMPTY(&server->vu_fd_watches));
45
+
46
object_unref(OBJECT(sioc));
47
object_unref(OBJECT(server->ioc));
48
}
49
--
50
2.26.2
51
diff view generated by jsdifflib
Deleted patch
1
The device panic notifier callback is not used. Drop it.
2
1
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
Message-id: 20200924151549.913737-7-stefanha@redhat.com
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
---
7
util/vhost-user-server.h | 3 ---
8
block/export/vhost-user-blk-server.c | 3 +--
9
util/vhost-user-server.c | 6 ------
10
3 files changed, 1 insertion(+), 11 deletions(-)
11
12
diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
13
index XXXXXXX..XXXXXXX 100644
14
--- a/util/vhost-user-server.h
15
+++ b/util/vhost-user-server.h
16
@@ -XXX,XX +XXX,XX @@ typedef struct VuFdWatch {
17
} VuFdWatch;
18
19
typedef struct VuServer VuServer;
20
-typedef void DevicePanicNotifierFn(VuServer *server);
21
22
struct VuServer {
23
QIONetListener *listener;
24
AioContext *ctx;
25
- DevicePanicNotifierFn *device_panic_notifier;
26
int max_queues;
27
const VuDevIface *vu_iface;
28
VuDev vu_dev;
29
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
30
SocketAddress *unix_socket,
31
AioContext *ctx,
32
uint16_t max_queues,
33
- DevicePanicNotifierFn *device_panic_notifier,
34
const VuDevIface *vu_iface,
35
Error **errp);
36
37
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
38
index XXXXXXX..XXXXXXX 100644
39
--- a/block/export/vhost-user-blk-server.c
40
+++ b/block/export/vhost-user-blk-server.c
41
@@ -XXX,XX +XXX,XX @@ static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
42
ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
43
44
if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
45
- VHOST_USER_BLK_MAX_QUEUES,
46
- NULL, &vu_block_iface,
47
+ VHOST_USER_BLK_MAX_QUEUES, &vu_block_iface,
48
errp)) {
49
goto error;
50
}
51
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
52
index XXXXXXX..XXXXXXX 100644
53
--- a/util/vhost-user-server.c
54
+++ b/util/vhost-user-server.c
55
@@ -XXX,XX +XXX,XX @@ static void panic_cb(VuDev *vu_dev, const char *buf)
56
close_client(server);
57
}
58
59
- if (server->device_panic_notifier) {
60
- server->device_panic_notifier(server);
61
- }
62
-
63
/*
64
* Set the callback function for network listener so another
65
* vhost-user client can connect to this server
66
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
67
SocketAddress *socket_addr,
68
AioContext *ctx,
69
uint16_t max_queues,
70
- DevicePanicNotifierFn *device_panic_notifier,
71
const VuDevIface *vu_iface,
72
Error **errp)
73
{
74
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
75
.vu_iface = vu_iface,
76
.max_queues = max_queues,
77
.ctx = ctx,
78
- .device_panic_notifier = device_panic_notifier,
79
};
80
81
qio_net_listener_set_name(server->listener, "vhost-user-backend-listener");
82
--
83
2.26.2
84
diff view generated by jsdifflib
Deleted patch
1
fds[] is leaked when qio_channel_readv_full() fails.
2
1
3
Use vmsg->fds[] instead of keeping a local fds[] array. Then we can
4
reuse goto fail to clean up fds. vmsg->fd_num must be zeroed before the
5
loop to make this safe.
6
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Message-id: 20200924151549.913737-8-stefanha@redhat.com
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
11
util/vhost-user-server.c | 50 ++++++++++++++++++----------------------
12
1 file changed, 23 insertions(+), 27 deletions(-)
13
14
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
15
index XXXXXXX..XXXXXXX 100644
16
--- a/util/vhost-user-server.c
17
+++ b/util/vhost-user-server.c
18
@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
19
};
20
int rc, read_bytes = 0;
21
Error *local_err = NULL;
22
- /*
23
- * Store fds/nfds returned from qio_channel_readv_full into
24
- * temporary variables.
25
- *
26
- * VhostUserMsg is a packed structure, gcc will complain about passing
27
- * pointer to a packed structure member if we pass &VhostUserMsg.fd_num
28
- * and &VhostUserMsg.fds directly when calling qio_channel_readv_full,
29
- * thus two temporary variables nfds and fds are used here.
30
- */
31
- size_t nfds = 0, nfds_t = 0;
32
const size_t max_fds = G_N_ELEMENTS(vmsg->fds);
33
- int *fds_t = NULL;
34
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
35
QIOChannel *ioc = server->ioc;
36
37
+ vmsg->fd_num = 0;
38
if (!ioc) {
39
error_report_err(local_err);
40
goto fail;
41
@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
42
43
assert(qemu_in_coroutine());
44
do {
45
+ size_t nfds = 0;
46
+ int *fds = NULL;
47
+
48
/*
49
* qio_channel_readv_full may have short reads, keeping calling it
50
* until getting VHOST_USER_HDR_SIZE or 0 bytes in total
51
*/
52
- rc = qio_channel_readv_full(ioc, &iov, 1, &fds_t, &nfds_t, &local_err);
53
+ rc = qio_channel_readv_full(ioc, &iov, 1, &fds, &nfds, &local_err);
54
if (rc < 0) {
55
if (rc == QIO_CHANNEL_ERR_BLOCK) {
56
+ assert(local_err == NULL);
57
qio_channel_yield(ioc, G_IO_IN);
58
continue;
59
} else {
60
error_report_err(local_err);
61
- return false;
62
+ goto fail;
63
}
64
}
65
- read_bytes += rc;
66
- if (nfds_t > 0) {
67
- if (nfds + nfds_t > max_fds) {
68
+
69
+ if (nfds > 0) {
70
+ if (vmsg->fd_num + nfds > max_fds) {
71
error_report("A maximum of %zu fds are allowed, "
72
"however got %zu fds now",
73
- max_fds, nfds + nfds_t);
74
+ max_fds, vmsg->fd_num + nfds);
75
+ g_free(fds);
76
goto fail;
77
}
78
- memcpy(vmsg->fds + nfds, fds_t,
79
- nfds_t *sizeof(vmsg->fds[0]));
80
- nfds += nfds_t;
81
- g_free(fds_t);
82
+ memcpy(vmsg->fds + vmsg->fd_num, fds, nfds * sizeof(vmsg->fds[0]));
83
+ vmsg->fd_num += nfds;
84
+ g_free(fds);
85
}
86
- if (read_bytes == VHOST_USER_HDR_SIZE || rc == 0) {
87
- break;
88
+
89
+ if (rc == 0) { /* socket closed */
90
+ goto fail;
91
}
92
- iov.iov_base = (char *)vmsg + read_bytes;
93
- iov.iov_len = VHOST_USER_HDR_SIZE - read_bytes;
94
- } while (true);
95
96
- vmsg->fd_num = nfds;
97
+ iov.iov_base += rc;
98
+ iov.iov_len -= rc;
99
+ read_bytes += rc;
100
+ } while (read_bytes != VHOST_USER_HDR_SIZE);
101
+
102
/* qio_channel_readv_full will make socket fds blocking, unblock them */
103
vmsg_unblock_fds(vmsg);
104
if (vmsg->size > sizeof(vmsg->payload)) {
105
--
106
2.26.2
107
diff view generated by jsdifflib
Deleted patch
1
Unexpected EOF is an error that must be reported.
2
1
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
Message-id: 20200924151549.913737-9-stefanha@redhat.com
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
---
7
util/vhost-user-server.c | 6 ++++--
8
1 file changed, 4 insertions(+), 2 deletions(-)
9
10
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
11
index XXXXXXX..XXXXXXX 100644
12
--- a/util/vhost-user-server.c
13
+++ b/util/vhost-user-server.c
14
@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
15
};
16
if (vmsg->size) {
17
rc = qio_channel_readv_all_eof(ioc, &iov_payload, 1, &local_err);
18
- if (rc == -1) {
19
- error_report_err(local_err);
20
+ if (rc != 1) {
21
+ if (local_err) {
22
+ error_report_err(local_err);
23
+ }
24
goto fail;
25
}
26
}
27
--
28
2.26.2
29
diff view generated by jsdifflib
Deleted patch
1
Propagate the flush return value since errors are possible.
2
1
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
Message-id: 20200924151549.913737-11-stefanha@redhat.com
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
---
7
block/export/vhost-user-blk-server.c | 11 +++++++----
8
1 file changed, 7 insertions(+), 4 deletions(-)
9
10
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
11
index XXXXXXX..XXXXXXX 100644
12
--- a/block/export/vhost-user-blk-server.c
13
+++ b/block/export/vhost-user-blk-server.c
14
@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
15
return -EINVAL;
16
}
17
18
-static void coroutine_fn vu_block_flush(VuBlockReq *req)
19
+static int coroutine_fn vu_block_flush(VuBlockReq *req)
20
{
21
VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
22
BlockBackend *backend = vdev_blk->backend;
23
- blk_co_flush(backend);
24
+ return blk_co_flush(backend);
25
}
26
27
static void coroutine_fn vu_block_virtio_process_req(void *opaque)
28
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
29
break;
30
}
31
case VIRTIO_BLK_T_FLUSH:
32
- vu_block_flush(req);
33
- req->in->status = VIRTIO_BLK_S_OK;
34
+ if (vu_block_flush(req) == 0) {
35
+ req->in->status = VIRTIO_BLK_S_OK;
36
+ } else {
37
+ req->in->status = VIRTIO_BLK_S_IOERR;
38
+ }
39
break;
40
case VIRTIO_BLK_T_GET_ID: {
41
size_t size = MIN(iov_size(&elem->in_sg[0], in_num),
42
--
43
2.26.2
44
diff view generated by jsdifflib
Deleted patch
1
Headers used by other subsystems are located in include/. Also add the
2
vhost-user-server and vhost-user-blk-server headers to MAINTAINERS.
3
1
4
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Message-id: 20200924151549.913737-13-stefanha@redhat.com
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
---
8
MAINTAINERS | 4 +++-
9
{util => include/qemu}/vhost-user-server.h | 0
10
block/export/vhost-user-blk-server.c | 2 +-
11
util/vhost-user-server.c | 2 +-
12
4 files changed, 5 insertions(+), 3 deletions(-)
13
rename {util => include/qemu}/vhost-user-server.h (100%)
14
15
diff --git a/MAINTAINERS b/MAINTAINERS
16
index XXXXXXX..XXXXXXX 100644
17
--- a/MAINTAINERS
18
+++ b/MAINTAINERS
19
@@ -XXX,XX +XXX,XX @@ Vhost-user block device backend server
20
M: Coiby Xu <Coiby.Xu@gmail.com>
21
S: Maintained
22
F: block/export/vhost-user-blk-server.c
23
-F: util/vhost-user-server.c
24
+F: block/export/vhost-user-blk-server.h
25
+F: include/qemu/vhost-user-server.h
26
F: tests/qtest/libqos/vhost-user-blk.c
27
+F: util/vhost-user-server.c
28
29
Replication
30
M: Wen Congyang <wencongyang2@huawei.com>
31
diff --git a/util/vhost-user-server.h b/include/qemu/vhost-user-server.h
32
similarity index 100%
33
rename from util/vhost-user-server.h
34
rename to include/qemu/vhost-user-server.h
35
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
36
index XXXXXXX..XXXXXXX 100644
37
--- a/block/export/vhost-user-blk-server.c
38
+++ b/block/export/vhost-user-blk-server.c
39
@@ -XXX,XX +XXX,XX @@
40
#include "block/block.h"
41
#include "contrib/libvhost-user/libvhost-user.h"
42
#include "standard-headers/linux/virtio_blk.h"
43
-#include "util/vhost-user-server.h"
44
+#include "qemu/vhost-user-server.h"
45
#include "vhost-user-blk-server.h"
46
#include "qapi/error.h"
47
#include "qom/object_interfaces.h"
48
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
49
index XXXXXXX..XXXXXXX 100644
50
--- a/util/vhost-user-server.c
51
+++ b/util/vhost-user-server.c
52
@@ -XXX,XX +XXX,XX @@
53
*/
54
#include "qemu/osdep.h"
55
#include "qemu/main-loop.h"
56
+#include "qemu/vhost-user-server.h"
57
#include "block/aio-wait.h"
58
-#include "vhost-user-server.h"
59
60
/*
61
* Theory of operation:
62
--
63
2.26.2
64
diff view generated by jsdifflib
Deleted patch
1
Don't compile contrib/libvhost-user/libvhost-user.c again. Instead build
2
the static library once and then reuse it throughout QEMU.
3
1
4
Also switch from CONFIG_LINUX to CONFIG_VHOST_USER, which is what the
5
vhost-user tools (vhost-user-gpu, etc) do.
6
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Message-id: 20200924151549.913737-14-stefanha@redhat.com
9
[Added CONFIG_LINUX again because libvhost-user doesn't build on macOS.
10
--Stefan]
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
13
block/export/export.c | 8 ++++----
14
block/export/meson.build | 2 +-
15
contrib/libvhost-user/meson.build | 1 +
16
meson.build | 6 +++++-
17
util/meson.build | 4 +++-
18
5 files changed, 14 insertions(+), 7 deletions(-)
19
20
diff --git a/block/export/export.c b/block/export/export.c
21
index XXXXXXX..XXXXXXX 100644
22
--- a/block/export/export.c
23
+++ b/block/export/export.c
24
@@ -XXX,XX +XXX,XX @@
25
#include "sysemu/block-backend.h"
26
#include "block/export.h"
27
#include "block/nbd.h"
28
-#if CONFIG_LINUX
29
-#include "block/export/vhost-user-blk-server.h"
30
-#endif
31
#include "qapi/error.h"
32
#include "qapi/qapi-commands-block-export.h"
33
#include "qapi/qapi-events-block-export.h"
34
#include "qemu/id.h"
35
+#ifdef CONFIG_VHOST_USER
36
+#include "vhost-user-blk-server.h"
37
+#endif
38
39
static const BlockExportDriver *blk_exp_drivers[] = {
40
&blk_exp_nbd,
41
-#if CONFIG_LINUX
42
+#ifdef CONFIG_VHOST_USER
43
&blk_exp_vhost_user_blk,
44
#endif
45
};
46
diff --git a/block/export/meson.build b/block/export/meson.build
47
index XXXXXXX..XXXXXXX 100644
48
--- a/block/export/meson.build
49
+++ b/block/export/meson.build
50
@@ -XXX,XX +XXX,XX @@
51
block_ss.add(files('export.c'))
52
-block_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-blk-server.c', '../../contrib/libvhost-user/libvhost-user.c'))
53
+block_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
54
diff --git a/contrib/libvhost-user/meson.build b/contrib/libvhost-user/meson.build
55
index XXXXXXX..XXXXXXX 100644
56
--- a/contrib/libvhost-user/meson.build
57
+++ b/contrib/libvhost-user/meson.build
58
@@ -XXX,XX +XXX,XX @@
59
libvhost_user = static_library('vhost-user',
60
files('libvhost-user.c', 'libvhost-user-glib.c'),
61
build_by_default: false)
62
+vhost_user = declare_dependency(link_with: libvhost_user)
63
diff --git a/meson.build b/meson.build
64
index XXXXXXX..XXXXXXX 100644
65
--- a/meson.build
66
+++ b/meson.build
67
@@ -XXX,XX +XXX,XX @@ trace_events_subdirs += [
68
'util',
69
]
70
71
+vhost_user = not_found
72
+if 'CONFIG_VHOST_USER' in config_host
73
+ subdir('contrib/libvhost-user')
74
+endif
75
+
76
subdir('qapi')
77
subdir('qobject')
78
subdir('stubs')
79
@@ -XXX,XX +XXX,XX @@ if have_tools
80
install: true)
81
82
if 'CONFIG_VHOST_USER' in config_host
83
- subdir('contrib/libvhost-user')
84
subdir('contrib/vhost-user-blk')
85
subdir('contrib/vhost-user-gpu')
86
subdir('contrib/vhost-user-input')
87
diff --git a/util/meson.build b/util/meson.build
88
index XXXXXXX..XXXXXXX 100644
89
--- a/util/meson.build
90
+++ b/util/meson.build
91
@@ -XXX,XX +XXX,XX @@ if have_block
92
util_ss.add(files('main-loop.c'))
93
util_ss.add(files('nvdimm-utils.c'))
94
util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
95
- util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
96
+ util_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: [
97
+ files('vhost-user-server.c'), vhost_user
98
+ ])
99
util_ss.add(files('block-helpers.c'))
100
util_ss.add(files('qemu-coroutine-sleep.c'))
101
util_ss.add(files('qemu-co-shared-resource.c'))
102
--
103
2.26.2
104
diff view generated by jsdifflib
1
Make it possible to specify the iothread where the export will run. By
1
From: Sam Li <faithilikerun@gmail.com>
2
default the block node can be moved to other AioContexts later and the
3
export will follow. The fixed-iothread option forces strict behavior
4
that prevents changing AioContext while the export is active. See the
5
QAPI docs for details.
6
2
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
3
Add the documentation about the example of using virtio-blk driver
8
Message-id: 20200929125516.186715-5-stefanha@redhat.com
4
to pass the zoned block devices through to the guest.
9
[Fix stray '#' character in block-export.json and add missing "(since:
5
10
5.2)" as suggested by Eric Blake.
6
Signed-off-by: Sam Li <faithilikerun@gmail.com>
7
Message-id: 20230508051916.178322-5-faithilikerun@gmail.com
8
[Fix pre-formatted code syntax
11
--Stefan]
9
--Stefan]
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
---
11
---
14
qapi/block-export.json | 11 ++++++++++
12
docs/devel/zoned-storage.rst | 19 +++++++++++++++++++
15
block/export/export.c | 31 +++++++++++++++++++++++++++-
13
1 file changed, 19 insertions(+)
16
block/export/vhost-user-blk-server.c | 5 ++++-
17
nbd/server.c | 2 --
18
4 files changed, 45 insertions(+), 4 deletions(-)
19
14
20
diff --git a/qapi/block-export.json b/qapi/block-export.json
15
diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
21
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
22
--- a/qapi/block-export.json
17
--- a/docs/devel/zoned-storage.rst
23
+++ b/qapi/block-export.json
18
+++ b/docs/devel/zoned-storage.rst
24
@@ -XXX,XX +XXX,XX @@
19
@@ -XXX,XX +XXX,XX @@ APIs for zoned storage emulation or testing.
25
# export before completion is signalled. (since: 5.2;
20
For example, to test zone_report on a null_blk device using qemu-io is::
26
# default: false)
21
27
#
22
$ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0 -c "zrp offset nr_zones"
28
+# @iothread: The name of the iothread object where the export will run. The
29
+# default is to use the thread currently associated with the
30
+# block node. (since: 5.2)
31
+#
32
+# @fixed-iothread: True prevents the block node from being moved to another
33
+# thread while the export is active. If true and @iothread is
34
+# given, export creation fails if the block node cannot be
35
+# moved to the iothread. The default is false. (since: 5.2)
36
+#
37
# Since: 4.2
38
##
39
{ 'union': 'BlockExportOptions',
40
'base': { 'type': 'BlockExportType',
41
'id': 'str',
42
+     '*fixed-iothread': 'bool',
43
+     '*iothread': 'str',
44
'node-name': 'str',
45
'*writable': 'bool',
46
'*writethrough': 'bool' },
47
diff --git a/block/export/export.c b/block/export/export.c
48
index XXXXXXX..XXXXXXX 100644
49
--- a/block/export/export.c
50
+++ b/block/export/export.c
51
@@ -XXX,XX +XXX,XX @@
52
53
#include "block/block.h"
54
#include "sysemu/block-backend.h"
55
+#include "sysemu/iothread.h"
56
#include "block/export.h"
57
#include "block/nbd.h"
58
#include "qapi/error.h"
59
@@ -XXX,XX +XXX,XX @@ static const BlockExportDriver *blk_exp_find_driver(BlockExportType type)
60
61
BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
62
{
63
+ bool fixed_iothread = export->has_fixed_iothread && export->fixed_iothread;
64
const BlockExportDriver *drv;
65
BlockExport *exp = NULL;
66
BlockDriverState *bs;
67
- BlockBackend *blk;
68
+ BlockBackend *blk = NULL;
69
AioContext *ctx;
70
uint64_t perm;
71
int ret;
72
@@ -XXX,XX +XXX,XX @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
73
ctx = bdrv_get_aio_context(bs);
74
aio_context_acquire(ctx);
75
76
+ if (export->has_iothread) {
77
+ IOThread *iothread;
78
+ AioContext *new_ctx;
79
+
23
+
80
+ iothread = iothread_by_id(export->iothread);
24
+To expose the host's zoned block device through virtio-blk, the command line
81
+ if (!iothread) {
25
+can be (includes the -device parameter)::
82
+ error_setg(errp, "iothread \"%s\" not found", export->iothread);
83
+ goto fail;
84
+ }
85
+
26
+
86
+ new_ctx = iothread_get_aio_context(iothread);
27
+ -blockdev node-name=drive0,driver=host_device,filename=/dev/nullb0,cache.direct=on \
28
+ -device virtio-blk-pci,drive=drive0
87
+
29
+
88
+ ret = bdrv_try_set_aio_context(bs, new_ctx, errp);
30
+Or only use the -drive parameter::
89
+ if (ret == 0) {
90
+ aio_context_release(ctx);
91
+ aio_context_acquire(new_ctx);
92
+ ctx = new_ctx;
93
+ } else if (fixed_iothread) {
94
+ goto fail;
95
+ }
96
+ }
97
+
31
+
98
/*
32
+ -driver driver=host_device,file=/dev/nullb0,if=virtio,cache.direct=on
99
* Block exports are used for non-shared storage migration. Make sure
100
* that BDRV_O_INACTIVE is cleared and the image is ready for write
101
@@ -XXX,XX +XXX,XX @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
102
}
103
104
blk = blk_new(ctx, perm, BLK_PERM_ALL);
105
+
33
+
106
+ if (!fixed_iothread) {
34
+Additionally, QEMU has several ways of supporting zoned storage, including:
107
+ blk_set_allow_aio_context_change(blk, true);
35
+(1) Using virtio-scsi: --device scsi-block allows for the passing through of
108
+ }
36
+SCSI ZBC devices, enabling the attachment of ZBC or ZAC HDDs to QEMU.
109
+
37
+(2) PCI device pass-through: While NVMe ZNS emulation is available for testing
110
ret = blk_insert_bs(blk, bs, errp);
38
+purposes, it cannot yet pass through a zoned device from the host. To pass on
111
if (ret < 0) {
39
+the NVMe ZNS device to the guest, use VFIO PCI pass the entire NVMe PCI adapter
112
goto fail;
40
+through to the guest. Likewise, an HDD HBA can be passed on to QEMU all HDDs
113
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
41
+attached to the HBA.
114
index XXXXXXX..XXXXXXX 100644
115
--- a/block/export/vhost-user-blk-server.c
116
+++ b/block/export/vhost-user-blk-server.c
117
@@ -XXX,XX +XXX,XX @@ static const VuDevIface vu_blk_iface = {
118
static void blk_aio_attached(AioContext *ctx, void *opaque)
119
{
120
VuBlkExport *vexp = opaque;
121
+
122
+ vexp->export.ctx = ctx;
123
vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
124
}
125
126
static void blk_aio_detach(void *opaque)
127
{
128
VuBlkExport *vexp = opaque;
129
+
130
vhost_user_server_detach_aio_context(&vexp->vu_server);
131
+ vexp->export.ctx = NULL;
132
}
133
134
static void
135
@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
136
vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
137
logical_block_size);
138
139
- blk_set_allow_aio_context_change(exp->blk, true);
140
blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
141
vexp);
142
143
diff --git a/nbd/server.c b/nbd/server.c
144
index XXXXXXX..XXXXXXX 100644
145
--- a/nbd/server.c
146
+++ b/nbd/server.c
147
@@ -XXX,XX +XXX,XX @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
148
return ret;
149
}
150
151
- blk_set_allow_aio_context_change(blk, true);
152
-
153
QTAILQ_INIT(&exp->clients);
154
exp->name = g_strdup(arg->name);
155
exp->description = g_strdup(arg->description);
156
--
42
--
157
2.26.2
43
2.40.1
158
diff view generated by jsdifflib
Deleted patch
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2
1
3
In order to reuse bdrv_common_block_status_above in
4
bdrv_is_allocated_above, let's support include_base parameter.
5
6
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
7
Reviewed-by: Alberto Garcia <berto@igalia.com>
8
Reviewed-by: Eric Blake <eblake@redhat.com>
9
Message-id: 20200924194003.22080-3-vsementsov@virtuozzo.com
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
12
block/coroutines.h | 2 ++
13
block/io.c | 21 ++++++++++++++-------
14
2 files changed, 16 insertions(+), 7 deletions(-)
15
16
diff --git a/block/coroutines.h b/block/coroutines.h
17
index XXXXXXX..XXXXXXX 100644
18
--- a/block/coroutines.h
19
+++ b/block/coroutines.h
20
@@ -XXX,XX +XXX,XX @@ bdrv_pwritev(BdrvChild *child, int64_t offset, unsigned int bytes,
21
int coroutine_fn
22
bdrv_co_common_block_status_above(BlockDriverState *bs,
23
BlockDriverState *base,
24
+ bool include_base,
25
bool want_zero,
26
int64_t offset,
27
int64_t bytes,
28
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
29
int generated_co_wrapper
30
bdrv_common_block_status_above(BlockDriverState *bs,
31
BlockDriverState *base,
32
+ bool include_base,
33
bool want_zero,
34
int64_t offset,
35
int64_t bytes,
36
diff --git a/block/io.c b/block/io.c
37
index XXXXXXX..XXXXXXX 100644
38
--- a/block/io.c
39
+++ b/block/io.c
40
@@ -XXX,XX +XXX,XX @@ early_out:
41
int coroutine_fn
42
bdrv_co_common_block_status_above(BlockDriverState *bs,
43
BlockDriverState *base,
44
+ bool include_base,
45
bool want_zero,
46
int64_t offset,
47
int64_t bytes,
48
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
49
BlockDriverState *p;
50
int64_t eof = 0;
51
52
- assert(bs != base);
53
+ assert(include_base || bs != base);
54
+ assert(!include_base || base); /* Can't include NULL base */
55
56
ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
57
- if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED) {
58
+ if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
59
return ret;
60
}
61
62
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
63
assert(*pnum <= bytes);
64
bytes = *pnum;
65
66
- for (p = bdrv_filter_or_cow_bs(bs); p != base;
67
+ for (p = bdrv_filter_or_cow_bs(bs); include_base || p != base;
68
p = bdrv_filter_or_cow_bs(p))
69
{
70
ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
71
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
72
break;
73
}
74
75
+ if (p == base) {
76
+ assert(include_base);
77
+ break;
78
+ }
79
+
80
/*
81
* OK, [offset, offset + *pnum) region is unallocated on this layer,
82
* let's continue the diving.
83
@@ -XXX,XX +XXX,XX @@ int bdrv_block_status_above(BlockDriverState *bs, BlockDriverState *base,
84
int64_t offset, int64_t bytes, int64_t *pnum,
85
int64_t *map, BlockDriverState **file)
86
{
87
- return bdrv_common_block_status_above(bs, base, true, offset, bytes,
88
+ return bdrv_common_block_status_above(bs, base, false, true, offset, bytes,
89
pnum, map, file);
90
}
91
92
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
93
int ret;
94
int64_t dummy;
95
96
- ret = bdrv_common_block_status_above(bs, bdrv_filter_or_cow_bs(bs), false,
97
- offset, bytes, pnum ? pnum : &dummy,
98
- NULL, NULL);
99
+ ret = bdrv_common_block_status_above(bs, bs, true, false, offset,
100
+ bytes, pnum ? pnum : &dummy, NULL,
101
+ NULL);
102
if (ret < 0) {
103
return ret;
104
}
105
--
106
2.26.2
107
diff view generated by jsdifflib
Deleted patch
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2
1
3
We are going to reuse bdrv_common_block_status_above in
4
bdrv_is_allocated_above. bdrv_is_allocated_above may be called with
5
include_base == false and still bs == base (for ex. from img_rebase()).
6
7
So, support this corner case.
8
9
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
10
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
11
Reviewed-by: Eric Blake <eblake@redhat.com>
12
Reviewed-by: Alberto Garcia <berto@igalia.com>
13
Message-id: 20200924194003.22080-4-vsementsov@virtuozzo.com
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
15
---
16
block/io.c | 6 +++++-
17
1 file changed, 5 insertions(+), 1 deletion(-)
18
19
diff --git a/block/io.c b/block/io.c
20
index XXXXXXX..XXXXXXX 100644
21
--- a/block/io.c
22
+++ b/block/io.c
23
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
24
BlockDriverState *p;
25
int64_t eof = 0;
26
27
- assert(include_base || bs != base);
28
assert(!include_base || base); /* Can't include NULL base */
29
30
+ if (!include_base && bs == base) {
31
+ *pnum = bytes;
32
+ return 0;
33
+ }
34
+
35
ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
36
if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
37
return ret;
38
--
39
2.26.2
40
diff view generated by jsdifflib