1
The following changes since commit 0b6206b9c6825619cd721085fe082d7a0abc9af4:
1
The following changes since commit ac793156f650ae2d77834932d72224175ee69086:
2
2
3
Merge remote-tracking branch 'remotes/rth-gitlab/tags/pull-tcg-20210914-4' into staging (2021-09-15 13:27:49 +0100)
3
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20201020-1' into staging (2020-10-20 21:11:35 +0100)
4
4
5
are available in the Git repository at:
5
are available in the Git repository at:
6
6
7
https://github.com/XanClic/qemu.git tags/pull-block-2021-09-15
7
https://gitlab.com/stefanha/qemu.git tags/block-pull-request
8
8
9
for you to fetch changes up to 1899bf47375ad40555dcdff12ba49b4b8b82df38:
9
for you to fetch changes up to 32a3fd65e7e3551337fd26bfc0e2f899d70c028c:
10
10
11
qemu-img: Add -F shorthand to convert (2021-09-15 18:42:38 +0200)
11
iotests: add commit top->base cases to 274 (2020-10-22 09:55:39 +0100)
12
12
13
----------------------------------------------------------------
13
----------------------------------------------------------------
14
Block patches:
14
Pull request
15
- Block-status cache for data regions
15
16
- qcow2 optimization (when using subclusters)
16
v2:
17
- iotests delinting, and let 297 (lint checker) cover named iotests
17
* Fix format string issues on 32-bit hosts [Peter]
18
- qcow2 check improvements
18
* Fix qemu-nbd.c CONFIG_POSIX ifdef issue [Eric]
19
- Added -F (target backing file format) option to qemu-img convert
19
* Fix missing eventfd.h header on macOS [Peter]
20
- Mirror job fix
20
* Drop unreliable vhost-user-blk test (will send a new patch when ready) [Peter]
21
- Fix for when a migration is initiated while a backup job runs
21
22
- Fix for uncached qemu-img convert to a volume with 4k sectors (for an
22
This pull request contains the vhost-user-blk server by Coiby Xu along with my
23
unaligned image)
23
additions, block/nvme.c alignment and hardware error statistics by Philippe
24
- Minor gluster driver fix
24
Mathieu-Daudé, and bdrv_co_block_status_above() fixes by Vladimir
25
Sementsov-Ogievskiy.
25
26
26
----------------------------------------------------------------
27
----------------------------------------------------------------
27
Eric Blake (1):
28
qemu-img: Add -F shorthand to convert
29
28
30
Hanna Reitz (15):
29
Coiby Xu (6):
31
gluster: Align block-status tail
30
libvhost-user: Allow vu_message_read to be replaced
32
block: Drop BDS comment regarding bdrv_append()
31
libvhost-user: remove watch for kick_fd when de-initialize vu-dev
33
block: block-status cache for data regions
32
util/vhost-user-server: generic vhost user server
34
block: Clarify that @bytes is no limit on *pnum
33
block: move logical block size check function to a common utility
35
block/file-posix: Do not force-cap *pnum
34
function
36
block/gluster: Do not force-cap *pnum
35
block/export: vhost-user block device backend server
37
block/iscsi: Do not force-cap *pnum
36
MAINTAINERS: Add vhost-user block device backend server maintainer
38
iotests: Fix unspecified-encoding pylint warnings
39
iotests: Fix use-{list,dict}-literal warnings
40
iotests/297: Drop 169 and 199 from the skip list
41
migrate-bitmaps-postcopy-test: Fix pylint warnings
42
migrate-bitmaps-test: Fix pylint warnings
43
mirror-top-perms: Fix AbnormalShutdown path
44
iotests/297: Cover tests/
45
qemu-img: Allow target be aligned to sector size
46
37
47
Stefano Garzarella (1):
38
Philippe Mathieu-Daudé (1):
48
block/mirror: fix NULL pointer dereference in
39
block/nvme: Add driver statistics for access alignment and hw errors
49
mirror_wait_on_conflicts()
50
40
51
Vladimir Sementsov-Ogievskiy (15):
41
Stefan Hajnoczi (16):
52
tests: add migrate-during-backup
42
util/vhost-user-server: s/fileds/fields/ typo fix
53
block: bdrv_inactivate_recurse(): check for permissions and fix crash
43
util/vhost-user-server: drop unnecessary QOM cast
54
simplebench: add img_bench_templater.py
44
util/vhost-user-server: drop unnecessary watch deletion
55
qcow2: refactor handle_dependencies() loop body
45
block/export: consolidate request structs into VuBlockReq
56
qcow2: handle_dependencies(): relax conflict detection
46
util/vhost-user-server: drop unused DevicePanicNotifier
57
qcow2-refcount: improve style of check_refcounts_l2()
47
util/vhost-user-server: fix memory leak in vu_message_read()
58
qcow2: compressed read: simplify cluster descriptor passing
48
util/vhost-user-server: check EOF when reading payload
59
qcow2: introduce qcow2_parse_compressed_l2_entry() helper
49
util/vhost-user-server: rework vu_client_trip() coroutine lifecycle
60
qcow2-refcount: introduce fix_l2_entry_by_zero()
50
block/export: report flush errors
61
qcow2-refcount: fix_l2_entry_by_zero(): also zero L2 entry bitmap
51
block/export: convert vhost-user-blk server to block export API
62
qcow2-refcount: check_refcounts_l2(): check l2_bitmap
52
util/vhost-user-server: move header to include/
63
qcow2-refcount: check_refcounts_l2(): check reserved bits
53
util/vhost-user-server: use static library in meson.build
64
qcow2-refcount: improve style of check_refcounts_l1()
54
qemu-storage-daemon: avoid compiling blockdev_ss twice
65
qcow2-refcount: check_refcounts_l1(): check reserved bits
55
block: move block exports to libblockdev
66
qcow2-refcount: check_refblocks(): add separate message for reserved
56
block/export: add iothread and fixed-iothread options
57
block/export: add vhost-user-blk multi-queue support
67
58
68
docs/tools/qemu-img.rst | 4 +-
59
Vladimir Sementsov-Ogievskiy (5):
69
block/qcow2.h | 7 +-
60
block/io: fix bdrv_co_block_status_above
70
include/block/block_int.h | 61 +++-
61
block/io: bdrv_common_block_status_above: support include_base
71
block.c | 88 +++++
62
block/io: bdrv_common_block_status_above: support bs == base
72
block/file-posix.c | 7 +-
63
block/io: fix bdrv_is_allocated_above
73
block/gluster.c | 23 +-
64
iotests: add commit top->base cases to 274
74
block/io.c | 68 +++-
65
75
block/iscsi.c | 3 -
66
MAINTAINERS | 9 +
76
block/mirror.c | 25 +-
67
qapi/block-core.json | 24 +-
77
block/qcow2-cluster.c | 78 +++--
68
qapi/block-export.json | 36 +-
78
block/qcow2-refcount.c | 326 ++++++++++++------
69
block/coroutines.h | 2 +
79
block/qcow2.c | 13 +-
70
block/export/vhost-user-blk-server.h | 19 +
80
qemu-img.c | 18 +-
71
contrib/libvhost-user/libvhost-user.h | 21 +
81
qemu-img-cmds.hx | 2 +-
72
include/qemu/vhost-user-server.h | 65 +++
82
scripts/simplebench/img_bench_templater.py | 95 +++++
73
util/block-helpers.h | 19 +
83
scripts/simplebench/table_templater.py | 62 ++++
74
block/export/export.c | 37 +-
84
tests/qemu-iotests/122 | 2 +-
75
block/export/vhost-user-blk-server.c | 431 ++++++++++++++++++++
85
tests/qemu-iotests/271 | 5 +-
76
block/io.c | 132 +++---
86
tests/qemu-iotests/271.out | 4 +-
77
block/nvme.c | 27 ++
87
tests/qemu-iotests/297 | 9 +-
78
block/qcow2.c | 16 +-
88
tests/qemu-iotests/iotests.py | 12 +-
79
contrib/libvhost-user/libvhost-user-glib.c | 2 +-
89
.../tests/migrate-bitmaps-postcopy-test | 13 +-
80
contrib/libvhost-user/libvhost-user.c | 15 +-
90
tests/qemu-iotests/tests/migrate-bitmaps-test | 43 ++-
81
hw/core/qdev-properties-system.c | 31 +-
91
.../qemu-iotests/tests/migrate-during-backup | 97 ++++++
82
nbd/server.c | 2 -
92
.../tests/migrate-during-backup.out | 5 +
83
qemu-nbd.c | 21 +-
93
tests/qemu-iotests/tests/mirror-top-perms | 2 +-
84
softmmu/vl.c | 4 +
94
26 files changed, 855 insertions(+), 217 deletions(-)
85
stubs/blk-exp-close-all.c | 7 +
95
create mode 100755 scripts/simplebench/img_bench_templater.py
86
tests/vhost-user-bridge.c | 2 +
96
create mode 100644 scripts/simplebench/table_templater.py
87
tools/virtiofsd/fuse_virtio.c | 4 +-
97
create mode 100755 tests/qemu-iotests/tests/migrate-during-backup
88
util/block-helpers.c | 46 +++
98
create mode 100644 tests/qemu-iotests/tests/migrate-during-backup.out
89
util/vhost-user-server.c | 446 +++++++++++++++++++++
90
block/export/meson.build | 3 +-
91
contrib/libvhost-user/meson.build | 1 +
92
meson.build | 22 +-
93
nbd/meson.build | 2 +
94
storage-daemon/meson.build | 3 +-
95
stubs/meson.build | 1 +
96
tests/qemu-iotests/274 | 20 +
97
tests/qemu-iotests/274.out | 68 ++++
98
util/meson.build | 4 +
99
33 files changed, 1420 insertions(+), 122 deletions(-)
100
create mode 100644 block/export/vhost-user-blk-server.h
101
create mode 100644 include/qemu/vhost-user-server.h
102
create mode 100644 util/block-helpers.h
103
create mode 100644 block/export/vhost-user-blk-server.c
104
create mode 100644 stubs/blk-exp-close-all.c
105
create mode 100644 util/block-helpers.c
106
create mode 100644 util/vhost-user-server.c
99
107
100
--
108
--
101
2.31.1
109
2.26.2
102
110
103
diff view generated by jsdifflib
1
There is a comment above the BDS definition stating care must be taken
1
From: Philippe Mathieu-Daudé <philmd@redhat.com>
2
to consider handling newly added fields in bdrv_append().
3
2
4
Actually, this comment should have said "bdrv_swap()" as of 4ddc07cac
3
Keep statistics of some hardware errors, and number of
5
(nine years ago), and in any case, bdrv_swap() was dropped in
4
aligned/unaligned I/O accesses.
6
8e419aefa (six years ago). So no such care is necessary anymore.
7
5
8
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
6
QMP example booting a full RHEL 8.3 aarch64 guest:
9
Reviewed-by: Eric Blake <eblake@redhat.com>
7
10
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
8
{ "execute": "query-blockstats" }
11
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
9
{
12
Message-Id: <20210812084148.14458-2-hreitz@redhat.com>
10
"return": [
11
{
12
"device": "",
13
"node-name": "drive0",
14
"stats": {
15
"flush_total_time_ns": 6026948,
16
"wr_highest_offset": 3383991230464,
17
"wr_total_time_ns": 807450995,
18
"failed_wr_operations": 0,
19
"failed_rd_operations": 0,
20
"wr_merged": 3,
21
"wr_bytes": 50133504,
22
"failed_unmap_operations": 0,
23
"failed_flush_operations": 0,
24
"account_invalid": false,
25
"rd_total_time_ns": 1846979900,
26
"flush_operations": 130,
27
"wr_operations": 659,
28
"rd_merged": 1192,
29
"rd_bytes": 218244096,
30
"account_failed": false,
31
"idle_time_ns": 2678641497,
32
"rd_operations": 7406,
33
},
34
"driver-specific": {
35
"driver": "nvme",
36
"completion-errors": 0,
37
"unaligned-accesses": 2959,
38
"aligned-accesses": 4477
39
},
40
"qdev": "/machine/peripheral-anon/device[0]/virtio-backend"
41
}
42
]
43
}
44
45
Suggested-by: Stefan Hajnoczi <stefanha@gmail.com>
46
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
47
Acked-by: Markus Armbruster <armbru@redhat.com>
48
Message-id: 20201001162939.1567915-1-philmd@redhat.com
49
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
---
50
---
14
include/block/block_int.h | 6 ------
51
qapi/block-core.json | 24 +++++++++++++++++++++++-
15
1 file changed, 6 deletions(-)
52
block/nvme.c | 27 +++++++++++++++++++++++++++
53
2 files changed, 50 insertions(+), 1 deletion(-)
16
54
17
diff --git a/include/block/block_int.h b/include/block/block_int.h
55
diff --git a/qapi/block-core.json b/qapi/block-core.json
18
index XXXXXXX..XXXXXXX 100644
56
index XXXXXXX..XXXXXXX 100644
19
--- a/include/block/block_int.h
57
--- a/qapi/block-core.json
20
+++ b/include/block/block_int.h
58
+++ b/qapi/block-core.json
21
@@ -XXX,XX +XXX,XX @@ struct BdrvChild {
59
@@ -XXX,XX +XXX,XX @@
22
QLIST_ENTRY(BdrvChild) next_parent;
60
'discard-nb-failed': 'uint64',
61
'discard-bytes-ok': 'uint64' } }
62
63
+##
64
+# @BlockStatsSpecificNvme:
65
+#
66
+# NVMe driver statistics
67
+#
68
+# @completion-errors: The number of completion errors.
69
+#
70
+# @aligned-accesses: The number of aligned accesses performed by
71
+# the driver.
72
+#
73
+# @unaligned-accesses: The number of unaligned accesses performed by
74
+# the driver.
75
+#
76
+# Since: 5.2
77
+##
78
+{ 'struct': 'BlockStatsSpecificNvme',
79
+ 'data': {
80
+ 'completion-errors': 'uint64',
81
+ 'aligned-accesses': 'uint64',
82
+ 'unaligned-accesses': 'uint64' } }
83
+
84
##
85
# @BlockStatsSpecific:
86
#
87
@@ -XXX,XX +XXX,XX @@
88
'discriminator': 'driver',
89
'data': {
90
'file': 'BlockStatsSpecificFile',
91
- 'host_device': 'BlockStatsSpecificFile' } }
92
+ 'host_device': 'BlockStatsSpecificFile',
93
+ 'nvme': 'BlockStatsSpecificNvme' } }
94
95
##
96
# @BlockStats:
97
diff --git a/block/nvme.c b/block/nvme.c
98
index XXXXXXX..XXXXXXX 100644
99
--- a/block/nvme.c
100
+++ b/block/nvme.c
101
@@ -XXX,XX +XXX,XX @@ struct BDRVNVMeState {
102
103
/* PCI address (required for nvme_refresh_filename()) */
104
char *device;
105
+
106
+ struct {
107
+ uint64_t completion_errors;
108
+ uint64_t aligned_accesses;
109
+ uint64_t unaligned_accesses;
110
+ } stats;
23
};
111
};
24
112
25
-/*
113
#define NVME_BLOCK_OPT_DEVICE "device"
26
- * Note: the function bdrv_append() copies and swaps contents of
114
@@ -XXX,XX +XXX,XX @@ static bool nvme_process_completion(NVMeQueuePair *q)
27
- * BlockDriverStates, so if you add new fields to this struct, please
115
break;
28
- * inspect bdrv_append() to determine if the new fields need to be
116
}
29
- * copied as well.
117
ret = nvme_translate_error(c);
30
- */
118
+ if (ret) {
31
struct BlockDriverState {
119
+ s->stats.completion_errors++;
32
/* Protected by big QEMU lock or read-only after opening. No special
120
+ }
33
* locking needed during I/O...
121
q->cq.head = (q->cq.head + 1) % NVME_QUEUE_SIZE;
122
if (!q->cq.head) {
123
q->cq_phase = !q->cq_phase;
124
@@ -XXX,XX +XXX,XX @@ static int nvme_co_prw(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
125
assert(QEMU_IS_ALIGNED(bytes, s->page_size));
126
assert(bytes <= s->max_transfer);
127
if (nvme_qiov_aligned(bs, qiov)) {
128
+ s->stats.aligned_accesses++;
129
return nvme_co_prw_aligned(bs, offset, bytes, qiov, is_write, flags);
130
}
131
+ s->stats.unaligned_accesses++;
132
trace_nvme_prw_buffered(s, offset, bytes, qiov->niov, is_write);
133
buf = qemu_try_memalign(s->page_size, bytes);
134
135
@@ -XXX,XX +XXX,XX @@ static void nvme_unregister_buf(BlockDriverState *bs, void *host)
136
qemu_vfio_dma_unmap(s->vfio, host);
137
}
138
139
+static BlockStatsSpecific *nvme_get_specific_stats(BlockDriverState *bs)
140
+{
141
+ BlockStatsSpecific *stats = g_new(BlockStatsSpecific, 1);
142
+ BDRVNVMeState *s = bs->opaque;
143
+
144
+ stats->driver = BLOCKDEV_DRIVER_NVME;
145
+ stats->u.nvme = (BlockStatsSpecificNvme) {
146
+ .completion_errors = s->stats.completion_errors,
147
+ .aligned_accesses = s->stats.aligned_accesses,
148
+ .unaligned_accesses = s->stats.unaligned_accesses,
149
+ };
150
+
151
+ return stats;
152
+}
153
+
154
static const char *const nvme_strong_runtime_opts[] = {
155
NVME_BLOCK_OPT_DEVICE,
156
NVME_BLOCK_OPT_NAMESPACE,
157
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_nvme = {
158
.bdrv_refresh_filename = nvme_refresh_filename,
159
.bdrv_refresh_limits = nvme_refresh_limits,
160
.strong_runtime_opts = nvme_strong_runtime_opts,
161
+ .bdrv_get_specific_stats = nvme_get_specific_stats,
162
163
.bdrv_detach_aio_context = nvme_detach_aio_context,
164
.bdrv_attach_aio_context = nvme_attach_aio_context,
34
--
165
--
35
2.31.1
166
2.26.2
36
167
37
diff view generated by jsdifflib
1
We cannot write to images opened with O_DIRECT unless we allow them to
1
From: Coiby Xu <coiby.xu@gmail.com>
2
be resized so they are aligned to the sector size: Since 9c60a5d1978,
3
bdrv_node_refresh_perm() ensures that for nodes whose length is not
4
aligned to the request alignment and where someone has taken a WRITE
5
permission, the RESIZE permission is taken, too).
6
2
7
Let qemu-img convert pass the BDRV_O_RESIZE flag (which causes
3
Allow vu_message_read to be replaced by one which will make use of the
8
blk_new_open() to take the RESIZE permission) when using cache=none for
4
QIOChannel functions. Thus reading vhost-user message won't stall the
9
the target, so that when writing to it, it can be aligned to the target
5
guest. For slave channel, we still use the default vu_message_read.
10
sector size.
11
6
12
Without this patch, an error is returned:
7
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
8
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Message-id: 20200918080912.321299-2-coiby.xu@gmail.com
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
13
contrib/libvhost-user/libvhost-user.h | 21 +++++++++++++++++++++
14
contrib/libvhost-user/libvhost-user-glib.c | 2 +-
15
contrib/libvhost-user/libvhost-user.c | 14 +++++++-------
16
tests/vhost-user-bridge.c | 2 ++
17
tools/virtiofsd/fuse_virtio.c | 4 ++--
18
5 files changed, 33 insertions(+), 10 deletions(-)
13
19
14
$ qemu-img convert -f raw -O raw -t none foo.img /mnt/tmp/foo.img
20
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
15
qemu-img: Could not open '/mnt/tmp/foo.img': Cannot get 'write'
16
permission without 'resize': Image size is not a multiple of request
17
alignment
18
19
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1994266
20
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
21
Message-Id: <20210819101200.64235-1-hreitz@redhat.com>
22
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
23
---
24
qemu-img.c | 8 ++++++++
25
1 file changed, 8 insertions(+)
26
27
diff --git a/qemu-img.c b/qemu-img.c
28
index XXXXXXX..XXXXXXX 100644
21
index XXXXXXX..XXXXXXX 100644
29
--- a/qemu-img.c
22
--- a/contrib/libvhost-user/libvhost-user.h
30
+++ b/qemu-img.c
23
+++ b/contrib/libvhost-user/libvhost-user.h
31
@@ -XXX,XX +XXX,XX @@ static int img_convert(int argc, char **argv)
24
@@ -XXX,XX +XXX,XX @@
25
*/
26
#define VHOST_USER_MAX_RAM_SLOTS 32
27
28
+#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
29
+
30
typedef enum VhostSetConfigType {
31
VHOST_SET_CONFIG_TYPE_MASTER = 0,
32
VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
33
@@ -XXX,XX +XXX,XX @@ typedef uint64_t (*vu_get_features_cb) (VuDev *dev);
34
typedef void (*vu_set_features_cb) (VuDev *dev, uint64_t features);
35
typedef int (*vu_process_msg_cb) (VuDev *dev, VhostUserMsg *vmsg,
36
int *do_reply);
37
+typedef bool (*vu_read_msg_cb) (VuDev *dev, int sock, VhostUserMsg *vmsg);
38
typedef void (*vu_queue_set_started_cb) (VuDev *dev, int qidx, bool started);
39
typedef bool (*vu_queue_is_processed_in_order_cb) (VuDev *dev, int qidx);
40
typedef int (*vu_get_config_cb) (VuDev *dev, uint8_t *config, uint32_t len);
41
@@ -XXX,XX +XXX,XX @@ struct VuDev {
42
bool broken;
43
uint16_t max_queues;
44
45
+ /* @read_msg: custom method to read vhost-user message
46
+ *
47
+ * Read data from vhost_user socket fd and fill up
48
+ * the passed VhostUserMsg *vmsg struct.
49
+ *
50
+ * If reading fails, it should close the received set of file
51
+ * descriptors as socket message's auxiliary data.
52
+ *
53
+ * For the details, please refer to vu_message_read in libvhost-user.c
54
+ * which will be used by default if not custom method is provided when
55
+ * calling vu_init
56
+ *
57
+ * Returns: true if vhost-user message successfully received,
58
+ * otherwise return false.
59
+ *
60
+ */
61
+ vu_read_msg_cb read_msg;
62
/* @set_watch: add or update the given fd to the watch set,
63
* call cb when condition is met */
64
vu_set_watch_cb set_watch;
65
@@ -XXX,XX +XXX,XX @@ bool vu_init(VuDev *dev,
66
uint16_t max_queues,
67
int socket,
68
vu_panic_cb panic,
69
+ vu_read_msg_cb read_msg,
70
vu_set_watch_cb set_watch,
71
vu_remove_watch_cb remove_watch,
72
const VuDevIface *iface);
73
diff --git a/contrib/libvhost-user/libvhost-user-glib.c b/contrib/libvhost-user/libvhost-user-glib.c
74
index XXXXXXX..XXXXXXX 100644
75
--- a/contrib/libvhost-user/libvhost-user-glib.c
76
+++ b/contrib/libvhost-user/libvhost-user-glib.c
77
@@ -XXX,XX +XXX,XX @@ vug_init(VugDev *dev, uint16_t max_queues, int socket,
78
g_assert(dev);
79
g_assert(iface);
80
81
- if (!vu_init(&dev->parent, max_queues, socket, panic, set_watch,
82
+ if (!vu_init(&dev->parent, max_queues, socket, panic, NULL, set_watch,
83
remove_watch, iface)) {
84
return false;
85
}
86
diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
87
index XXXXXXX..XXXXXXX 100644
88
--- a/contrib/libvhost-user/libvhost-user.c
89
+++ b/contrib/libvhost-user/libvhost-user.c
90
@@ -XXX,XX +XXX,XX @@
91
/* The version of inflight buffer */
92
#define INFLIGHT_VERSION 1
93
94
-#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
95
-
96
/* The version of the protocol we support */
97
#define VHOST_USER_VERSION 1
98
#define LIBVHOST_USER_DEBUG 0
99
@@ -XXX,XX +XXX,XX @@ have_userfault(void)
100
}
101
102
static bool
103
-vu_message_read(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
104
+vu_message_read_default(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
105
{
106
char control[CMSG_SPACE(VHOST_MEMORY_BASELINE_NREGIONS * sizeof(int))] = {};
107
struct iovec iov = {
108
@@ -XXX,XX +XXX,XX @@ vu_process_message_reply(VuDev *dev, const VhostUserMsg *vmsg)
32
goto out;
109
goto out;
33
}
110
}
34
111
35
+ if (flags & BDRV_O_NOCACHE) {
112
- if (!vu_message_read(dev, dev->slave_fd, &msg_reply)) {
36
+ /*
113
+ if (!vu_message_read_default(dev, dev->slave_fd, &msg_reply)) {
37
+ * If we open the target with O_DIRECT, it may be necessary to
114
goto out;
38
+ * extend its size to align to the physical sector size.
115
}
39
+ */
116
40
+ flags |= BDRV_O_RESIZE;
117
@@ -XXX,XX +XXX,XX @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
41
+ }
118
/* Wait for QEMU to confirm that it's registered the handler for the
42
+
119
* faults.
43
if (skip_create) {
120
*/
44
s.target = img_open(tgt_image_opts, out_filename, out_fmt,
121
- if (!vu_message_read(dev, dev->sock, vmsg) ||
45
flags, writethrough, s.quiet, false);
122
+ if (!dev->read_msg(dev, dev->sock, vmsg) ||
123
vmsg->size != sizeof(vmsg->payload.u64) ||
124
vmsg->payload.u64 != 0) {
125
vu_panic(dev, "failed to receive valid ack for postcopy set-mem-table");
126
@@ -XXX,XX +XXX,XX @@ vu_dispatch(VuDev *dev)
127
int reply_requested;
128
bool need_reply, success = false;
129
130
- if (!vu_message_read(dev, dev->sock, &vmsg)) {
131
+ if (!dev->read_msg(dev, dev->sock, &vmsg)) {
132
goto end;
133
}
134
135
@@ -XXX,XX +XXX,XX @@ vu_init(VuDev *dev,
136
uint16_t max_queues,
137
int socket,
138
vu_panic_cb panic,
139
+ vu_read_msg_cb read_msg,
140
vu_set_watch_cb set_watch,
141
vu_remove_watch_cb remove_watch,
142
const VuDevIface *iface)
143
@@ -XXX,XX +XXX,XX @@ vu_init(VuDev *dev,
144
145
dev->sock = socket;
146
dev->panic = panic;
147
+ dev->read_msg = read_msg ? read_msg : vu_message_read_default;
148
dev->set_watch = set_watch;
149
dev->remove_watch = remove_watch;
150
dev->iface = iface;
151
@@ -XXX,XX +XXX,XX @@ static void _vu_queue_notify(VuDev *dev, VuVirtq *vq, bool sync)
152
153
vu_message_write(dev, dev->slave_fd, &vmsg);
154
if (ack) {
155
- vu_message_read(dev, dev->slave_fd, &vmsg);
156
+ vu_message_read_default(dev, dev->slave_fd, &vmsg);
157
}
158
return;
159
}
160
diff --git a/tests/vhost-user-bridge.c b/tests/vhost-user-bridge.c
161
index XXXXXXX..XXXXXXX 100644
162
--- a/tests/vhost-user-bridge.c
163
+++ b/tests/vhost-user-bridge.c
164
@@ -XXX,XX +XXX,XX @@ vubr_accept_cb(int sock, void *ctx)
165
VHOST_USER_BRIDGE_MAX_QUEUES,
166
conn_fd,
167
vubr_panic,
168
+ NULL,
169
vubr_set_watch,
170
vubr_remove_watch,
171
&vuiface)) {
172
@@ -XXX,XX +XXX,XX @@ vubr_new(const char *path, bool client)
173
VHOST_USER_BRIDGE_MAX_QUEUES,
174
dev->sock,
175
vubr_panic,
176
+ NULL,
177
vubr_set_watch,
178
vubr_remove_watch,
179
&vuiface)) {
180
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
181
index XXXXXXX..XXXXXXX 100644
182
--- a/tools/virtiofsd/fuse_virtio.c
183
+++ b/tools/virtiofsd/fuse_virtio.c
184
@@ -XXX,XX +XXX,XX @@ int virtio_session_mount(struct fuse_session *se)
185
se->vu_socketfd = data_sock;
186
se->virtio_dev->se = se;
187
pthread_rwlock_init(&se->virtio_dev->vu_dispatch_rwlock, NULL);
188
- vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, fv_set_watch,
189
- fv_remove_watch, &fv_iface);
190
+ vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, NULL,
191
+ fv_set_watch, fv_remove_watch, &fv_iface);
192
193
return 0;
194
}
46
--
195
--
47
2.31.1
196
2.26.2
48
197
49
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Coiby Xu <coiby.xu@gmail.com>
2
2
3
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
3
When the client is running in gdb and quit command is run in gdb,
4
Reviewed-by: Eric Blake <eblake@redhat.com>
4
QEMU will still dispatch the event which will cause segment fault in
5
Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
5
the callback function.
6
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
6
7
Message-Id: <20210914122454.141075-10-vsementsov@virtuozzo.com>
7
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
8
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
8
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
10
Message-id: 20200918080912.321299-3-coiby.xu@gmail.com
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
---
12
---
10
block/qcow2.h | 1 +
13
contrib/libvhost-user/libvhost-user.c | 1 +
11
block/qcow2-refcount.c | 6 ++++++
14
1 file changed, 1 insertion(+)
12
2 files changed, 7 insertions(+)
13
15
14
diff --git a/block/qcow2.h b/block/qcow2.h
16
diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
15
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
16
--- a/block/qcow2.h
18
--- a/contrib/libvhost-user/libvhost-user.c
17
+++ b/block/qcow2.h
19
+++ b/contrib/libvhost-user/libvhost-user.c
18
@@ -XXX,XX +XXX,XX @@ typedef enum QCow2MetadataOverlap {
20
@@ -XXX,XX +XXX,XX @@ vu_deinit(VuDev *dev)
19
(QCOW2_OL_CACHED | QCOW2_OL_INACTIVE_L2)
20
21
#define L1E_OFFSET_MASK 0x00fffffffffffe00ULL
22
+#define L1E_RESERVED_MASK 0x7f000000000001ffULL
23
#define L2E_OFFSET_MASK 0x00fffffffffffe00ULL
24
#define L2E_STD_RESERVED_MASK 0x3f000000000001feULL
25
26
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
27
index XXXXXXX..XXXXXXX 100644
28
--- a/block/qcow2-refcount.c
29
+++ b/block/qcow2-refcount.c
30
@@ -XXX,XX +XXX,XX @@ static int check_refcounts_l1(BlockDriverState *bs,
31
continue;
32
}
21
}
33
22
34
+ if (l1_table[i] & L1E_RESERVED_MASK) {
23
if (vq->kick_fd != -1) {
35
+ fprintf(stderr, "ERROR found L1 entry with reserved bits set: "
24
+ dev->remove_watch(dev, vq->kick_fd);
36
+ "%" PRIx64 "\n", l1_table[i]);
25
close(vq->kick_fd);
37
+ res->corruptions++;
26
vq->kick_fd = -1;
38
+ }
27
}
39
+
40
l2_offset = l1_table[i] & L1E_OFFSET_MASK;
41
42
/* Mark L2 table as used */
43
--
28
--
44
2.31.1
29
2.26.2
45
30
46
diff view generated by jsdifflib
1
From: Eric Blake <eblake@redhat.com>
1
From: Coiby Xu <coiby.xu@gmail.com>
2
2
3
Although we have long supported 'qemu-img convert -o
3
Sharing QEMU devices via vhost-user protocol.
4
backing_file=foo,backing_fmt=bar', the fact that we have a shortcut -B
5
for backing_file but none for backing_fmt has made it more likely that
6
users accidentally run into:
7
4
8
qemu-img: warning: Deprecated use of backing file without explicit backing format
5
Only one vhost-user client can connect to the server one time.
9
6
10
when using -B instead of -o. For similarity with other qemu-img
7
Suggested-by: Kevin Wolf <kwolf@redhat.com>
11
commands, such as create and compare, add '-F $fmt' as the shorthand
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
for '-o backing_fmt=$fmt'. Update iotest 122 for coverage of both
9
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
13
spellings.
10
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
12
Message-id: 20200918080912.321299-4-coiby.xu@gmail.com
13
[Fixed size_t %lu -> %zu format string compiler error.
14
--Stefan]
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
16
---
17
util/vhost-user-server.h | 65 ++++++
18
util/vhost-user-server.c | 428 +++++++++++++++++++++++++++++++++++++++
19
util/meson.build | 1 +
20
3 files changed, 494 insertions(+)
21
create mode 100644 util/vhost-user-server.h
22
create mode 100644 util/vhost-user-server.c
14
23
15
Signed-off-by: Eric Blake <eblake@redhat.com>
24
diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
16
Message-Id: <20210913131735.1948339-1-eblake@redhat.com>
25
new file mode 100644
17
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
26
index XXXXXXX..XXXXXXX
18
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
27
--- /dev/null
19
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
28
+++ b/util/vhost-user-server.h
20
---
29
@@ -XXX,XX +XXX,XX @@
21
docs/tools/qemu-img.rst | 4 ++--
30
+/*
22
qemu-img.c | 10 +++++++---
31
+ * Sharing QEMU devices via vhost-user protocol
23
qemu-img-cmds.hx | 2 +-
32
+ *
24
tests/qemu-iotests/122 | 2 +-
33
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
25
4 files changed, 11 insertions(+), 7 deletions(-)
34
+ * Copyright (c) 2020 Red Hat, Inc.
35
+ *
36
+ * This work is licensed under the terms of the GNU GPL, version 2 or
37
+ * later. See the COPYING file in the top-level directory.
38
+ */
39
+
40
+#ifndef VHOST_USER_SERVER_H
41
+#define VHOST_USER_SERVER_H
42
+
43
+#include "contrib/libvhost-user/libvhost-user.h"
44
+#include "io/channel-socket.h"
45
+#include "io/channel-file.h"
46
+#include "io/net-listener.h"
47
+#include "qemu/error-report.h"
48
+#include "qapi/error.h"
49
+#include "standard-headers/linux/virtio_blk.h"
50
+
51
+typedef struct VuFdWatch {
52
+ VuDev *vu_dev;
53
+ int fd; /*kick fd*/
54
+ void *pvt;
55
+ vu_watch_cb cb;
56
+ bool processing;
57
+ QTAILQ_ENTRY(VuFdWatch) next;
58
+} VuFdWatch;
59
+
60
+typedef struct VuServer VuServer;
61
+typedef void DevicePanicNotifierFn(VuServer *server);
62
+
63
+struct VuServer {
64
+ QIONetListener *listener;
65
+ AioContext *ctx;
66
+ DevicePanicNotifierFn *device_panic_notifier;
67
+ int max_queues;
68
+ const VuDevIface *vu_iface;
69
+ VuDev vu_dev;
70
+ QIOChannel *ioc; /* The I/O channel with the client */
71
+ QIOChannelSocket *sioc; /* The underlying data channel with the client */
72
+ /* IOChannel for fd provided via VHOST_USER_SET_SLAVE_REQ_FD */
73
+ QIOChannel *ioc_slave;
74
+ QIOChannelSocket *sioc_slave;
75
+ Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
76
+ QTAILQ_HEAD(, VuFdWatch) vu_fd_watches;
77
+ /* restart coroutine co_trip if AIOContext is changed */
78
+ bool aio_context_changed;
79
+ bool processing_msg;
80
+};
81
+
82
+bool vhost_user_server_start(VuServer *server,
83
+ SocketAddress *unix_socket,
84
+ AioContext *ctx,
85
+ uint16_t max_queues,
86
+ DevicePanicNotifierFn *device_panic_notifier,
87
+ const VuDevIface *vu_iface,
88
+ Error **errp);
89
+
90
+void vhost_user_server_stop(VuServer *server);
91
+
92
+void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx);
93
+
94
+#endif /* VHOST_USER_SERVER_H */
95
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
96
new file mode 100644
97
index XXXXXXX..XXXXXXX
98
--- /dev/null
99
+++ b/util/vhost-user-server.c
100
@@ -XXX,XX +XXX,XX @@
101
+/*
102
+ * Sharing QEMU devices via vhost-user protocol
103
+ *
104
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
105
+ * Copyright (c) 2020 Red Hat, Inc.
106
+ *
107
+ * This work is licensed under the terms of the GNU GPL, version 2 or
108
+ * later. See the COPYING file in the top-level directory.
109
+ */
110
+#include "qemu/osdep.h"
111
+#include "qemu/main-loop.h"
112
+#include "vhost-user-server.h"
113
+
114
+static void vmsg_close_fds(VhostUserMsg *vmsg)
115
+{
116
+ int i;
117
+ for (i = 0; i < vmsg->fd_num; i++) {
118
+ close(vmsg->fds[i]);
119
+ }
120
+}
121
+
122
+static void vmsg_unblock_fds(VhostUserMsg *vmsg)
123
+{
124
+ int i;
125
+ for (i = 0; i < vmsg->fd_num; i++) {
126
+ qemu_set_nonblock(vmsg->fds[i]);
127
+ }
128
+}
129
+
130
+static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
131
+ gpointer opaque);
132
+
133
+static void close_client(VuServer *server)
134
+{
135
+ /*
136
+ * Before closing the client
137
+ *
138
+ * 1. Let vu_client_trip stop processing new vhost-user msg
139
+ *
140
+ * 2. remove kick_handler
141
+ *
142
+ * 3. wait for the kick handler to be finished
143
+ *
144
+ * 4. wait for the current vhost-user msg to be finished processing
145
+ */
146
+
147
+ QIOChannelSocket *sioc = server->sioc;
148
+ /* When this is set vu_client_trip will stop new processing vhost-user message */
149
+ server->sioc = NULL;
150
+
151
+ VuFdWatch *vu_fd_watch, *next;
152
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
153
+ aio_set_fd_handler(server->ioc->ctx, vu_fd_watch->fd, true, NULL,
154
+ NULL, NULL, NULL);
155
+ }
156
+
157
+ while (!QTAILQ_EMPTY(&server->vu_fd_watches)) {
158
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
159
+ if (!vu_fd_watch->processing) {
160
+ QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
161
+ g_free(vu_fd_watch);
162
+ }
163
+ }
164
+ }
165
+
166
+ while (server->processing_msg) {
167
+ if (server->ioc->read_coroutine) {
168
+ server->ioc->read_coroutine = NULL;
169
+ qio_channel_set_aio_fd_handler(server->ioc, server->ioc->ctx, NULL,
170
+ NULL, server->ioc);
171
+ server->processing_msg = false;
172
+ }
173
+ }
174
+
175
+ vu_deinit(&server->vu_dev);
176
+ object_unref(OBJECT(sioc));
177
+ object_unref(OBJECT(server->ioc));
178
+}
179
+
180
+static void panic_cb(VuDev *vu_dev, const char *buf)
181
+{
182
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
183
+
184
+ /* avoid while loop in close_client */
185
+ server->processing_msg = false;
186
+
187
+ if (buf) {
188
+ error_report("vu_panic: %s", buf);
189
+ }
190
+
191
+ if (server->sioc) {
192
+ close_client(server);
193
+ }
194
+
195
+ if (server->device_panic_notifier) {
196
+ server->device_panic_notifier(server);
197
+ }
198
+
199
+ /*
200
+ * Set the callback function for network listener so another
201
+ * vhost-user client can connect to this server
202
+ */
203
+ qio_net_listener_set_client_func(server->listener,
204
+ vu_accept,
205
+ server,
206
+ NULL);
207
+}
208
+
209
+static bool coroutine_fn
210
+vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
211
+{
212
+ struct iovec iov = {
213
+ .iov_base = (char *)vmsg,
214
+ .iov_len = VHOST_USER_HDR_SIZE,
215
+ };
216
+ int rc, read_bytes = 0;
217
+ Error *local_err = NULL;
218
+ /*
219
+ * Store fds/nfds returned from qio_channel_readv_full into
220
+ * temporary variables.
221
+ *
222
+ * VhostUserMsg is a packed structure, gcc will complain about passing
223
+ * pointer to a packed structure member if we pass &VhostUserMsg.fd_num
224
+ * and &VhostUserMsg.fds directly when calling qio_channel_readv_full,
225
+ * thus two temporary variables nfds and fds are used here.
226
+ */
227
+ size_t nfds = 0, nfds_t = 0;
228
+ const size_t max_fds = G_N_ELEMENTS(vmsg->fds);
229
+ int *fds_t = NULL;
230
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
231
+ QIOChannel *ioc = server->ioc;
232
+
233
+ if (!ioc) {
234
+ error_report_err(local_err);
235
+ goto fail;
236
+ }
237
+
238
+ assert(qemu_in_coroutine());
239
+ do {
240
+ /*
241
+ * qio_channel_readv_full may have short reads, keeping calling it
242
+ * until getting VHOST_USER_HDR_SIZE or 0 bytes in total
243
+ */
244
+ rc = qio_channel_readv_full(ioc, &iov, 1, &fds_t, &nfds_t, &local_err);
245
+ if (rc < 0) {
246
+ if (rc == QIO_CHANNEL_ERR_BLOCK) {
247
+ qio_channel_yield(ioc, G_IO_IN);
248
+ continue;
249
+ } else {
250
+ error_report_err(local_err);
251
+ return false;
252
+ }
253
+ }
254
+ read_bytes += rc;
255
+ if (nfds_t > 0) {
256
+ if (nfds + nfds_t > max_fds) {
257
+ error_report("A maximum of %zu fds are allowed, "
258
+ "however got %zu fds now",
259
+ max_fds, nfds + nfds_t);
260
+ goto fail;
261
+ }
262
+ memcpy(vmsg->fds + nfds, fds_t,
263
+ nfds_t *sizeof(vmsg->fds[0]));
264
+ nfds += nfds_t;
265
+ g_free(fds_t);
266
+ }
267
+ if (read_bytes == VHOST_USER_HDR_SIZE || rc == 0) {
268
+ break;
269
+ }
270
+ iov.iov_base = (char *)vmsg + read_bytes;
271
+ iov.iov_len = VHOST_USER_HDR_SIZE - read_bytes;
272
+ } while (true);
273
+
274
+ vmsg->fd_num = nfds;
275
+ /* qio_channel_readv_full will make socket fds blocking, unblock them */
276
+ vmsg_unblock_fds(vmsg);
277
+ if (vmsg->size > sizeof(vmsg->payload)) {
278
+ error_report("Error: too big message request: %d, "
279
+ "size: vmsg->size: %u, "
280
+ "while sizeof(vmsg->payload) = %zu",
281
+ vmsg->request, vmsg->size, sizeof(vmsg->payload));
282
+ goto fail;
283
+ }
284
+
285
+ struct iovec iov_payload = {
286
+ .iov_base = (char *)&vmsg->payload,
287
+ .iov_len = vmsg->size,
288
+ };
289
+ if (vmsg->size) {
290
+ rc = qio_channel_readv_all_eof(ioc, &iov_payload, 1, &local_err);
291
+ if (rc == -1) {
292
+ error_report_err(local_err);
293
+ goto fail;
294
+ }
295
+ }
296
+
297
+ return true;
298
+
299
+fail:
300
+ vmsg_close_fds(vmsg);
301
+
302
+ return false;
303
+}
304
+
305
+
306
+static void vu_client_start(VuServer *server);
307
+static coroutine_fn void vu_client_trip(void *opaque)
308
+{
309
+ VuServer *server = opaque;
310
+
311
+ while (!server->aio_context_changed && server->sioc) {
312
+ server->processing_msg = true;
313
+ vu_dispatch(&server->vu_dev);
314
+ server->processing_msg = false;
315
+ }
316
+
317
+ if (server->aio_context_changed && server->sioc) {
318
+ server->aio_context_changed = false;
319
+ vu_client_start(server);
320
+ }
321
+}
322
+
323
+static void vu_client_start(VuServer *server)
324
+{
325
+ server->co_trip = qemu_coroutine_create(vu_client_trip, server);
326
+ aio_co_enter(server->ctx, server->co_trip);
327
+}
328
+
329
+/*
330
+ * a wrapper for vu_kick_cb
331
+ *
332
+ * since aio_dispatch can only pass one user data pointer to the
333
+ * callback function, pack VuDev and pvt into a struct. Then unpack it
334
+ * and pass them to vu_kick_cb
335
+ */
336
+static void kick_handler(void *opaque)
337
+{
338
+ VuFdWatch *vu_fd_watch = opaque;
339
+ vu_fd_watch->processing = true;
340
+ vu_fd_watch->cb(vu_fd_watch->vu_dev, 0, vu_fd_watch->pvt);
341
+ vu_fd_watch->processing = false;
342
+}
343
+
344
+
345
+static VuFdWatch *find_vu_fd_watch(VuServer *server, int fd)
346
+{
347
+
348
+ VuFdWatch *vu_fd_watch, *next;
349
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
350
+ if (vu_fd_watch->fd == fd) {
351
+ return vu_fd_watch;
352
+ }
353
+ }
354
+ return NULL;
355
+}
356
+
357
+static void
358
+set_watch(VuDev *vu_dev, int fd, int vu_evt,
359
+ vu_watch_cb cb, void *pvt)
360
+{
361
+
362
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
363
+ g_assert(vu_dev);
364
+ g_assert(fd >= 0);
365
+ g_assert(cb);
366
+
367
+ VuFdWatch *vu_fd_watch = find_vu_fd_watch(server, fd);
368
+
369
+ if (!vu_fd_watch) {
370
+ VuFdWatch *vu_fd_watch = g_new0(VuFdWatch, 1);
371
+
372
+ QTAILQ_INSERT_TAIL(&server->vu_fd_watches, vu_fd_watch, next);
373
+
374
+ vu_fd_watch->fd = fd;
375
+ vu_fd_watch->cb = cb;
376
+ qemu_set_nonblock(fd);
377
+ aio_set_fd_handler(server->ioc->ctx, fd, true, kick_handler,
378
+ NULL, NULL, vu_fd_watch);
379
+ vu_fd_watch->vu_dev = vu_dev;
380
+ vu_fd_watch->pvt = pvt;
381
+ }
382
+}
383
+
384
+
385
+static void remove_watch(VuDev *vu_dev, int fd)
386
+{
387
+ VuServer *server;
388
+ g_assert(vu_dev);
389
+ g_assert(fd >= 0);
390
+
391
+ server = container_of(vu_dev, VuServer, vu_dev);
392
+
393
+ VuFdWatch *vu_fd_watch = find_vu_fd_watch(server, fd);
394
+
395
+ if (!vu_fd_watch) {
396
+ return;
397
+ }
398
+ aio_set_fd_handler(server->ioc->ctx, fd, true, NULL, NULL, NULL, NULL);
399
+
400
+ QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
401
+ g_free(vu_fd_watch);
402
+}
403
+
404
+
405
+static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
406
+ gpointer opaque)
407
+{
408
+ VuServer *server = opaque;
409
+
410
+ if (server->sioc) {
411
+ warn_report("Only one vhost-user client is allowed to "
412
+ "connect the server one time");
413
+ return;
414
+ }
415
+
416
+ if (!vu_init(&server->vu_dev, server->max_queues, sioc->fd, panic_cb,
417
+ vu_message_read, set_watch, remove_watch, server->vu_iface)) {
418
+ error_report("Failed to initialize libvhost-user");
419
+ return;
420
+ }
421
+
422
+ /*
423
+ * Unset the callback function for network listener to make another
424
+ * vhost-user client keeping waiting until this client disconnects
425
+ */
426
+ qio_net_listener_set_client_func(server->listener,
427
+ NULL,
428
+ NULL,
429
+ NULL);
430
+ server->sioc = sioc;
431
+ /*
432
+ * Increase the object reference, so sioc will not freed by
433
+ * qio_net_listener_channel_func which will call object_unref(OBJECT(sioc))
434
+ */
435
+ object_ref(OBJECT(server->sioc));
436
+ qio_channel_set_name(QIO_CHANNEL(sioc), "vhost-user client");
437
+ server->ioc = QIO_CHANNEL(sioc);
438
+ object_ref(OBJECT(server->ioc));
439
+ qio_channel_attach_aio_context(server->ioc, server->ctx);
440
+ qio_channel_set_blocking(QIO_CHANNEL(server->sioc), false, NULL);
441
+ vu_client_start(server);
442
+}
443
+
444
+
445
+void vhost_user_server_stop(VuServer *server)
446
+{
447
+ if (server->sioc) {
448
+ close_client(server);
449
+ }
450
+
451
+ if (server->listener) {
452
+ qio_net_listener_disconnect(server->listener);
453
+ object_unref(OBJECT(server->listener));
454
+ }
455
+
456
+}
457
+
458
+void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx)
459
+{
460
+ VuFdWatch *vu_fd_watch, *next;
461
+ void *opaque = NULL;
462
+ IOHandler *io_read = NULL;
463
+ bool attach;
464
+
465
+ server->ctx = ctx ? ctx : qemu_get_aio_context();
466
+
467
+ if (!server->sioc) {
468
+ /* not yet serving any client*/
469
+ return;
470
+ }
471
+
472
+ if (ctx) {
473
+ qio_channel_attach_aio_context(server->ioc, ctx);
474
+ server->aio_context_changed = true;
475
+ io_read = kick_handler;
476
+ attach = true;
477
+ } else {
478
+ qio_channel_detach_aio_context(server->ioc);
479
+ /* server->ioc->ctx keeps the old AioConext */
480
+ ctx = server->ioc->ctx;
481
+ attach = false;
482
+ }
483
+
484
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
485
+ if (vu_fd_watch->cb) {
486
+ opaque = attach ? vu_fd_watch : NULL;
487
+ aio_set_fd_handler(ctx, vu_fd_watch->fd, true,
488
+ io_read, NULL, NULL,
489
+ opaque);
490
+ }
491
+ }
492
+}
493
+
494
+
495
+bool vhost_user_server_start(VuServer *server,
496
+ SocketAddress *socket_addr,
497
+ AioContext *ctx,
498
+ uint16_t max_queues,
499
+ DevicePanicNotifierFn *device_panic_notifier,
500
+ const VuDevIface *vu_iface,
501
+ Error **errp)
502
+{
503
+ QIONetListener *listener = qio_net_listener_new();
504
+ if (qio_net_listener_open_sync(listener, socket_addr, 1,
505
+ errp) < 0) {
506
+ object_unref(OBJECT(listener));
507
+ return false;
508
+ }
509
+
510
+ /* zero out unspecified fileds */
511
+ *server = (VuServer) {
512
+ .listener = listener,
513
+ .vu_iface = vu_iface,
514
+ .max_queues = max_queues,
515
+ .ctx = ctx,
516
+ .device_panic_notifier = device_panic_notifier,
517
+ };
518
+
519
+ qio_net_listener_set_name(server->listener, "vhost-user-backend-listener");
520
+
521
+ qio_net_listener_set_client_func(server->listener,
522
+ vu_accept,
523
+ server,
524
+ NULL);
525
+
526
+ QTAILQ_INIT(&server->vu_fd_watches);
527
+ return true;
528
+}
529
diff --git a/util/meson.build b/util/meson.build
530
index XXXXXXX..XXXXXXX 100644
531
--- a/util/meson.build
532
+++ b/util/meson.build
533
@@ -XXX,XX +XXX,XX @@ if have_block
534
util_ss.add(files('main-loop.c'))
535
util_ss.add(files('nvdimm-utils.c'))
536
util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
537
+ util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
538
util_ss.add(files('qemu-coroutine-sleep.c'))
539
util_ss.add(files('qemu-co-shared-resource.c'))
540
util_ss.add(files('thread-pool.c', 'qemu-timer.c'))
541
--
542
2.26.2
26
543
27
diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
28
index XXXXXXX..XXXXXXX 100644
29
--- a/docs/tools/qemu-img.rst
30
+++ b/docs/tools/qemu-img.rst
31
@@ -XXX,XX +XXX,XX @@ Command description:
32
4
33
Error on reading data
34
35
-.. option:: convert [--object OBJECTDEF] [--image-opts] [--target-image-opts] [--target-is-zero] [--bitmaps [--skip-broken-bitmaps]] [-U] [-C] [-c] [-p] [-q] [-n] [-f FMT] [-t CACHE] [-T SRC_CACHE] [-O OUTPUT_FMT] [-B BACKING_FILE] [-o OPTIONS] [-l SNAPSHOT_PARAM] [-S SPARSE_SIZE] [-r RATE_LIMIT] [-m NUM_COROUTINES] [-W] FILENAME [FILENAME2 [...]] OUTPUT_FILENAME
36
+.. option:: convert [--object OBJECTDEF] [--image-opts] [--target-image-opts] [--target-is-zero] [--bitmaps [--skip-broken-bitmaps]] [-U] [-C] [-c] [-p] [-q] [-n] [-f FMT] [-t CACHE] [-T SRC_CACHE] [-O OUTPUT_FMT] [-B BACKING_FILE [-F backing_fmt]] [-o OPTIONS] [-l SNAPSHOT_PARAM] [-S SPARSE_SIZE] [-r RATE_LIMIT] [-m NUM_COROUTINES] [-W] FILENAME [FILENAME2 [...]] OUTPUT_FILENAME
37
38
Convert the disk image *FILENAME* or a snapshot *SNAPSHOT_PARAM*
39
to disk image *OUTPUT_FILENAME* using format *OUTPUT_FMT*. It can
40
@@ -XXX,XX +XXX,XX @@ Command description:
41
You can use the *BACKING_FILE* option to force the output image to be
42
created as a copy on write image of the specified base image; the
43
*BACKING_FILE* should have the same content as the input's base image,
44
- however the path, image format, etc may differ.
45
+ however the path, image format (as given by *BACKING_FMT*), etc may differ.
46
47
If a relative path name is given, the backing file is looked up relative to
48
the directory containing *OUTPUT_FILENAME*.
49
diff --git a/qemu-img.c b/qemu-img.c
50
index XXXXXXX..XXXXXXX 100644
51
--- a/qemu-img.c
52
+++ b/qemu-img.c
53
@@ -XXX,XX +XXX,XX @@ static int img_convert(int argc, char **argv)
54
int c, bs_i, flags, src_flags = BDRV_O_NO_SHARE;
55
const char *fmt = NULL, *out_fmt = NULL, *cache = "unsafe",
56
*src_cache = BDRV_DEFAULT_CACHE, *out_baseimg = NULL,
57
- *out_filename, *out_baseimg_param, *snapshot_name = NULL;
58
+ *out_filename, *out_baseimg_param, *snapshot_name = NULL,
59
+ *backing_fmt = NULL;
60
BlockDriver *drv = NULL, *proto_drv = NULL;
61
BlockDriverInfo bdi;
62
BlockDriverState *out_bs;
63
@@ -XXX,XX +XXX,XX @@ static int img_convert(int argc, char **argv)
64
{"skip-broken-bitmaps", no_argument, 0, OPTION_SKIP_BROKEN},
65
{0, 0, 0, 0}
66
};
67
- c = getopt_long(argc, argv, ":hf:O:B:Cco:l:S:pt:T:qnm:WUr:",
68
+ c = getopt_long(argc, argv, ":hf:O:B:CcF:o:l:S:pt:T:qnm:WUr:",
69
long_options, NULL);
70
if (c == -1) {
71
break;
72
@@ -XXX,XX +XXX,XX @@ static int img_convert(int argc, char **argv)
73
case 'c':
74
s.compressed = true;
75
break;
76
+ case 'F':
77
+ backing_fmt = optarg;
78
+ break;
79
case 'o':
80
if (accumulate_options(&options, optarg) < 0) {
81
goto fail_getopt;
82
@@ -XXX,XX +XXX,XX @@ static int img_convert(int argc, char **argv)
83
84
qemu_opt_set_number(opts, BLOCK_OPT_SIZE,
85
s.total_sectors * BDRV_SECTOR_SIZE, &error_abort);
86
- ret = add_old_style_options(out_fmt, opts, out_baseimg, NULL);
87
+ ret = add_old_style_options(out_fmt, opts, out_baseimg, backing_fmt);
88
if (ret < 0) {
89
goto out;
90
}
91
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
92
index XXXXXXX..XXXXXXX 100644
93
--- a/qemu-img-cmds.hx
94
+++ b/qemu-img-cmds.hx
95
@@ -XXX,XX +XXX,XX @@ SRST
96
ERST
97
98
DEF("convert", img_convert,
99
- "convert [--object objectdef] [--image-opts] [--target-image-opts] [--target-is-zero] [--bitmaps] [-U] [-C] [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-T src_cache] [-O output_fmt] [-B backing_file] [-o options] [-l snapshot_param] [-S sparse_size] [-r rate_limit] [-m num_coroutines] [-W] [--salvage] filename [filename2 [...]] output_filename")
100
+ "convert [--object objectdef] [--image-opts] [--target-image-opts] [--target-is-zero] [--bitmaps] [-U] [-C] [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-T src_cache] [-O output_fmt] [-B backing_file [-F backing_fmt]] [-o options] [-l snapshot_param] [-S sparse_size] [-r rate_limit] [-m num_coroutines] [-W] [--salvage] filename [filename2 [...]] output_filename")
101
SRST
102
.. option:: convert [--object OBJECTDEF] [--image-opts] [--target-image-opts] [--target-is-zero] [--bitmaps] [-U] [-C] [-c] [-p] [-q] [-n] [-f FMT] [-t CACHE] [-T SRC_CACHE] [-O OUTPUT_FMT] [-B BACKING_FILE] [-o OPTIONS] [-l SNAPSHOT_PARAM] [-S SPARSE_SIZE] [-r RATE_LIMIT] [-m NUM_COROUTINES] [-W] [--salvage] FILENAME [FILENAME2 [...]] OUTPUT_FILENAME
103
ERST
104
diff --git a/tests/qemu-iotests/122 b/tests/qemu-iotests/122
105
index XXXXXXX..XXXXXXX 100755
106
--- a/tests/qemu-iotests/122
107
+++ b/tests/qemu-iotests/122
108
@@ -XXX,XX +XXX,XX @@ echo
109
_make_test_img -b "$TEST_IMG".base -F $IMGFMT
110
111
$QEMU_IO -c "write -P 0 0 3M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
112
-$QEMU_IMG convert -O $IMGFMT -B "$TEST_IMG".base -o backing_fmt=$IMGFMT \
113
+$QEMU_IMG convert -O $IMGFMT -B "$TEST_IMG".base -F $IMGFMT \
114
"$TEST_IMG" "$TEST_IMG".orig
115
$QEMU_IO -c "read -P 0 0 3M" "$TEST_IMG".orig 2>&1 | _filter_qemu_io | _filter_testdir
116
$QEMU_IMG convert -O $IMGFMT -c -B "$TEST_IMG".base -o backing_fmt=$IMGFMT \
117
--
118
2.31.1
119
120
diff view generated by jsdifflib
1
As we have attempted before
1
From: Coiby Xu <coiby.xu@gmail.com>
2
(https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06451.html,
3
"file-posix: Cache lseek result for data regions";
4
https://lists.nongnu.org/archive/html/qemu-block/2021-02/msg00934.html,
5
"file-posix: Cache next hole"), this patch seeks to reduce the number of
6
SEEK_DATA/HOLE operations the file-posix driver has to perform. The
7
main difference is that this time it is implemented as part of the
8
general block layer code.
9
2
10
The problem we face is that on some filesystems or in some
3
Move the constants from hw/core/qdev-properties.c to
11
circumstances, SEEK_DATA/HOLE is unreasonably slow. Given the
4
util/block-helpers.h so that knowledge of the min/max values is
12
implementation is outside of qemu, there is little we can do about its
13
performance.
14
5
15
We have already introduced the want_zero parameter to
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
16
bdrv_co_block_status() to reduce the number of SEEK_DATA/HOLE calls
7
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
17
unless we really want zero information; but sometimes we do want that
8
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
18
information, because for files that consist largely of zero areas,
9
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
19
special-casing those areas can give large performance boosts. So the
10
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
20
real problem is with files that consist largely of data, so that
11
Message-id: 20200918080912.321299-5-coiby.xu@gmail.com
21
inquiring the block status does not gain us much performance, but where
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
22
such an inquiry itself takes a lot of time.
13
---
14
util/block-helpers.h | 19 +++++++++++++
15
hw/core/qdev-properties-system.c | 31 ++++-----------------
16
util/block-helpers.c | 46 ++++++++++++++++++++++++++++++++
17
util/meson.build | 1 +
18
4 files changed, 71 insertions(+), 26 deletions(-)
19
create mode 100644 util/block-helpers.h
20
create mode 100644 util/block-helpers.c
23
21
24
To address this, we want to cache data regions. Most of the time, when
22
diff --git a/util/block-helpers.h b/util/block-helpers.h
25
bad performance is reported, it is in places where the image is iterated
23
new file mode 100644
26
over from start to end (qemu-img convert or the mirror job), so a simple
24
index XXXXXXX..XXXXXXX
27
yet effective solution is to cache only the current data region.
25
--- /dev/null
28
26
+++ b/util/block-helpers.h
29
(Note that only caching data regions but not zero regions means that
27
@@ -XXX,XX +XXX,XX @@
30
returning false information from the cache is not catastrophic: Treating
28
+#ifndef BLOCK_HELPERS_H
31
zeroes as data is fine. While we try to invalidate the cache on zero
29
+#define BLOCK_HELPERS_H
32
writes and discards, such incongruences may still occur when there are
30
+
33
other processes writing to the image.)
31
+#include "qemu/units.h"
34
32
+
35
We only use the cache for nodes without children (i.e. protocol nodes),
33
+/* lower limit is sector size */
36
because that is where the problem is: Drivers that rely on block-status
34
+#define MIN_BLOCK_SIZE INT64_C(512)
37
implementations outside of qemu (e.g. SEEK_DATA/HOLE).
35
+#define MIN_BLOCK_SIZE_STR "512 B"
38
36
+/*
39
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307
37
+ * upper limit is arbitrary, 2 MiB looks sufficient for all sensible uses, and
40
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
38
+ * matches qcow2 cluster size limit
41
Message-Id: <20210812084148.14458-3-hreitz@redhat.com>
39
+ */
42
Reviewed-by: Eric Blake <eblake@redhat.com>
40
+#define MAX_BLOCK_SIZE (2 * MiB)
43
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
41
+#define MAX_BLOCK_SIZE_STR "2 MiB"
44
[hreitz: Added `local_file == bs` assertion, as suggested by Vladimir]
42
+
45
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
43
+void check_block_size(const char *id, const char *name, int64_t value,
46
---
44
+ Error **errp);
47
include/block/block_int.h | 50 ++++++++++++++++++++++++
45
+
48
block.c | 80 +++++++++++++++++++++++++++++++++++++++
46
+#endif /* BLOCK_HELPERS_H */
49
block/io.c | 68 +++++++++++++++++++++++++++++++--
47
diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
50
3 files changed, 195 insertions(+), 3 deletions(-)
51
52
diff --git a/include/block/block_int.h b/include/block/block_int.h
53
index XXXXXXX..XXXXXXX 100644
48
index XXXXXXX..XXXXXXX 100644
54
--- a/include/block/block_int.h
49
--- a/hw/core/qdev-properties-system.c
55
+++ b/include/block/block_int.h
50
+++ b/hw/core/qdev-properties-system.c
56
@@ -XXX,XX +XXX,XX @@
51
@@ -XXX,XX +XXX,XX @@
57
#include "qemu/hbitmap.h"
52
#include "sysemu/blockdev.h"
58
#include "block/snapshot.h"
53
#include "net/net.h"
59
#include "qemu/throttle.h"
54
#include "hw/pci/pci.h"
60
+#include "qemu/rcu.h"
55
+#include "util/block-helpers.h"
61
56
62
#define BLOCK_FLAG_LAZY_REFCOUNTS 8
57
static bool check_prop_still_unset(DeviceState *dev, const char *name,
63
58
const void *old_val, const char *new_val,
64
@@ -XXX,XX +XXX,XX @@ struct BdrvChild {
59
@@ -XXX,XX +XXX,XX @@ const PropertyInfo qdev_prop_losttickpolicy = {
65
QLIST_ENTRY(BdrvChild) next_parent;
60
66
};
61
/* --- blocksize --- */
67
62
63
-/* lower limit is sector size */
64
-#define MIN_BLOCK_SIZE 512
65
-#define MIN_BLOCK_SIZE_STR "512 B"
66
-/*
67
- * upper limit is arbitrary, 2 MiB looks sufficient for all sensible uses, and
68
- * matches qcow2 cluster size limit
69
- */
70
-#define MAX_BLOCK_SIZE (2 * MiB)
71
-#define MAX_BLOCK_SIZE_STR "2 MiB"
72
-
73
static void set_blocksize(Object *obj, Visitor *v, const char *name,
74
void *opaque, Error **errp)
75
{
76
@@ -XXX,XX +XXX,XX @@ static void set_blocksize(Object *obj, Visitor *v, const char *name,
77
Property *prop = opaque;
78
uint32_t *ptr = qdev_get_prop_ptr(dev, prop);
79
uint64_t value;
80
+ Error *local_err = NULL;
81
82
if (dev->realized) {
83
qdev_prop_set_after_realize(dev, name, errp);
84
@@ -XXX,XX +XXX,XX @@ static void set_blocksize(Object *obj, Visitor *v, const char *name,
85
if (!visit_type_size(v, name, &value, errp)) {
86
return;
87
}
88
- /* value of 0 means "unset" */
89
- if (value && (value < MIN_BLOCK_SIZE || value > MAX_BLOCK_SIZE)) {
90
- error_setg(errp,
91
- "Property %s.%s doesn't take value %" PRIu64
92
- " (minimum: " MIN_BLOCK_SIZE_STR
93
- ", maximum: " MAX_BLOCK_SIZE_STR ")",
94
- dev->id ? : "", name, value);
95
+ check_block_size(dev->id ? : "", name, value, &local_err);
96
+ if (local_err) {
97
+ error_propagate(errp, local_err);
98
return;
99
}
100
-
101
- /* We rely on power-of-2 blocksizes for bitmasks */
102
- if ((value & (value - 1)) != 0) {
103
- error_setg(errp,
104
- "Property %s.%s doesn't take value '%" PRId64 "', "
105
- "it's not a power of 2", dev->id ?: "", name, (int64_t)value);
106
- return;
107
- }
108
-
109
*ptr = value;
110
}
111
112
diff --git a/util/block-helpers.c b/util/block-helpers.c
113
new file mode 100644
114
index XXXXXXX..XXXXXXX
115
--- /dev/null
116
+++ b/util/block-helpers.c
117
@@ -XXX,XX +XXX,XX @@
68
+/*
118
+/*
69
+ * Allows bdrv_co_block_status() to cache one data region for a
119
+ * Block utility functions
70
+ * protocol node.
71
+ *
120
+ *
72
+ * @valid: Whether the cache is valid (should be accessed with atomic
121
+ * Copyright IBM, Corp. 2011
73
+ * functions so this can be reset by RCU readers)
122
+ * Copyright (c) 2020 Coiby Xu <coiby.xu@gmail.com>
74
+ * @data_start: Offset where we know (or strongly assume) is data
123
+ *
75
+ * @data_end: Offset where the data region ends (which is not necessarily
124
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
76
+ * the start of a zeroed region)
125
+ * See the COPYING file in the top-level directory.
77
+ */
126
+ */
78
+typedef struct BdrvBlockStatusCache {
79
+ struct rcu_head rcu;
80
+
127
+
81
+ bool valid;
128
+#include "qemu/osdep.h"
82
+ int64_t data_start;
129
+#include "qapi/error.h"
83
+ int64_t data_end;
130
+#include "qapi/qmp/qerror.h"
84
+} BdrvBlockStatusCache;
131
+#include "block-helpers.h"
85
+
86
struct BlockDriverState {
87
/* Protected by big QEMU lock or read-only after opening. No special
88
* locking needed during I/O...
89
@@ -XXX,XX +XXX,XX @@ struct BlockDriverState {
90
91
/* BdrvChild links to this node may never be frozen */
92
bool never_freeze;
93
+
94
+ /* Lock for block-status cache RCU writers */
95
+ CoMutex bsc_modify_lock;
96
+ /* Always non-NULL, but must only be dereferenced under an RCU read guard */
97
+ BdrvBlockStatusCache *block_status_cache;
98
};
99
100
struct BlockBackendRootState {
101
@@ -XXX,XX +XXX,XX @@ static inline BlockDriverState *bdrv_primary_bs(BlockDriverState *bs)
102
*/
103
void bdrv_drain_all_end_quiesce(BlockDriverState *bs);
104
105
+/**
106
+ * Check whether the given offset is in the cached block-status data
107
+ * region.
108
+ *
109
+ * If it is, and @pnum is not NULL, *pnum is set to
110
+ * `bsc.data_end - offset`, i.e. how many bytes, starting from
111
+ * @offset, are data (according to the cache).
112
+ * Otherwise, *pnum is not touched.
113
+ */
114
+bool bdrv_bsc_is_data(BlockDriverState *bs, int64_t offset, int64_t *pnum);
115
+
132
+
116
+/**
133
+/**
117
+ * If [offset, offset + bytes) overlaps with the currently cached
134
+ * check_block_size:
118
+ * block-status region, invalidate the cache.
135
+ * @id: The unique ID of the object
136
+ * @name: The name of the property being validated
137
+ * @value: The block size in bytes
138
+ * @errp: A pointer to an area to store an error
119
+ *
139
+ *
120
+ * (To be used by I/O paths that cause data regions to be zero or
140
+ * This function checks that the block size meets the following conditions:
121
+ * holes.)
141
+ * 1. At least MIN_BLOCK_SIZE
142
+ * 2. No larger than MAX_BLOCK_SIZE
143
+ * 3. A power of 2
122
+ */
144
+ */
123
+void bdrv_bsc_invalidate_range(BlockDriverState *bs,
145
+void check_block_size(const char *id, const char *name, int64_t value,
124
+ int64_t offset, int64_t bytes);
146
+ Error **errp)
125
+
126
+/**
127
+ * Mark the range [offset, offset + bytes) as a data region.
128
+ */
129
+void bdrv_bsc_fill(BlockDriverState *bs, int64_t offset, int64_t bytes);
130
+
131
#endif /* BLOCK_INT_H */
132
diff --git a/block.c b/block.c
133
index XXXXXXX..XXXXXXX 100644
134
--- a/block.c
135
+++ b/block.c
136
@@ -XXX,XX +XXX,XX @@
137
#include "qemu/timer.h"
138
#include "qemu/cutils.h"
139
#include "qemu/id.h"
140
+#include "qemu/range.h"
141
+#include "qemu/rcu.h"
142
#include "block/coroutines.h"
143
144
#ifdef CONFIG_BSD
145
@@ -XXX,XX +XXX,XX @@ BlockDriverState *bdrv_new(void)
146
147
qemu_co_queue_init(&bs->flush_queue);
148
149
+ qemu_co_mutex_init(&bs->bsc_modify_lock);
150
+ bs->block_status_cache = g_new0(BdrvBlockStatusCache, 1);
151
+
152
for (i = 0; i < bdrv_drain_all_count; i++) {
153
bdrv_drained_begin(bs);
154
}
155
@@ -XXX,XX +XXX,XX @@ static void bdrv_close(BlockDriverState *bs)
156
bs->explicit_options = NULL;
157
qobject_unref(bs->full_open_options);
158
bs->full_open_options = NULL;
159
+ g_free(bs->block_status_cache);
160
+ bs->block_status_cache = NULL;
161
162
bdrv_release_named_dirty_bitmaps(bs);
163
assert(QLIST_EMPTY(&bs->dirty_bitmaps));
164
@@ -XXX,XX +XXX,XX @@ BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs)
165
{
166
return bdrv_skip_filters(bdrv_cow_bs(bdrv_skip_filters(bs)));
167
}
168
+
169
+/**
170
+ * Check whether [offset, offset + bytes) overlaps with the cached
171
+ * block-status data region.
172
+ *
173
+ * If so, and @pnum is not NULL, set *pnum to `bsc.data_end - offset`,
174
+ * which is what bdrv_bsc_is_data()'s interface needs.
175
+ * Otherwise, *pnum is not touched.
176
+ */
177
+static bool bdrv_bsc_range_overlaps_locked(BlockDriverState *bs,
178
+ int64_t offset, int64_t bytes,
179
+ int64_t *pnum)
180
+{
147
+{
181
+ BdrvBlockStatusCache *bsc = qatomic_rcu_read(&bs->block_status_cache);
148
+ /* value of 0 means "unset" */
182
+ bool overlaps;
149
+ if (value && (value < MIN_BLOCK_SIZE || value > MAX_BLOCK_SIZE)) {
183
+
150
+ error_setg(errp, QERR_PROPERTY_VALUE_OUT_OF_RANGE,
184
+ overlaps =
151
+ id, name, value, MIN_BLOCK_SIZE, MAX_BLOCK_SIZE);
185
+ qatomic_read(&bsc->valid) &&
152
+ return;
186
+ ranges_overlap(offset, bytes, bsc->data_start,
187
+ bsc->data_end - bsc->data_start);
188
+
189
+ if (overlaps && pnum) {
190
+ *pnum = bsc->data_end - offset;
191
+ }
153
+ }
192
+
154
+
193
+ return overlaps;
155
+ /* We rely on power-of-2 blocksizes for bitmasks */
194
+}
156
+ if ((value & (value - 1)) != 0) {
195
+
157
+ error_setg(errp,
196
+/**
158
+ "Property %s.%s doesn't take value '%" PRId64
197
+ * See block_int.h for this function's documentation.
159
+ "', it's not a power of 2",
198
+ */
160
+ id, name, value);
199
+bool bdrv_bsc_is_data(BlockDriverState *bs, int64_t offset, int64_t *pnum)
161
+ return;
200
+{
201
+ RCU_READ_LOCK_GUARD();
202
+
203
+ return bdrv_bsc_range_overlaps_locked(bs, offset, 1, pnum);
204
+}
205
+
206
+/**
207
+ * See block_int.h for this function's documentation.
208
+ */
209
+void bdrv_bsc_invalidate_range(BlockDriverState *bs,
210
+ int64_t offset, int64_t bytes)
211
+{
212
+ RCU_READ_LOCK_GUARD();
213
+
214
+ if (bdrv_bsc_range_overlaps_locked(bs, offset, bytes, NULL)) {
215
+ qatomic_set(&bs->block_status_cache->valid, false);
216
+ }
162
+ }
217
+}
163
+}
218
+
164
diff --git a/util/meson.build b/util/meson.build
219
+/**
220
+ * See block_int.h for this function's documentation.
221
+ */
222
+void bdrv_bsc_fill(BlockDriverState *bs, int64_t offset, int64_t bytes)
223
+{
224
+ BdrvBlockStatusCache *new_bsc = g_new(BdrvBlockStatusCache, 1);
225
+ BdrvBlockStatusCache *old_bsc;
226
+
227
+ *new_bsc = (BdrvBlockStatusCache) {
228
+ .valid = true,
229
+ .data_start = offset,
230
+ .data_end = offset + bytes,
231
+ };
232
+
233
+ QEMU_LOCK_GUARD(&bs->bsc_modify_lock);
234
+
235
+ old_bsc = qatomic_rcu_read(&bs->block_status_cache);
236
+ qatomic_rcu_set(&bs->block_status_cache, new_bsc);
237
+ if (old_bsc) {
238
+ g_free_rcu(old_bsc, rcu);
239
+ }
240
+}
241
diff --git a/block/io.c b/block/io.c
242
index XXXXXXX..XXXXXXX 100644
165
index XXXXXXX..XXXXXXX 100644
243
--- a/block/io.c
166
--- a/util/meson.build
244
+++ b/block/io.c
167
+++ b/util/meson.build
245
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
168
@@ -XXX,XX +XXX,XX @@ if have_block
246
return -ENOTSUP;
169
util_ss.add(files('nvdimm-utils.c'))
247
}
170
util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
248
171
util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
249
+ /* Invalidate the cached block-status data range if this write overlaps */
172
+ util_ss.add(files('block-helpers.c'))
250
+ bdrv_bsc_invalidate_range(bs, offset, bytes);
173
util_ss.add(files('qemu-coroutine-sleep.c'))
251
+
174
util_ss.add(files('qemu-co-shared-resource.c'))
252
assert(alignment % bs->bl.request_alignment == 0);
175
util_ss.add(files('thread-pool.c', 'qemu-timer.c'))
253
head = offset % alignment;
254
tail = (offset + bytes) % alignment;
255
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
256
aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset;
257
258
if (bs->drv->bdrv_co_block_status) {
259
- ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
260
- aligned_bytes, pnum, &local_map,
261
- &local_file);
262
+ /*
263
+ * Use the block-status cache only for protocol nodes: Format
264
+ * drivers are generally quick to inquire the status, but protocol
265
+ * drivers often need to get information from outside of qemu, so
266
+ * we do not have control over the actual implementation. There
267
+ * have been cases where inquiring the status took an unreasonably
268
+ * long time, and we can do nothing in qemu to fix it.
269
+ * This is especially problematic for images with large data areas,
270
+ * because finding the few holes in them and giving them special
271
+ * treatment does not gain much performance. Therefore, we try to
272
+ * cache the last-identified data region.
273
+ *
274
+ * Second, limiting ourselves to protocol nodes allows us to assume
275
+ * the block status for data regions to be DATA | OFFSET_VALID, and
276
+ * that the host offset is the same as the guest offset.
277
+ *
278
+ * Note that it is possible that external writers zero parts of
279
+ * the cached regions without the cache being invalidated, and so
280
+ * we may report zeroes as data. This is not catastrophic,
281
+ * however, because reporting zeroes as data is fine.
282
+ */
283
+ if (QLIST_EMPTY(&bs->children) &&
284
+ bdrv_bsc_is_data(bs, aligned_offset, pnum))
285
+ {
286
+ ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
287
+ local_file = bs;
288
+ local_map = aligned_offset;
289
+ } else {
290
+ ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
291
+ aligned_bytes, pnum, &local_map,
292
+ &local_file);
293
+
294
+ /*
295
+ * Note that checking QLIST_EMPTY(&bs->children) is also done when
296
+ * the cache is queried above. Technically, we do not need to check
297
+ * it here; the worst that can happen is that we fill the cache for
298
+ * non-protocol nodes, and then it is never used. However, filling
299
+ * the cache requires an RCU update, so double check here to avoid
300
+ * such an update if possible.
301
+ */
302
+ if (ret == (BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID) &&
303
+ QLIST_EMPTY(&bs->children))
304
+ {
305
+ /*
306
+ * When a protocol driver reports BLOCK_OFFSET_VALID, the
307
+ * returned local_map value must be the same as the offset we
308
+ * have passed (aligned_offset), and local_bs must be the node
309
+ * itself.
310
+ * Assert this, because we follow this rule when reading from
311
+ * the cache (see the `local_file = bs` and
312
+ * `local_map = aligned_offset` assignments above), and the
313
+ * result the cache delivers must be the same as the driver
314
+ * would deliver.
315
+ */
316
+ assert(local_file == bs);
317
+ assert(local_map == aligned_offset);
318
+ bdrv_bsc_fill(bs, aligned_offset, *pnum);
319
+ }
320
+ }
321
} else {
322
/* Default code for filters */
323
324
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_pdiscard(BdrvChild *child, int64_t offset,
325
return 0;
326
}
327
328
+ /* Invalidate the cached block-status data range if this discard overlaps */
329
+ bdrv_bsc_invalidate_range(bs, offset, bytes);
330
+
331
/* Discard is advisory, but some devices track and coalesce
332
* unaligned requests, so we must pass everything down rather than
333
* round here. Still, most devices will just silently ignore
334
--
176
--
335
2.31.1
177
2.26.2
336
178
337
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Coiby Xu <coiby.xu@gmail.com>
2
2
3
Add simple grammar-parsing template benchmark. New tool consume test
3
By making use of libvhost-user, block device drive can be shared to
4
template written in bash with some special grammar injections and
4
the connected vhost-user client. Only one client can connect to the
5
produces multiple tests, run them and finally print a performance
5
server one time.
6
comparison table of different tests produced from one template.
7
6
8
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
7
Since vhost-user-server needs a block drive to be created first, delay
9
Message-Id: <20210824101517.59802-2-vsementsov@virtuozzo.com>
8
the creation of this object.
10
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
9
11
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
10
Suggested-by: Kevin Wolf <kwolf@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
13
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
14
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
15
Message-id: 20200918080912.321299-6-coiby.xu@gmail.com
16
[Shorten "vhost_user_blk_server" string to "vhost_user_blk" to avoid the
17
following compiler warning:
18
../block/export/vhost-user-blk-server.c:178:50: error: ‘%s’ directive output truncated writing 21 bytes into a region of size 20 [-Werror=format-truncation=]
19
and fix "Invalid size %ld ..." ssize_t format string arguments for
20
32-bit hosts.
21
--Stefan]
22
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
23
---
13
scripts/simplebench/img_bench_templater.py | 95 ++++++++++++++++++++++
24
block/export/vhost-user-blk-server.h | 36 ++
14
scripts/simplebench/table_templater.py | 62 ++++++++++++++
25
block/export/vhost-user-blk-server.c | 661 +++++++++++++++++++++++++++
15
2 files changed, 157 insertions(+)
26
softmmu/vl.c | 4 +
16
create mode 100755 scripts/simplebench/img_bench_templater.py
27
block/meson.build | 1 +
17
create mode 100644 scripts/simplebench/table_templater.py
28
4 files changed, 702 insertions(+)
29
create mode 100644 block/export/vhost-user-blk-server.h
30
create mode 100644 block/export/vhost-user-blk-server.c
18
31
19
diff --git a/scripts/simplebench/img_bench_templater.py b/scripts/simplebench/img_bench_templater.py
32
diff --git a/block/export/vhost-user-blk-server.h b/block/export/vhost-user-blk-server.h
20
new file mode 100755
21
index XXXXXXX..XXXXXXX
22
--- /dev/null
23
+++ b/scripts/simplebench/img_bench_templater.py
24
@@ -XXX,XX +XXX,XX @@
25
+#!/usr/bin/env python3
26
+#
27
+# Process img-bench test templates
28
+#
29
+# Copyright (c) 2021 Virtuozzo International GmbH.
30
+#
31
+# This program is free software; you can redistribute it and/or modify
32
+# it under the terms of the GNU General Public License as published by
33
+# the Free Software Foundation; either version 2 of the License, or
34
+# (at your option) any later version.
35
+#
36
+# This program is distributed in the hope that it will be useful,
37
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
38
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
39
+# GNU General Public License for more details.
40
+#
41
+# You should have received a copy of the GNU General Public License
42
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
43
+#
44
+
45
+
46
+import sys
47
+import subprocess
48
+import re
49
+import json
50
+
51
+import simplebench
52
+from results_to_text import results_to_text
53
+from table_templater import Templater
54
+
55
+
56
+def bench_func(env, case):
57
+ test = templater.gen(env['data'], case['data'])
58
+
59
+ p = subprocess.run(test, shell=True, stdout=subprocess.PIPE,
60
+ stderr=subprocess.STDOUT, universal_newlines=True)
61
+
62
+ if p.returncode == 0:
63
+ try:
64
+ m = re.search(r'Run completed in (\d+.\d+) seconds.', p.stdout)
65
+ return {'seconds': float(m.group(1))}
66
+ except Exception:
67
+ return {'error': f'failed to parse qemu-img output: {p.stdout}'}
68
+ else:
69
+ return {'error': f'qemu-img failed: {p.returncode}: {p.stdout}'}
70
+
71
+
72
+if __name__ == '__main__':
73
+ if len(sys.argv) > 1:
74
+ print("""
75
+Usage: img_bench_templater.py < path/to/test-template.sh
76
+
77
+This script generates performance tests from a test template (example below),
78
+runs them, and displays the results in a table. The template is read from
79
+stdin. It must be written in bash and end with a `qemu-img bench` invocation
80
+(whose result is parsed to get the test instance’s result).
81
+
82
+Use the following syntax in the template to create the various different test
83
+instances:
84
+
85
+ column templating: {var1|var2|...} - test will use different values in
86
+ different columns. You may use several {} constructions in the test, in this
87
+ case product of all choice-sets will be used.
88
+
89
+ row templating: [var1|var2|...] - similar thing to define rows (test-cases)
90
+
91
+Test template example:
92
+
93
+Assume you want to compare two qemu-img binaries, called qemu-img-old and
94
+qemu-img-new in your build directory in two test-cases with 4K writes and 64K
95
+writes. The template may look like this:
96
+
97
+qemu_img=/path/to/qemu/build/qemu-img-{old|new}
98
+$qemu_img create -f qcow2 /ssd/x.qcow2 1G
99
+$qemu_img bench -c 100 -d 8 [-s 4K|-s 64K] -w -t none -n /ssd/x.qcow2
100
+
101
+When passing this to stdin of img_bench_templater.py, the resulting comparison
102
+table will contain two columns (for two binaries) and two rows (for two
103
+test-cases).
104
+
105
+In addition to displaying the results, script also stores results in JSON
106
+format into results.json file in current directory.
107
+""")
108
+ sys.exit()
109
+
110
+ templater = Templater(sys.stdin.read())
111
+
112
+ envs = [{'id': ' / '.join(x), 'data': x} for x in templater.columns]
113
+ cases = [{'id': ' / '.join(x), 'data': x} for x in templater.rows]
114
+
115
+ result = simplebench.bench(bench_func, envs, cases, count=5,
116
+ initial_run=False)
117
+ print(results_to_text(result))
118
+ with open('results.json', 'w') as f:
119
+ json.dump(result, f, indent=4)
120
diff --git a/scripts/simplebench/table_templater.py b/scripts/simplebench/table_templater.py
121
new file mode 100644
33
new file mode 100644
122
index XXXXXXX..XXXXXXX
34
index XXXXXXX..XXXXXXX
123
--- /dev/null
35
--- /dev/null
124
+++ b/scripts/simplebench/table_templater.py
36
+++ b/block/export/vhost-user-blk-server.h
125
@@ -XXX,XX +XXX,XX @@
37
@@ -XXX,XX +XXX,XX @@
126
+# Parser for test templates
38
+/*
127
+#
39
+ * Sharing QEMU block devices via vhost-user protocal
128
+# Copyright (c) 2021 Virtuozzo International GmbH.
40
+ *
129
+#
41
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
130
+# This program is free software; you can redistribute it and/or modify
42
+ * Copyright (c) 2020 Red Hat, Inc.
131
+# it under the terms of the GNU General Public License as published by
43
+ *
132
+# the Free Software Foundation; either version 2 of the License, or
44
+ * This work is licensed under the terms of the GNU GPL, version 2 or
133
+# (at your option) any later version.
45
+ * later. See the COPYING file in the top-level directory.
134
+#
46
+ */
135
+# This program is distributed in the hope that it will be useful,
47
+
136
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
48
+#ifndef VHOST_USER_BLK_SERVER_H
137
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
49
+#define VHOST_USER_BLK_SERVER_H
138
+# GNU General Public License for more details.
50
+#include "util/vhost-user-server.h"
139
+#
51
+
140
+# You should have received a copy of the GNU General Public License
52
+typedef struct VuBlockDev VuBlockDev;
141
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
53
+#define TYPE_VHOST_USER_BLK_SERVER "vhost-user-blk-server"
142
+#
54
+#define VHOST_USER_BLK_SERVER(obj) \
143
+
55
+ OBJECT_CHECK(VuBlockDev, obj, TYPE_VHOST_USER_BLK_SERVER)
144
+import itertools
56
+
145
+from lark import Lark
57
+/* vhost user block device */
146
+
58
+struct VuBlockDev {
147
+grammar = """
59
+ Object parent_obj;
148
+start: ( text | column_switch | row_switch )+
60
+ char *node_name;
149
+
61
+ SocketAddress *addr;
150
+column_switch: "{" text ["|" text]+ "}"
62
+ AioContext *ctx;
151
+row_switch: "[" text ["|" text]+ "]"
63
+ VuServer vu_server;
152
+text: /[^|{}\[\]]+/
64
+ bool running;
153
+"""
65
+ uint32_t blk_size;
154
+
66
+ BlockBackend *backend;
155
+parser = Lark(grammar)
67
+ QIOChannelSocket *sioc;
156
+
68
+ QTAILQ_ENTRY(VuBlockDev) next;
157
+class Templater:
69
+ struct virtio_blk_config blkcfg;
158
+ def __init__(self, template):
70
+ bool writable;
159
+ self.tree = parser.parse(template)
71
+};
160
+
72
+
161
+ c_switches = []
73
+#endif /* VHOST_USER_BLK_SERVER_H */
162
+ r_switches = []
74
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
163
+ for x in self.tree.children:
75
new file mode 100644
164
+ if x.data == 'column_switch':
76
index XXXXXXX..XXXXXXX
165
+ c_switches.append([el.children[0].value for el in x.children])
77
--- /dev/null
166
+ elif x.data == 'row_switch':
78
+++ b/block/export/vhost-user-blk-server.c
167
+ r_switches.append([el.children[0].value for el in x.children])
79
@@ -XXX,XX +XXX,XX @@
168
+
80
+/*
169
+ self.columns = list(itertools.product(*c_switches))
81
+ * Sharing QEMU block devices via vhost-user protocal
170
+ self.rows = list(itertools.product(*r_switches))
82
+ *
171
+
83
+ * Parts of the code based on nbd/server.c.
172
+ def gen(self, column, row):
84
+ *
173
+ i = 0
85
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
174
+ j = 0
86
+ * Copyright (c) 2020 Red Hat, Inc.
175
+ result = []
87
+ *
176
+
88
+ * This work is licensed under the terms of the GNU GPL, version 2 or
177
+ for x in self.tree.children:
89
+ * later. See the COPYING file in the top-level directory.
178
+ if x.data == 'text':
90
+ */
179
+ result.append(x.children[0].value)
91
+#include "qemu/osdep.h"
180
+ elif x.data == 'column_switch':
92
+#include "block/block.h"
181
+ result.append(column[i])
93
+#include "vhost-user-blk-server.h"
182
+ i += 1
94
+#include "qapi/error.h"
183
+ elif x.data == 'row_switch':
95
+#include "qom/object_interfaces.h"
184
+ result.append(row[j])
96
+#include "sysemu/block-backend.h"
185
+ j += 1
97
+#include "util/block-helpers.h"
186
+
98
+
187
+ return ''.join(result)
99
+enum {
100
+ VHOST_USER_BLK_MAX_QUEUES = 1,
101
+};
102
+struct virtio_blk_inhdr {
103
+ unsigned char status;
104
+};
105
+
106
+typedef struct VuBlockReq {
107
+ VuVirtqElement *elem;
108
+ int64_t sector_num;
109
+ size_t size;
110
+ struct virtio_blk_inhdr *in;
111
+ struct virtio_blk_outhdr out;
112
+ VuServer *server;
113
+ struct VuVirtq *vq;
114
+} VuBlockReq;
115
+
116
+static void vu_block_req_complete(VuBlockReq *req)
117
+{
118
+ VuDev *vu_dev = &req->server->vu_dev;
119
+
120
+ /* IO size with 1 extra status byte */
121
+ vu_queue_push(vu_dev, req->vq, req->elem, req->size + 1);
122
+ vu_queue_notify(vu_dev, req->vq);
123
+
124
+ if (req->elem) {
125
+ free(req->elem);
126
+ }
127
+
128
+ g_free(req);
129
+}
130
+
131
+static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
132
+{
133
+ return container_of(server, VuBlockDev, vu_server);
134
+}
135
+
136
+static int coroutine_fn
137
+vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
138
+ uint32_t iovcnt, uint32_t type)
139
+{
140
+ struct virtio_blk_discard_write_zeroes desc;
141
+ ssize_t size = iov_to_buf(iov, iovcnt, 0, &desc, sizeof(desc));
142
+ if (unlikely(size != sizeof(desc))) {
143
+ error_report("Invalid size %zd, expect %zu", size, sizeof(desc));
144
+ return -EINVAL;
145
+ }
146
+
147
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
148
+ uint64_t range[2] = { le64_to_cpu(desc.sector) << 9,
149
+ le32_to_cpu(desc.num_sectors) << 9 };
150
+ if (type == VIRTIO_BLK_T_DISCARD) {
151
+ if (blk_co_pdiscard(vdev_blk->backend, range[0], range[1]) == 0) {
152
+ return 0;
153
+ }
154
+ } else if (type == VIRTIO_BLK_T_WRITE_ZEROES) {
155
+ if (blk_co_pwrite_zeroes(vdev_blk->backend,
156
+ range[0], range[1], 0) == 0) {
157
+ return 0;
158
+ }
159
+ }
160
+
161
+ return -EINVAL;
162
+}
163
+
164
+static void coroutine_fn vu_block_flush(VuBlockReq *req)
165
+{
166
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
167
+ BlockBackend *backend = vdev_blk->backend;
168
+ blk_co_flush(backend);
169
+}
170
+
171
+struct req_data {
172
+ VuServer *server;
173
+ VuVirtq *vq;
174
+ VuVirtqElement *elem;
175
+};
176
+
177
+static void coroutine_fn vu_block_virtio_process_req(void *opaque)
178
+{
179
+ struct req_data *data = opaque;
180
+ VuServer *server = data->server;
181
+ VuVirtq *vq = data->vq;
182
+ VuVirtqElement *elem = data->elem;
183
+ uint32_t type;
184
+ VuBlockReq *req;
185
+
186
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
187
+ BlockBackend *backend = vdev_blk->backend;
188
+
189
+ struct iovec *in_iov = elem->in_sg;
190
+ struct iovec *out_iov = elem->out_sg;
191
+ unsigned in_num = elem->in_num;
192
+ unsigned out_num = elem->out_num;
193
+ /* refer to hw/block/virtio_blk.c */
194
+ if (elem->out_num < 1 || elem->in_num < 1) {
195
+ error_report("virtio-blk request missing headers");
196
+ free(elem);
197
+ return;
198
+ }
199
+
200
+ req = g_new0(VuBlockReq, 1);
201
+ req->server = server;
202
+ req->vq = vq;
203
+ req->elem = elem;
204
+
205
+ if (unlikely(iov_to_buf(out_iov, out_num, 0, &req->out,
206
+ sizeof(req->out)) != sizeof(req->out))) {
207
+ error_report("virtio-blk request outhdr too short");
208
+ goto err;
209
+ }
210
+
211
+ iov_discard_front(&out_iov, &out_num, sizeof(req->out));
212
+
213
+ if (in_iov[in_num - 1].iov_len < sizeof(struct virtio_blk_inhdr)) {
214
+ error_report("virtio-blk request inhdr too short");
215
+ goto err;
216
+ }
217
+
218
+ /* We always touch the last byte, so just see how big in_iov is. */
219
+ req->in = (void *)in_iov[in_num - 1].iov_base
220
+ + in_iov[in_num - 1].iov_len
221
+ - sizeof(struct virtio_blk_inhdr);
222
+ iov_discard_back(in_iov, &in_num, sizeof(struct virtio_blk_inhdr));
223
+
224
+ type = le32_to_cpu(req->out.type);
225
+ switch (type & ~VIRTIO_BLK_T_BARRIER) {
226
+ case VIRTIO_BLK_T_IN:
227
+ case VIRTIO_BLK_T_OUT: {
228
+ ssize_t ret = 0;
229
+ bool is_write = type & VIRTIO_BLK_T_OUT;
230
+ req->sector_num = le64_to_cpu(req->out.sector);
231
+
232
+ int64_t offset = req->sector_num * vdev_blk->blk_size;
233
+ QEMUIOVector qiov;
234
+ if (is_write) {
235
+ qemu_iovec_init_external(&qiov, out_iov, out_num);
236
+ ret = blk_co_pwritev(backend, offset, qiov.size,
237
+ &qiov, 0);
238
+ } else {
239
+ qemu_iovec_init_external(&qiov, in_iov, in_num);
240
+ ret = blk_co_preadv(backend, offset, qiov.size,
241
+ &qiov, 0);
242
+ }
243
+ if (ret >= 0) {
244
+ req->in->status = VIRTIO_BLK_S_OK;
245
+ } else {
246
+ req->in->status = VIRTIO_BLK_S_IOERR;
247
+ }
248
+ break;
249
+ }
250
+ case VIRTIO_BLK_T_FLUSH:
251
+ vu_block_flush(req);
252
+ req->in->status = VIRTIO_BLK_S_OK;
253
+ break;
254
+ case VIRTIO_BLK_T_GET_ID: {
255
+ size_t size = MIN(iov_size(&elem->in_sg[0], in_num),
256
+ VIRTIO_BLK_ID_BYTES);
257
+ snprintf(elem->in_sg[0].iov_base, size, "%s", "vhost_user_blk");
258
+ req->in->status = VIRTIO_BLK_S_OK;
259
+ req->size = elem->in_sg[0].iov_len;
260
+ break;
261
+ }
262
+ case VIRTIO_BLK_T_DISCARD:
263
+ case VIRTIO_BLK_T_WRITE_ZEROES: {
264
+ int rc;
265
+ rc = vu_block_discard_write_zeroes(req, &elem->out_sg[1],
266
+ out_num, type);
267
+ if (rc == 0) {
268
+ req->in->status = VIRTIO_BLK_S_OK;
269
+ } else {
270
+ req->in->status = VIRTIO_BLK_S_IOERR;
271
+ }
272
+ break;
273
+ }
274
+ default:
275
+ req->in->status = VIRTIO_BLK_S_UNSUPP;
276
+ break;
277
+ }
278
+
279
+ vu_block_req_complete(req);
280
+ return;
281
+
282
+err:
283
+ free(elem);
284
+ g_free(req);
285
+ return;
286
+}
287
+
288
+static void vu_block_process_vq(VuDev *vu_dev, int idx)
289
+{
290
+ VuServer *server;
291
+ VuVirtq *vq;
292
+ struct req_data *req_data;
293
+
294
+ server = container_of(vu_dev, VuServer, vu_dev);
295
+ assert(server);
296
+
297
+ vq = vu_get_queue(vu_dev, idx);
298
+ assert(vq);
299
+ VuVirtqElement *elem;
300
+ while (1) {
301
+ elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) +
302
+ sizeof(VuBlockReq));
303
+ if (elem) {
304
+ req_data = g_new0(struct req_data, 1);
305
+ req_data->server = server;
306
+ req_data->vq = vq;
307
+ req_data->elem = elem;
308
+ Coroutine *co = qemu_coroutine_create(vu_block_virtio_process_req,
309
+ req_data);
310
+ aio_co_enter(server->ioc->ctx, co);
311
+ } else {
312
+ break;
313
+ }
314
+ }
315
+}
316
+
317
+static void vu_block_queue_set_started(VuDev *vu_dev, int idx, bool started)
318
+{
319
+ VuVirtq *vq;
320
+
321
+ assert(vu_dev);
322
+
323
+ vq = vu_get_queue(vu_dev, idx);
324
+ vu_set_queue_handler(vu_dev, vq, started ? vu_block_process_vq : NULL);
325
+}
326
+
327
+static uint64_t vu_block_get_features(VuDev *dev)
328
+{
329
+ uint64_t features;
330
+ VuServer *server = container_of(dev, VuServer, vu_dev);
331
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
332
+ features = 1ull << VIRTIO_BLK_F_SIZE_MAX |
333
+ 1ull << VIRTIO_BLK_F_SEG_MAX |
334
+ 1ull << VIRTIO_BLK_F_TOPOLOGY |
335
+ 1ull << VIRTIO_BLK_F_BLK_SIZE |
336
+ 1ull << VIRTIO_BLK_F_FLUSH |
337
+ 1ull << VIRTIO_BLK_F_DISCARD |
338
+ 1ull << VIRTIO_BLK_F_WRITE_ZEROES |
339
+ 1ull << VIRTIO_BLK_F_CONFIG_WCE |
340
+ 1ull << VIRTIO_F_VERSION_1 |
341
+ 1ull << VIRTIO_RING_F_INDIRECT_DESC |
342
+ 1ull << VIRTIO_RING_F_EVENT_IDX |
343
+ 1ull << VHOST_USER_F_PROTOCOL_FEATURES;
344
+
345
+ if (!vdev_blk->writable) {
346
+ features |= 1ull << VIRTIO_BLK_F_RO;
347
+ }
348
+
349
+ return features;
350
+}
351
+
352
+static uint64_t vu_block_get_protocol_features(VuDev *dev)
353
+{
354
+ return 1ull << VHOST_USER_PROTOCOL_F_CONFIG |
355
+ 1ull << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD;
356
+}
357
+
358
+static int
359
+vu_block_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
360
+{
361
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
362
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
363
+ memcpy(config, &vdev_blk->blkcfg, len);
364
+
365
+ return 0;
366
+}
367
+
368
+static int
369
+vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
370
+ uint32_t offset, uint32_t size, uint32_t flags)
371
+{
372
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
373
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
374
+ uint8_t wce;
375
+
376
+ /* don't support live migration */
377
+ if (flags != VHOST_SET_CONFIG_TYPE_MASTER) {
378
+ return -EINVAL;
379
+ }
380
+
381
+ if (offset != offsetof(struct virtio_blk_config, wce) ||
382
+ size != 1) {
383
+ return -EINVAL;
384
+ }
385
+
386
+ wce = *data;
387
+ vdev_blk->blkcfg.wce = wce;
388
+ blk_set_enable_write_cache(vdev_blk->backend, wce);
389
+ return 0;
390
+}
391
+
392
+/*
393
+ * When the client disconnects, it sends a VHOST_USER_NONE request
394
+ * and vu_process_message will simple call exit which cause the VM
395
+ * to exit abruptly.
396
+ * To avoid this issue, process VHOST_USER_NONE request ahead
397
+ * of vu_process_message.
398
+ *
399
+ */
400
+static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
401
+{
402
+ if (vmsg->request == VHOST_USER_NONE) {
403
+ dev->panic(dev, "disconnect");
404
+ return true;
405
+ }
406
+ return false;
407
+}
408
+
409
+static const VuDevIface vu_block_iface = {
410
+ .get_features = vu_block_get_features,
411
+ .queue_set_started = vu_block_queue_set_started,
412
+ .get_protocol_features = vu_block_get_protocol_features,
413
+ .get_config = vu_block_get_config,
414
+ .set_config = vu_block_set_config,
415
+ .process_msg = vu_block_process_msg,
416
+};
417
+
418
+static void blk_aio_attached(AioContext *ctx, void *opaque)
419
+{
420
+ VuBlockDev *vub_dev = opaque;
421
+ aio_context_acquire(ctx);
422
+ vhost_user_server_set_aio_context(&vub_dev->vu_server, ctx);
423
+ aio_context_release(ctx);
424
+}
425
+
426
+static void blk_aio_detach(void *opaque)
427
+{
428
+ VuBlockDev *vub_dev = opaque;
429
+ AioContext *ctx = vub_dev->vu_server.ctx;
430
+ aio_context_acquire(ctx);
431
+ vhost_user_server_set_aio_context(&vub_dev->vu_server, NULL);
432
+ aio_context_release(ctx);
433
+}
434
+
435
+static void
436
+vu_block_initialize_config(BlockDriverState *bs,
437
+ struct virtio_blk_config *config, uint32_t blk_size)
438
+{
439
+ config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
440
+ config->blk_size = blk_size;
441
+ config->size_max = 0;
442
+ config->seg_max = 128 - 2;
443
+ config->min_io_size = 1;
444
+ config->opt_io_size = 1;
445
+ config->num_queues = VHOST_USER_BLK_MAX_QUEUES;
446
+ config->max_discard_sectors = 32768;
447
+ config->max_discard_seg = 1;
448
+ config->discard_sector_alignment = config->blk_size >> 9;
449
+ config->max_write_zeroes_sectors = 32768;
450
+ config->max_write_zeroes_seg = 1;
451
+}
452
+
453
+static VuBlockDev *vu_block_init(VuBlockDev *vu_block_device, Error **errp)
454
+{
455
+
456
+ BlockBackend *blk;
457
+ Error *local_error = NULL;
458
+ const char *node_name = vu_block_device->node_name;
459
+ bool writable = vu_block_device->writable;
460
+ uint64_t perm = BLK_PERM_CONSISTENT_READ;
461
+ int ret;
462
+
463
+ AioContext *ctx;
464
+
465
+ BlockDriverState *bs = bdrv_lookup_bs(node_name, node_name, &local_error);
466
+
467
+ if (!bs) {
468
+ error_propagate(errp, local_error);
469
+ return NULL;
470
+ }
471
+
472
+ if (bdrv_is_read_only(bs)) {
473
+ writable = false;
474
+ }
475
+
476
+ if (writable) {
477
+ perm |= BLK_PERM_WRITE;
478
+ }
479
+
480
+ ctx = bdrv_get_aio_context(bs);
481
+ aio_context_acquire(ctx);
482
+ bdrv_invalidate_cache(bs, NULL);
483
+ aio_context_release(ctx);
484
+
485
+ /*
486
+ * Don't allow resize while the vhost user server is running,
487
+ * otherwise we don't care what happens with the node.
488
+ */
489
+ blk = blk_new(bdrv_get_aio_context(bs), perm,
490
+ BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
491
+ BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD);
492
+ ret = blk_insert_bs(blk, bs, errp);
493
+
494
+ if (ret < 0) {
495
+ goto fail;
496
+ }
497
+
498
+ blk_set_enable_write_cache(blk, false);
499
+
500
+ blk_set_allow_aio_context_change(blk, true);
501
+
502
+ vu_block_device->blkcfg.wce = 0;
503
+ vu_block_device->backend = blk;
504
+ if (!vu_block_device->blk_size) {
505
+ vu_block_device->blk_size = BDRV_SECTOR_SIZE;
506
+ }
507
+ vu_block_device->blkcfg.blk_size = vu_block_device->blk_size;
508
+ blk_set_guest_block_size(blk, vu_block_device->blk_size);
509
+ vu_block_initialize_config(bs, &vu_block_device->blkcfg,
510
+ vu_block_device->blk_size);
511
+ return vu_block_device;
512
+
513
+fail:
514
+ blk_unref(blk);
515
+ return NULL;
516
+}
517
+
518
+static void vu_block_deinit(VuBlockDev *vu_block_device)
519
+{
520
+ if (vu_block_device->backend) {
521
+ blk_remove_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
522
+ blk_aio_detach, vu_block_device);
523
+ }
524
+
525
+ blk_unref(vu_block_device->backend);
526
+}
527
+
528
+static void vhost_user_blk_server_stop(VuBlockDev *vu_block_device)
529
+{
530
+ vhost_user_server_stop(&vu_block_device->vu_server);
531
+ vu_block_deinit(vu_block_device);
532
+}
533
+
534
+static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
535
+ Error **errp)
536
+{
537
+ AioContext *ctx;
538
+ SocketAddress *addr = vu_block_device->addr;
539
+
540
+ if (!vu_block_init(vu_block_device, errp)) {
541
+ return;
542
+ }
543
+
544
+ ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
545
+
546
+ if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
547
+ VHOST_USER_BLK_MAX_QUEUES,
548
+ NULL, &vu_block_iface,
549
+ errp)) {
550
+ goto error;
551
+ }
552
+
553
+ blk_add_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
554
+ blk_aio_detach, vu_block_device);
555
+ vu_block_device->running = true;
556
+ return;
557
+
558
+ error:
559
+ vu_block_deinit(vu_block_device);
560
+}
561
+
562
+static bool vu_prop_modifiable(VuBlockDev *vus, Error **errp)
563
+{
564
+ if (vus->running) {
565
+ error_setg(errp, "The property can't be modified "
566
+ "while the server is running");
567
+ return false;
568
+ }
569
+ return true;
570
+}
571
+
572
+static void vu_set_node_name(Object *obj, const char *value, Error **errp)
573
+{
574
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
575
+
576
+ if (!vu_prop_modifiable(vus, errp)) {
577
+ return;
578
+ }
579
+
580
+ if (vus->node_name) {
581
+ g_free(vus->node_name);
582
+ }
583
+
584
+ vus->node_name = g_strdup(value);
585
+}
586
+
587
+static char *vu_get_node_name(Object *obj, Error **errp)
588
+{
589
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
590
+ return g_strdup(vus->node_name);
591
+}
592
+
593
+static void free_socket_addr(SocketAddress *addr)
594
+{
595
+ g_free(addr->u.q_unix.path);
596
+ g_free(addr);
597
+}
598
+
599
+static void vu_set_unix_socket(Object *obj, const char *value,
600
+ Error **errp)
601
+{
602
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
603
+
604
+ if (!vu_prop_modifiable(vus, errp)) {
605
+ return;
606
+ }
607
+
608
+ if (vus->addr) {
609
+ free_socket_addr(vus->addr);
610
+ }
611
+
612
+ SocketAddress *addr = g_new0(SocketAddress, 1);
613
+ addr->type = SOCKET_ADDRESS_TYPE_UNIX;
614
+ addr->u.q_unix.path = g_strdup(value);
615
+ vus->addr = addr;
616
+}
617
+
618
+static char *vu_get_unix_socket(Object *obj, Error **errp)
619
+{
620
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
621
+ return g_strdup(vus->addr->u.q_unix.path);
622
+}
623
+
624
+static bool vu_get_block_writable(Object *obj, Error **errp)
625
+{
626
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
627
+ return vus->writable;
628
+}
629
+
630
+static void vu_set_block_writable(Object *obj, bool value, Error **errp)
631
+{
632
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
633
+
634
+ if (!vu_prop_modifiable(vus, errp)) {
635
+ return;
636
+ }
637
+
638
+ vus->writable = value;
639
+}
640
+
641
+static void vu_get_blk_size(Object *obj, Visitor *v, const char *name,
642
+ void *opaque, Error **errp)
643
+{
644
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
645
+ uint32_t value = vus->blk_size;
646
+
647
+ visit_type_uint32(v, name, &value, errp);
648
+}
649
+
650
+static void vu_set_blk_size(Object *obj, Visitor *v, const char *name,
651
+ void *opaque, Error **errp)
652
+{
653
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
654
+
655
+ Error *local_err = NULL;
656
+ uint32_t value;
657
+
658
+ if (!vu_prop_modifiable(vus, errp)) {
659
+ return;
660
+ }
661
+
662
+ visit_type_uint32(v, name, &value, &local_err);
663
+ if (local_err) {
664
+ goto out;
665
+ }
666
+
667
+ check_block_size(object_get_typename(obj), name, value, &local_err);
668
+ if (local_err) {
669
+ goto out;
670
+ }
671
+
672
+ vus->blk_size = value;
673
+
674
+out:
675
+ error_propagate(errp, local_err);
676
+}
677
+
678
+static void vhost_user_blk_server_instance_finalize(Object *obj)
679
+{
680
+ VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
681
+
682
+ vhost_user_blk_server_stop(vub);
683
+
684
+ /*
685
+ * Unlike object_property_add_str, object_class_property_add_str
686
+ * doesn't have a release method. Thus manual memory freeing is
687
+ * needed.
688
+ */
689
+ free_socket_addr(vub->addr);
690
+ g_free(vub->node_name);
691
+}
692
+
693
+static void vhost_user_blk_server_complete(UserCreatable *obj, Error **errp)
694
+{
695
+ VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
696
+
697
+ vhost_user_blk_server_start(vub, errp);
698
+}
699
+
700
+static void vhost_user_blk_server_class_init(ObjectClass *klass,
701
+ void *class_data)
702
+{
703
+ UserCreatableClass *ucc = USER_CREATABLE_CLASS(klass);
704
+ ucc->complete = vhost_user_blk_server_complete;
705
+
706
+ object_class_property_add_bool(klass, "writable",
707
+ vu_get_block_writable,
708
+ vu_set_block_writable);
709
+
710
+ object_class_property_add_str(klass, "node-name",
711
+ vu_get_node_name,
712
+ vu_set_node_name);
713
+
714
+ object_class_property_add_str(klass, "unix-socket",
715
+ vu_get_unix_socket,
716
+ vu_set_unix_socket);
717
+
718
+ object_class_property_add(klass, "logical-block-size", "uint32",
719
+ vu_get_blk_size, vu_set_blk_size,
720
+ NULL, NULL);
721
+}
722
+
723
+static const TypeInfo vhost_user_blk_server_info = {
724
+ .name = TYPE_VHOST_USER_BLK_SERVER,
725
+ .parent = TYPE_OBJECT,
726
+ .instance_size = sizeof(VuBlockDev),
727
+ .instance_finalize = vhost_user_blk_server_instance_finalize,
728
+ .class_init = vhost_user_blk_server_class_init,
729
+ .interfaces = (InterfaceInfo[]) {
730
+ {TYPE_USER_CREATABLE},
731
+ {}
732
+ },
733
+};
734
+
735
+static void vhost_user_blk_server_register_types(void)
736
+{
737
+ type_register_static(&vhost_user_blk_server_info);
738
+}
739
+
740
+type_init(vhost_user_blk_server_register_types)
741
diff --git a/softmmu/vl.c b/softmmu/vl.c
742
index XXXXXXX..XXXXXXX 100644
743
--- a/softmmu/vl.c
744
+++ b/softmmu/vl.c
745
@@ -XXX,XX +XXX,XX @@ static bool object_create_initial(const char *type, QemuOpts *opts)
746
}
747
#endif
748
749
+ /* Reason: vhost-user-blk-server property "node-name" */
750
+ if (g_str_equal(type, "vhost-user-blk-server")) {
751
+ return false;
752
+ }
753
/*
754
* Reason: filter-* property "netdev" etc.
755
*/
756
diff --git a/block/meson.build b/block/meson.build
757
index XXXXXXX..XXXXXXX 100644
758
--- a/block/meson.build
759
+++ b/block/meson.build
760
@@ -XXX,XX +XXX,XX @@ block_ss.add(when: 'CONFIG_WIN32', if_true: files('file-win32.c', 'win32-aio.c')
761
block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, iokit])
762
block_ss.add(when: 'CONFIG_LIBISCSI', if_true: files('iscsi-opts.c'))
763
block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c'))
764
+block_ss.add(when: 'CONFIG_LINUX', if_true: files('export/vhost-user-blk-server.c', '../contrib/libvhost-user/libvhost-user.c'))
765
block_ss.add(when: 'CONFIG_REPLICATION', if_true: files('replication.c'))
766
block_ss.add(when: 'CONFIG_SHEEPDOG', if_true: files('sheepdog.c'))
767
block_ss.add(when: ['CONFIG_LINUX_AIO', libaio], if_true: files('linux-aio.c'))
188
--
768
--
189
2.31.1
769
2.26.2
190
770
191
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Coiby Xu <coiby.xu@gmail.com>
2
2
3
Split checking for reserved bits out of aligned offset check.
3
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
4
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
5
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
7
Message-id: 20200918080912.321299-8-coiby.xu@gmail.com
8
[Removed reference to vhost-user-blk-test.c, it will be sent in a
9
separate pull request.
10
--Stefan]
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
13
MAINTAINERS | 7 +++++++
14
1 file changed, 7 insertions(+)
4
15
5
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
16
diff --git a/MAINTAINERS b/MAINTAINERS
6
Reviewed-by: Eric Blake <eblake@redhat.com>
17
index XXXXXXX..XXXXXXX 100644
7
Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
18
--- a/MAINTAINERS
8
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
19
+++ b/MAINTAINERS
9
Message-Id: <20210914122454.141075-11-vsementsov@virtuozzo.com>
20
@@ -XXX,XX +XXX,XX @@ L: qemu-block@nongnu.org
10
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
21
S: Supported
11
---
22
F: tests/image-fuzzer/
12
block/qcow2.h | 1 +
23
13
block/qcow2-refcount.c | 10 +++++++++-
24
+Vhost-user block device backend server
14
2 files changed, 10 insertions(+), 1 deletion(-)
25
+M: Coiby Xu <Coiby.Xu@gmail.com>
26
+S: Maintained
27
+F: block/export/vhost-user-blk-server.c
28
+F: util/vhost-user-server.c
29
+F: tests/qtest/libqos/vhost-user-blk.c
30
+
31
Replication
32
M: Wen Congyang <wencongyang2@huawei.com>
33
M: Xie Changlong <xiechanglong.d@gmail.com>
34
--
35
2.26.2
15
36
16
diff --git a/block/qcow2.h b/block/qcow2.h
17
index XXXXXXX..XXXXXXX 100644
18
--- a/block/qcow2.h
19
+++ b/block/qcow2.h
20
@@ -XXX,XX +XXX,XX @@ typedef enum QCow2MetadataOverlap {
21
#define L2E_STD_RESERVED_MASK 0x3f000000000001feULL
22
23
#define REFT_OFFSET_MASK 0xfffffffffffffe00ULL
24
+#define REFT_RESERVED_MASK 0x1ffULL
25
26
#define INV_OFFSET (-1ULL)
27
28
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
29
index XXXXXXX..XXXXXXX 100644
30
--- a/block/qcow2-refcount.c
31
+++ b/block/qcow2-refcount.c
32
@@ -XXX,XX +XXX,XX @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
33
34
for(i = 0; i < s->refcount_table_size; i++) {
35
uint64_t offset, cluster;
36
- offset = s->refcount_table[i];
37
+ offset = s->refcount_table[i] & REFT_OFFSET_MASK;
38
cluster = offset >> s->cluster_bits;
39
40
+ if (s->refcount_table[i] & REFT_RESERVED_MASK) {
41
+ fprintf(stderr, "ERROR refcount table entry %" PRId64 " has "
42
+ "reserved bits set\n", i);
43
+ res->corruptions++;
44
+ *rebuild = true;
45
+ continue;
46
+ }
47
+
48
/* Refcount blocks are cluster aligned */
49
if (offset_into_cluster(s, offset)) {
50
fprintf(stderr, "ERROR refcount block %" PRId64 " is not "
51
--
52
2.31.1
53
54
diff view generated by jsdifflib
1
The AbnormalShutdown exception class is not in qemu.machine, but in
1
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2
qemu.machine.machine. (qemu.machine.AbnormalShutdown was enough for
2
Message-id: 20200924151549.913737-3-stefanha@redhat.com
3
Python to find it in order to run this test, but pylint complains about
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
it.)
5
6
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
7
Message-Id: <20210902094017.32902-5-hreitz@redhat.com>
8
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
9
---
4
---
10
tests/qemu-iotests/tests/mirror-top-perms | 2 +-
5
util/vhost-user-server.c | 2 +-
11
1 file changed, 1 insertion(+), 1 deletion(-)
6
1 file changed, 1 insertion(+), 1 deletion(-)
12
7
13
diff --git a/tests/qemu-iotests/tests/mirror-top-perms b/tests/qemu-iotests/tests/mirror-top-perms
8
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
14
index XXXXXXX..XXXXXXX 100755
9
index XXXXXXX..XXXXXXX 100644
15
--- a/tests/qemu-iotests/tests/mirror-top-perms
10
--- a/util/vhost-user-server.c
16
+++ b/tests/qemu-iotests/tests/mirror-top-perms
11
+++ b/util/vhost-user-server.c
17
@@ -XXX,XX +XXX,XX @@ class TestMirrorTopPerms(iotests.QMPTestCase):
12
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
18
def tearDown(self):
13
return false;
19
try:
14
}
20
self.vm.shutdown()
15
21
- except qemu.machine.AbnormalShutdown:
16
- /* zero out unspecified fileds */
22
+ except qemu.machine.machine.AbnormalShutdown:
17
+ /* zero out unspecified fields */
23
pass
18
*server = (VuServer) {
24
19
.listener = listener,
25
if self.vm_b is not None:
20
.vu_iface = vu_iface,
26
--
21
--
27
2.31.1
22
2.26.2
28
23
29
diff view generated by jsdifflib
1
169 and 199 have been renamed and moved to tests/ (commit a44be0334be:
1
We already have access to the value with the correct type (ioc and sioc
2
"iotests: rename and move 169 and 199 tests"), so we can drop them from
2
are the same QIOChannel).
3
the skip list.
4
3
5
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
4
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Reviewed-by: Willian Rampazzo <willianr@redhat.com>
5
Message-id: 20200924151549.913737-4-stefanha@redhat.com
7
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
9
Message-Id: <20210902094017.32902-2-hreitz@redhat.com>
10
---
7
---
11
tests/qemu-iotests/297 | 2 +-
8
util/vhost-user-server.c | 2 +-
12
1 file changed, 1 insertion(+), 1 deletion(-)
9
1 file changed, 1 insertion(+), 1 deletion(-)
13
10
14
diff --git a/tests/qemu-iotests/297 b/tests/qemu-iotests/297
11
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
15
index XXXXXXX..XXXXXXX 100755
12
index XXXXXXX..XXXXXXX 100644
16
--- a/tests/qemu-iotests/297
13
--- a/util/vhost-user-server.c
17
+++ b/tests/qemu-iotests/297
14
+++ b/util/vhost-user-server.c
18
@@ -XXX,XX +XXX,XX @@ import iotests
15
@@ -XXX,XX +XXX,XX @@ static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
19
SKIP_FILES = (
16
server->ioc = QIO_CHANNEL(sioc);
20
'030', '040', '041', '044', '045', '055', '056', '057', '065', '093',
17
object_ref(OBJECT(server->ioc));
21
'096', '118', '124', '132', '136', '139', '147', '148', '149',
18
qio_channel_attach_aio_context(server->ioc, server->ctx);
22
- '151', '152', '155', '163', '165', '169', '194', '196', '199', '202',
19
- qio_channel_set_blocking(QIO_CHANNEL(server->sioc), false, NULL);
23
+ '151', '152', '155', '163', '165', '194', '196', '202',
20
+ qio_channel_set_blocking(server->ioc, false, NULL);
24
'203', '205', '206', '207', '208', '210', '211', '212', '213', '216',
21
vu_client_start(server);
25
'218', '219', '224', '228', '234', '235', '236', '237', '238',
22
}
26
'240', '242', '245', '246', '248', '255', '256', '257', '258', '260',
23
27
--
24
--
28
2.31.1
25
2.26.2
29
26
30
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
Explicitly deleting watches is not necessary since libvhost-user calls
2
remove_watch() during vu_deinit(). Add an assertion to check this
3
though.
2
4
3
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
Reviewed-by: Eric Blake <eblake@redhat.com>
6
Message-id: 20200924151549.913737-5-stefanha@redhat.com
5
Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
7
Message-Id: <20210914122454.141075-8-vsementsov@virtuozzo.com>
8
[hreitz: Separated `type` declaration from statements]
9
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
10
---
8
---
11
block/qcow2.h | 1 +
9
util/vhost-user-server.c | 19 ++++---------------
12
block/qcow2-refcount.c | 14 +++++++++++++-
10
1 file changed, 4 insertions(+), 15 deletions(-)
13
2 files changed, 14 insertions(+), 1 deletion(-)
14
11
15
diff --git a/block/qcow2.h b/block/qcow2.h
12
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
16
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
17
--- a/block/qcow2.h
14
--- a/util/vhost-user-server.c
18
+++ b/block/qcow2.h
15
+++ b/util/vhost-user-server.c
19
@@ -XXX,XX +XXX,XX @@ typedef enum QCow2MetadataOverlap {
16
@@ -XXX,XX +XXX,XX @@ static void close_client(VuServer *server)
20
17
/* When this is set vu_client_trip will stop new processing vhost-user message */
21
#define L1E_OFFSET_MASK 0x00fffffffffffe00ULL
18
server->sioc = NULL;
22
#define L2E_OFFSET_MASK 0x00fffffffffffe00ULL
19
23
+#define L2E_STD_RESERVED_MASK 0x3f000000000001feULL
20
- VuFdWatch *vu_fd_watch, *next;
24
21
- QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
25
#define REFT_OFFSET_MASK 0xfffffffffffffe00ULL
22
- aio_set_fd_handler(server->ioc->ctx, vu_fd_watch->fd, true, NULL,
26
23
- NULL, NULL, NULL);
27
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
24
- }
28
index XXXXXXX..XXXXXXX 100644
25
-
29
--- a/block/qcow2-refcount.c
26
- while (!QTAILQ_EMPTY(&server->vu_fd_watches)) {
30
+++ b/block/qcow2-refcount.c
27
- QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
31
@@ -XXX,XX +XXX,XX @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
28
- if (!vu_fd_watch->processing) {
32
for (i = 0; i < s->l2_size; i++) {
29
- QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
33
uint64_t coffset;
30
- g_free(vu_fd_watch);
34
int csize;
31
- }
35
+ QCow2ClusterType type;
32
- }
33
- }
34
-
35
while (server->processing_msg) {
36
if (server->ioc->read_coroutine) {
37
server->ioc->read_coroutine = NULL;
38
@@ -XXX,XX +XXX,XX @@ static void close_client(VuServer *server)
39
}
40
41
vu_deinit(&server->vu_dev);
36
+
42
+
37
l2_entry = get_l2_entry(s, l2_table, i);
43
+ /* vu_deinit() should have called remove_watch() */
38
l2_bitmap = get_l2_bitmap(s, l2_table, i);
44
+ assert(QTAILQ_EMPTY(&server->vu_fd_watches));
39
+ type = qcow2_get_cluster_type(bs, l2_entry);
40
+
45
+
41
+ if (type != QCOW2_CLUSTER_COMPRESSED) {
46
object_unref(OBJECT(sioc));
42
+ /* Check reserved bits of Standard Cluster Descriptor */
47
object_unref(OBJECT(server->ioc));
43
+ if (l2_entry & L2E_STD_RESERVED_MASK) {
48
}
44
+ fprintf(stderr, "ERROR found l2 entry with reserved bits set: "
45
+ "%" PRIx64 "\n", l2_entry);
46
+ res->corruptions++;
47
+ }
48
+ }
49
50
- switch (qcow2_get_cluster_type(bs, l2_entry)) {
51
+ switch (type) {
52
case QCOW2_CLUSTER_COMPRESSED:
53
/* Compressed clusters don't have QCOW_OFLAG_COPIED */
54
if (l2_entry & QCOW_OFLAG_COPIED) {
55
--
49
--
56
2.31.1
50
2.26.2
57
51
58
diff view generated by jsdifflib
1
bdrv_co_block_status() does it for us, we do not need to do it here.
1
Only one struct is needed per request. Drop req_data and the separate
2
VuBlockReq instance. Instead let vu_queue_pop() allocate everything at
3
once.
2
4
3
The advantage of not capping *pnum is that bdrv_co_block_status() can
5
This fixes the req_data memory leak in vu_block_virtio_process_req().
4
cache larger data regions than requested by its caller.
5
6
6
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Reviewed-by: Eric Blake <eblake@redhat.com>
8
Message-id: 20200924151549.913737-6-stefanha@redhat.com
8
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
10
Message-Id: <20210812084148.14458-7-hreitz@redhat.com>
11
---
10
---
12
block/iscsi.c | 3 ---
11
block/export/vhost-user-blk-server.c | 68 +++++++++-------------------
13
1 file changed, 3 deletions(-)
12
1 file changed, 21 insertions(+), 47 deletions(-)
14
13
15
diff --git a/block/iscsi.c b/block/iscsi.c
14
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
16
index XXXXXXX..XXXXXXX 100644
15
index XXXXXXX..XXXXXXX 100644
17
--- a/block/iscsi.c
16
--- a/block/export/vhost-user-blk-server.c
18
+++ b/block/iscsi.c
17
+++ b/block/export/vhost-user-blk-server.c
19
@@ -XXX,XX +XXX,XX @@ retry:
18
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_inhdr {
20
iscsi_allocmap_set_allocated(iscsilun, offset, *pnum);
19
};
20
21
typedef struct VuBlockReq {
22
- VuVirtqElement *elem;
23
+ VuVirtqElement elem;
24
int64_t sector_num;
25
size_t size;
26
struct virtio_blk_inhdr *in;
27
@@ -XXX,XX +XXX,XX @@ static void vu_block_req_complete(VuBlockReq *req)
28
VuDev *vu_dev = &req->server->vu_dev;
29
30
/* IO size with 1 extra status byte */
31
- vu_queue_push(vu_dev, req->vq, req->elem, req->size + 1);
32
+ vu_queue_push(vu_dev, req->vq, &req->elem, req->size + 1);
33
vu_queue_notify(vu_dev, req->vq);
34
35
- if (req->elem) {
36
- free(req->elem);
37
- }
38
-
39
- g_free(req);
40
+ free(req);
41
}
42
43
static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
44
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_flush(VuBlockReq *req)
45
blk_co_flush(backend);
46
}
47
48
-struct req_data {
49
- VuServer *server;
50
- VuVirtq *vq;
51
- VuVirtqElement *elem;
52
-};
53
-
54
static void coroutine_fn vu_block_virtio_process_req(void *opaque)
55
{
56
- struct req_data *data = opaque;
57
- VuServer *server = data->server;
58
- VuVirtq *vq = data->vq;
59
- VuVirtqElement *elem = data->elem;
60
+ VuBlockReq *req = opaque;
61
+ VuServer *server = req->server;
62
+ VuVirtqElement *elem = &req->elem;
63
uint32_t type;
64
- VuBlockReq *req;
65
66
VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
67
BlockBackend *backend = vdev_blk->backend;
68
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
69
struct iovec *out_iov = elem->out_sg;
70
unsigned in_num = elem->in_num;
71
unsigned out_num = elem->out_num;
72
+
73
/* refer to hw/block/virtio_blk.c */
74
if (elem->out_num < 1 || elem->in_num < 1) {
75
error_report("virtio-blk request missing headers");
76
- free(elem);
77
- return;
78
+ goto err;
21
}
79
}
22
80
23
- if (*pnum > bytes) {
81
- req = g_new0(VuBlockReq, 1);
24
- *pnum = bytes;
82
- req->server = server;
25
- }
83
- req->vq = vq;
26
out_unlock:
84
- req->elem = elem;
27
qemu_mutex_unlock(&iscsilun->mutex);
85
-
28
g_free(iTask.err_str);
86
if (unlikely(iov_to_buf(out_iov, out_num, 0, &req->out,
87
sizeof(req->out)) != sizeof(req->out))) {
88
error_report("virtio-blk request outhdr too short");
89
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
90
91
err:
92
free(elem);
93
- g_free(req);
94
- return;
95
}
96
97
static void vu_block_process_vq(VuDev *vu_dev, int idx)
98
{
99
- VuServer *server;
100
- VuVirtq *vq;
101
- struct req_data *req_data;
102
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
103
+ VuVirtq *vq = vu_get_queue(vu_dev, idx);
104
105
- server = container_of(vu_dev, VuServer, vu_dev);
106
- assert(server);
107
-
108
- vq = vu_get_queue(vu_dev, idx);
109
- assert(vq);
110
- VuVirtqElement *elem;
111
while (1) {
112
- elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) +
113
- sizeof(VuBlockReq));
114
- if (elem) {
115
- req_data = g_new0(struct req_data, 1);
116
- req_data->server = server;
117
- req_data->vq = vq;
118
- req_data->elem = elem;
119
- Coroutine *co = qemu_coroutine_create(vu_block_virtio_process_req,
120
- req_data);
121
- aio_co_enter(server->ioc->ctx, co);
122
- } else {
123
+ VuBlockReq *req;
124
+
125
+ req = vu_queue_pop(vu_dev, vq, sizeof(VuBlockReq));
126
+ if (!req) {
127
break;
128
}
129
+
130
+ req->server = server;
131
+ req->vq = vq;
132
+
133
+ Coroutine *co =
134
+ qemu_coroutine_create(vu_block_virtio_process_req, req);
135
+ qemu_coroutine_enter(co);
136
}
137
}
138
29
--
139
--
30
2.31.1
140
2.26.2
31
141
32
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
The device panic notifier callback is not used. Drop it.
2
2
3
Check subcluster bitmap of the l2 entry for different types of
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
clusters:
4
Message-id: 20200924151549.913737-7-stefanha@redhat.com
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
---
7
util/vhost-user-server.h | 3 ---
8
block/export/vhost-user-blk-server.c | 3 +--
9
util/vhost-user-server.c | 6 ------
10
3 files changed, 1 insertion(+), 11 deletions(-)
5
11
6
- for compressed it must be zero
12
diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
7
- for allocated check consistency of two parts of the bitmap
13
index XXXXXXX..XXXXXXX 100644
8
- for unallocated all subclusters should be unallocated
14
--- a/util/vhost-user-server.h
9
(or zero-plain)
15
+++ b/util/vhost-user-server.h
16
@@ -XXX,XX +XXX,XX @@ typedef struct VuFdWatch {
17
} VuFdWatch;
18
19
typedef struct VuServer VuServer;
20
-typedef void DevicePanicNotifierFn(VuServer *server);
21
22
struct VuServer {
23
QIONetListener *listener;
24
AioContext *ctx;
25
- DevicePanicNotifierFn *device_panic_notifier;
26
int max_queues;
27
const VuDevIface *vu_iface;
28
VuDev vu_dev;
29
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
30
SocketAddress *unix_socket,
31
AioContext *ctx,
32
uint16_t max_queues,
33
- DevicePanicNotifierFn *device_panic_notifier,
34
const VuDevIface *vu_iface,
35
Error **errp);
36
37
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
38
index XXXXXXX..XXXXXXX 100644
39
--- a/block/export/vhost-user-blk-server.c
40
+++ b/block/export/vhost-user-blk-server.c
41
@@ -XXX,XX +XXX,XX @@ static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
42
ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
43
44
if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
45
- VHOST_USER_BLK_MAX_QUEUES,
46
- NULL, &vu_block_iface,
47
+ VHOST_USER_BLK_MAX_QUEUES, &vu_block_iface,
48
errp)) {
49
goto error;
50
}
51
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
52
index XXXXXXX..XXXXXXX 100644
53
--- a/util/vhost-user-server.c
54
+++ b/util/vhost-user-server.c
55
@@ -XXX,XX +XXX,XX @@ static void panic_cb(VuDev *vu_dev, const char *buf)
56
close_client(server);
57
}
58
59
- if (server->device_panic_notifier) {
60
- server->device_panic_notifier(server);
61
- }
62
-
63
/*
64
* Set the callback function for network listener so another
65
* vhost-user client can connect to this server
66
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
67
SocketAddress *socket_addr,
68
AioContext *ctx,
69
uint16_t max_queues,
70
- DevicePanicNotifierFn *device_panic_notifier,
71
const VuDevIface *vu_iface,
72
Error **errp)
73
{
74
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
75
.vu_iface = vu_iface,
76
.max_queues = max_queues,
77
.ctx = ctx,
78
- .device_panic_notifier = device_panic_notifier,
79
};
80
81
qio_net_listener_set_name(server->listener, "vhost-user-backend-listener");
82
--
83
2.26.2
10
84
11
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
12
Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
13
Message-Id: <20210914122454.141075-7-vsementsov@virtuozzo.com>
14
Reviewed-by: Eric Blake <eblake@redhat.com>
15
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
16
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
17
---
18
block/qcow2-refcount.c | 28 ++++++++++++++++++++++++++--
19
1 file changed, 26 insertions(+), 2 deletions(-)
20
21
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
22
index XXXXXXX..XXXXXXX 100644
23
--- a/block/qcow2-refcount.c
24
+++ b/block/qcow2-refcount.c
25
@@ -XXX,XX +XXX,XX @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
26
int flags, BdrvCheckMode fix, bool active)
27
{
28
BDRVQcow2State *s = bs->opaque;
29
- uint64_t l2_entry;
30
+ uint64_t l2_entry, l2_bitmap;
31
uint64_t next_contiguous_offset = 0;
32
int i, ret;
33
size_t l2_size_bytes = s->l2_size * l2_entry_size(s);
34
@@ -XXX,XX +XXX,XX @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
35
uint64_t coffset;
36
int csize;
37
l2_entry = get_l2_entry(s, l2_table, i);
38
+ l2_bitmap = get_l2_bitmap(s, l2_table, i);
39
40
switch (qcow2_get_cluster_type(bs, l2_entry)) {
41
case QCOW2_CLUSTER_COMPRESSED:
42
@@ -XXX,XX +XXX,XX @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
43
break;
44
}
45
46
+ if (l2_bitmap) {
47
+ fprintf(stderr, "ERROR compressed cluster %d with non-zero "
48
+ "subcluster allocation bitmap, entry=0x%" PRIx64 "\n",
49
+ i, l2_entry);
50
+ res->corruptions++;
51
+ break;
52
+ }
53
+
54
/* Mark cluster as used */
55
qcow2_parse_compressed_l2_entry(bs, l2_entry, &coffset, &csize);
56
ret = qcow2_inc_refcounts_imrt(
57
@@ -XXX,XX +XXX,XX @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
58
{
59
uint64_t offset = l2_entry & L2E_OFFSET_MASK;
60
61
+ if ((l2_bitmap >> 32) & l2_bitmap) {
62
+ res->corruptions++;
63
+ fprintf(stderr, "ERROR offset=%" PRIx64 ": Allocated "
64
+ "cluster has corrupted subcluster allocation bitmap\n",
65
+ offset);
66
+ }
67
+
68
/* Correct offsets are cluster aligned */
69
if (offset_into_cluster(s, offset)) {
70
bool contains_data;
71
res->corruptions++;
72
73
if (has_subclusters(s)) {
74
- uint64_t l2_bitmap = get_l2_bitmap(s, l2_table, i);
75
contains_data = (l2_bitmap & QCOW_L2_BITMAP_ALL_ALLOC);
76
} else {
77
contains_data = !(l2_entry & QCOW_OFLAG_ZERO);
78
@@ -XXX,XX +XXX,XX @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
79
}
80
81
case QCOW2_CLUSTER_ZERO_PLAIN:
82
+ /* Impossible when image has subclusters */
83
+ assert(!l2_bitmap);
84
+ break;
85
+
86
case QCOW2_CLUSTER_UNALLOCATED:
87
+ if (l2_bitmap & QCOW_L2_BITMAP_ALL_ALLOC) {
88
+ res->corruptions++;
89
+ fprintf(stderr, "ERROR: Unallocated "
90
+ "cluster has non-zero subcluster allocation map\n");
91
+ }
92
break;
93
94
default:
95
--
96
2.31.1
97
98
diff view generated by jsdifflib
1
From: Stefano Garzarella <sgarzare@redhat.com>
1
fds[] is leaked when qio_channel_readv_full() fails.
2
2
3
In mirror_iteration() we call mirror_wait_on_conflicts() with
3
Use vmsg->fds[] instead of keeping a local fds[] array. Then we can
4
`self` parameter set to NULL.
4
reuse goto fail to clean up fds. vmsg->fd_num must be zeroed before the
5
loop to make this safe.
5
6
6
Starting from commit d44dae1a7c we dereference `self` pointer in
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
mirror_wait_on_conflicts() without checks if it is not NULL.
8
Message-id: 20200924151549.913737-8-stefanha@redhat.com
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
11
util/vhost-user-server.c | 50 ++++++++++++++++++----------------------
12
1 file changed, 23 insertions(+), 27 deletions(-)
8
13
9
Backtrace:
14
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
10
Program terminated with signal SIGSEGV, Segmentation fault.
11
#0 mirror_wait_on_conflicts (self=0x0, s=<optimized out>, offset=<optimized out>, bytes=<optimized out>)
12
at ../block/mirror.c:172
13
172     self->waiting_for_op = op;
14
[Current thread is 1 (Thread 0x7f0908931ec0 (LWP 380249))]
15
(gdb) bt
16
#0 mirror_wait_on_conflicts (self=0x0, s=<optimized out>, offset=<optimized out>, bytes=<optimized out>)
17
at ../block/mirror.c:172
18
#1 0x00005610c5d9d631 in mirror_run (job=0x5610c76a2c00, errp=<optimized out>) at ../block/mirror.c:491
19
#2 0x00005610c5d58726 in job_co_entry (opaque=0x5610c76a2c00) at ../job.c:917
20
#3 0x00005610c5f046c6 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>)
21
at ../util/coroutine-ucontext.c:173
22
#4 0x00007f0909975820 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91
23
from /usr/lib64/libc.so.6
24
25
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2001404
26
Fixes: d44dae1a7c ("block/mirror: fix active mirror dead-lock in mirror_wait_on_conflicts")
27
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
28
Message-Id: <20210910124533.288318-1-sgarzare@redhat.com>
29
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
30
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
31
---
32
block/mirror.c | 25 ++++++++++++++++---------
33
1 file changed, 16 insertions(+), 9 deletions(-)
34
35
diff --git a/block/mirror.c b/block/mirror.c
36
index XXXXXXX..XXXXXXX 100644
15
index XXXXXXX..XXXXXXX 100644
37
--- a/block/mirror.c
16
--- a/util/vhost-user-server.c
38
+++ b/block/mirror.c
17
+++ b/util/vhost-user-server.c
39
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn mirror_wait_on_conflicts(MirrorOp *self,
18
@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
40
if (ranges_overlap(self_start_chunk, self_nb_chunks,
19
};
41
op_start_chunk, op_nb_chunks))
20
int rc, read_bytes = 0;
42
{
21
Error *local_err = NULL;
43
- /*
22
- /*
44
- * If the operation is already (indirectly) waiting for us, or
23
- * Store fds/nfds returned from qio_channel_readv_full into
45
- * will wait for us as soon as it wakes up, then just go on
24
- * temporary variables.
46
- * (instead of producing a deadlock in the former case).
25
- *
47
- */
26
- * VhostUserMsg is a packed structure, gcc will complain about passing
48
- if (op->waiting_for_op) {
27
- * pointer to a packed structure member if we pass &VhostUserMsg.fd_num
49
- continue;
28
- * and &VhostUserMsg.fds directly when calling qio_channel_readv_full,
50
+ if (self) {
29
- * thus two temporary variables nfds and fds are used here.
51
+ /*
30
- */
52
+ * If the operation is already (indirectly) waiting for us,
31
- size_t nfds = 0, nfds_t = 0;
53
+ * or will wait for us as soon as it wakes up, then just go
32
const size_t max_fds = G_N_ELEMENTS(vmsg->fds);
54
+ * on (instead of producing a deadlock in the former case).
33
- int *fds_t = NULL;
55
+ */
34
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
56
+ if (op->waiting_for_op) {
35
QIOChannel *ioc = server->ioc;
57
+ continue;
36
58
+ }
37
+ vmsg->fd_num = 0;
38
if (!ioc) {
39
error_report_err(local_err);
40
goto fail;
41
@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
42
43
assert(qemu_in_coroutine());
44
do {
45
+ size_t nfds = 0;
46
+ int *fds = NULL;
59
+
47
+
60
+ self->waiting_for_op = op;
48
/*
61
}
49
* qio_channel_readv_full may have short reads, keeping calling it
62
50
* until getting VHOST_USER_HDR_SIZE or 0 bytes in total
63
- self->waiting_for_op = op;
51
*/
64
qemu_co_queue_wait(&op->waiting_requests, NULL);
52
- rc = qio_channel_readv_full(ioc, &iov, 1, &fds_t, &nfds_t, &local_err);
65
- self->waiting_for_op = NULL;
53
+ rc = qio_channel_readv_full(ioc, &iov, 1, &fds, &nfds, &local_err);
66
+
54
if (rc < 0) {
67
+ if (self) {
55
if (rc == QIO_CHANNEL_ERR_BLOCK) {
68
+ self->waiting_for_op = NULL;
56
+ assert(local_err == NULL);
69
+ }
57
qio_channel_yield(ioc, G_IO_IN);
70
+
58
continue;
71
break;
59
} else {
60
error_report_err(local_err);
61
- return false;
62
+ goto fail;
72
}
63
}
73
}
64
}
65
- read_bytes += rc;
66
- if (nfds_t > 0) {
67
- if (nfds + nfds_t > max_fds) {
68
+
69
+ if (nfds > 0) {
70
+ if (vmsg->fd_num + nfds > max_fds) {
71
error_report("A maximum of %zu fds are allowed, "
72
"however got %zu fds now",
73
- max_fds, nfds + nfds_t);
74
+ max_fds, vmsg->fd_num + nfds);
75
+ g_free(fds);
76
goto fail;
77
}
78
- memcpy(vmsg->fds + nfds, fds_t,
79
- nfds_t *sizeof(vmsg->fds[0]));
80
- nfds += nfds_t;
81
- g_free(fds_t);
82
+ memcpy(vmsg->fds + vmsg->fd_num, fds, nfds * sizeof(vmsg->fds[0]));
83
+ vmsg->fd_num += nfds;
84
+ g_free(fds);
85
}
86
- if (read_bytes == VHOST_USER_HDR_SIZE || rc == 0) {
87
- break;
88
+
89
+ if (rc == 0) { /* socket closed */
90
+ goto fail;
91
}
92
- iov.iov_base = (char *)vmsg + read_bytes;
93
- iov.iov_len = VHOST_USER_HDR_SIZE - read_bytes;
94
- } while (true);
95
96
- vmsg->fd_num = nfds;
97
+ iov.iov_base += rc;
98
+ iov.iov_len -= rc;
99
+ read_bytes += rc;
100
+ } while (read_bytes != VHOST_USER_HDR_SIZE);
101
+
102
/* qio_channel_readv_full will make socket fds blocking, unblock them */
103
vmsg_unblock_fds(vmsg);
104
if (vmsg->size > sizeof(vmsg->payload)) {
74
--
105
--
75
2.31.1
106
2.26.2
76
107
77
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
Unexpected EOF is an error that must be reported.
2
2
3
No logic change, just prepare for the following commit. While being
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
here do also small grammar fix in a comment.
4
Message-id: 20200924151549.913737-9-stefanha@redhat.com
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
---
7
util/vhost-user-server.c | 6 ++++--
8
1 file changed, 4 insertions(+), 2 deletions(-)
5
9
6
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
10
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
7
Reviewed-by: Eric Blake <eblake@redhat.com>
8
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
9
Message-Id: <20210824101517.59802-3-vsementsov@virtuozzo.com>
10
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
11
---
12
block/qcow2-cluster.c | 49 ++++++++++++++++++++++++-------------------
13
1 file changed, 28 insertions(+), 21 deletions(-)
14
15
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
16
index XXXXXXX..XXXXXXX 100644
11
index XXXXXXX..XXXXXXX 100644
17
--- a/block/qcow2-cluster.c
12
--- a/util/vhost-user-server.c
18
+++ b/block/qcow2-cluster.c
13
+++ b/util/vhost-user-server.c
19
@@ -XXX,XX +XXX,XX @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
14
@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
20
15
};
21
if (end <= old_start || start >= old_end) {
16
if (vmsg->size) {
22
/* No intersection */
17
rc = qio_channel_readv_all_eof(ioc, &iov_payload, 1, &local_err);
23
- } else {
18
- if (rc == -1) {
24
- if (start < old_start) {
19
- error_report_err(local_err);
25
- /* Stop at the start of a running allocation */
20
+ if (rc != 1) {
26
- bytes = old_start - start;
21
+ if (local_err) {
27
- } else {
22
+ error_report_err(local_err);
28
- bytes = 0;
23
+ }
29
- }
24
goto fail;
30
+ continue;
31
+ }
32
33
- /* Stop if already an l2meta exists. After yielding, it wouldn't
34
- * be valid any more, so we'd have to clean up the old L2Metas
35
- * and deal with requests depending on them before starting to
36
- * gather new ones. Not worth the trouble. */
37
- if (bytes == 0 && *m) {
38
- *cur_bytes = 0;
39
- return 0;
40
- }
41
+ /* Conflict */
42
43
- if (bytes == 0) {
44
- /* Wait for the dependency to complete. We need to recheck
45
- * the free/allocated clusters when we continue. */
46
- qemu_co_queue_wait(&old_alloc->dependent_requests, &s->lock);
47
- return -EAGAIN;
48
- }
49
+ if (start < old_start) {
50
+ /* Stop at the start of a running allocation */
51
+ bytes = old_start - start;
52
+ } else {
53
+ bytes = 0;
54
+ }
55
+
56
+ /*
57
+ * Stop if an l2meta already exists. After yielding, it wouldn't
58
+ * be valid any more, so we'd have to clean up the old L2Metas
59
+ * and deal with requests depending on them before starting to
60
+ * gather new ones. Not worth the trouble.
61
+ */
62
+ if (bytes == 0 && *m) {
63
+ *cur_bytes = 0;
64
+ return 0;
65
+ }
66
+
67
+ if (bytes == 0) {
68
+ /*
69
+ * Wait for the dependency to complete. We need to recheck
70
+ * the free/allocated clusters when we continue.
71
+ */
72
+ qemu_co_queue_wait(&old_alloc->dependent_requests, &s->lock);
73
+ return -EAGAIN;
74
}
25
}
75
}
26
}
76
77
--
27
--
78
2.31.1
28
2.26.2
79
29
80
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
The vu_client_trip() coroutine is leaked during AioContext switching. It
2
is also unsafe to destroy the vu_dev in panic_cb() since its callers
3
still access it in some cases.
2
4
3
Split fix_l2_entry_by_zero() out of check_refcounts_l2() to be
5
Rework the lifecycle to solve these safety issues.
4
reused in further patch.
5
6
6
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Reviewed-by: Eric Blake <eblake@redhat.com>
8
Message-id: 20200924151549.913737-10-stefanha@redhat.com
8
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Message-Id: <20210914122454.141075-5-vsementsov@virtuozzo.com>
10
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
11
---
10
---
12
block/qcow2-refcount.c | 87 +++++++++++++++++++++++++++++-------------
11
util/vhost-user-server.h | 29 ++--
13
1 file changed, 60 insertions(+), 27 deletions(-)
12
block/export/vhost-user-blk-server.c | 9 +-
13
util/vhost-user-server.c | 245 +++++++++++++++------------
14
3 files changed, 155 insertions(+), 128 deletions(-)
14
15
15
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
16
diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
16
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
17
--- a/block/qcow2-refcount.c
18
--- a/util/vhost-user-server.h
18
+++ b/block/qcow2-refcount.c
19
+++ b/util/vhost-user-server.h
19
@@ -XXX,XX +XXX,XX @@ enum {
20
@@ -XXX,XX +XXX,XX @@
20
CHECK_FRAG_INFO = 0x2, /* update BlockFragInfo counters */
21
#include "qapi/error.h"
21
};
22
#include "standard-headers/linux/virtio_blk.h"
23
24
+/* A kick fd that we monitor on behalf of libvhost-user */
25
typedef struct VuFdWatch {
26
VuDev *vu_dev;
27
int fd; /*kick fd*/
28
void *pvt;
29
vu_watch_cb cb;
30
- bool processing;
31
QTAILQ_ENTRY(VuFdWatch) next;
32
} VuFdWatch;
33
34
-typedef struct VuServer VuServer;
35
-
36
-struct VuServer {
37
+/**
38
+ * VuServer:
39
+ * A vhost-user server instance with user-defined VuDevIface callbacks.
40
+ * Vhost-user device backends can be implemented using VuServer. VuDevIface
41
+ * callbacks and virtqueue kicks run in the given AioContext.
42
+ */
43
+typedef struct {
44
QIONetListener *listener;
45
+ QEMUBH *restart_listener_bh;
46
AioContext *ctx;
47
int max_queues;
48
const VuDevIface *vu_iface;
49
+
50
+ /* Protected by ctx lock */
51
VuDev vu_dev;
52
QIOChannel *ioc; /* The I/O channel with the client */
53
QIOChannelSocket *sioc; /* The underlying data channel with the client */
54
- /* IOChannel for fd provided via VHOST_USER_SET_SLAVE_REQ_FD */
55
- QIOChannel *ioc_slave;
56
- QIOChannelSocket *sioc_slave;
57
- Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
58
QTAILQ_HEAD(, VuFdWatch) vu_fd_watches;
59
- /* restart coroutine co_trip if AIOContext is changed */
60
- bool aio_context_changed;
61
- bool processing_msg;
62
-};
63
+
64
+ Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
65
+} VuServer;
66
67
bool vhost_user_server_start(VuServer *server,
68
SocketAddress *unix_socket,
69
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
70
71
void vhost_user_server_stop(VuServer *server);
72
73
-void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx);
74
+void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx);
75
+void vhost_user_server_detach_aio_context(VuServer *server);
76
77
#endif /* VHOST_USER_SERVER_H */
78
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
79
index XXXXXXX..XXXXXXX 100644
80
--- a/block/export/vhost-user-blk-server.c
81
+++ b/block/export/vhost-user-blk-server.c
82
@@ -XXX,XX +XXX,XX @@ static const VuDevIface vu_block_iface = {
83
static void blk_aio_attached(AioContext *ctx, void *opaque)
84
{
85
VuBlockDev *vub_dev = opaque;
86
- aio_context_acquire(ctx);
87
- vhost_user_server_set_aio_context(&vub_dev->vu_server, ctx);
88
- aio_context_release(ctx);
89
+ vhost_user_server_attach_aio_context(&vub_dev->vu_server, ctx);
90
}
91
92
static void blk_aio_detach(void *opaque)
93
{
94
VuBlockDev *vub_dev = opaque;
95
- AioContext *ctx = vub_dev->vu_server.ctx;
96
- aio_context_acquire(ctx);
97
- vhost_user_server_set_aio_context(&vub_dev->vu_server, NULL);
98
- aio_context_release(ctx);
99
+ vhost_user_server_detach_aio_context(&vub_dev->vu_server);
100
}
101
102
static void
103
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
104
index XXXXXXX..XXXXXXX 100644
105
--- a/util/vhost-user-server.c
106
+++ b/util/vhost-user-server.c
107
@@ -XXX,XX +XXX,XX @@
108
*/
109
#include "qemu/osdep.h"
110
#include "qemu/main-loop.h"
111
+#include "block/aio-wait.h"
112
#include "vhost-user-server.h"
22
113
23
+/*
114
+/*
24
+ * Fix L2 entry by making it QCOW2_CLUSTER_ZERO_PLAIN.
115
+ * Theory of operation:
25
+ *
116
+ *
26
+ * This function decrements res->corruptions on success, so the caller is
117
+ * VuServer is started and stopped by vhost_user_server_start() and
27
+ * responsible to increment res->corruptions prior to the call.
118
+ * vhost_user_server_stop() from the main loop thread. Starting the server
28
+ *
119
+ * opens a vhost-user UNIX domain socket and listens for incoming connections.
29
+ * On failure in-memory @l2_table may be modified.
120
+ * Only one connection is allowed at a time.
121
+ *
122
+ * The connection is handled by the vu_client_trip() coroutine in the
123
+ * VuServer->ctx AioContext. The coroutine consists of a vu_dispatch() loop
124
+ * where libvhost-user calls vu_message_read() to receive the next vhost-user
125
+ * protocol messages over the UNIX domain socket.
126
+ *
127
+ * When virtqueues are set up libvhost-user calls set_watch() to monitor kick
128
+ * fds. These fds are also handled in the VuServer->ctx AioContext.
129
+ *
130
+ * Both vu_client_trip() and kick fd monitoring can be stopped by shutting down
131
+ * the socket connection. Shutting down the socket connection causes
132
+ * vu_message_read() to fail since no more data can be received from the socket.
133
+ * After vu_dispatch() fails, vu_client_trip() calls vu_deinit() to stop
134
+ * libvhost-user before terminating the coroutine. vu_deinit() calls
135
+ * remove_watch() to stop monitoring kick fds and this stops virtqueue
136
+ * processing.
137
+ *
138
+ * When vu_client_trip() has finished cleaning up it schedules a BH in the main
139
+ * loop thread to accept the next client connection.
140
+ *
141
+ * When libvhost-user detects an error it calls panic_cb() and sets the
142
+ * dev->broken flag. Both vu_client_trip() and kick fd processing stop when
143
+ * the dev->broken flag is set.
144
+ *
145
+ * It is possible to switch AioContexts using
146
+ * vhost_user_server_detach_aio_context() and
147
+ * vhost_user_server_attach_aio_context(). They stop monitoring fds in the old
148
+ * AioContext and resume monitoring in the new AioContext. The vu_client_trip()
149
+ * coroutine remains in a yielded state during the switch. This is made
150
+ * possible by QIOChannel's support for spurious coroutine re-entry in
151
+ * qio_channel_yield(). The coroutine will restart I/O when re-entered from the
152
+ * new AioContext.
30
+ */
153
+ */
31
+static int fix_l2_entry_by_zero(BlockDriverState *bs, BdrvCheckResult *res,
154
+
32
+ uint64_t l2_offset,
155
static void vmsg_close_fds(VhostUserMsg *vmsg)
33
+ uint64_t *l2_table, int l2_index, bool active,
156
{
34
+ bool *metadata_overlap)
157
int i;
158
@@ -XXX,XX +XXX,XX @@ static void vmsg_unblock_fds(VhostUserMsg *vmsg)
159
}
160
}
161
162
-static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
163
- gpointer opaque);
164
-
165
-static void close_client(VuServer *server)
166
-{
167
- /*
168
- * Before closing the client
169
- *
170
- * 1. Let vu_client_trip stop processing new vhost-user msg
171
- *
172
- * 2. remove kick_handler
173
- *
174
- * 3. wait for the kick handler to be finished
175
- *
176
- * 4. wait for the current vhost-user msg to be finished processing
177
- */
178
-
179
- QIOChannelSocket *sioc = server->sioc;
180
- /* When this is set vu_client_trip will stop new processing vhost-user message */
181
- server->sioc = NULL;
182
-
183
- while (server->processing_msg) {
184
- if (server->ioc->read_coroutine) {
185
- server->ioc->read_coroutine = NULL;
186
- qio_channel_set_aio_fd_handler(server->ioc, server->ioc->ctx, NULL,
187
- NULL, server->ioc);
188
- server->processing_msg = false;
189
- }
190
- }
191
-
192
- vu_deinit(&server->vu_dev);
193
-
194
- /* vu_deinit() should have called remove_watch() */
195
- assert(QTAILQ_EMPTY(&server->vu_fd_watches));
196
-
197
- object_unref(OBJECT(sioc));
198
- object_unref(OBJECT(server->ioc));
199
-}
200
-
201
static void panic_cb(VuDev *vu_dev, const char *buf)
202
{
203
- VuServer *server = container_of(vu_dev, VuServer, vu_dev);
204
-
205
- /* avoid while loop in close_client */
206
- server->processing_msg = false;
207
-
208
- if (buf) {
209
- error_report("vu_panic: %s", buf);
210
- }
211
-
212
- if (server->sioc) {
213
- close_client(server);
214
- }
215
-
216
- /*
217
- * Set the callback function for network listener so another
218
- * vhost-user client can connect to this server
219
- */
220
- qio_net_listener_set_client_func(server->listener,
221
- vu_accept,
222
- server,
223
- NULL);
224
+ error_report("vu_panic: %s", buf);
225
}
226
227
static bool coroutine_fn
228
@@ -XXX,XX +XXX,XX @@ fail:
229
return false;
230
}
231
232
-
233
-static void vu_client_start(VuServer *server);
234
static coroutine_fn void vu_client_trip(void *opaque)
235
{
236
VuServer *server = opaque;
237
+ VuDev *vu_dev = &server->vu_dev;
238
239
- while (!server->aio_context_changed && server->sioc) {
240
- server->processing_msg = true;
241
- vu_dispatch(&server->vu_dev);
242
- server->processing_msg = false;
243
+ while (!vu_dev->broken && vu_dispatch(vu_dev)) {
244
+ /* Keep running */
245
}
246
247
- if (server->aio_context_changed && server->sioc) {
248
- server->aio_context_changed = false;
249
- vu_client_start(server);
250
- }
251
-}
252
+ vu_deinit(vu_dev);
253
+
254
+ /* vu_deinit() should have called remove_watch() */
255
+ assert(QTAILQ_EMPTY(&server->vu_fd_watches));
256
+
257
+ object_unref(OBJECT(server->sioc));
258
+ server->sioc = NULL;
259
260
-static void vu_client_start(VuServer *server)
261
-{
262
- server->co_trip = qemu_coroutine_create(vu_client_trip, server);
263
- aio_co_enter(server->ctx, server->co_trip);
264
+ object_unref(OBJECT(server->ioc));
265
+ server->ioc = NULL;
266
+
267
+ server->co_trip = NULL;
268
+ if (server->restart_listener_bh) {
269
+ qemu_bh_schedule(server->restart_listener_bh);
270
+ }
271
+ aio_wait_kick();
272
}
273
274
/*
275
@@ -XXX,XX +XXX,XX @@ static void vu_client_start(VuServer *server)
276
static void kick_handler(void *opaque)
277
{
278
VuFdWatch *vu_fd_watch = opaque;
279
- vu_fd_watch->processing = true;
280
- vu_fd_watch->cb(vu_fd_watch->vu_dev, 0, vu_fd_watch->pvt);
281
- vu_fd_watch->processing = false;
282
+ VuDev *vu_dev = vu_fd_watch->vu_dev;
283
+
284
+ vu_fd_watch->cb(vu_dev, 0, vu_fd_watch->pvt);
285
+
286
+ /* Stop vu_client_trip() if an error occurred in vu_fd_watch->cb() */
287
+ if (vu_dev->broken) {
288
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
289
+
290
+ qio_channel_shutdown(server->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
291
+ }
292
}
293
294
-
295
static VuFdWatch *find_vu_fd_watch(VuServer *server, int fd)
296
{
297
298
@@ -XXX,XX +XXX,XX @@ static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
299
qio_channel_set_name(QIO_CHANNEL(sioc), "vhost-user client");
300
server->ioc = QIO_CHANNEL(sioc);
301
object_ref(OBJECT(server->ioc));
302
- qio_channel_attach_aio_context(server->ioc, server->ctx);
303
+
304
+ /* TODO vu_message_write() spins if non-blocking! */
305
qio_channel_set_blocking(server->ioc, false, NULL);
306
- vu_client_start(server);
307
+
308
+ server->co_trip = qemu_coroutine_create(vu_client_trip, server);
309
+
310
+ aio_context_acquire(server->ctx);
311
+ vhost_user_server_attach_aio_context(server, server->ctx);
312
+ aio_context_release(server->ctx);
313
}
314
315
-
316
void vhost_user_server_stop(VuServer *server)
317
{
318
+ aio_context_acquire(server->ctx);
319
+
320
+ qemu_bh_delete(server->restart_listener_bh);
321
+ server->restart_listener_bh = NULL;
322
+
323
if (server->sioc) {
324
- close_client(server);
325
+ VuFdWatch *vu_fd_watch;
326
+
327
+ QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
328
+ aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true,
329
+ NULL, NULL, NULL, vu_fd_watch);
330
+ }
331
+
332
+ qio_channel_shutdown(server->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
333
+
334
+ AIO_WAIT_WHILE(server->ctx, server->co_trip);
335
}
336
337
+ aio_context_release(server->ctx);
338
+
339
if (server->listener) {
340
qio_net_listener_disconnect(server->listener);
341
object_unref(OBJECT(server->listener));
342
}
343
+}
344
+
345
+/*
346
+ * Allow the next client to connect to the server. Called from a BH in the main
347
+ * loop.
348
+ */
349
+static void restart_listener_bh(void *opaque)
35
+{
350
+{
36
+ BDRVQcow2State *s = bs->opaque;
351
+ VuServer *server = opaque;
37
+ int ret;
352
38
+ int idx = l2_index * (l2_entry_size(s) / sizeof(uint64_t));
353
+ qio_net_listener_set_client_func(server->listener, vu_accept, server,
39
+ uint64_t l2e_offset = l2_offset + (uint64_t)l2_index * l2_entry_size(s);
354
+ NULL);
40
+ int ign = active ? QCOW2_OL_ACTIVE_L2 : QCOW2_OL_INACTIVE_L2;
355
}
41
+ uint64_t l2_entry = has_subclusters(s) ? 0 : QCOW_OFLAG_ZERO;
356
42
+
357
-void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx)
43
+ set_l2_entry(s, l2_table, l2_index, l2_entry);
358
+/* Called with ctx acquired */
44
+ ret = qcow2_pre_write_overlap_check(bs, ign, l2e_offset, l2_entry_size(s),
359
+void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx)
45
+ false);
360
{
46
+ if (metadata_overlap) {
361
- VuFdWatch *vu_fd_watch, *next;
47
+ *metadata_overlap = ret < 0;
362
- void *opaque = NULL;
363
- IOHandler *io_read = NULL;
364
- bool attach;
365
+ VuFdWatch *vu_fd_watch;
366
367
- server->ctx = ctx ? ctx : qemu_get_aio_context();
368
+ server->ctx = ctx;
369
370
if (!server->sioc) {
371
- /* not yet serving any client*/
372
return;
373
}
374
375
- if (ctx) {
376
- qio_channel_attach_aio_context(server->ioc, ctx);
377
- server->aio_context_changed = true;
378
- io_read = kick_handler;
379
- attach = true;
380
- } else {
381
+ qio_channel_attach_aio_context(server->ioc, ctx);
382
+
383
+ QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
384
+ aio_set_fd_handler(ctx, vu_fd_watch->fd, true, kick_handler, NULL,
385
+ NULL, vu_fd_watch);
48
+ }
386
+ }
49
+ if (ret < 0) {
387
+
50
+ fprintf(stderr, "ERROR: Overlap check failed\n");
388
+ aio_co_schedule(ctx, server->co_trip);
51
+ goto fail;
52
+ }
53
+
54
+ ret = bdrv_pwrite_sync(bs->file, l2e_offset, &l2_table[idx],
55
+ l2_entry_size(s));
56
+ if (ret < 0) {
57
+ fprintf(stderr, "ERROR: Failed to overwrite L2 "
58
+ "table entry: %s\n", strerror(-ret));
59
+ goto fail;
60
+ }
61
+
62
+ res->corruptions--;
63
+ res->corruptions_fixed++;
64
+ return 0;
65
+
66
+fail:
67
+ res->check_errors++;
68
+ return ret;
69
+}
389
+}
70
+
390
+
71
/*
391
+/* Called with server->ctx acquired */
72
* Increases the refcount in the given refcount table for the all clusters
392
+void vhost_user_server_detach_aio_context(VuServer *server)
73
* referenced in the L2 table. While doing so, performs some checks on L2
393
+{
74
@@ -XXX,XX +XXX,XX @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
394
+ if (server->sioc) {
75
int i, ret;
395
+ VuFdWatch *vu_fd_watch;
76
size_t l2_size_bytes = s->l2_size * l2_entry_size(s);
396
+
77
g_autofree uint64_t *l2_table = g_malloc(l2_size_bytes);
397
+ QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
78
+ bool metadata_overlap;
398
+ aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true,
79
399
+ NULL, NULL, NULL, vu_fd_watch);
80
/* Read L2 table from disk */
400
+ }
81
ret = bdrv_pread(bs->file, l2_offset, l2_table, l2_size_bytes);
401
+
82
@@ -XXX,XX +XXX,XX @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
402
qio_channel_detach_aio_context(server->ioc);
83
fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR",
403
- /* server->ioc->ctx keeps the old AioConext */
84
offset);
404
- ctx = server->ioc->ctx;
85
if (fix & BDRV_FIX_ERRORS) {
405
- attach = false;
86
- int idx = i * (l2_entry_size(s) / sizeof(uint64_t));
406
}
87
- uint64_t l2e_offset =
407
88
- l2_offset + (uint64_t)i * l2_entry_size(s);
408
- QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
89
- int ign = active ? QCOW2_OL_ACTIVE_L2 :
409
- if (vu_fd_watch->cb) {
90
- QCOW2_OL_INACTIVE_L2;
410
- opaque = attach ? vu_fd_watch : NULL;
91
-
411
- aio_set_fd_handler(ctx, vu_fd_watch->fd, true,
92
- l2_entry = has_subclusters(s) ? 0 : QCOW_OFLAG_ZERO;
412
- io_read, NULL, NULL,
93
- set_l2_entry(s, l2_table, i, l2_entry);
413
- opaque);
94
- ret = qcow2_pre_write_overlap_check(bs, ign,
414
- }
95
- l2e_offset, l2_entry_size(s), false);
415
- }
96
- if (ret < 0) {
416
+ server->ctx = NULL;
97
- fprintf(stderr, "ERROR: Overlap check failed\n");
417
}
98
- res->check_errors++;
418
99
+ ret = fix_l2_entry_by_zero(bs, res, l2_offset,
419
-
100
+ l2_table, i, active,
420
bool vhost_user_server_start(VuServer *server,
101
+ &metadata_overlap);
421
SocketAddress *socket_addr,
102
+ if (metadata_overlap) {
422
AioContext *ctx,
103
/*
423
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
104
* Something is seriously wrong, so abort checking
424
const VuDevIface *vu_iface,
105
* this L2 table.
425
Error **errp)
106
@@ -XXX,XX +XXX,XX @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
426
{
107
return ret;
427
+ QEMUBH *bh;
108
}
428
QIONetListener *listener = qio_net_listener_new();
109
429
if (qio_net_listener_open_sync(listener, socket_addr, 1,
110
- ret = bdrv_pwrite_sync(bs->file, l2e_offset,
430
errp) < 0) {
111
- &l2_table[idx],
431
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
112
- l2_entry_size(s));
432
return false;
113
- if (ret < 0) {
433
}
114
- fprintf(stderr, "ERROR: Failed to overwrite L2 "
434
115
- "table entry: %s\n", strerror(-ret));
435
+ bh = qemu_bh_new(restart_listener_bh, server);
116
- res->check_errors++;
436
+
117
- /*
437
/* zero out unspecified fields */
118
- * Do not abort, continue checking the rest of this
438
*server = (VuServer) {
119
- * L2 table's entries.
439
.listener = listener,
120
- */
440
+ .restart_listener_bh = bh,
121
- } else {
441
.vu_iface = vu_iface,
122
- res->corruptions--;
442
.max_queues = max_queues,
123
- res->corruptions_fixed++;
443
.ctx = ctx,
124
+ if (ret == 0) {
125
/*
126
* Skip marking the cluster as used
127
* (it is unused now).
128
*/
129
continue;
130
}
131
+
132
+ /*
133
+ * Failed to fix.
134
+ * Do not abort, continue checking the rest of this
135
+ * L2 table's entries.
136
+ */
137
}
138
} else {
139
fprintf(stderr, "ERROR offset=%" PRIx64 ": Data cluster is "
140
--
444
--
141
2.31.1
445
2.26.2
142
446
143
diff view generated by jsdifflib
1
297 so far does not check the named tests, which reside in the tests/
1
Propagate the flush return value since errors are possible.
2
directory (i.e. full path tests/qemu-iotests/tests). Fix it.
3
2
4
Thanks to the previous two commits, all named tests pass its scrutiny,
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
5
so we do not have to add anything to SKIP_FILES.
4
Message-id: 20200924151549.913737-11-stefanha@redhat.com
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
---
7
block/export/vhost-user-blk-server.c | 11 +++++++----
8
1 file changed, 7 insertions(+), 4 deletions(-)
6
9
7
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
10
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
8
Reviewed-by: Willian Rampazzo <willianr@redhat.com>
11
index XXXXXXX..XXXXXXX 100644
9
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
12
--- a/block/export/vhost-user-blk-server.c
10
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
13
+++ b/block/export/vhost-user-blk-server.c
11
Message-Id: <20210902094017.32902-6-hreitz@redhat.com>
14
@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
12
---
15
return -EINVAL;
13
tests/qemu-iotests/297 | 5 +++--
16
}
14
1 file changed, 3 insertions(+), 2 deletions(-)
17
18
-static void coroutine_fn vu_block_flush(VuBlockReq *req)
19
+static int coroutine_fn vu_block_flush(VuBlockReq *req)
20
{
21
VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
22
BlockBackend *backend = vdev_blk->backend;
23
- blk_co_flush(backend);
24
+ return blk_co_flush(backend);
25
}
26
27
static void coroutine_fn vu_block_virtio_process_req(void *opaque)
28
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
29
break;
30
}
31
case VIRTIO_BLK_T_FLUSH:
32
- vu_block_flush(req);
33
- req->in->status = VIRTIO_BLK_S_OK;
34
+ if (vu_block_flush(req) == 0) {
35
+ req->in->status = VIRTIO_BLK_S_OK;
36
+ } else {
37
+ req->in->status = VIRTIO_BLK_S_IOERR;
38
+ }
39
break;
40
case VIRTIO_BLK_T_GET_ID: {
41
size_t size = MIN(iov_size(&elem->in_sg[0], in_num),
42
--
43
2.26.2
15
44
16
diff --git a/tests/qemu-iotests/297 b/tests/qemu-iotests/297
17
index XXXXXXX..XXXXXXX 100755
18
--- a/tests/qemu-iotests/297
19
+++ b/tests/qemu-iotests/297
20
@@ -XXX,XX +XXX,XX @@ def is_python_file(filename):
21
22
23
def run_linters():
24
- files = [filename for filename in (set(os.listdir('.')) - set(SKIP_FILES))
25
- if is_python_file(filename)]
26
+ named_tests = [f'tests/{entry}' for entry in os.listdir('tests')]
27
+ check_tests = set(os.listdir('.') + named_tests) - set(SKIP_FILES)
28
+ files = [filename for filename in check_tests if is_python_file(filename)]
29
30
iotests.logger.debug('Files to be checked:')
31
iotests.logger.debug(', '.join(sorted(files)))
32
--
33
2.31.1
34
35
diff view generated by jsdifflib
1
From: Max Reitz <mreitz@redhat.com>
1
Use the new QAPI block exports API instead of defining our own QOM
2
2
objects.
3
gluster's block-status implementation is basically a copy of that in
3
4
block/file-posix.c, there is only one thing missing, and that is
4
This is a large change because the lifecycle of VuBlockDev needs to
5
aligning trailing data extents to the request alignment (as added by
5
follow BlockExportDriver. QOM properties are replaced by QAPI options
6
commit 9c3db310ff0).
6
objects.
7
7
8
Note that 9c3db310ff0 mentions that "there seems to be no other block
8
VuBlockDev is renamed VuBlkExport and contains a BlockExport field.
9
driver that sets request_alignment and [...]", but while block/gluster.c
9
Several fields can be dropped since BlockExport already has equivalents.
10
does indeed not set request_alignment, block/io.c's
10
11
bdrv_refresh_limits() will still default to an alignment of 512 because
11
The file names and meson build integration will be adjusted in a future
12
block/gluster.c does not provide a byte-aligned read function.
12
patch. libvhost-user should probably be built as a static library that
13
Therefore, unaligned tails can conceivably occur, and so we should apply
13
is linked into QEMU instead of as a .c file that results in duplicate
14
the change from 9c3db310ff0 to gluster's block-status implementation.
14
compilation.
15
15
16
Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
16
The new command-line syntax is:
17
Signed-off-by: Max Reitz <mreitz@redhat.com>
17
18
Message-Id: <20210805143603.59503-1-mreitz@redhat.com>
18
$ qemu-storage-daemon \
19
Reviewed-by: Eric Blake <eblake@redhat.com>
19
--blockdev file,node-name=drive0,filename=test.img \
20
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
20
--export vhost-user-blk,node-name=drive0,id=export0,unix-socket=/tmp/vhost-user-blk.sock
21
22
Note that unix-socket is optional because we may wish to accept chardevs
23
too in the future.
24
25
Markus noted that supported address families are not explicit in the
26
QAPI schema. It is unlikely that support for more address families will
27
be added since file descriptor passing is required and few address
28
families support it. If a new address family needs to be added, then the
29
QAPI 'features' syntax can be used to advertize them.
30
31
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
32
Acked-by: Markus Armbruster <armbru@redhat.com>
33
Message-id: 20200924151549.913737-12-stefanha@redhat.com
34
[Skip test on big-endian host architectures because this device doesn't
35
support them yet (as already mentioned in a code comment).
36
--Stefan]
37
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
21
---
38
---
22
block/gluster.c | 16 ++++++++++++++++
39
qapi/block-export.json | 21 +-
23
1 file changed, 16 insertions(+)
40
block/export/vhost-user-blk-server.h | 23 +-
24
41
block/export/export.c | 6 +
25
diff --git a/block/gluster.c b/block/gluster.c
42
block/export/vhost-user-blk-server.c | 452 +++++++--------------------
43
util/vhost-user-server.c | 10 +-
44
block/export/meson.build | 1 +
45
block/meson.build | 1 -
46
7 files changed, 156 insertions(+), 358 deletions(-)
47
48
diff --git a/qapi/block-export.json b/qapi/block-export.json
26
index XXXXXXX..XXXXXXX 100644
49
index XXXXXXX..XXXXXXX 100644
27
--- a/block/gluster.c
50
--- a/qapi/block-export.json
28
+++ b/block/gluster.c
51
+++ b/qapi/block-export.json
29
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn qemu_gluster_co_block_status(BlockDriverState *bs,
52
@@ -XXX,XX +XXX,XX @@
30
off_t data = 0, hole = 0;
53
'data': { '*name': 'str', '*description': 'str',
31
int ret = -EINVAL;
54
'*bitmap': 'str' } }
32
55
33
+ assert(QEMU_IS_ALIGNED(offset | bytes, bs->bl.request_alignment));
56
+##
57
+# @BlockExportOptionsVhostUserBlk:
58
+#
59
+# A vhost-user-blk block export.
60
+#
61
+# @addr: The vhost-user socket on which to listen. Both 'unix' and 'fd'
62
+# SocketAddress types are supported. Passed fds must be UNIX domain
63
+# sockets.
64
+# @logical-block-size: Logical block size in bytes. Defaults to 512 bytes.
65
+#
66
+# Since: 5.2
67
+##
68
+{ 'struct': 'BlockExportOptionsVhostUserBlk',
69
+ 'data': { 'addr': 'SocketAddress', '*logical-block-size': 'size' } }
34
+
70
+
35
if (!s->fd) {
71
##
36
return ret;
72
# @NbdServerAddOptions:
73
#
74
@@ -XXX,XX +XXX,XX @@
75
# An enumeration of block export types
76
#
77
# @nbd: NBD export
78
+# @vhost-user-blk: vhost-user-blk export (since 5.2)
79
#
80
# Since: 4.2
81
##
82
{ 'enum': 'BlockExportType',
83
- 'data': [ 'nbd' ] }
84
+ 'data': [ 'nbd', 'vhost-user-blk' ] }
85
86
##
87
# @BlockExportOptions:
88
@@ -XXX,XX +XXX,XX @@
89
'*writethrough': 'bool' },
90
'discriminator': 'type',
91
'data': {
92
- 'nbd': 'BlockExportOptionsNbd'
93
+ 'nbd': 'BlockExportOptionsNbd',
94
+ 'vhost-user-blk': 'BlockExportOptionsVhostUserBlk'
95
} }
96
97
##
98
diff --git a/block/export/vhost-user-blk-server.h b/block/export/vhost-user-blk-server.h
99
index XXXXXXX..XXXXXXX 100644
100
--- a/block/export/vhost-user-blk-server.h
101
+++ b/block/export/vhost-user-blk-server.h
102
@@ -XXX,XX +XXX,XX @@
103
104
#ifndef VHOST_USER_BLK_SERVER_H
105
#define VHOST_USER_BLK_SERVER_H
106
-#include "util/vhost-user-server.h"
107
108
-typedef struct VuBlockDev VuBlockDev;
109
-#define TYPE_VHOST_USER_BLK_SERVER "vhost-user-blk-server"
110
-#define VHOST_USER_BLK_SERVER(obj) \
111
- OBJECT_CHECK(VuBlockDev, obj, TYPE_VHOST_USER_BLK_SERVER)
112
+#include "block/export.h"
113
114
-/* vhost user block device */
115
-struct VuBlockDev {
116
- Object parent_obj;
117
- char *node_name;
118
- SocketAddress *addr;
119
- AioContext *ctx;
120
- VuServer vu_server;
121
- bool running;
122
- uint32_t blk_size;
123
- BlockBackend *backend;
124
- QIOChannelSocket *sioc;
125
- QTAILQ_ENTRY(VuBlockDev) next;
126
- struct virtio_blk_config blkcfg;
127
- bool writable;
128
-};
129
+/* For block/export/export.c */
130
+extern const BlockExportDriver blk_exp_vhost_user_blk;
131
132
#endif /* VHOST_USER_BLK_SERVER_H */
133
diff --git a/block/export/export.c b/block/export/export.c
134
index XXXXXXX..XXXXXXX 100644
135
--- a/block/export/export.c
136
+++ b/block/export/export.c
137
@@ -XXX,XX +XXX,XX @@
138
#include "sysemu/block-backend.h"
139
#include "block/export.h"
140
#include "block/nbd.h"
141
+#if CONFIG_LINUX
142
+#include "block/export/vhost-user-blk-server.h"
143
+#endif
144
#include "qapi/error.h"
145
#include "qapi/qapi-commands-block-export.h"
146
#include "qapi/qapi-events-block-export.h"
147
@@ -XXX,XX +XXX,XX @@
148
149
static const BlockExportDriver *blk_exp_drivers[] = {
150
&blk_exp_nbd,
151
+#if CONFIG_LINUX
152
+ &blk_exp_vhost_user_blk,
153
+#endif
154
};
155
156
/* Only accessed from the main thread */
157
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
158
index XXXXXXX..XXXXXXX 100644
159
--- a/block/export/vhost-user-blk-server.c
160
+++ b/block/export/vhost-user-blk-server.c
161
@@ -XXX,XX +XXX,XX @@
162
*/
163
#include "qemu/osdep.h"
164
#include "block/block.h"
165
+#include "contrib/libvhost-user/libvhost-user.h"
166
+#include "standard-headers/linux/virtio_blk.h"
167
+#include "util/vhost-user-server.h"
168
#include "vhost-user-blk-server.h"
169
#include "qapi/error.h"
170
#include "qom/object_interfaces.h"
171
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_inhdr {
172
unsigned char status;
173
};
174
175
-typedef struct VuBlockReq {
176
+typedef struct VuBlkReq {
177
VuVirtqElement elem;
178
int64_t sector_num;
179
size_t size;
180
@@ -XXX,XX +XXX,XX @@ typedef struct VuBlockReq {
181
struct virtio_blk_outhdr out;
182
VuServer *server;
183
struct VuVirtq *vq;
184
-} VuBlockReq;
185
+} VuBlkReq;
186
187
-static void vu_block_req_complete(VuBlockReq *req)
188
+/* vhost user block device */
189
+typedef struct {
190
+ BlockExport export;
191
+ VuServer vu_server;
192
+ uint32_t blk_size;
193
+ QIOChannelSocket *sioc;
194
+ struct virtio_blk_config blkcfg;
195
+ bool writable;
196
+} VuBlkExport;
197
+
198
+static void vu_blk_req_complete(VuBlkReq *req)
199
{
200
VuDev *vu_dev = &req->server->vu_dev;
201
202
@@ -XXX,XX +XXX,XX @@ static void vu_block_req_complete(VuBlockReq *req)
203
free(req);
204
}
205
206
-static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
207
-{
208
- return container_of(server, VuBlockDev, vu_server);
209
-}
210
-
211
static int coroutine_fn
212
-vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
213
- uint32_t iovcnt, uint32_t type)
214
+vu_blk_discard_write_zeroes(BlockBackend *blk, struct iovec *iov,
215
+ uint32_t iovcnt, uint32_t type)
216
{
217
struct virtio_blk_discard_write_zeroes desc;
218
ssize_t size = iov_to_buf(iov, iovcnt, 0, &desc, sizeof(desc));
219
@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
220
return -EINVAL;
37
}
221
}
38
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn qemu_gluster_co_block_status(BlockDriverState *bs,
222
39
/* On a data extent, compute bytes to the end of the extent,
223
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
40
* possibly including a partial sector at EOF. */
224
uint64_t range[2] = { le64_to_cpu(desc.sector) << 9,
41
*pnum = MIN(bytes, hole - offset);
225
le32_to_cpu(desc.num_sectors) << 9 };
42
+
226
if (type == VIRTIO_BLK_T_DISCARD) {
43
+ /*
227
- if (blk_co_pdiscard(vdev_blk->backend, range[0], range[1]) == 0) {
44
+ * We are not allowed to return partial sectors, though, so
228
+ if (blk_co_pdiscard(blk, range[0], range[1]) == 0) {
45
+ * round up if necessary.
229
return 0;
46
+ */
230
}
47
+ if (!QEMU_IS_ALIGNED(*pnum, bs->bl.request_alignment)) {
231
} else if (type == VIRTIO_BLK_T_WRITE_ZEROES) {
48
+ int64_t file_length = qemu_gluster_getlength(bs);
232
- if (blk_co_pwrite_zeroes(vdev_blk->backend,
49
+ if (file_length > 0) {
233
- range[0], range[1], 0) == 0) {
50
+ /* Ignore errors, this is just a safeguard */
234
+ if (blk_co_pwrite_zeroes(blk, range[0], range[1], 0) == 0) {
51
+ assert(hole == file_length);
235
return 0;
52
+ }
236
}
53
+ *pnum = ROUND_UP(*pnum, bs->bl.request_alignment);
237
}
238
@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
239
return -EINVAL;
240
}
241
242
-static int coroutine_fn vu_block_flush(VuBlockReq *req)
243
+static void coroutine_fn vu_blk_virtio_process_req(void *opaque)
244
{
245
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
246
- BlockBackend *backend = vdev_blk->backend;
247
- return blk_co_flush(backend);
248
-}
249
-
250
-static void coroutine_fn vu_block_virtio_process_req(void *opaque)
251
-{
252
- VuBlockReq *req = opaque;
253
+ VuBlkReq *req = opaque;
254
VuServer *server = req->server;
255
VuVirtqElement *elem = &req->elem;
256
uint32_t type;
257
258
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
259
- BlockBackend *backend = vdev_blk->backend;
260
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
261
+ BlockBackend *blk = vexp->export.blk;
262
263
struct iovec *in_iov = elem->in_sg;
264
struct iovec *out_iov = elem->out_sg;
265
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
266
bool is_write = type & VIRTIO_BLK_T_OUT;
267
req->sector_num = le64_to_cpu(req->out.sector);
268
269
- int64_t offset = req->sector_num * vdev_blk->blk_size;
270
+ if (is_write && !vexp->writable) {
271
+ req->in->status = VIRTIO_BLK_S_IOERR;
272
+ break;
54
+ }
273
+ }
55
+
274
+
56
ret = BDRV_BLOCK_DATA;
275
+ int64_t offset = req->sector_num * vexp->blk_size;
57
} else {
276
QEMUIOVector qiov;
58
/* On a hole, compute bytes to the beginning of the next extent. */
277
if (is_write) {
278
qemu_iovec_init_external(&qiov, out_iov, out_num);
279
- ret = blk_co_pwritev(backend, offset, qiov.size,
280
- &qiov, 0);
281
+ ret = blk_co_pwritev(blk, offset, qiov.size, &qiov, 0);
282
} else {
283
qemu_iovec_init_external(&qiov, in_iov, in_num);
284
- ret = blk_co_preadv(backend, offset, qiov.size,
285
- &qiov, 0);
286
+ ret = blk_co_preadv(blk, offset, qiov.size, &qiov, 0);
287
}
288
if (ret >= 0) {
289
req->in->status = VIRTIO_BLK_S_OK;
290
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
291
break;
292
}
293
case VIRTIO_BLK_T_FLUSH:
294
- if (vu_block_flush(req) == 0) {
295
+ if (blk_co_flush(blk) == 0) {
296
req->in->status = VIRTIO_BLK_S_OK;
297
} else {
298
req->in->status = VIRTIO_BLK_S_IOERR;
299
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
300
case VIRTIO_BLK_T_DISCARD:
301
case VIRTIO_BLK_T_WRITE_ZEROES: {
302
int rc;
303
- rc = vu_block_discard_write_zeroes(req, &elem->out_sg[1],
304
- out_num, type);
305
+
306
+ if (!vexp->writable) {
307
+ req->in->status = VIRTIO_BLK_S_IOERR;
308
+ break;
309
+ }
310
+
311
+ rc = vu_blk_discard_write_zeroes(blk, &elem->out_sg[1], out_num, type);
312
if (rc == 0) {
313
req->in->status = VIRTIO_BLK_S_OK;
314
} else {
315
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
316
break;
317
}
318
319
- vu_block_req_complete(req);
320
+ vu_blk_req_complete(req);
321
return;
322
323
err:
324
- free(elem);
325
+ free(req);
326
}
327
328
-static void vu_block_process_vq(VuDev *vu_dev, int idx)
329
+static void vu_blk_process_vq(VuDev *vu_dev, int idx)
330
{
331
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
332
VuVirtq *vq = vu_get_queue(vu_dev, idx);
333
334
while (1) {
335
- VuBlockReq *req;
336
+ VuBlkReq *req;
337
338
- req = vu_queue_pop(vu_dev, vq, sizeof(VuBlockReq));
339
+ req = vu_queue_pop(vu_dev, vq, sizeof(VuBlkReq));
340
if (!req) {
341
break;
342
}
343
@@ -XXX,XX +XXX,XX @@ static void vu_block_process_vq(VuDev *vu_dev, int idx)
344
req->vq = vq;
345
346
Coroutine *co =
347
- qemu_coroutine_create(vu_block_virtio_process_req, req);
348
+ qemu_coroutine_create(vu_blk_virtio_process_req, req);
349
qemu_coroutine_enter(co);
350
}
351
}
352
353
-static void vu_block_queue_set_started(VuDev *vu_dev, int idx, bool started)
354
+static void vu_blk_queue_set_started(VuDev *vu_dev, int idx, bool started)
355
{
356
VuVirtq *vq;
357
358
assert(vu_dev);
359
360
vq = vu_get_queue(vu_dev, idx);
361
- vu_set_queue_handler(vu_dev, vq, started ? vu_block_process_vq : NULL);
362
+ vu_set_queue_handler(vu_dev, vq, started ? vu_blk_process_vq : NULL);
363
}
364
365
-static uint64_t vu_block_get_features(VuDev *dev)
366
+static uint64_t vu_blk_get_features(VuDev *dev)
367
{
368
uint64_t features;
369
VuServer *server = container_of(dev, VuServer, vu_dev);
370
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
371
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
372
features = 1ull << VIRTIO_BLK_F_SIZE_MAX |
373
1ull << VIRTIO_BLK_F_SEG_MAX |
374
1ull << VIRTIO_BLK_F_TOPOLOGY |
375
@@ -XXX,XX +XXX,XX @@ static uint64_t vu_block_get_features(VuDev *dev)
376
1ull << VIRTIO_RING_F_EVENT_IDX |
377
1ull << VHOST_USER_F_PROTOCOL_FEATURES;
378
379
- if (!vdev_blk->writable) {
380
+ if (!vexp->writable) {
381
features |= 1ull << VIRTIO_BLK_F_RO;
382
}
383
384
return features;
385
}
386
387
-static uint64_t vu_block_get_protocol_features(VuDev *dev)
388
+static uint64_t vu_blk_get_protocol_features(VuDev *dev)
389
{
390
return 1ull << VHOST_USER_PROTOCOL_F_CONFIG |
391
1ull << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD;
392
}
393
394
static int
395
-vu_block_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
396
+vu_blk_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
397
{
398
+ /* TODO blkcfg must be little-endian for VIRTIO 1.0 */
399
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
400
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
401
- memcpy(config, &vdev_blk->blkcfg, len);
402
-
403
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
404
+ memcpy(config, &vexp->blkcfg, len);
405
return 0;
406
}
407
408
static int
409
-vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
410
+vu_blk_set_config(VuDev *vu_dev, const uint8_t *data,
411
uint32_t offset, uint32_t size, uint32_t flags)
412
{
413
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
414
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
415
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
416
uint8_t wce;
417
418
/* don't support live migration */
419
@@ -XXX,XX +XXX,XX @@ vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
420
}
421
422
wce = *data;
423
- vdev_blk->blkcfg.wce = wce;
424
- blk_set_enable_write_cache(vdev_blk->backend, wce);
425
+ vexp->blkcfg.wce = wce;
426
+ blk_set_enable_write_cache(vexp->export.blk, wce);
427
return 0;
428
}
429
430
@@ -XXX,XX +XXX,XX @@ vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
431
* of vu_process_message.
432
*
433
*/
434
-static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
435
+static int vu_blk_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
436
{
437
if (vmsg->request == VHOST_USER_NONE) {
438
dev->panic(dev, "disconnect");
439
@@ -XXX,XX +XXX,XX @@ static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
440
return false;
441
}
442
443
-static const VuDevIface vu_block_iface = {
444
- .get_features = vu_block_get_features,
445
- .queue_set_started = vu_block_queue_set_started,
446
- .get_protocol_features = vu_block_get_protocol_features,
447
- .get_config = vu_block_get_config,
448
- .set_config = vu_block_set_config,
449
- .process_msg = vu_block_process_msg,
450
+static const VuDevIface vu_blk_iface = {
451
+ .get_features = vu_blk_get_features,
452
+ .queue_set_started = vu_blk_queue_set_started,
453
+ .get_protocol_features = vu_blk_get_protocol_features,
454
+ .get_config = vu_blk_get_config,
455
+ .set_config = vu_blk_set_config,
456
+ .process_msg = vu_blk_process_msg,
457
};
458
459
static void blk_aio_attached(AioContext *ctx, void *opaque)
460
{
461
- VuBlockDev *vub_dev = opaque;
462
- vhost_user_server_attach_aio_context(&vub_dev->vu_server, ctx);
463
+ VuBlkExport *vexp = opaque;
464
+ vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
465
}
466
467
static void blk_aio_detach(void *opaque)
468
{
469
- VuBlockDev *vub_dev = opaque;
470
- vhost_user_server_detach_aio_context(&vub_dev->vu_server);
471
+ VuBlkExport *vexp = opaque;
472
+ vhost_user_server_detach_aio_context(&vexp->vu_server);
473
}
474
475
static void
476
-vu_block_initialize_config(BlockDriverState *bs,
477
+vu_blk_initialize_config(BlockDriverState *bs,
478
struct virtio_blk_config *config, uint32_t blk_size)
479
{
480
config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
481
@@ -XXX,XX +XXX,XX @@ vu_block_initialize_config(BlockDriverState *bs,
482
config->max_write_zeroes_seg = 1;
483
}
484
485
-static VuBlockDev *vu_block_init(VuBlockDev *vu_block_device, Error **errp)
486
+static void vu_blk_exp_request_shutdown(BlockExport *exp)
487
{
488
+ VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
489
490
- BlockBackend *blk;
491
- Error *local_error = NULL;
492
- const char *node_name = vu_block_device->node_name;
493
- bool writable = vu_block_device->writable;
494
- uint64_t perm = BLK_PERM_CONSISTENT_READ;
495
- int ret;
496
-
497
- AioContext *ctx;
498
-
499
- BlockDriverState *bs = bdrv_lookup_bs(node_name, node_name, &local_error);
500
-
501
- if (!bs) {
502
- error_propagate(errp, local_error);
503
- return NULL;
504
- }
505
-
506
- if (bdrv_is_read_only(bs)) {
507
- writable = false;
508
- }
509
-
510
- if (writable) {
511
- perm |= BLK_PERM_WRITE;
512
- }
513
-
514
- ctx = bdrv_get_aio_context(bs);
515
- aio_context_acquire(ctx);
516
- bdrv_invalidate_cache(bs, NULL);
517
- aio_context_release(ctx);
518
-
519
- /*
520
- * Don't allow resize while the vhost user server is running,
521
- * otherwise we don't care what happens with the node.
522
- */
523
- blk = blk_new(bdrv_get_aio_context(bs), perm,
524
- BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
525
- BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD);
526
- ret = blk_insert_bs(blk, bs, errp);
527
-
528
- if (ret < 0) {
529
- goto fail;
530
- }
531
-
532
- blk_set_enable_write_cache(blk, false);
533
-
534
- blk_set_allow_aio_context_change(blk, true);
535
-
536
- vu_block_device->blkcfg.wce = 0;
537
- vu_block_device->backend = blk;
538
- if (!vu_block_device->blk_size) {
539
- vu_block_device->blk_size = BDRV_SECTOR_SIZE;
540
- }
541
- vu_block_device->blkcfg.blk_size = vu_block_device->blk_size;
542
- blk_set_guest_block_size(blk, vu_block_device->blk_size);
543
- vu_block_initialize_config(bs, &vu_block_device->blkcfg,
544
- vu_block_device->blk_size);
545
- return vu_block_device;
546
-
547
-fail:
548
- blk_unref(blk);
549
- return NULL;
550
-}
551
-
552
-static void vu_block_deinit(VuBlockDev *vu_block_device)
553
-{
554
- if (vu_block_device->backend) {
555
- blk_remove_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
556
- blk_aio_detach, vu_block_device);
557
- }
558
-
559
- blk_unref(vu_block_device->backend);
560
-}
561
-
562
-static void vhost_user_blk_server_stop(VuBlockDev *vu_block_device)
563
-{
564
- vhost_user_server_stop(&vu_block_device->vu_server);
565
- vu_block_deinit(vu_block_device);
566
-}
567
-
568
-static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
569
- Error **errp)
570
-{
571
- AioContext *ctx;
572
- SocketAddress *addr = vu_block_device->addr;
573
-
574
- if (!vu_block_init(vu_block_device, errp)) {
575
- return;
576
- }
577
-
578
- ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
579
-
580
- if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
581
- VHOST_USER_BLK_MAX_QUEUES, &vu_block_iface,
582
- errp)) {
583
- goto error;
584
- }
585
-
586
- blk_add_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
587
- blk_aio_detach, vu_block_device);
588
- vu_block_device->running = true;
589
- return;
590
-
591
- error:
592
- vu_block_deinit(vu_block_device);
593
-}
594
-
595
-static bool vu_prop_modifiable(VuBlockDev *vus, Error **errp)
596
-{
597
- if (vus->running) {
598
- error_setg(errp, "The property can't be modified "
599
- "while the server is running");
600
- return false;
601
- }
602
- return true;
603
-}
604
-
605
-static void vu_set_node_name(Object *obj, const char *value, Error **errp)
606
-{
607
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
608
-
609
- if (!vu_prop_modifiable(vus, errp)) {
610
- return;
611
- }
612
-
613
- if (vus->node_name) {
614
- g_free(vus->node_name);
615
- }
616
-
617
- vus->node_name = g_strdup(value);
618
-}
619
-
620
-static char *vu_get_node_name(Object *obj, Error **errp)
621
-{
622
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
623
- return g_strdup(vus->node_name);
624
-}
625
-
626
-static void free_socket_addr(SocketAddress *addr)
627
-{
628
- g_free(addr->u.q_unix.path);
629
- g_free(addr);
630
-}
631
-
632
-static void vu_set_unix_socket(Object *obj, const char *value,
633
- Error **errp)
634
-{
635
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
636
-
637
- if (!vu_prop_modifiable(vus, errp)) {
638
- return;
639
- }
640
-
641
- if (vus->addr) {
642
- free_socket_addr(vus->addr);
643
- }
644
-
645
- SocketAddress *addr = g_new0(SocketAddress, 1);
646
- addr->type = SOCKET_ADDRESS_TYPE_UNIX;
647
- addr->u.q_unix.path = g_strdup(value);
648
- vus->addr = addr;
649
+ vhost_user_server_stop(&vexp->vu_server);
650
}
651
652
-static char *vu_get_unix_socket(Object *obj, Error **errp)
653
+static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
654
+ Error **errp)
655
{
656
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
657
- return g_strdup(vus->addr->u.q_unix.path);
658
-}
659
-
660
-static bool vu_get_block_writable(Object *obj, Error **errp)
661
-{
662
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
663
- return vus->writable;
664
-}
665
-
666
-static void vu_set_block_writable(Object *obj, bool value, Error **errp)
667
-{
668
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
669
-
670
- if (!vu_prop_modifiable(vus, errp)) {
671
- return;
672
- }
673
-
674
- vus->writable = value;
675
-}
676
-
677
-static void vu_get_blk_size(Object *obj, Visitor *v, const char *name,
678
- void *opaque, Error **errp)
679
-{
680
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
681
- uint32_t value = vus->blk_size;
682
-
683
- visit_type_uint32(v, name, &value, errp);
684
-}
685
-
686
-static void vu_set_blk_size(Object *obj, Visitor *v, const char *name,
687
- void *opaque, Error **errp)
688
-{
689
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
690
-
691
+ VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
692
+ BlockExportOptionsVhostUserBlk *vu_opts = &opts->u.vhost_user_blk;
693
Error *local_err = NULL;
694
- uint32_t value;
695
+ uint64_t logical_block_size;
696
697
- if (!vu_prop_modifiable(vus, errp)) {
698
- return;
699
- }
700
+ vexp->writable = opts->writable;
701
+ vexp->blkcfg.wce = 0;
702
703
- visit_type_uint32(v, name, &value, &local_err);
704
- if (local_err) {
705
- goto out;
706
+ if (vu_opts->has_logical_block_size) {
707
+ logical_block_size = vu_opts->logical_block_size;
708
+ } else {
709
+ logical_block_size = BDRV_SECTOR_SIZE;
710
}
711
-
712
- check_block_size(object_get_typename(obj), name, value, &local_err);
713
+ check_block_size(exp->id, "logical-block-size", logical_block_size,
714
+ &local_err);
715
if (local_err) {
716
- goto out;
717
+ error_propagate(errp, local_err);
718
+ return -EINVAL;
719
+ }
720
+ vexp->blk_size = logical_block_size;
721
+ blk_set_guest_block_size(exp->blk, logical_block_size);
722
+ vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
723
+ logical_block_size);
724
+
725
+ blk_set_allow_aio_context_change(exp->blk, true);
726
+ blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
727
+ vexp);
728
+
729
+ if (!vhost_user_server_start(&vexp->vu_server, vu_opts->addr, exp->ctx,
730
+ VHOST_USER_BLK_MAX_QUEUES, &vu_blk_iface,
731
+ errp)) {
732
+ blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
733
+ blk_aio_detach, vexp);
734
+ return -EADDRNOTAVAIL;
735
}
736
737
- vus->blk_size = value;
738
-
739
-out:
740
- error_propagate(errp, local_err);
741
-}
742
-
743
-static void vhost_user_blk_server_instance_finalize(Object *obj)
744
-{
745
- VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
746
-
747
- vhost_user_blk_server_stop(vub);
748
-
749
- /*
750
- * Unlike object_property_add_str, object_class_property_add_str
751
- * doesn't have a release method. Thus manual memory freeing is
752
- * needed.
753
- */
754
- free_socket_addr(vub->addr);
755
- g_free(vub->node_name);
756
-}
757
-
758
-static void vhost_user_blk_server_complete(UserCreatable *obj, Error **errp)
759
-{
760
- VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
761
-
762
- vhost_user_blk_server_start(vub, errp);
763
+ return 0;
764
}
765
766
-static void vhost_user_blk_server_class_init(ObjectClass *klass,
767
- void *class_data)
768
+static void vu_blk_exp_delete(BlockExport *exp)
769
{
770
- UserCreatableClass *ucc = USER_CREATABLE_CLASS(klass);
771
- ucc->complete = vhost_user_blk_server_complete;
772
-
773
- object_class_property_add_bool(klass, "writable",
774
- vu_get_block_writable,
775
- vu_set_block_writable);
776
-
777
- object_class_property_add_str(klass, "node-name",
778
- vu_get_node_name,
779
- vu_set_node_name);
780
-
781
- object_class_property_add_str(klass, "unix-socket",
782
- vu_get_unix_socket,
783
- vu_set_unix_socket);
784
+ VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
785
786
- object_class_property_add(klass, "logical-block-size", "uint32",
787
- vu_get_blk_size, vu_set_blk_size,
788
- NULL, NULL);
789
+ blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
790
+ vexp);
791
}
792
793
-static const TypeInfo vhost_user_blk_server_info = {
794
- .name = TYPE_VHOST_USER_BLK_SERVER,
795
- .parent = TYPE_OBJECT,
796
- .instance_size = sizeof(VuBlockDev),
797
- .instance_finalize = vhost_user_blk_server_instance_finalize,
798
- .class_init = vhost_user_blk_server_class_init,
799
- .interfaces = (InterfaceInfo[]) {
800
- {TYPE_USER_CREATABLE},
801
- {}
802
- },
803
+const BlockExportDriver blk_exp_vhost_user_blk = {
804
+ .type = BLOCK_EXPORT_TYPE_VHOST_USER_BLK,
805
+ .instance_size = sizeof(VuBlkExport),
806
+ .create = vu_blk_exp_create,
807
+ .delete = vu_blk_exp_delete,
808
+ .request_shutdown = vu_blk_exp_request_shutdown,
809
};
810
-
811
-static void vhost_user_blk_server_register_types(void)
812
-{
813
- type_register_static(&vhost_user_blk_server_info);
814
-}
815
-
816
-type_init(vhost_user_blk_server_register_types)
817
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
818
index XXXXXXX..XXXXXXX 100644
819
--- a/util/vhost-user-server.c
820
+++ b/util/vhost-user-server.c
821
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
822
Error **errp)
823
{
824
QEMUBH *bh;
825
- QIONetListener *listener = qio_net_listener_new();
826
+ QIONetListener *listener;
827
+
828
+ if (socket_addr->type != SOCKET_ADDRESS_TYPE_UNIX &&
829
+ socket_addr->type != SOCKET_ADDRESS_TYPE_FD) {
830
+ error_setg(errp, "Only socket address types 'unix' and 'fd' are supported");
831
+ return false;
832
+ }
833
+
834
+ listener = qio_net_listener_new();
835
if (qio_net_listener_open_sync(listener, socket_addr, 1,
836
errp) < 0) {
837
object_unref(OBJECT(listener));
838
diff --git a/block/export/meson.build b/block/export/meson.build
839
index XXXXXXX..XXXXXXX 100644
840
--- a/block/export/meson.build
841
+++ b/block/export/meson.build
842
@@ -1 +1,2 @@
843
block_ss.add(files('export.c'))
844
+block_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-blk-server.c', '../../contrib/libvhost-user/libvhost-user.c'))
845
diff --git a/block/meson.build b/block/meson.build
846
index XXXXXXX..XXXXXXX 100644
847
--- a/block/meson.build
848
+++ b/block/meson.build
849
@@ -XXX,XX +XXX,XX @@ block_ss.add(when: 'CONFIG_WIN32', if_true: files('file-win32.c', 'win32-aio.c')
850
block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, iokit])
851
block_ss.add(when: 'CONFIG_LIBISCSI', if_true: files('iscsi-opts.c'))
852
block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c'))
853
-block_ss.add(when: 'CONFIG_LINUX', if_true: files('export/vhost-user-blk-server.c', '../contrib/libvhost-user/libvhost-user.c'))
854
block_ss.add(when: 'CONFIG_REPLICATION', if_true: files('replication.c'))
855
block_ss.add(when: 'CONFIG_SHEEPDOG', if_true: files('sheepdog.c'))
856
block_ss.add(when: ['CONFIG_LINUX_AIO', libaio], if_true: files('linux-aio.c'))
59
--
857
--
60
2.31.1
858
2.26.2
61
859
62
diff view generated by jsdifflib
Deleted patch
1
.bdrv_co_block_status() implementations are free to return a *pnum that
2
exceeds @bytes, because bdrv_co_block_status() in block/io.c will clamp
3
*pnum as necessary.
4
1
5
On the other hand, if drivers' implementations return values for *pnum
6
that are as large as possible, our recently introduced block-status
7
cache will become more effective.
8
9
So, make a note in block_int.h that @bytes is no upper limit for *pnum.
10
11
Suggested-by: Eric Blake <eblake@redhat.com>
12
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
13
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
14
Message-Id: <20210812084148.14458-4-hreitz@redhat.com>
15
Reviewed-by: Eric Blake <eblake@redhat.com>
16
---
17
include/block/block_int.h | 9 +++++++++
18
1 file changed, 9 insertions(+)
19
20
diff --git a/include/block/block_int.h b/include/block/block_int.h
21
index XXXXXXX..XXXXXXX 100644
22
--- a/include/block/block_int.h
23
+++ b/include/block/block_int.h
24
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
25
* clamped to bdrv_getlength() and aligned to request_alignment,
26
* as well as non-NULL pnum, map, and file; in turn, the driver
27
* must return an error or set pnum to an aligned non-zero value.
28
+ *
29
+ * Note that @bytes is just a hint on how big of a region the
30
+ * caller wants to inspect. It is not a limit on *pnum.
31
+ * Implementations are free to return larger values of *pnum if
32
+ * doing so does not incur a performance penalty.
33
+ *
34
+ * block/io.c's bdrv_co_block_status() will utilize an unclamped
35
+ * *pnum value for the block-status cache on protocol nodes, prior
36
+ * to clamping *pnum for return to its caller.
37
*/
38
int coroutine_fn (*bdrv_co_block_status)(BlockDriverState *bs,
39
bool want_zero, int64_t offset, int64_t bytes, int64_t *pnum,
40
--
41
2.31.1
42
43
diff view generated by jsdifflib
Deleted patch
1
bdrv_co_block_status() does it for us, we do not need to do it here.
2
1
3
The advantage of not capping *pnum is that bdrv_co_block_status() can
4
cache larger data regions than requested by its caller.
5
6
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
7
Reviewed-by: Eric Blake <eblake@redhat.com>
8
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
9
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
10
Message-Id: <20210812084148.14458-5-hreitz@redhat.com>
11
---
12
block/file-posix.c | 7 ++++---
13
1 file changed, 4 insertions(+), 3 deletions(-)
14
15
diff --git a/block/file-posix.c b/block/file-posix.c
16
index XXXXXXX..XXXXXXX 100644
17
--- a/block/file-posix.c
18
+++ b/block/file-posix.c
19
@@ -XXX,XX +XXX,XX @@ static int find_allocation(BlockDriverState *bs, off_t start,
20
* the specified offset) that are known to be in the same
21
* allocated/unallocated state.
22
*
23
- * 'bytes' is the max value 'pnum' should be set to.
24
+ * 'bytes' is a soft cap for 'pnum'. If the information is free, 'pnum' may
25
+ * well exceed it.
26
*/
27
static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
28
bool want_zero,
29
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
30
} else if (data == offset) {
31
/* On a data extent, compute bytes to the end of the extent,
32
* possibly including a partial sector at EOF. */
33
- *pnum = MIN(bytes, hole - offset);
34
+ *pnum = hole - offset;
35
36
/*
37
* We are not allowed to return partial sectors, though, so
38
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
39
} else {
40
/* On a hole, compute bytes to the beginning of the next extent. */
41
assert(hole == offset);
42
- *pnum = MIN(bytes, data - offset);
43
+ *pnum = data - offset;
44
ret = BDRV_BLOCK_ZERO;
45
}
46
*map = offset;
47
--
48
2.31.1
49
50
diff view generated by jsdifflib
Deleted patch
1
bdrv_co_block_status() does it for us, we do not need to do it here.
2
1
3
The advantage of not capping *pnum is that bdrv_co_block_status() can
4
cache larger data regions than requested by its caller.
5
6
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
7
Reviewed-by: Eric Blake <eblake@redhat.com>
8
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
9
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
10
Message-Id: <20210812084148.14458-6-hreitz@redhat.com>
11
---
12
block/gluster.c | 7 ++++---
13
1 file changed, 4 insertions(+), 3 deletions(-)
14
15
diff --git a/block/gluster.c b/block/gluster.c
16
index XXXXXXX..XXXXXXX 100644
17
--- a/block/gluster.c
18
+++ b/block/gluster.c
19
@@ -XXX,XX +XXX,XX @@ exit:
20
* the specified offset) that are known to be in the same
21
* allocated/unallocated state.
22
*
23
- * 'bytes' is the max value 'pnum' should be set to.
24
+ * 'bytes' is a soft cap for 'pnum'. If the information is free, 'pnum' may
25
+ * well exceed it.
26
*
27
* (Based on raw_co_block_status() from file-posix.c.)
28
*/
29
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn qemu_gluster_co_block_status(BlockDriverState *bs,
30
} else if (data == offset) {
31
/* On a data extent, compute bytes to the end of the extent,
32
* possibly including a partial sector at EOF. */
33
- *pnum = MIN(bytes, hole - offset);
34
+ *pnum = hole - offset;
35
36
/*
37
* We are not allowed to return partial sectors, though, so
38
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn qemu_gluster_co_block_status(BlockDriverState *bs,
39
} else {
40
/* On a hole, compute bytes to the beginning of the next extent. */
41
assert(hole == offset);
42
- *pnum = MIN(bytes, data - offset);
43
+ *pnum = data - offset;
44
ret = BDRV_BLOCK_ZERO;
45
}
46
47
--
48
2.31.1
49
50
diff view generated by jsdifflib
Deleted patch
1
As of recently, pylint complains when `open()` calls are missing an
2
`encoding=` specified. Everything we have should be UTF-8 (and in fact,
3
everything should be UTF-8, period (exceptions apply)), so use that.
4
1
5
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
6
Message-Id: <20210824153540.177128-2-hreitz@redhat.com>
7
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
8
Reviewed-by: John Snow <jsnow@redhat.com>
9
---
10
tests/qemu-iotests/297 | 2 +-
11
tests/qemu-iotests/iotests.py | 8 +++++---
12
2 files changed, 6 insertions(+), 4 deletions(-)
13
14
diff --git a/tests/qemu-iotests/297 b/tests/qemu-iotests/297
15
index XXXXXXX..XXXXXXX 100755
16
--- a/tests/qemu-iotests/297
17
+++ b/tests/qemu-iotests/297
18
@@ -XXX,XX +XXX,XX @@ def is_python_file(filename):
19
if filename.endswith('.py'):
20
return True
21
22
- with open(filename) as f:
23
+ with open(filename, encoding='utf-8') as f:
24
try:
25
first_line = f.readline()
26
return re.match('^#!.*python', first_line) is not None
27
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
28
index XXXXXXX..XXXXXXX 100644
29
--- a/tests/qemu-iotests/iotests.py
30
+++ b/tests/qemu-iotests/iotests.py
31
@@ -XXX,XX +XXX,XX @@ def _post_shutdown(self) -> None:
32
return
33
valgrind_filename = f"{test_dir}/{self._popen.pid}.valgrind"
34
if self.exitcode() == 99:
35
- with open(valgrind_filename) as f:
36
+ with open(valgrind_filename, encoding='utf-8') as f:
37
print(f.read())
38
else:
39
os.remove(valgrind_filename)
40
@@ -XXX,XX +XXX,XX @@ def notrun(reason):
41
# Each test in qemu-iotests has a number ("seq")
42
seq = os.path.basename(sys.argv[0])
43
44
- with open('%s/%s.notrun' % (output_dir, seq), 'w') as outfile:
45
+ with open('%s/%s.notrun' % (output_dir, seq), 'w', encoding='utf-8') \
46
+ as outfile:
47
outfile.write(reason + '\n')
48
logger.warning("%s not run: %s", seq, reason)
49
sys.exit(0)
50
@@ -XXX,XX +XXX,XX @@ def case_notrun(reason):
51
# Each test in qemu-iotests has a number ("seq")
52
seq = os.path.basename(sys.argv[0])
53
54
- with open('%s/%s.casenotrun' % (output_dir, seq), 'a') as outfile:
55
+ with open('%s/%s.casenotrun' % (output_dir, seq), 'a', encoding='utf-8') \
56
+ as outfile:
57
outfile.write(' [case not run] ' + reason + '\n')
58
59
def _verify_image_format(supported_fmts: Sequence[str] = (),
60
--
61
2.31.1
62
63
diff view generated by jsdifflib
1
There are a couple of things pylint takes issue with:
1
Headers used by other subsystems are located in include/. Also add the
2
- The "time" import is unused
2
vhost-user-server and vhost-user-blk-server headers to MAINTAINERS.
3
- The import order (iotests should come last)
4
- get_bitmap_hash() doesn't use @self and so should be a function
5
- Semicolons at the end of some lines
6
- Parentheses after "if"
7
- Some lines are too long (80 characters instead of 79)
8
- inject_test_case()'s @name parameter shadows a top-level @name
9
variable
10
- "lambda self: mc(self)" were equivalent to just "mc", but in
11
inject_test_case(), it is not equivalent, so add a comment and disable
12
the warning locally
13
- Always put two empty lines after a function
14
- f'exec: cat > /dev/null' does not need to be an f-string
15
3
16
Fix them.
4
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Message-id: 20200924151549.913737-13-stefanha@redhat.com
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
---
8
MAINTAINERS | 4 +++-
9
{util => include/qemu}/vhost-user-server.h | 0
10
block/export/vhost-user-blk-server.c | 2 +-
11
util/vhost-user-server.c | 2 +-
12
4 files changed, 5 insertions(+), 3 deletions(-)
13
rename {util => include/qemu}/vhost-user-server.h (100%)
17
14
18
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
15
diff --git a/MAINTAINERS b/MAINTAINERS
19
Message-Id: <20210902094017.32902-4-hreitz@redhat.com>
16
index XXXXXXX..XXXXXXX 100644
20
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
17
--- a/MAINTAINERS
21
---
18
+++ b/MAINTAINERS
22
tests/qemu-iotests/tests/migrate-bitmaps-test | 43 +++++++++++--------
19
@@ -XXX,XX +XXX,XX @@ Vhost-user block device backend server
23
1 file changed, 25 insertions(+), 18 deletions(-)
20
M: Coiby Xu <Coiby.Xu@gmail.com>
21
S: Maintained
22
F: block/export/vhost-user-blk-server.c
23
-F: util/vhost-user-server.c
24
+F: block/export/vhost-user-blk-server.h
25
+F: include/qemu/vhost-user-server.h
26
F: tests/qtest/libqos/vhost-user-blk.c
27
+F: util/vhost-user-server.c
28
29
Replication
30
M: Wen Congyang <wencongyang2@huawei.com>
31
diff --git a/util/vhost-user-server.h b/include/qemu/vhost-user-server.h
32
similarity index 100%
33
rename from util/vhost-user-server.h
34
rename to include/qemu/vhost-user-server.h
35
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
36
index XXXXXXX..XXXXXXX 100644
37
--- a/block/export/vhost-user-blk-server.c
38
+++ b/block/export/vhost-user-blk-server.c
39
@@ -XXX,XX +XXX,XX @@
40
#include "block/block.h"
41
#include "contrib/libvhost-user/libvhost-user.h"
42
#include "standard-headers/linux/virtio_blk.h"
43
-#include "util/vhost-user-server.h"
44
+#include "qemu/vhost-user-server.h"
45
#include "vhost-user-blk-server.h"
46
#include "qapi/error.h"
47
#include "qom/object_interfaces.h"
48
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
49
index XXXXXXX..XXXXXXX 100644
50
--- a/util/vhost-user-server.c
51
+++ b/util/vhost-user-server.c
52
@@ -XXX,XX +XXX,XX @@
53
*/
54
#include "qemu/osdep.h"
55
#include "qemu/main-loop.h"
56
+#include "qemu/vhost-user-server.h"
57
#include "block/aio-wait.h"
58
-#include "vhost-user-server.h"
59
60
/*
61
* Theory of operation:
62
--
63
2.26.2
24
64
25
diff --git a/tests/qemu-iotests/tests/migrate-bitmaps-test b/tests/qemu-iotests/tests/migrate-bitmaps-test
26
index XXXXXXX..XXXXXXX 100755
27
--- a/tests/qemu-iotests/tests/migrate-bitmaps-test
28
+++ b/tests/qemu-iotests/tests/migrate-bitmaps-test
29
@@ -XXX,XX +XXX,XX @@
30
#
31
32
import os
33
-import iotests
34
-import time
35
import itertools
36
import operator
37
import re
38
+import iotests
39
from iotests import qemu_img, qemu_img_create, Timeout
40
41
42
@@ -XXX,XX +XXX,XX @@ mig_cmd = 'exec: cat > ' + mig_file
43
incoming_cmd = 'exec: cat ' + mig_file
44
45
46
+def get_bitmap_hash(vm):
47
+ result = vm.qmp('x-debug-block-dirty-bitmap-sha256',
48
+ node='drive0', name='bitmap0')
49
+ return result['return']['sha256']
50
+
51
+
52
class TestDirtyBitmapMigration(iotests.QMPTestCase):
53
def tearDown(self):
54
self.vm_a.shutdown()
55
@@ -XXX,XX +XXX,XX @@ class TestDirtyBitmapMigration(iotests.QMPTestCase):
56
params['persistent'] = True
57
58
result = vm.qmp('block-dirty-bitmap-add', **params)
59
- self.assert_qmp(result, 'return', {});
60
-
61
- def get_bitmap_hash(self, vm):
62
- result = vm.qmp('x-debug-block-dirty-bitmap-sha256',
63
- node='drive0', name='bitmap0')
64
- return result['return']['sha256']
65
+ self.assert_qmp(result, 'return', {})
66
67
def check_bitmap(self, vm, sha256):
68
result = vm.qmp('x-debug-block-dirty-bitmap-sha256',
69
node='drive0', name='bitmap0')
70
if sha256:
71
- self.assert_qmp(result, 'return/sha256', sha256);
72
+ self.assert_qmp(result, 'return/sha256', sha256)
73
else:
74
self.assert_qmp(result, 'error/desc',
75
- "Dirty bitmap 'bitmap0' not found");
76
+ "Dirty bitmap 'bitmap0' not found")
77
78
def do_test_migration_resume_source(self, persistent, migrate_bitmaps):
79
granularity = 512
80
@@ -XXX,XX +XXX,XX @@ class TestDirtyBitmapMigration(iotests.QMPTestCase):
81
self.add_bitmap(self.vm_a, granularity, persistent)
82
for r in regions:
83
self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % r)
84
- sha256 = self.get_bitmap_hash(self.vm_a)
85
+ sha256 = get_bitmap_hash(self.vm_a)
86
87
result = self.vm_a.qmp('migrate', uri=mig_cmd)
88
while True:
89
@@ -XXX,XX +XXX,XX @@ class TestDirtyBitmapMigration(iotests.QMPTestCase):
90
break
91
while True:
92
result = self.vm_a.qmp('query-status')
93
- if (result['return']['status'] == 'postmigrate'):
94
+ if result['return']['status'] == 'postmigrate':
95
break
96
97
# test that bitmap is still here
98
@@ -XXX,XX +XXX,XX @@ class TestDirtyBitmapMigration(iotests.QMPTestCase):
99
self.add_bitmap(self.vm_a, granularity, persistent)
100
for r in regions:
101
self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % r)
102
- sha256 = self.get_bitmap_hash(self.vm_a)
103
+ sha256 = get_bitmap_hash(self.vm_a)
104
105
if pre_shutdown:
106
self.vm_a.shutdown()
107
@@ -XXX,XX +XXX,XX @@ class TestDirtyBitmapMigration(iotests.QMPTestCase):
108
self.check_bitmap(self.vm_b, sha256 if persistent else False)
109
110
111
-def inject_test_case(klass, name, method, *args, **kwargs):
112
+def inject_test_case(klass, suffix, method, *args, **kwargs):
113
mc = operator.methodcaller(method, *args, **kwargs)
114
- setattr(klass, 'test_' + method + name, lambda self: mc(self))
115
+ # We want to add a function attribute to `klass`, so that it is
116
+ # correctly converted to a method on instantiation. The
117
+ # methodcaller object `mc` is a callable, not a function, so we
118
+ # need the lambda to turn it into a function.
119
+ # pylint: disable=unnecessary-lambda
120
+ setattr(klass, 'test_' + method + suffix, lambda self: mc(self))
121
+
122
123
for cmb in list(itertools.product((True, False), repeat=5)):
124
name = ('_' if cmb[0] else '_not_') + 'persistent_'
125
name += ('_' if cmb[1] else '_not_') + 'migbitmap_'
126
name += '_online' if cmb[2] else '_offline'
127
name += '_shared' if cmb[3] else '_nonshared'
128
- if (cmb[4]):
129
+ if cmb[4]:
130
name += '__pre_shutdown'
131
132
inject_test_case(TestDirtyBitmapMigration, name, 'do_test_migration',
133
@@ -XXX,XX +XXX,XX @@ class TestDirtyBitmapBackingMigration(iotests.QMPTestCase):
134
self.assert_qmp(result, 'return', {})
135
136
# Check that the bitmaps are there
137
- for node in self.vm.qmp('query-named-block-nodes', flat=True)['return']:
138
+ nodes = self.vm.qmp('query-named-block-nodes', flat=True)['return']
139
+ for node in nodes:
140
if 'node0' in node['node-name']:
141
self.assert_qmp(node, 'dirty-bitmaps[0]/name', 'bmap0')
142
143
@@ -XXX,XX +XXX,XX @@ class TestDirtyBitmapBackingMigration(iotests.QMPTestCase):
144
"""
145
Continue the source after migration.
146
"""
147
- result = self.vm.qmp('migrate', uri=f'exec: cat > /dev/null')
148
+ result = self.vm.qmp('migrate', uri='exec: cat > /dev/null')
149
self.assert_qmp(result, 'return', {})
150
151
with Timeout(10, 'Migration timeout'):
152
--
153
2.31.1
154
155
diff view generated by jsdifflib
1
pylint complains that discards1_sha256 and all_discards_sha256 are first
1
Don't compile contrib/libvhost-user/libvhost-user.c again. Instead build
2
set in non-__init__ methods.
2
the static library once and then reuse it throughout QEMU.
3
3
4
These variables are not really class-variables anyway, so let them
4
Also switch from CONFIG_LINUX to CONFIG_VHOST_USER, which is what the
5
instead be returned by start_postcopy(), thus silencing pylint.
5
vhost-user tools (vhost-user-gpu, etc) do.
6
6
7
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
8
Message-id: 20200924151549.913737-14-stefanha@redhat.com
9
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
9
[Added CONFIG_LINUX again because libvhost-user doesn't build on macOS.
10
Message-Id: <20210902094017.32902-3-hreitz@redhat.com>
10
--Stefan]
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
12
---
12
.../tests/migrate-bitmaps-postcopy-test | 13 +++++++------
13
block/export/export.c | 8 ++++----
13
1 file changed, 7 insertions(+), 6 deletions(-)
14
block/export/meson.build | 2 +-
15
contrib/libvhost-user/meson.build | 1 +
16
meson.build | 6 +++++-
17
util/meson.build | 4 +++-
18
5 files changed, 14 insertions(+), 7 deletions(-)
14
19
15
diff --git a/tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test b/tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test
20
diff --git a/block/export/export.c b/block/export/export.c
16
index XXXXXXX..XXXXXXX 100755
21
index XXXXXXX..XXXXXXX 100644
17
--- a/tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test
22
--- a/block/export/export.c
18
+++ b/tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test
23
+++ b/block/export/export.c
19
@@ -XXX,XX +XXX,XX @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
24
@@ -XXX,XX +XXX,XX @@
20
25
#include "sysemu/block-backend.h"
21
result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
26
#include "block/export.h"
22
node='drive0', name='bitmap0')
27
#include "block/nbd.h"
23
- self.discards1_sha256 = result['return']['sha256']
28
-#if CONFIG_LINUX
24
+ discards1_sha256 = result['return']['sha256']
29
-#include "block/export/vhost-user-blk-server.h"
25
30
-#endif
26
# Check, that updating the bitmap by discards works
31
#include "qapi/error.h"
27
- assert self.discards1_sha256 != empty_sha256
32
#include "qapi/qapi-commands-block-export.h"
28
+ assert discards1_sha256 != empty_sha256
33
#include "qapi/qapi-events-block-export.h"
29
34
#include "qemu/id.h"
30
# We want to calculate resulting sha256. Do it in bitmap0, so, disable
35
+#ifdef CONFIG_VHOST_USER
31
# other bitmaps
36
+#include "vhost-user-blk-server.h"
32
@@ -XXX,XX +XXX,XX @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
37
+#endif
33
38
34
result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
39
static const BlockExportDriver *blk_exp_drivers[] = {
35
node='drive0', name='bitmap0')
40
&blk_exp_nbd,
36
- self.all_discards_sha256 = result['return']['sha256']
41
-#if CONFIG_LINUX
37
+ all_discards_sha256 = result['return']['sha256']
42
+#ifdef CONFIG_VHOST_USER
38
43
&blk_exp_vhost_user_blk,
39
# Now, enable some bitmaps, to be updated during migration
44
#endif
40
for i in range(2, nb_bitmaps, 2):
45
};
41
@@ -XXX,XX +XXX,XX @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
46
diff --git a/block/export/meson.build b/block/export/meson.build
42
47
index XXXXXXX..XXXXXXX 100644
43
event_resume = self.vm_b.event_wait('RESUME')
48
--- a/block/export/meson.build
44
self.vm_b_events.append(event_resume)
49
+++ b/block/export/meson.build
45
- return event_resume
50
@@ -XXX,XX +XXX,XX @@
46
+ return (event_resume, discards1_sha256, all_discards_sha256)
51
block_ss.add(files('export.c'))
47
52
-block_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-blk-server.c', '../../contrib/libvhost-user/libvhost-user.c'))
48
def test_postcopy_success(self):
53
+block_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
49
- event_resume = self.start_postcopy()
54
diff --git a/contrib/libvhost-user/meson.build b/contrib/libvhost-user/meson.build
50
+ event_resume, discards1_sha256, all_discards_sha256 = \
55
index XXXXXXX..XXXXXXX 100644
51
+ self.start_postcopy()
56
--- a/contrib/libvhost-user/meson.build
52
57
+++ b/contrib/libvhost-user/meson.build
53
# enabled bitmaps should be updated
58
@@ -XXX,XX +XXX,XX @@
54
apply_discards(self.vm_b, discards2)
59
libvhost_user = static_library('vhost-user',
55
@@ -XXX,XX +XXX,XX @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
60
files('libvhost-user.c', 'libvhost-user-glib.c'),
56
for i in range(0, nb_bitmaps, 5):
61
build_by_default: false)
57
result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
62
+vhost_user = declare_dependency(link_with: libvhost_user)
58
node='drive0', name='bitmap{}'.format(i))
63
diff --git a/meson.build b/meson.build
59
- sha = self.discards1_sha256 if i % 2 else self.all_discards_sha256
64
index XXXXXXX..XXXXXXX 100644
60
+ sha = discards1_sha256 if i % 2 else all_discards_sha256
65
--- a/meson.build
61
self.assert_qmp(result, 'return/sha256', sha)
66
+++ b/meson.build
62
67
@@ -XXX,XX +XXX,XX @@ trace_events_subdirs += [
63
def test_early_shutdown_destination(self):
68
'util',
69
]
70
71
+vhost_user = not_found
72
+if 'CONFIG_VHOST_USER' in config_host
73
+ subdir('contrib/libvhost-user')
74
+endif
75
+
76
subdir('qapi')
77
subdir('qobject')
78
subdir('stubs')
79
@@ -XXX,XX +XXX,XX @@ if have_tools
80
install: true)
81
82
if 'CONFIG_VHOST_USER' in config_host
83
- subdir('contrib/libvhost-user')
84
subdir('contrib/vhost-user-blk')
85
subdir('contrib/vhost-user-gpu')
86
subdir('contrib/vhost-user-input')
87
diff --git a/util/meson.build b/util/meson.build
88
index XXXXXXX..XXXXXXX 100644
89
--- a/util/meson.build
90
+++ b/util/meson.build
91
@@ -XXX,XX +XXX,XX @@ if have_block
92
util_ss.add(files('main-loop.c'))
93
util_ss.add(files('nvdimm-utils.c'))
94
util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
95
- util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
96
+ util_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: [
97
+ files('vhost-user-server.c'), vhost_user
98
+ ])
99
util_ss.add(files('block-helpers.c'))
100
util_ss.add(files('qemu-coroutine-sleep.c'))
101
util_ss.add(files('qemu-co-shared-resource.c'))
64
--
102
--
65
2.31.1
103
2.26.2
66
104
67
diff view generated by jsdifflib
1
pylint proposes using `[]` instead of `list()` and `{}` instead of
1
Introduce libblkdev.fa to avoid recompiling blockdev_ss twice.
2
`dict()`, because it is faster. That seems simple enough, so heed its
3
advice.
4
2
5
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
3
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
6
Message-Id: <20210824153540.177128-3-hreitz@redhat.com>
4
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
7
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Message-id: 20200929125516.186715-3-stefanha@redhat.com
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
---
8
---
9
tests/qemu-iotests/iotests.py | 4 ++--
9
meson.build | 12 ++++++++++--
10
1 file changed, 2 insertions(+), 2 deletions(-)
10
storage-daemon/meson.build | 3 +--
11
2 files changed, 11 insertions(+), 4 deletions(-)
11
12
12
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
13
diff --git a/meson.build b/meson.build
13
index XXXXXXX..XXXXXXX 100644
14
index XXXXXXX..XXXXXXX 100644
14
--- a/tests/qemu-iotests/iotests.py
15
--- a/meson.build
15
+++ b/tests/qemu-iotests/iotests.py
16
+++ b/meson.build
16
@@ -XXX,XX +XXX,XX @@ def hmp_qemu_io(self, drive: str, cmd: str,
17
@@ -XXX,XX +XXX,XX @@ blockdev_ss.add(files(
17
18
# os-win32.c does not
18
def flatten_qmp_object(self, obj, output=None, basestr=''):
19
blockdev_ss.add(when: 'CONFIG_POSIX', if_true: files('os-posix.c'))
19
if output is None:
20
softmmu_ss.add(when: 'CONFIG_WIN32', if_true: [files('os-win32.c')])
20
- output = dict()
21
-softmmu_ss.add_all(blockdev_ss)
21
+ output = {}
22
22
if isinstance(obj, list):
23
common_ss.add(files('cpus-common.c'))
23
for i, item in enumerate(obj):
24
24
self.flatten_qmp_object(item, output, basestr + str(i) + '.')
25
@@ -XXX,XX +XXX,XX @@ block = declare_dependency(link_whole: [libblock],
25
@@ -XXX,XX +XXX,XX @@ def flatten_qmp_object(self, obj, output=None, basestr=''):
26
link_args: '@block.syms',
26
27
dependencies: [crypto, io])
27
def qmp_to_opts(self, obj):
28
28
obj = self.flatten_qmp_object(obj)
29
+blockdev_ss = blockdev_ss.apply(config_host, strict: false)
29
- output_list = list()
30
+libblockdev = static_library('blockdev', blockdev_ss.sources() + genh,
30
+ output_list = []
31
+ dependencies: blockdev_ss.dependencies(),
31
for key in obj:
32
+ name_suffix: 'fa',
32
output_list += [key + '=' + obj[key]]
33
+ build_by_default: false)
33
return ','.join(output_list)
34
+
35
+blockdev = declare_dependency(link_whole: [libblockdev],
36
+ dependencies: [block])
37
+
38
qmp_ss = qmp_ss.apply(config_host, strict: false)
39
libqmp = static_library('qmp', qmp_ss.sources() + genh,
40
dependencies: qmp_ss.dependencies(),
41
@@ -XXX,XX +XXX,XX @@ foreach m : block_mods + softmmu_mods
42
install_dir: config_host['qemu_moddir'])
43
endforeach
44
45
-softmmu_ss.add(authz, block, chardev, crypto, io, qmp)
46
+softmmu_ss.add(authz, blockdev, chardev, crypto, io, qmp)
47
common_ss.add(qom, qemuutil)
48
49
common_ss.add_all(when: 'CONFIG_SOFTMMU', if_true: [softmmu_ss])
50
diff --git a/storage-daemon/meson.build b/storage-daemon/meson.build
51
index XXXXXXX..XXXXXXX 100644
52
--- a/storage-daemon/meson.build
53
+++ b/storage-daemon/meson.build
54
@@ -XXX,XX +XXX,XX @@
55
qsd_ss = ss.source_set()
56
qsd_ss.add(files('qemu-storage-daemon.c'))
57
-qsd_ss.add(block, chardev, qmp, qom, qemuutil)
58
-qsd_ss.add_all(blockdev_ss)
59
+qsd_ss.add(blockdev, chardev, qmp, qom, qemuutil)
60
61
subdir('qapi')
62
34
--
63
--
35
2.31.1
64
2.26.2
36
65
37
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
Block exports are used by softmmu, qemu-storage-daemon, and qemu-nbd.
2
They are not used by other programs and are not otherwise needed in
3
libblock.
2
4
3
Add a simple test which tries to run migration during backup.
5
Undo the recent move of blockdev-nbd.c from blockdev_ss into block_ss.
4
bdrv_inactivate_all() should fail. But due to bug (see next commit with
6
Since bdrv_close_all() (libblock) calls blk_exp_close_all()
5
fix) it doesn't, nodes are inactivated and continued backup crashes
7
(libblockdev) a stub function is required..
6
on assertion "assert(!(bs->open_flags & BDRV_O_INACTIVE));" in
7
bdrv_co_write_req_prepare().
8
8
9
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
9
Make qemu-nbd.c use signal handling utility functions instead of
10
Message-Id: <20210911120027.8063-2-vsementsov@virtuozzo.com>
10
duplicating the code. This helps because os-posix.c is in libblockdev
11
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
11
and it depends on a qemu_system_killed() symbol that qemu-nbd.c lacks.
12
Once we use the signal handling utility functions we also end up
13
providing the necessary symbol.
14
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
16
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
17
Reviewed-by: Eric Blake <eblake@redhat.com>
18
Message-id: 20200929125516.186715-4-stefanha@redhat.com
19
[Fixed s/ndb/nbd/ typo in commit description as suggested by Eric Blake
20
--Stefan]
21
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
22
---
13
.../qemu-iotests/tests/migrate-during-backup | 97 +++++++++++++++++++
23
qemu-nbd.c | 21 ++++++++-------------
14
.../tests/migrate-during-backup.out | 5 +
24
stubs/blk-exp-close-all.c | 7 +++++++
15
2 files changed, 102 insertions(+)
25
block/export/meson.build | 4 ++--
16
create mode 100755 tests/qemu-iotests/tests/migrate-during-backup
26
meson.build | 4 ++--
17
create mode 100644 tests/qemu-iotests/tests/migrate-during-backup.out
27
nbd/meson.build | 2 ++
28
stubs/meson.build | 1 +
29
6 files changed, 22 insertions(+), 17 deletions(-)
30
create mode 100644 stubs/blk-exp-close-all.c
18
31
19
diff --git a/tests/qemu-iotests/tests/migrate-during-backup b/tests/qemu-iotests/tests/migrate-during-backup
32
diff --git a/qemu-nbd.c b/qemu-nbd.c
20
new file mode 100755
33
index XXXXXXX..XXXXXXX 100644
21
index XXXXXXX..XXXXXXX
34
--- a/qemu-nbd.c
22
--- /dev/null
35
+++ b/qemu-nbd.c
23
+++ b/tests/qemu-iotests/tests/migrate-during-backup
24
@@ -XXX,XX +XXX,XX @@
36
@@ -XXX,XX +XXX,XX @@
25
+#!/usr/bin/env python3
37
#include "qapi/error.h"
26
+# group: migration disabled
38
#include "qemu/cutils.h"
27
+#
39
#include "sysemu/block-backend.h"
28
+# Copyright (c) 2021 Virtuozzo International GmbH
40
+#include "sysemu/runstate.h" /* for qemu_system_killed() prototype */
29
+#
41
#include "block/block_int.h"
30
+# This program is free software; you can redistribute it and/or modify
42
#include "block/nbd.h"
31
+# it under the terms of the GNU General Public License as published by
43
#include "qemu/main-loop.h"
32
+# the Free Software Foundation; either version 2 of the License, or
44
@@ -XXX,XX +XXX,XX @@ QEMU_COPYRIGHT "\n"
33
+# (at your option) any later version.
45
}
34
+#
46
35
+# This program is distributed in the hope that it will be useful,
47
#ifdef CONFIG_POSIX
36
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
48
-static void termsig_handler(int signum)
37
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
49
+/*
38
+# GNU General Public License for more details.
50
+ * The client thread uses SIGTERM to interrupt the server. A signal
39
+#
51
+ * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
40
+# You should have received a copy of the GNU General Public License
52
+ */
41
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
53
+void qemu_system_killed(int signum, pid_t pid)
42
+#
54
{
43
+
55
qatomic_cmpxchg(&state, RUNNING, TERMINATE);
44
+import os
56
qemu_notify_event();
45
+import iotests
57
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
46
+from iotests import qemu_img_create, qemu_io
58
BlockExportOptions *export_opts;
47
+
59
48
+
60
#ifdef CONFIG_POSIX
49
+disk_a = os.path.join(iotests.test_dir, 'disk_a')
61
- /*
50
+disk_b = os.path.join(iotests.test_dir, 'disk_b')
62
- * Exit gracefully on various signals, which includes SIGTERM used
51
+size = '1M'
63
- * by 'qemu-nbd -v -c'.
52
+mig_file = os.path.join(iotests.test_dir, 'mig_file')
64
- */
53
+mig_cmd = 'exec: cat > ' + mig_file
65
- struct sigaction sa_sigterm;
54
+
66
- memset(&sa_sigterm, 0, sizeof(sa_sigterm));
55
+
67
- sa_sigterm.sa_handler = termsig_handler;
56
+class TestMigrateDuringBackup(iotests.QMPTestCase):
68
- sigaction(SIGTERM, &sa_sigterm, NULL);
57
+ def tearDown(self):
69
- sigaction(SIGINT, &sa_sigterm, NULL);
58
+ self.vm.shutdown()
70
- sigaction(SIGHUP, &sa_sigterm, NULL);
59
+ os.remove(disk_a)
71
-
60
+ os.remove(disk_b)
72
- signal(SIGPIPE, SIG_IGN);
61
+ os.remove(mig_file)
73
+ os_setup_early_signal_handling();
62
+
74
+ os_setup_signal_handling();
63
+ def setUp(self):
75
#endif
64
+ qemu_img_create('-f', iotests.imgfmt, disk_a, size)
76
65
+ qemu_img_create('-f', iotests.imgfmt, disk_b, size)
77
socket_init();
66
+ qemu_io('-c', f'write 0 {size}', disk_a)
78
diff --git a/stubs/blk-exp-close-all.c b/stubs/blk-exp-close-all.c
67
+
68
+ self.vm = iotests.VM().add_drive(disk_a)
69
+ self.vm.launch()
70
+ result = self.vm.qmp('blockdev-add', {
71
+ 'node-name': 'target',
72
+ 'driver': iotests.imgfmt,
73
+ 'file': {
74
+ 'driver': 'file',
75
+ 'filename': disk_b
76
+ }
77
+ })
78
+ self.assert_qmp(result, 'return', {})
79
+
80
+ def test_migrate(self):
81
+ result = self.vm.qmp('blockdev-backup', device='drive0',
82
+ target='target', sync='full',
83
+ speed=1, x_perf={
84
+ 'max-workers': 1,
85
+ 'max-chunk': 64 * 1024
86
+ })
87
+ self.assert_qmp(result, 'return', {})
88
+
89
+ result = self.vm.qmp('job-pause', id='drive0')
90
+ self.assert_qmp(result, 'return', {})
91
+
92
+ result = self.vm.qmp('migrate-set-capabilities',
93
+ capabilities=[{'capability': 'events',
94
+ 'state': True}])
95
+ self.assert_qmp(result, 'return', {})
96
+ result = self.vm.qmp('migrate', uri=mig_cmd)
97
+ self.assert_qmp(result, 'return', {})
98
+
99
+ e = self.vm.events_wait((('MIGRATION',
100
+ {'data': {'status': 'completed'}}),
101
+ ('MIGRATION',
102
+ {'data': {'status': 'failed'}})))
103
+
104
+ # Don't assert that e is 'failed' now: this way we'll miss
105
+ # possible crash when backup continues :)
106
+
107
+ result = self.vm.qmp('block-job-set-speed', device='drive0',
108
+ speed=0)
109
+ self.assert_qmp(result, 'return', {})
110
+ result = self.vm.qmp('job-resume', id='drive0')
111
+ self.assert_qmp(result, 'return', {})
112
+
113
+ # For future: if something changes so that both migration
114
+ # and backup pass, let's not miss that moment, as it may
115
+ # be a bug as well as improvement.
116
+ self.assert_qmp(e, 'data/status', 'failed')
117
+
118
+
119
+if __name__ == '__main__':
120
+ iotests.main(supported_fmts=['qcow2'],
121
+ supported_protocols=['file'])
122
diff --git a/tests/qemu-iotests/tests/migrate-during-backup.out b/tests/qemu-iotests/tests/migrate-during-backup.out
123
new file mode 100644
79
new file mode 100644
124
index XXXXXXX..XXXXXXX
80
index XXXXXXX..XXXXXXX
125
--- /dev/null
81
--- /dev/null
126
+++ b/tests/qemu-iotests/tests/migrate-during-backup.out
82
+++ b/stubs/blk-exp-close-all.c
127
@@ -XXX,XX +XXX,XX @@
83
@@ -XXX,XX +XXX,XX @@
128
+.
84
+#include "qemu/osdep.h"
129
+----------------------------------------------------------------------
85
+#include "block/export.h"
130
+Ran 1 tests
131
+
86
+
132
+OK
87
+/* Only used in programs that support block exports (libblockdev.fa) */
88
+void blk_exp_close_all(void)
89
+{
90
+}
91
diff --git a/block/export/meson.build b/block/export/meson.build
92
index XXXXXXX..XXXXXXX 100644
93
--- a/block/export/meson.build
94
+++ b/block/export/meson.build
95
@@ -XXX,XX +XXX,XX @@
96
-block_ss.add(files('export.c'))
97
-block_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
98
+blockdev_ss.add(files('export.c'))
99
+blockdev_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
100
diff --git a/meson.build b/meson.build
101
index XXXXXXX..XXXXXXX 100644
102
--- a/meson.build
103
+++ b/meson.build
104
@@ -XXX,XX +XXX,XX @@ subdir('dump')
105
106
block_ss.add(files(
107
'block.c',
108
- 'blockdev-nbd.c',
109
'blockjob.c',
110
'job.c',
111
'qemu-io-cmds.c',
112
@@ -XXX,XX +XXX,XX @@ subdir('block')
113
114
blockdev_ss.add(files(
115
'blockdev.c',
116
+ 'blockdev-nbd.c',
117
'iothread.c',
118
'job-qmp.c',
119
))
120
@@ -XXX,XX +XXX,XX @@ if have_tools
121
qemu_io = executable('qemu-io', files('qemu-io.c'),
122
dependencies: [block, qemuutil], install: true)
123
qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'),
124
- dependencies: [block, qemuutil], install: true)
125
+ dependencies: [blockdev, qemuutil], install: true)
126
127
subdir('storage-daemon')
128
subdir('contrib/rdmacm-mux')
129
diff --git a/nbd/meson.build b/nbd/meson.build
130
index XXXXXXX..XXXXXXX 100644
131
--- a/nbd/meson.build
132
+++ b/nbd/meson.build
133
@@ -XXX,XX +XXX,XX @@
134
block_ss.add(files(
135
'client.c',
136
'common.c',
137
+))
138
+blockdev_ss.add(files(
139
'server.c',
140
))
141
diff --git a/stubs/meson.build b/stubs/meson.build
142
index XXXXXXX..XXXXXXX 100644
143
--- a/stubs/meson.build
144
+++ b/stubs/meson.build
145
@@ -XXX,XX +XXX,XX @@
146
stub_ss.add(files('arch_type.c'))
147
stub_ss.add(files('bdrv-next-monitor-owned.c'))
148
stub_ss.add(files('blk-commit-all.c'))
149
+stub_ss.add(files('blk-exp-close-all.c'))
150
stub_ss.add(files('blockdev-close-all-bdrv-states.c'))
151
stub_ss.add(files('change-state-handler.c'))
152
stub_ss.add(files('cmos.c'))
133
--
153
--
134
2.31.1
154
2.26.2
135
155
136
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
Make it possible to specify the iothread where the export will run. By
2
default the block node can be moved to other AioContexts later and the
3
export will follow. The fixed-iothread option forces strict behavior
4
that prevents changing AioContext while the export is active. See the
5
QAPI docs for details.
2
6
3
- use g_autofree for l1_table
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
- better name for size in bytes variable
8
Message-id: 20200929125516.186715-5-stefanha@redhat.com
5
- reduce code blocks nesting
9
[Fix stray '#' character in block-export.json and add missing "(since:
6
- whitespaces, braces, newlines
10
5.2)" as suggested by Eric Blake.
11
--Stefan]
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
---
14
qapi/block-export.json | 11 ++++++++++
15
block/export/export.c | 31 +++++++++++++++++++++++++++-
16
block/export/vhost-user-blk-server.c | 5 ++++-
17
nbd/server.c | 2 --
18
4 files changed, 45 insertions(+), 4 deletions(-)
7
19
8
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
20
diff --git a/qapi/block-export.json b/qapi/block-export.json
9
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
10
Message-Id: <20210914122454.141075-9-vsementsov@virtuozzo.com>
11
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
12
---
13
block/qcow2-refcount.c | 98 +++++++++++++++++++++---------------------
14
1 file changed, 50 insertions(+), 48 deletions(-)
15
16
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
17
index XXXXXXX..XXXXXXX 100644
21
index XXXXXXX..XXXXXXX 100644
18
--- a/block/qcow2-refcount.c
22
--- a/qapi/block-export.json
19
+++ b/block/qcow2-refcount.c
23
+++ b/qapi/block-export.json
20
@@ -XXX,XX +XXX,XX @@ static int check_refcounts_l1(BlockDriverState *bs,
24
@@ -XXX,XX +XXX,XX @@
21
int flags, BdrvCheckMode fix, bool active)
25
# export before completion is signalled. (since: 5.2;
26
# default: false)
27
#
28
+# @iothread: The name of the iothread object where the export will run. The
29
+# default is to use the thread currently associated with the
30
+# block node. (since: 5.2)
31
+#
32
+# @fixed-iothread: True prevents the block node from being moved to another
33
+# thread while the export is active. If true and @iothread is
34
+# given, export creation fails if the block node cannot be
35
+# moved to the iothread. The default is false. (since: 5.2)
36
+#
37
# Since: 4.2
38
##
39
{ 'union': 'BlockExportOptions',
40
'base': { 'type': 'BlockExportType',
41
'id': 'str',
42
+     '*fixed-iothread': 'bool',
43
+     '*iothread': 'str',
44
'node-name': 'str',
45
'*writable': 'bool',
46
'*writethrough': 'bool' },
47
diff --git a/block/export/export.c b/block/export/export.c
48
index XXXXXXX..XXXXXXX 100644
49
--- a/block/export/export.c
50
+++ b/block/export/export.c
51
@@ -XXX,XX +XXX,XX @@
52
53
#include "block/block.h"
54
#include "sysemu/block-backend.h"
55
+#include "sysemu/iothread.h"
56
#include "block/export.h"
57
#include "block/nbd.h"
58
#include "qapi/error.h"
59
@@ -XXX,XX +XXX,XX @@ static const BlockExportDriver *blk_exp_find_driver(BlockExportType type)
60
61
BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
22
{
62
{
23
BDRVQcow2State *s = bs->opaque;
63
+ bool fixed_iothread = export->has_fixed_iothread && export->fixed_iothread;
24
- uint64_t *l1_table = NULL, l2_offset, l1_size2;
64
const BlockExportDriver *drv;
25
+ size_t l1_size_bytes = l1_size * L1E_SIZE;
65
BlockExport *exp = NULL;
26
+ g_autofree uint64_t *l1_table = NULL;
66
BlockDriverState *bs;
27
+ uint64_t l2_offset;
67
- BlockBackend *blk;
28
int i, ret;
68
+ BlockBackend *blk = NULL;
29
69
AioContext *ctx;
30
- l1_size2 = l1_size * L1E_SIZE;
70
uint64_t perm;
31
+ if (!l1_size) {
71
int ret;
32
+ return 0;
72
@@ -XXX,XX +XXX,XX @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
33
+ }
73
ctx = bdrv_get_aio_context(bs);
34
74
aio_context_acquire(ctx);
35
/* Mark L1 table as used */
75
36
ret = qcow2_inc_refcounts_imrt(bs, res, refcount_table, refcount_table_size,
76
+ if (export->has_iothread) {
37
- l1_table_offset, l1_size2);
77
+ IOThread *iothread;
38
+ l1_table_offset, l1_size_bytes);
78
+ AioContext *new_ctx;
39
if (ret < 0) {
79
+
40
- goto fail;
80
+ iothread = iothread_by_id(export->iothread);
41
+ return ret;
81
+ if (!iothread) {
82
+ error_setg(errp, "iothread \"%s\" not found", export->iothread);
83
+ goto fail;
84
+ }
85
+
86
+ new_ctx = iothread_get_aio_context(iothread);
87
+
88
+ ret = bdrv_try_set_aio_context(bs, new_ctx, errp);
89
+ if (ret == 0) {
90
+ aio_context_release(ctx);
91
+ aio_context_acquire(new_ctx);
92
+ ctx = new_ctx;
93
+ } else if (fixed_iothread) {
94
+ goto fail;
95
+ }
42
+ }
96
+ }
43
+
97
+
44
+ l1_table = g_try_malloc(l1_size_bytes);
98
/*
45
+ if (l1_table == NULL) {
99
* Block exports are used for non-shared storage migration. Make sure
46
+ res->check_errors++;
100
* that BDRV_O_INACTIVE is cleared and the image is ready for write
47
+ return -ENOMEM;
101
@@ -XXX,XX +XXX,XX @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
48
}
102
}
49
103
50
/* Read L1 table entries from disk */
104
blk = blk_new(ctx, perm, BLK_PERM_ALL);
51
- if (l1_size2 > 0) {
105
+
52
- l1_table = g_try_malloc(l1_size2);
106
+ if (!fixed_iothread) {
53
- if (l1_table == NULL) {
107
+ blk_set_allow_aio_context_change(blk, true);
54
- ret = -ENOMEM;
55
- res->check_errors++;
56
- goto fail;
57
- }
58
- ret = bdrv_pread(bs->file, l1_table_offset, l1_table, l1_size2);
59
- if (ret < 0) {
60
- fprintf(stderr, "ERROR: I/O error in check_refcounts_l1\n");
61
- res->check_errors++;
62
- goto fail;
63
- }
64
- for(i = 0;i < l1_size; i++)
65
- be64_to_cpus(&l1_table[i]);
66
+ ret = bdrv_pread(bs->file, l1_table_offset, l1_table, l1_size_bytes);
67
+ if (ret < 0) {
68
+ fprintf(stderr, "ERROR: I/O error in check_refcounts_l1\n");
69
+ res->check_errors++;
70
+ return ret;
71
+ }
108
+ }
72
+
109
+
73
+ for (i = 0; i < l1_size; i++) {
110
ret = blk_insert_bs(blk, bs, errp);
74
+ be64_to_cpus(&l1_table[i]);
111
if (ret < 0) {
112
goto fail;
113
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
114
index XXXXXXX..XXXXXXX 100644
115
--- a/block/export/vhost-user-blk-server.c
116
+++ b/block/export/vhost-user-blk-server.c
117
@@ -XXX,XX +XXX,XX @@ static const VuDevIface vu_blk_iface = {
118
static void blk_aio_attached(AioContext *ctx, void *opaque)
119
{
120
VuBlkExport *vexp = opaque;
121
+
122
+ vexp->export.ctx = ctx;
123
vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
124
}
125
126
static void blk_aio_detach(void *opaque)
127
{
128
VuBlkExport *vexp = opaque;
129
+
130
vhost_user_server_detach_aio_context(&vexp->vu_server);
131
+ vexp->export.ctx = NULL;
132
}
133
134
static void
135
@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
136
vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
137
logical_block_size);
138
139
- blk_set_allow_aio_context_change(exp->blk, true);
140
blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
141
vexp);
142
143
diff --git a/nbd/server.c b/nbd/server.c
144
index XXXXXXX..XXXXXXX 100644
145
--- a/nbd/server.c
146
+++ b/nbd/server.c
147
@@ -XXX,XX +XXX,XX @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
148
return ret;
75
}
149
}
76
150
77
/* Do the actual checks */
151
- blk_set_allow_aio_context_change(blk, true);
78
- for(i = 0; i < l1_size; i++) {
152
-
79
- l2_offset = l1_table[i];
153
QTAILQ_INIT(&exp->clients);
80
- if (l2_offset) {
154
exp->name = g_strdup(arg->name);
81
- /* Mark L2 table as used */
155
exp->description = g_strdup(arg->description);
82
- l2_offset &= L1E_OFFSET_MASK;
83
- ret = qcow2_inc_refcounts_imrt(bs, res,
84
- refcount_table, refcount_table_size,
85
- l2_offset, s->cluster_size);
86
- if (ret < 0) {
87
- goto fail;
88
- }
89
+ for (i = 0; i < l1_size; i++) {
90
+ if (!l1_table[i]) {
91
+ continue;
92
+ }
93
94
- /* L2 tables are cluster aligned */
95
- if (offset_into_cluster(s, l2_offset)) {
96
- fprintf(stderr, "ERROR l2_offset=%" PRIx64 ": Table is not "
97
- "cluster aligned; L1 entry corrupted\n", l2_offset);
98
- res->corruptions++;
99
- }
100
+ l2_offset = l1_table[i] & L1E_OFFSET_MASK;
101
102
- /* Process and check L2 entries */
103
- ret = check_refcounts_l2(bs, res, refcount_table,
104
- refcount_table_size, l2_offset, flags,
105
- fix, active);
106
- if (ret < 0) {
107
- goto fail;
108
- }
109
+ /* Mark L2 table as used */
110
+ ret = qcow2_inc_refcounts_imrt(bs, res,
111
+ refcount_table, refcount_table_size,
112
+ l2_offset, s->cluster_size);
113
+ if (ret < 0) {
114
+ return ret;
115
+ }
116
+
117
+ /* L2 tables are cluster aligned */
118
+ if (offset_into_cluster(s, l2_offset)) {
119
+ fprintf(stderr, "ERROR l2_offset=%" PRIx64 ": Table is not "
120
+ "cluster aligned; L1 entry corrupted\n", l2_offset);
121
+ res->corruptions++;
122
+ }
123
+
124
+ /* Process and check L2 entries */
125
+ ret = check_refcounts_l2(bs, res, refcount_table,
126
+ refcount_table_size, l2_offset, flags,
127
+ fix, active);
128
+ if (ret < 0) {
129
+ return ret;
130
}
131
}
132
- g_free(l1_table);
133
- return 0;
134
135
-fail:
136
- g_free(l1_table);
137
- return ret;
138
+ return 0;
139
}
140
141
/*
142
--
156
--
143
2.31.1
157
2.26.2
144
158
145
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
Allow the number of queues to be configured using --export
2
vhost-user-blk,num-queues=N. This setting should match the QEMU --device
3
vhost-user-blk-pci,num-queues=N setting but QEMU vhost-user-blk.c lowers
4
its own value if the vhost-user-blk backend offers fewer queues than
5
QEMU.
2
6
3
We'll reuse the function to fix wrong L2 entry bitmap. Support it now.
7
The vhost-user-blk-server.c code is already capable of multi-queue. All
8
virtqueue processing runs in the same AioContext. No new locking is
9
needed.
4
10
5
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
11
Add the num-queues=N option and set the VIRTIO_BLK_F_MQ feature bit.
6
Reviewed-by: Eric Blake <eblake@redhat.com>
12
Note that the feature bit only announces the presence of the num_queues
7
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
13
configuration space field. It does not promise that there is more than 1
8
Message-Id: <20210914122454.141075-6-vsementsov@virtuozzo.com>
14
virtqueue, so we can set it unconditionally.
9
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
15
16
I tested multi-queue by running a random read fio test with numjobs=4 on
17
an -smp 4 guest. After the benchmark finished the guest /proc/interrupts
18
file showed activity on all 4 virtio-blk MSI-X. The /sys/block/vda/mq/
19
directory shows that Linux blk-mq has 4 queues configured.
20
21
An automated test is included in the next commit.
22
23
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
24
Acked-by: Markus Armbruster <armbru@redhat.com>
25
Message-id: 20201001144604.559733-2-stefanha@redhat.com
26
[Fixed accidental tab characters as suggested by Markus Armbruster
27
--Stefan]
28
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
29
---
11
block/qcow2-refcount.c | 18 +++++++++++++++---
30
qapi/block-export.json | 10 +++++++---
12
1 file changed, 15 insertions(+), 3 deletions(-)
31
block/export/vhost-user-blk-server.c | 24 ++++++++++++++++++------
32
2 files changed, 25 insertions(+), 9 deletions(-)
13
33
14
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
34
diff --git a/qapi/block-export.json b/qapi/block-export.json
15
index XXXXXXX..XXXXXXX 100644
35
index XXXXXXX..XXXXXXX 100644
16
--- a/block/qcow2-refcount.c
36
--- a/qapi/block-export.json
17
+++ b/block/qcow2-refcount.c
37
+++ b/qapi/block-export.json
18
@@ -XXX,XX +XXX,XX @@ enum {
38
@@ -XXX,XX +XXX,XX @@
39
# SocketAddress types are supported. Passed fds must be UNIX domain
40
# sockets.
41
# @logical-block-size: Logical block size in bytes. Defaults to 512 bytes.
42
+# @num-queues: Number of request virtqueues. Must be greater than 0. Defaults
43
+# to 1.
44
#
45
# Since: 5.2
46
##
47
{ 'struct': 'BlockExportOptionsVhostUserBlk',
48
- 'data': { 'addr': 'SocketAddress', '*logical-block-size': 'size' } }
49
+ 'data': { 'addr': 'SocketAddress',
50
+     '*logical-block-size': 'size',
51
+ '*num-queues': 'uint16'} }
52
53
##
54
# @NbdServerAddOptions:
55
@@ -XXX,XX +XXX,XX @@
56
{ 'union': 'BlockExportOptions',
57
'base': { 'type': 'BlockExportType',
58
'id': 'str',
59
-     '*fixed-iothread': 'bool',
60
-     '*iothread': 'str',
61
+ '*fixed-iothread': 'bool',
62
+ '*iothread': 'str',
63
'node-name': 'str',
64
'*writable': 'bool',
65
'*writethrough': 'bool' },
66
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
67
index XXXXXXX..XXXXXXX 100644
68
--- a/block/export/vhost-user-blk-server.c
69
+++ b/block/export/vhost-user-blk-server.c
70
@@ -XXX,XX +XXX,XX @@
71
#include "util/block-helpers.h"
72
73
enum {
74
- VHOST_USER_BLK_MAX_QUEUES = 1,
75
+ VHOST_USER_BLK_NUM_QUEUES_DEFAULT = 1,
19
};
76
};
20
77
struct virtio_blk_inhdr {
21
/*
78
unsigned char status;
22
- * Fix L2 entry by making it QCOW2_CLUSTER_ZERO_PLAIN.
79
@@ -XXX,XX +XXX,XX @@ static uint64_t vu_blk_get_features(VuDev *dev)
23
+ * Fix L2 entry by making it QCOW2_CLUSTER_ZERO_PLAIN (or making all its present
80
1ull << VIRTIO_BLK_F_DISCARD |
24
+ * subclusters QCOW2_SUBCLUSTER_ZERO_PLAIN).
81
1ull << VIRTIO_BLK_F_WRITE_ZEROES |
25
*
82
1ull << VIRTIO_BLK_F_CONFIG_WCE |
26
* This function decrements res->corruptions on success, so the caller is
83
+ 1ull << VIRTIO_BLK_F_MQ |
27
* responsible to increment res->corruptions prior to the call.
84
1ull << VIRTIO_F_VERSION_1 |
28
@@ -XXX,XX +XXX,XX @@ static int fix_l2_entry_by_zero(BlockDriverState *bs, BdrvCheckResult *res,
85
1ull << VIRTIO_RING_F_INDIRECT_DESC |
29
int idx = l2_index * (l2_entry_size(s) / sizeof(uint64_t));
86
1ull << VIRTIO_RING_F_EVENT_IDX |
30
uint64_t l2e_offset = l2_offset + (uint64_t)l2_index * l2_entry_size(s);
87
@@ -XXX,XX +XXX,XX @@ static void blk_aio_detach(void *opaque)
31
int ign = active ? QCOW2_OL_ACTIVE_L2 : QCOW2_OL_INACTIVE_L2;
88
32
- uint64_t l2_entry = has_subclusters(s) ? 0 : QCOW_OFLAG_ZERO;
89
static void
33
90
vu_blk_initialize_config(BlockDriverState *bs,
34
- set_l2_entry(s, l2_table, l2_index, l2_entry);
91
- struct virtio_blk_config *config, uint32_t blk_size)
35
+ if (has_subclusters(s)) {
92
+ struct virtio_blk_config *config,
36
+ uint64_t l2_bitmap = get_l2_bitmap(s, l2_table, l2_index);
93
+ uint32_t blk_size,
94
+ uint16_t num_queues)
95
{
96
config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
97
config->blk_size = blk_size;
98
@@ -XXX,XX +XXX,XX @@ vu_blk_initialize_config(BlockDriverState *bs,
99
config->seg_max = 128 - 2;
100
config->min_io_size = 1;
101
config->opt_io_size = 1;
102
- config->num_queues = VHOST_USER_BLK_MAX_QUEUES;
103
+ config->num_queues = num_queues;
104
config->max_discard_sectors = 32768;
105
config->max_discard_seg = 1;
106
config->discard_sector_alignment = config->blk_size >> 9;
107
@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
108
BlockExportOptionsVhostUserBlk *vu_opts = &opts->u.vhost_user_blk;
109
Error *local_err = NULL;
110
uint64_t logical_block_size;
111
+ uint16_t num_queues = VHOST_USER_BLK_NUM_QUEUES_DEFAULT;
112
113
vexp->writable = opts->writable;
114
vexp->blkcfg.wce = 0;
115
@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
116
}
117
vexp->blk_size = logical_block_size;
118
blk_set_guest_block_size(exp->blk, logical_block_size);
37
+
119
+
38
+ /* Allocated subclusters become zero */
120
+ if (vu_opts->has_num_queues) {
39
+ l2_bitmap |= l2_bitmap << 32;
121
+ num_queues = vu_opts->num_queues;
40
+ l2_bitmap &= QCOW_L2_BITMAP_ALL_ZEROES;
122
+ }
41
+
123
+ if (num_queues == 0) {
42
+ set_l2_bitmap(s, l2_table, l2_index, l2_bitmap);
124
+ error_setg(errp, "num-queues must be greater than 0");
43
+ set_l2_entry(s, l2_table, l2_index, 0);
125
+ return -EINVAL;
44
+ } else {
45
+ set_l2_entry(s, l2_table, l2_index, QCOW_OFLAG_ZERO);
46
+ }
126
+ }
47
+
127
+
48
ret = qcow2_pre_write_overlap_check(bs, ign, l2e_offset, l2_entry_size(s),
128
vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
49
false);
129
- logical_block_size);
50
if (metadata_overlap) {
130
+ logical_block_size, num_queues);
131
132
blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
133
vexp);
134
135
if (!vhost_user_server_start(&vexp->vu_server, vu_opts->addr, exp->ctx,
136
- VHOST_USER_BLK_MAX_QUEUES, &vu_blk_iface,
137
- errp)) {
138
+ num_queues, &vu_blk_iface, errp)) {
139
blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
140
blk_aio_detach, vexp);
141
return -EADDRNOTAVAIL;
51
--
142
--
52
2.31.1
143
2.26.2
53
144
54
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2
2
3
Add helper to parse compressed l2_entry and use it everywhere instead
3
bdrv_co_block_status_above has several design problems with handling
4
of open-coding.
4
short backing files:
5
5
6
Note, that in most places we move to precise coffset/csize instead of
6
1. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but
7
sector-aligned. Still it should work good enough for updating
7
without BDRV_BLOCK_ALLOCATED flag, when actually short backing file
8
refcounts.
8
which produces these after-EOF zeros is inside requested backing
9
sequence.
10
11
2. With want_zero=false, it may return pnum=0 prior to actual EOF,
12
because of EOF of short backing file.
13
14
Fix these things, making logic about short backing files clearer.
15
16
With fixed bdrv_block_status_above we also have to improve is_zero in
17
qcow2 code, otherwise iotest 154 will fail, because with this patch we
18
stop to merge zeros of different types (produced by fully unallocated
19
in the whole backing chain regions vs produced by short backing files).
20
21
Note also, that this patch leaves for another day the general problem
22
around block-status: misuse of BDRV_BLOCK_ALLOCATED as is-fs-allocated
23
vs go-to-backing.
9
24
10
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
25
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
26
Reviewed-by: Alberto Garcia <berto@igalia.com>
11
Reviewed-by: Eric Blake <eblake@redhat.com>
27
Reviewed-by: Eric Blake <eblake@redhat.com>
12
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
28
Message-id: 20200924194003.22080-2-vsementsov@virtuozzo.com
13
Message-Id: <20210914122454.141075-4-vsementsov@virtuozzo.com>
29
[Fix s/comes/come/ as suggested by Eric Blake
14
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
30
--Stefan]
31
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
15
---
32
---
16
block/qcow2.h | 3 ++-
33
block/io.c | 68 ++++++++++++++++++++++++++++++++++++++++-----------
17
block/qcow2-cluster.c | 15 +++++++++++++++
34
block/qcow2.c | 16 ++++++++++--
18
block/qcow2-refcount.c | 36 +++++++++++++++++-------------------
35
2 files changed, 68 insertions(+), 16 deletions(-)
19
block/qcow2.c | 9 ++-------
20
4 files changed, 36 insertions(+), 27 deletions(-)
21
36
22
diff --git a/block/qcow2.h b/block/qcow2.h
37
diff --git a/block/io.c b/block/io.c
23
index XXXXXXX..XXXXXXX 100644
38
index XXXXXXX..XXXXXXX 100644
24
--- a/block/qcow2.h
39
--- a/block/io.c
25
+++ b/block/qcow2.h
40
+++ b/block/io.c
26
@@ -XXX,XX +XXX,XX @@
41
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
27
42
int64_t *map,
28
/* Defined in the qcow2 spec (compressed cluster descriptor) */
43
BlockDriverState **file)
29
#define QCOW2_COMPRESSED_SECTOR_SIZE 512U
44
{
30
-#define QCOW2_COMPRESSED_SECTOR_MASK (~(QCOW2_COMPRESSED_SECTOR_SIZE - 1ULL))
45
+ int ret;
31
46
BlockDriverState *p;
32
/* Must be at least 2 to cover COW */
47
- int ret = 0;
33
#define MIN_L2_CACHE_SIZE 2 /* cache entries */
48
- bool first = true;
34
@@ -XXX,XX +XXX,XX @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
49
+ int64_t eof = 0;
35
uint64_t offset,
50
36
int compressed_size,
51
assert(bs != base);
37
uint64_t *host_offset);
52
- for (p = bs; p != base; p = bdrv_filter_or_cow_bs(p)) {
38
+void qcow2_parse_compressed_l2_entry(BlockDriverState *bs, uint64_t l2_entry,
53
+
39
+ uint64_t *coffset, int *csize);
54
+ ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
40
55
+ if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED) {
41
int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m);
56
+ return ret;
42
void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m);
57
+ }
43
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
58
+
44
index XXXXXXX..XXXXXXX 100644
59
+ if (ret & BDRV_BLOCK_EOF) {
45
--- a/block/qcow2-cluster.c
60
+ eof = offset + *pnum;
46
+++ b/block/qcow2-cluster.c
61
+ }
47
@@ -XXX,XX +XXX,XX @@ fail:
62
+
48
g_free(l1_table);
63
+ assert(*pnum <= bytes);
64
+ bytes = *pnum;
65
+
66
+ for (p = bdrv_filter_or_cow_bs(bs); p != base;
67
+ p = bdrv_filter_or_cow_bs(p))
68
+ {
69
ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
70
file);
71
if (ret < 0) {
72
- break;
73
+ return ret;
74
}
75
- if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
76
+ if (*pnum == 0) {
77
/*
78
- * Reading beyond the end of the file continues to read
79
- * zeroes, but we can only widen the result to the
80
- * unallocated length we learned from an earlier
81
- * iteration.
82
+ * The top layer deferred to this layer, and because this layer is
83
+ * short, any zeroes that we synthesize beyond EOF behave as if they
84
+ * were allocated at this layer.
85
+ *
86
+ * We don't include BDRV_BLOCK_EOF into ret, as upper layer may be
87
+ * larger. We'll add BDRV_BLOCK_EOF if needed at function end, see
88
+ * below.
89
*/
90
+ assert(ret & BDRV_BLOCK_EOF);
91
*pnum = bytes;
92
+ if (file) {
93
+ *file = p;
94
+ }
95
+ ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_ALLOCATED;
96
+ break;
97
}
98
- if (ret & (BDRV_BLOCK_ZERO | BDRV_BLOCK_DATA)) {
99
+ if (ret & BDRV_BLOCK_ALLOCATED) {
100
+ /*
101
+ * We've found the node and the status, we must break.
102
+ *
103
+ * Drop BDRV_BLOCK_EOF, as it's not for upper layer, which may be
104
+ * larger. We'll add BDRV_BLOCK_EOF if needed at function end, see
105
+ * below.
106
+ */
107
+ ret &= ~BDRV_BLOCK_EOF;
108
break;
109
}
110
- /* [offset, pnum] unallocated on this layer, which could be only
111
- * the first part of [offset, bytes]. */
112
- bytes = MIN(bytes, *pnum);
113
- first = false;
114
+
115
+ /*
116
+ * OK, [offset, offset + *pnum) region is unallocated on this layer,
117
+ * let's continue the diving.
118
+ */
119
+ assert(*pnum <= bytes);
120
+ bytes = *pnum;
121
+ }
122
+
123
+ if (offset + *pnum == eof) {
124
+ ret |= BDRV_BLOCK_EOF;
125
}
126
+
49
return ret;
127
return ret;
50
}
128
}
51
+
129
52
+void qcow2_parse_compressed_l2_entry(BlockDriverState *bs, uint64_t l2_entry,
53
+ uint64_t *coffset, int *csize)
54
+{
55
+ BDRVQcow2State *s = bs->opaque;
56
+ int nb_csectors;
57
+
58
+ assert(qcow2_get_cluster_type(bs, l2_entry) == QCOW2_CLUSTER_COMPRESSED);
59
+
60
+ *coffset = l2_entry & s->cluster_offset_mask;
61
+
62
+ nb_csectors = ((l2_entry >> s->csize_shift) & s->csize_mask) + 1;
63
+ *csize = nb_csectors * QCOW2_COMPRESSED_SECTOR_SIZE -
64
+ (*coffset & (QCOW2_COMPRESSED_SECTOR_SIZE - 1));
65
+}
66
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
67
index XXXXXXX..XXXXXXX 100644
68
--- a/block/qcow2-refcount.c
69
+++ b/block/qcow2-refcount.c
70
@@ -XXX,XX +XXX,XX @@ void qcow2_free_any_cluster(BlockDriverState *bs, uint64_t l2_entry,
71
switch (ctype) {
72
case QCOW2_CLUSTER_COMPRESSED:
73
{
74
- int64_t offset = (l2_entry & s->cluster_offset_mask)
75
- & QCOW2_COMPRESSED_SECTOR_MASK;
76
- int size = QCOW2_COMPRESSED_SECTOR_SIZE *
77
- (((l2_entry >> s->csize_shift) & s->csize_mask) + 1);
78
- qcow2_free_clusters(bs, offset, size, type);
79
+ uint64_t coffset;
80
+ int csize;
81
+
82
+ qcow2_parse_compressed_l2_entry(bs, l2_entry, &coffset, &csize);
83
+ qcow2_free_clusters(bs, coffset, csize, type);
84
}
85
break;
86
case QCOW2_CLUSTER_NORMAL:
87
@@ -XXX,XX +XXX,XX @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
88
bool l1_allocated = false;
89
int64_t old_entry, old_l2_offset;
90
unsigned slice, slice_size2, n_slices;
91
- int i, j, l1_modified = 0, nb_csectors;
92
+ int i, j, l1_modified = 0;
93
int ret;
94
95
assert(addend >= -1 && addend <= 1);
96
@@ -XXX,XX +XXX,XX @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
97
98
switch (qcow2_get_cluster_type(bs, entry)) {
99
case QCOW2_CLUSTER_COMPRESSED:
100
- nb_csectors = ((entry >> s->csize_shift) &
101
- s->csize_mask) + 1;
102
if (addend != 0) {
103
- uint64_t coffset = (entry & s->cluster_offset_mask)
104
- & QCOW2_COMPRESSED_SECTOR_MASK;
105
+ uint64_t coffset;
106
+ int csize;
107
+
108
+ qcow2_parse_compressed_l2_entry(bs, entry,
109
+ &coffset, &csize);
110
ret = update_refcount(
111
- bs, coffset,
112
- nb_csectors * QCOW2_COMPRESSED_SECTOR_SIZE,
113
+ bs, coffset, csize,
114
abs(addend), addend < 0,
115
QCOW2_DISCARD_SNAPSHOT);
116
if (ret < 0) {
117
@@ -XXX,XX +XXX,XX @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
118
BDRVQcow2State *s = bs->opaque;
119
uint64_t l2_entry;
120
uint64_t next_contiguous_offset = 0;
121
- int i, nb_csectors, ret;
122
+ int i, ret;
123
size_t l2_size_bytes = s->l2_size * l2_entry_size(s);
124
g_autofree uint64_t *l2_table = g_malloc(l2_size_bytes);
125
126
@@ -XXX,XX +XXX,XX @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
127
128
/* Do the actual checks */
129
for (i = 0; i < s->l2_size; i++) {
130
+ uint64_t coffset;
131
+ int csize;
132
l2_entry = get_l2_entry(s, l2_table, i);
133
134
switch (qcow2_get_cluster_type(bs, l2_entry)) {
135
@@ -XXX,XX +XXX,XX @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
136
}
137
138
/* Mark cluster as used */
139
- nb_csectors = ((l2_entry >> s->csize_shift) &
140
- s->csize_mask) + 1;
141
- l2_entry &= s->cluster_offset_mask;
142
+ qcow2_parse_compressed_l2_entry(bs, l2_entry, &coffset, &csize);
143
ret = qcow2_inc_refcounts_imrt(
144
- bs, res, refcount_table, refcount_table_size,
145
- l2_entry & QCOW2_COMPRESSED_SECTOR_MASK,
146
- nb_csectors * QCOW2_COMPRESSED_SECTOR_SIZE);
147
+ bs, res, refcount_table, refcount_table_size, coffset, csize);
148
if (ret < 0) {
149
return ret;
150
}
151
diff --git a/block/qcow2.c b/block/qcow2.c
130
diff --git a/block/qcow2.c b/block/qcow2.c
152
index XXXXXXX..XXXXXXX 100644
131
index XXXXXXX..XXXXXXX 100644
153
--- a/block/qcow2.c
132
--- a/block/qcow2.c
154
+++ b/block/qcow2.c
133
+++ b/block/qcow2.c
155
@@ -XXX,XX +XXX,XX @@ qcow2_co_preadv_compressed(BlockDriverState *bs,
134
@@ -XXX,XX +XXX,XX @@ static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
156
size_t qiov_offset)
135
if (!bytes) {
157
{
136
return true;
158
BDRVQcow2State *s = bs->opaque;
137
}
159
- int ret = 0, csize, nb_csectors;
138
- res = bdrv_block_status_above(bs, NULL, offset, bytes, &nr, NULL, NULL);
160
+ int ret = 0, csize;
139
- return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
161
uint64_t coffset;
140
+
162
uint8_t *buf, *out_buf;
141
+ /*
163
int offset_in_cluster = offset_into_cluster(s, offset);
142
+ * bdrv_block_status_above doesn't merge different types of zeros, for
164
143
+ * example, zeros which come from the region which is unallocated in
165
- assert(qcow2_get_cluster_type(bs, l2_entry) == QCOW2_CLUSTER_COMPRESSED);
144
+ * the whole backing chain, and zeros which come because of a short
166
-
145
+ * backing file. So, we need a loop.
167
- coffset = l2_entry & s->cluster_offset_mask;
146
+ */
168
- nb_csectors = ((l2_entry >> s->csize_shift) & s->csize_mask) + 1;
147
+ do {
169
- csize = nb_csectors * QCOW2_COMPRESSED_SECTOR_SIZE -
148
+ res = bdrv_block_status_above(bs, NULL, offset, bytes, &nr, NULL, NULL);
170
- (coffset & ~QCOW2_COMPRESSED_SECTOR_MASK);
149
+ offset += nr;
171
+ qcow2_parse_compressed_l2_entry(bs, l2_entry, &coffset, &csize);
150
+ bytes -= nr;
172
151
+ } while (res >= 0 && (res & BDRV_BLOCK_ZERO) && nr && bytes);
173
buf = g_try_malloc(csize);
152
+
174
if (!buf) {
153
+ return res >= 0 && (res & BDRV_BLOCK_ZERO) && bytes == 0;
154
}
155
156
static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
175
--
157
--
176
2.31.1
158
2.26.2
177
159
178
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2
2
3
There is no conflict and no dependency if we have parallel writes to
3
In order to reuse bdrv_common_block_status_above in
4
different subclusters of one cluster when the cluster itself is already
4
bdrv_is_allocated_above, let's support include_base parameter.
5
allocated. So, relax extra dependency.
6
7
Measure performance:
8
First, prepare build/qemu-img-old and build/qemu-img-new images.
9
10
cd scripts/simplebench
11
./img_bench_templater.py
12
13
Paste the following to stdin of running script:
14
15
qemu_img=../../build/qemu-img-{old|new}
16
$qemu_img create -f qcow2 -o extended_l2=on /ssd/x.qcow2 1G
17
$qemu_img bench -c 100000 -d 8 [-s 2K|-s 2K -o 512|-s $((1024*2+512))] \
18
-w -t none -n /ssd/x.qcow2
19
20
The result:
21
22
All results are in seconds
23
24
------------------ --------- ---------
25
old new
26
-s 2K 6.7 ± 15% 6.2 ± 12%
27
-7%
28
-s 2K -o 512 13 ± 3% 11 ± 5%
29
-16%
30
-s $((1024*2+512)) 9.5 ± 4% 8.4
31
-12%
32
------------------ --------- ---------
33
34
So small writes are more independent now and that helps to keep deeper
35
io queue which improves performance.
36
37
271 iotest output becomes racy for three allocation in one cluster.
38
Second and third writes may finish in different order. Second and
39
third requests don't depend on each other any more. Still they both
40
depend on first request anyway. Filter out second and third write
41
offsets to cover both possible outputs.
42
5
43
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
6
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
44
Message-Id: <20210824101517.59802-4-vsementsov@virtuozzo.com>
7
Reviewed-by: Alberto Garcia <berto@igalia.com>
45
Reviewed-by: Eric Blake <eblake@redhat.com>
8
Reviewed-by: Eric Blake <eblake@redhat.com>
46
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
9
Message-id: 20200924194003.22080-3-vsementsov@virtuozzo.com
47
[hreitz: s/ an / and /]
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
48
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
49
---
11
---
50
block/qcow2-cluster.c | 11 +++++++++++
12
block/coroutines.h | 2 ++
51
tests/qemu-iotests/271 | 5 ++++-
13
block/io.c | 21 ++++++++++++++-------
52
tests/qemu-iotests/271.out | 4 ++--
14
2 files changed, 16 insertions(+), 7 deletions(-)
53
3 files changed, 17 insertions(+), 3 deletions(-)
54
15
55
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
16
diff --git a/block/coroutines.h b/block/coroutines.h
56
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
57
--- a/block/qcow2-cluster.c
18
--- a/block/coroutines.h
58
+++ b/block/qcow2-cluster.c
19
+++ b/block/coroutines.h
59
@@ -XXX,XX +XXX,XX @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
20
@@ -XXX,XX +XXX,XX @@ bdrv_pwritev(BdrvChild *child, int64_t offset, unsigned int bytes,
60
continue;
21
int coroutine_fn
22
bdrv_co_common_block_status_above(BlockDriverState *bs,
23
BlockDriverState *base,
24
+ bool include_base,
25
bool want_zero,
26
int64_t offset,
27
int64_t bytes,
28
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
29
int generated_co_wrapper
30
bdrv_common_block_status_above(BlockDriverState *bs,
31
BlockDriverState *base,
32
+ bool include_base,
33
bool want_zero,
34
int64_t offset,
35
int64_t bytes,
36
diff --git a/block/io.c b/block/io.c
37
index XXXXXXX..XXXXXXX 100644
38
--- a/block/io.c
39
+++ b/block/io.c
40
@@ -XXX,XX +XXX,XX @@ early_out:
41
int coroutine_fn
42
bdrv_co_common_block_status_above(BlockDriverState *bs,
43
BlockDriverState *base,
44
+ bool include_base,
45
bool want_zero,
46
int64_t offset,
47
int64_t bytes,
48
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
49
BlockDriverState *p;
50
int64_t eof = 0;
51
52
- assert(bs != base);
53
+ assert(include_base || bs != base);
54
+ assert(!include_base || base); /* Can't include NULL base */
55
56
ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
57
- if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED) {
58
+ if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
59
return ret;
60
}
61
62
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
63
assert(*pnum <= bytes);
64
bytes = *pnum;
65
66
- for (p = bdrv_filter_or_cow_bs(bs); p != base;
67
+ for (p = bdrv_filter_or_cow_bs(bs); include_base || p != base;
68
p = bdrv_filter_or_cow_bs(p))
69
{
70
ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
71
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
72
break;
61
}
73
}
62
74
63
+ if (old_alloc->keep_old_clusters &&
75
+ if (p == base) {
64
+ (end <= l2meta_cow_start(old_alloc) ||
76
+ assert(include_base);
65
+ start >= l2meta_cow_end(old_alloc)))
77
+ break;
66
+ {
67
+ /*
68
+ * Clusters intersect but COW areas don't. And cluster itself is
69
+ * already allocated. So, there is no actual conflict.
70
+ */
71
+ continue;
72
+ }
78
+ }
73
+
79
+
74
/* Conflict */
80
/*
75
81
* OK, [offset, offset + *pnum) region is unallocated on this layer,
76
if (start < old_start) {
82
* let's continue the diving.
77
diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271
83
@@ -XXX,XX +XXX,XX @@ int bdrv_block_status_above(BlockDriverState *bs, BlockDriverState *base,
78
index XXXXXXX..XXXXXXX 100755
84
int64_t offset, int64_t bytes, int64_t *pnum,
79
--- a/tests/qemu-iotests/271
85
int64_t *map, BlockDriverState **file)
80
+++ b/tests/qemu-iotests/271
86
{
81
@@ -XXX,XX +XXX,XX @@ EOF
87
- return bdrv_common_block_status_above(bs, base, true, offset, bytes,
88
+ return bdrv_common_block_status_above(bs, base, false, true, offset, bytes,
89
pnum, map, file);
82
}
90
}
83
91
84
_make_test_img -o extended_l2=on 1M
92
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
85
-_concurrent_io | $QEMU_IO | _filter_qemu_io
93
int ret;
86
+# Second and third writes in _concurrent_io() are independent and may finish in
94
int64_t dummy;
87
+# different order. So, filter offset out to match both possible variants.
95
88
+_concurrent_io | $QEMU_IO | _filter_qemu_io | \
96
- ret = bdrv_common_block_status_above(bs, bdrv_filter_or_cow_bs(bs), false,
89
+ $SED -e 's/\(20480\|40960\)/OFFSET/'
97
- offset, bytes, pnum ? pnum : &dummy,
90
_concurrent_verify | $QEMU_IO | _filter_qemu_io
98
- NULL, NULL);
91
99
+ ret = bdrv_common_block_status_above(bs, bs, true, false, offset,
92
# success, all done
100
+ bytes, pnum ? pnum : &dummy, NULL,
93
diff --git a/tests/qemu-iotests/271.out b/tests/qemu-iotests/271.out
101
+ NULL);
94
index XXXXXXX..XXXXXXX 100644
102
if (ret < 0) {
95
--- a/tests/qemu-iotests/271.out
103
return ret;
96
+++ b/tests/qemu-iotests/271.out
104
}
97
@@ -XXX,XX +XXX,XX @@ blkdebug: Suspended request 'A'
98
blkdebug: Resuming request 'A'
99
wrote 2048/2048 bytes at offset 30720
100
2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
101
-wrote 2048/2048 bytes at offset 20480
102
+wrote 2048/2048 bytes at offset OFFSET
103
2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
104
-wrote 2048/2048 bytes at offset 40960
105
+wrote 2048/2048 bytes at offset OFFSET
106
2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
107
*** done
108
--
105
--
109
2.31.1
106
2.26.2
110
107
111
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2
2
3
We must not inactivate child when parent has write permissions on
3
We are going to reuse bdrv_common_block_status_above in
4
it.
4
bdrv_is_allocated_above. bdrv_is_allocated_above may be called with
5
include_base == false and still bs == base (for ex. from img_rebase()).
5
6
6
Calling .bdrv_inactivate() doesn't help: actually only qcow2 has this
7
So, support this corner case.
7
handler and it is used to flush caches, not for permission
8
manipulations.
9
10
So, let's simply check cumulative parent permissions before
11
inactivating the node.
12
13
This commit fixes a crash when we do migration during backup: prior to
14
the commit nothing prevents all nodes inactivation at migration finish
15
and following backup write to the target crashes on assertion
16
"assert(!(bs->open_flags & BDRV_O_INACTIVE));" in
17
bdrv_co_write_req_prepare().
18
19
After the commit, we rely on the fact that copy-before-write filter
20
keeps write permission on target node to be able to write to it. So
21
inactivation fails and migration fails as expected.
22
23
Corresponding test now passes, so, enable it.
24
8
25
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
9
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
26
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
10
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
27
Message-Id: <20210911120027.8063-3-vsementsov@virtuozzo.com>
11
Reviewed-by: Eric Blake <eblake@redhat.com>
28
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
12
Reviewed-by: Alberto Garcia <berto@igalia.com>
13
Message-id: 20200924194003.22080-4-vsementsov@virtuozzo.com
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
29
---
15
---
30
block.c | 8 ++++++++
16
block/io.c | 6 +++++-
31
tests/qemu-iotests/tests/migrate-during-backup | 2 +-
17
1 file changed, 5 insertions(+), 1 deletion(-)
32
2 files changed, 9 insertions(+), 1 deletion(-)
33
18
34
diff --git a/block.c b/block.c
19
diff --git a/block/io.c b/block/io.c
35
index XXXXXXX..XXXXXXX 100644
20
index XXXXXXX..XXXXXXX 100644
36
--- a/block.c
21
--- a/block/io.c
37
+++ b/block.c
22
+++ b/block/io.c
38
@@ -XXX,XX +XXX,XX @@ static int bdrv_inactivate_recurse(BlockDriverState *bs)
23
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
39
{
24
BlockDriverState *p;
40
BdrvChild *child, *parent;
25
int64_t eof = 0;
41
int ret;
26
42
+ uint64_t cumulative_perms, cumulative_shared_perms;
27
- assert(include_base || bs != base);
43
28
assert(!include_base || base); /* Can't include NULL base */
44
if (!bs->drv) {
29
45
return -ENOMEDIUM;
30
+ if (!include_base && bs == base) {
46
@@ -XXX,XX +XXX,XX @@ static int bdrv_inactivate_recurse(BlockDriverState *bs)
31
+ *pnum = bytes;
47
}
32
+ return 0;
48
}
49
50
+ bdrv_get_cumulative_perm(bs, &cumulative_perms,
51
+ &cumulative_shared_perms);
52
+ if (cumulative_perms & (BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED)) {
53
+ /* Our inactive parents still need write access. Inactivation failed. */
54
+ return -EPERM;
55
+ }
33
+ }
56
+
34
+
57
bs->open_flags |= BDRV_O_INACTIVE;
35
ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
58
36
if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
59
/*
37
return ret;
60
diff --git a/tests/qemu-iotests/tests/migrate-during-backup b/tests/qemu-iotests/tests/migrate-during-backup
61
index XXXXXXX..XXXXXXX 100755
62
--- a/tests/qemu-iotests/tests/migrate-during-backup
63
+++ b/tests/qemu-iotests/tests/migrate-during-backup
64
@@ -XXX,XX +XXX,XX @@
65
#!/usr/bin/env python3
66
-# group: migration disabled
67
+# group: migration
68
#
69
# Copyright (c) 2021 Virtuozzo International GmbH
70
#
71
--
38
--
72
2.31.1
39
2.26.2
73
40
74
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2
2
3
- don't use same name for size in bytes and in entries
3
bdrv_is_allocated_above wrongly handles short backing files: it reports
4
- use g_autofree for l2_table
4
after-EOF space as UNALLOCATED which is wrong, as on read the data is
5
- add whitespace
5
generated on the level of short backing file (if all overlays have
6
- fix block comment style
6
unallocated areas at that place).
7
8
Reusing bdrv_common_block_status_above fixes the issue and unifies code
9
path.
7
10
8
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
11
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
9
Reviewed-by: Eric Blake <eblake@redhat.com>
12
Reviewed-by: Eric Blake <eblake@redhat.com>
10
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
13
Reviewed-by: Alberto Garcia <berto@igalia.com>
11
Message-Id: <20210914122454.141075-2-vsementsov@virtuozzo.com>
14
Message-id: 20200924194003.22080-5-vsementsov@virtuozzo.com
12
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
15
[Fix s/has/have/ as suggested by Eric Blake. Fix s/area/areas/.
16
--Stefan]
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
---
18
---
14
block/qcow2-refcount.c | 47 +++++++++++++++++++++---------------------
19
block/io.c | 43 +++++--------------------------------------
15
1 file changed, 24 insertions(+), 23 deletions(-)
20
1 file changed, 5 insertions(+), 38 deletions(-)
16
21
17
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
22
diff --git a/block/io.c b/block/io.c
18
index XXXXXXX..XXXXXXX 100644
23
index XXXXXXX..XXXXXXX 100644
19
--- a/block/qcow2-refcount.c
24
--- a/block/io.c
20
+++ b/block/qcow2-refcount.c
25
+++ b/block/io.c
21
@@ -XXX,XX +XXX,XX @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
26
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
22
int flags, BdrvCheckMode fix, bool active)
27
* at 'offset + *pnum' may return the same allocation status (in other
28
* words, the result is not necessarily the maximum possible range);
29
* but 'pnum' will only be 0 when end of file is reached.
30
- *
31
*/
32
int bdrv_is_allocated_above(BlockDriverState *top,
33
BlockDriverState *base,
34
bool include_base, int64_t offset,
35
int64_t bytes, int64_t *pnum)
23
{
36
{
24
BDRVQcow2State *s = bs->opaque;
37
- BlockDriverState *intermediate;
25
- uint64_t *l2_table, l2_entry;
38
- int ret;
26
+ uint64_t l2_entry;
39
- int64_t n = bytes;
27
uint64_t next_contiguous_offset = 0;
28
- int i, l2_size, nb_csectors, ret;
29
+ int i, nb_csectors, ret;
30
+ size_t l2_size_bytes = s->l2_size * l2_entry_size(s);
31
+ g_autofree uint64_t *l2_table = g_malloc(l2_size_bytes);
32
33
/* Read L2 table from disk */
34
- l2_size = s->l2_size * l2_entry_size(s);
35
- l2_table = g_malloc(l2_size);
36
-
40
-
37
- ret = bdrv_pread(bs->file, l2_offset, l2_table, l2_size);
41
- assert(base || !include_base);
38
+ ret = bdrv_pread(bs->file, l2_offset, l2_table, l2_size_bytes);
42
-
39
if (ret < 0) {
43
- intermediate = top;
40
fprintf(stderr, "ERROR: I/O error in check_refcounts_l2\n");
44
- while (include_base || intermediate != base) {
41
res->check_errors++;
45
- int64_t pnum_inter;
42
- goto fail;
46
- int64_t size_inter;
47
-
48
- assert(intermediate);
49
- ret = bdrv_is_allocated(intermediate, offset, bytes, &pnum_inter);
50
- if (ret < 0) {
51
- return ret;
52
- }
53
- if (ret) {
54
- *pnum = pnum_inter;
55
- return 1;
56
- }
57
-
58
- size_inter = bdrv_getlength(intermediate);
59
- if (size_inter < 0) {
60
- return size_inter;
61
- }
62
- if (n > pnum_inter &&
63
- (intermediate == top || offset + pnum_inter < size_inter)) {
64
- n = pnum_inter;
65
- }
66
-
67
- if (intermediate == base) {
68
- break;
69
- }
70
-
71
- intermediate = bdrv_filter_or_cow_bs(intermediate);
72
+ int ret = bdrv_common_block_status_above(top, base, include_base, false,
73
+ offset, bytes, pnum, NULL, NULL);
74
+ if (ret < 0) {
43
+ return ret;
75
+ return ret;
44
}
76
}
45
77
46
/* Do the actual checks */
78
- *pnum = n;
47
- for(i = 0; i < s->l2_size; i++) {
79
- return 0;
48
+ for (i = 0; i < s->l2_size; i++) {
80
+ return !!(ret & BDRV_BLOCK_ALLOCATED);
49
l2_entry = get_l2_entry(s, l2_table, i);
50
51
switch (qcow2_get_cluster_type(bs, l2_entry)) {
52
@@ -XXX,XX +XXX,XX @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
53
l2_entry & QCOW2_COMPRESSED_SECTOR_MASK,
54
nb_csectors * QCOW2_COMPRESSED_SECTOR_SIZE);
55
if (ret < 0) {
56
- goto fail;
57
+ return ret;
58
}
59
60
if (flags & CHECK_FRAG_INFO) {
61
res->bfi.allocated_clusters++;
62
res->bfi.compressed_clusters++;
63
64
- /* Compressed clusters are fragmented by nature. Since they
65
+ /*
66
+ * Compressed clusters are fragmented by nature. Since they
67
* take up sub-sector space but we only have sector granularity
68
* I/O we need to re-read the same sectors even for adjacent
69
* compressed clusters.
70
@@ -XXX,XX +XXX,XX @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
71
if (ret < 0) {
72
fprintf(stderr, "ERROR: Overlap check failed\n");
73
res->check_errors++;
74
- /* Something is seriously wrong, so abort checking
75
- * this L2 table */
76
- goto fail;
77
+ /*
78
+ * Something is seriously wrong, so abort checking
79
+ * this L2 table.
80
+ */
81
+ return ret;
82
}
83
84
ret = bdrv_pwrite_sync(bs->file, l2e_offset,
85
@@ -XXX,XX +XXX,XX @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
86
fprintf(stderr, "ERROR: Failed to overwrite L2 "
87
"table entry: %s\n", strerror(-ret));
88
res->check_errors++;
89
- /* Do not abort, continue checking the rest of this
90
- * L2 table's entries */
91
+ /*
92
+ * Do not abort, continue checking the rest of this
93
+ * L2 table's entries.
94
+ */
95
} else {
96
res->corruptions--;
97
res->corruptions_fixed++;
98
- /* Skip marking the cluster as used
99
- * (it is unused now) */
100
+ /*
101
+ * Skip marking the cluster as used
102
+ * (it is unused now).
103
+ */
104
continue;
105
}
106
}
107
@@ -XXX,XX +XXX,XX @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
108
refcount_table_size,
109
offset, s->cluster_size);
110
if (ret < 0) {
111
- goto fail;
112
+ return ret;
113
}
114
}
115
break;
116
@@ -XXX,XX +XXX,XX @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
117
}
118
}
119
120
- g_free(l2_table);
121
return 0;
122
-
123
-fail:
124
- g_free(l2_table);
125
- return ret;
126
}
81
}
127
82
128
/*
83
int coroutine_fn
129
--
84
--
130
2.31.1
85
2.26.2
131
86
132
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2
2
3
Let's pass the whole L2 entry and not bother with
3
These cases are fixed by previous patches around block_status and
4
L2E_COMPRESSED_OFFSET_SIZE_MASK.
4
is_allocated.
5
6
It also helps further refactoring that adds generic
7
qcow2_parse_compressed_l2_entry() helper.
8
5
9
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
6
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
10
Reviewed-by: Eric Blake <eblake@redhat.com>
7
Reviewed-by: Eric Blake <eblake@redhat.com>
11
Reviewed-by: Alberto Garcia <berto@igalia.com>
8
Reviewed-by: Alberto Garcia <berto@igalia.com>
12
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
9
Message-id: 20200924194003.22080-6-vsementsov@virtuozzo.com
13
Message-Id: <20210914122454.141075-3-vsementsov@virtuozzo.com>
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
14
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
15
---
11
---
16
block/qcow2.h | 1 -
12
tests/qemu-iotests/274 | 20 +++++++++++
17
block/qcow2-cluster.c | 5 ++---
13
tests/qemu-iotests/274.out | 68 ++++++++++++++++++++++++++++++++++++++
18
block/qcow2.c | 12 +++++++-----
14
2 files changed, 88 insertions(+)
19
3 files changed, 9 insertions(+), 9 deletions(-)
20
15
21
diff --git a/block/qcow2.h b/block/qcow2.h
16
diff --git a/tests/qemu-iotests/274 b/tests/qemu-iotests/274
17
index XXXXXXX..XXXXXXX 100755
18
--- a/tests/qemu-iotests/274
19
+++ b/tests/qemu-iotests/274
20
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('base') as base, \
21
iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, mid)
22
iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), mid)
23
24
+ iotests.log('=== Testing qemu-img commit (top -> base) ===')
25
+
26
+ create_chain()
27
+ iotests.qemu_img_log('commit', '-b', base, top)
28
+ iotests.img_info_log(base)
29
+ iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
30
+ iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), base)
31
+
32
+ iotests.log('=== Testing QMP active commit (top -> base) ===')
33
+
34
+ create_chain()
35
+ with create_vm() as vm:
36
+ vm.launch()
37
+ vm.qmp_log('block-commit', device='top', base_node='base',
38
+ job_id='job0', auto_dismiss=False)
39
+ vm.run_job('job0', wait=5)
40
+
41
+ iotests.img_info_log(mid)
42
+ iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
43
+ iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), base)
44
45
iotests.log('== Resize tests ==')
46
47
diff --git a/tests/qemu-iotests/274.out b/tests/qemu-iotests/274.out
22
index XXXXXXX..XXXXXXX 100644
48
index XXXXXXX..XXXXXXX 100644
23
--- a/block/qcow2.h
49
--- a/tests/qemu-iotests/274.out
24
+++ b/block/qcow2.h
50
+++ b/tests/qemu-iotests/274.out
25
@@ -XXX,XX +XXX,XX @@ typedef enum QCow2MetadataOverlap {
51
@@ -XXX,XX +XXX,XX @@ read 1048576/1048576 bytes at offset 0
26
52
read 1048576/1048576 bytes at offset 1048576
27
#define L1E_OFFSET_MASK 0x00fffffffffffe00ULL
53
1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
28
#define L2E_OFFSET_MASK 0x00fffffffffffe00ULL
54
29
-#define L2E_COMPRESSED_OFFSET_SIZE_MASK 0x3fffffffffffffffULL
55
+=== Testing qemu-img commit (top -> base) ===
30
56
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 lazy_refcounts=off refcount_bits=16
31
#define REFT_OFFSET_MASK 0xfffffffffffffe00ULL
32
33
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
34
index XXXXXXX..XXXXXXX 100644
35
--- a/block/qcow2-cluster.c
36
+++ b/block/qcow2-cluster.c
37
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn do_perform_cow_write(BlockDriverState *bs,
38
* offset needs to be aligned to a cluster boundary.
39
*
40
* If the cluster is unallocated then *host_offset will be 0.
41
- * If the cluster is compressed then *host_offset will contain the
42
- * complete compressed cluster descriptor.
43
+ * If the cluster is compressed then *host_offset will contain the l2 entry.
44
*
45
* On entry, *bytes is the maximum number of contiguous bytes starting at
46
* offset that we are interested in.
47
@@ -XXX,XX +XXX,XX @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
48
ret = -EIO;
49
goto fail;
50
}
51
- *host_offset = l2_entry & L2E_COMPRESSED_OFFSET_SIZE_MASK;
52
+ *host_offset = l2_entry;
53
break;
54
case QCOW2_SUBCLUSTER_ZERO_PLAIN:
55
case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
56
diff --git a/block/qcow2.c b/block/qcow2.c
57
index XXXXXXX..XXXXXXX 100644
58
--- a/block/qcow2.c
59
+++ b/block/qcow2.c
60
@@ -XXX,XX +XXX,XX @@ typedef struct {
61
62
static int coroutine_fn
63
qcow2_co_preadv_compressed(BlockDriverState *bs,
64
- uint64_t cluster_descriptor,
65
+ uint64_t l2_entry,
66
uint64_t offset,
67
uint64_t bytes,
68
QEMUIOVector *qiov,
69
@@ -XXX,XX +XXX,XX @@ typedef struct Qcow2AioTask {
70
71
BlockDriverState *bs;
72
QCow2SubclusterType subcluster_type; /* only for read */
73
- uint64_t host_offset; /* or full descriptor in compressed clusters */
74
+ uint64_t host_offset; /* or l2_entry for compressed read */
75
uint64_t offset;
76
uint64_t bytes;
77
QEMUIOVector *qiov;
78
@@ -XXX,XX +XXX,XX @@ qcow2_co_pwritev_compressed_part(BlockDriverState *bs,
79
80
static int coroutine_fn
81
qcow2_co_preadv_compressed(BlockDriverState *bs,
82
- uint64_t cluster_descriptor,
83
+ uint64_t l2_entry,
84
uint64_t offset,
85
uint64_t bytes,
86
QEMUIOVector *qiov,
87
@@ -XXX,XX +XXX,XX @@ qcow2_co_preadv_compressed(BlockDriverState *bs,
88
uint8_t *buf, *out_buf;
89
int offset_in_cluster = offset_into_cluster(s, offset);
90
91
- coffset = cluster_descriptor & s->cluster_offset_mask;
92
- nb_csectors = ((cluster_descriptor >> s->csize_shift) & s->csize_mask) + 1;
93
+ assert(qcow2_get_cluster_type(bs, l2_entry) == QCOW2_CLUSTER_COMPRESSED);
94
+
57
+
95
+ coffset = l2_entry & s->cluster_offset_mask;
58
+Formatting 'TEST_DIR/PID-mid', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1048576 backing_file=TEST_DIR/PID-base backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
96
+ nb_csectors = ((l2_entry >> s->csize_shift) & s->csize_mask) + 1;
59
+
97
csize = nb_csectors * QCOW2_COMPRESSED_SECTOR_SIZE -
60
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 backing_file=TEST_DIR/PID-mid backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
98
(coffset & ~QCOW2_COMPRESSED_SECTOR_MASK);
61
+
99
62
+wrote 2097152/2097152 bytes at offset 0
63
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
64
+
65
+Image committed.
66
+
67
+image: TEST_IMG
68
+file format: IMGFMT
69
+virtual size: 2 MiB (2097152 bytes)
70
+cluster_size: 65536
71
+Format specific information:
72
+ compat: 1.1
73
+ compression type: zlib
74
+ lazy refcounts: false
75
+ refcount bits: 16
76
+ corrupt: false
77
+ extended l2: false
78
+
79
+read 1048576/1048576 bytes at offset 0
80
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
81
+
82
+read 1048576/1048576 bytes at offset 1048576
83
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
84
+
85
+=== Testing QMP active commit (top -> base) ===
86
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 lazy_refcounts=off refcount_bits=16
87
+
88
+Formatting 'TEST_DIR/PID-mid', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1048576 backing_file=TEST_DIR/PID-base backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
89
+
90
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 backing_file=TEST_DIR/PID-mid backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
91
+
92
+wrote 2097152/2097152 bytes at offset 0
93
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
94
+
95
+{"execute": "block-commit", "arguments": {"auto-dismiss": false, "base-node": "base", "device": "top", "job-id": "job0"}}
96
+{"return": {}}
97
+{"execute": "job-complete", "arguments": {"id": "job0"}}
98
+{"return": {}}
99
+{"data": {"device": "job0", "len": 1048576, "offset": 1048576, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
100
+{"data": {"device": "job0", "len": 1048576, "offset": 1048576, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
101
+{"execute": "job-dismiss", "arguments": {"id": "job0"}}
102
+{"return": {}}
103
+image: TEST_IMG
104
+file format: IMGFMT
105
+virtual size: 1 MiB (1048576 bytes)
106
+cluster_size: 65536
107
+backing file: TEST_DIR/PID-base
108
+backing file format: IMGFMT
109
+Format specific information:
110
+ compat: 1.1
111
+ compression type: zlib
112
+ lazy refcounts: false
113
+ refcount bits: 16
114
+ corrupt: false
115
+ extended l2: false
116
+
117
+read 1048576/1048576 bytes at offset 0
118
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
119
+
120
+read 1048576/1048576 bytes at offset 1048576
121
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
122
+
123
== Resize tests ==
124
=== preallocation=off ===
125
Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=6442450944 lazy_refcounts=off refcount_bits=16
100
--
126
--
101
2.31.1
127
2.26.2
102
128
103
diff view generated by jsdifflib