1
The following changes since commit ca4e667dbf431d4a2a5a619cde79d30dd2ac3eb2:
1
The following changes since commit ac793156f650ae2d77834932d72224175ee69086:
2
2
3
Merge remote-tracking branch 'remotes/kraxel/tags/usb-20170717-pull-request' into staging (2017-07-17 17:54:17 +0100)
3
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20201020-1' into staging (2020-10-20 21:11:35 +0100)
4
4
5
are available in the git repository at:
5
are available in the Git repository at:
6
6
7
git://github.com/codyprime/qemu-kvm-jtc.git tags/block-pull-request
7
https://gitlab.com/stefanha/qemu.git tags/block-pull-request
8
8
9
for you to fetch changes up to 8508eee740c78d1465e25dad7c3e06137485dfbc:
9
for you to fetch changes up to 32a3fd65e7e3551337fd26bfc0e2f899d70c028c:
10
10
11
live-block-ops.txt: Rename, rewrite, and improve it (2017-07-18 00:11:01 -0400)
11
iotests: add commit top->base cases to 274 (2020-10-22 09:55:39 +0100)
12
12
13
----------------------------------------------------------------
13
----------------------------------------------------------------
14
Block patches (documentation)
14
Pull request
15
16
v2:
17
* Fix format string issues on 32-bit hosts [Peter]
18
* Fix qemu-nbd.c CONFIG_POSIX ifdef issue [Eric]
19
* Fix missing eventfd.h header on macOS [Peter]
20
* Drop unreliable vhost-user-blk test (will send a new patch when ready) [Peter]
21
22
This pull request contains the vhost-user-blk server by Coiby Xu along with my
23
additions, block/nvme.c alignment and hardware error statistics by Philippe
24
Mathieu-Daudé, and bdrv_co_block_status_above() fixes by Vladimir
25
Sementsov-Ogievskiy.
26
15
----------------------------------------------------------------
27
----------------------------------------------------------------
16
28
17
Kashyap Chamarthy (2):
29
Coiby Xu (6):
18
bitmaps.md: Convert to rST; move it into 'interop' dir
30
libvhost-user: Allow vu_message_read to be replaced
19
live-block-ops.txt: Rename, rewrite, and improve it
31
libvhost-user: remove watch for kick_fd when de-initialize vu-dev
32
util/vhost-user-server: generic vhost user server
33
block: move logical block size check function to a common utility
34
function
35
block/export: vhost-user block device backend server
36
MAINTAINERS: Add vhost-user block device backend server maintainer
20
37
21
docs/devel/bitmaps.md | 505 ---------------
38
Philippe Mathieu-Daudé (1):
22
docs/interop/bitmaps.rst | 555 ++++++++++++++++
39
block/nvme: Add driver statistics for access alignment and hw errors
23
docs/interop/live-block-operations.rst | 1088 ++++++++++++++++++++++++++++++++
40
24
docs/live-block-ops.txt | 72 ---
41
Stefan Hajnoczi (16):
25
4 files changed, 1643 insertions(+), 577 deletions(-)
42
util/vhost-user-server: s/fileds/fields/ typo fix
26
delete mode 100644 docs/devel/bitmaps.md
43
util/vhost-user-server: drop unnecessary QOM cast
27
create mode 100644 docs/interop/bitmaps.rst
44
util/vhost-user-server: drop unnecessary watch deletion
28
create mode 100644 docs/interop/live-block-operations.rst
45
block/export: consolidate request structs into VuBlockReq
29
delete mode 100644 docs/live-block-ops.txt
46
util/vhost-user-server: drop unused DevicePanicNotifier
47
util/vhost-user-server: fix memory leak in vu_message_read()
48
util/vhost-user-server: check EOF when reading payload
49
util/vhost-user-server: rework vu_client_trip() coroutine lifecycle
50
block/export: report flush errors
51
block/export: convert vhost-user-blk server to block export API
52
util/vhost-user-server: move header to include/
53
util/vhost-user-server: use static library in meson.build
54
qemu-storage-daemon: avoid compiling blockdev_ss twice
55
block: move block exports to libblockdev
56
block/export: add iothread and fixed-iothread options
57
block/export: add vhost-user-blk multi-queue support
58
59
Vladimir Sementsov-Ogievskiy (5):
60
block/io: fix bdrv_co_block_status_above
61
block/io: bdrv_common_block_status_above: support include_base
62
block/io: bdrv_common_block_status_above: support bs == base
63
block/io: fix bdrv_is_allocated_above
64
iotests: add commit top->base cases to 274
65
66
MAINTAINERS | 9 +
67
qapi/block-core.json | 24 +-
68
qapi/block-export.json | 36 +-
69
block/coroutines.h | 2 +
70
block/export/vhost-user-blk-server.h | 19 +
71
contrib/libvhost-user/libvhost-user.h | 21 +
72
include/qemu/vhost-user-server.h | 65 +++
73
util/block-helpers.h | 19 +
74
block/export/export.c | 37 +-
75
block/export/vhost-user-blk-server.c | 431 ++++++++++++++++++++
76
block/io.c | 132 +++---
77
block/nvme.c | 27 ++
78
block/qcow2.c | 16 +-
79
contrib/libvhost-user/libvhost-user-glib.c | 2 +-
80
contrib/libvhost-user/libvhost-user.c | 15 +-
81
hw/core/qdev-properties-system.c | 31 +-
82
nbd/server.c | 2 -
83
qemu-nbd.c | 21 +-
84
softmmu/vl.c | 4 +
85
stubs/blk-exp-close-all.c | 7 +
86
tests/vhost-user-bridge.c | 2 +
87
tools/virtiofsd/fuse_virtio.c | 4 +-
88
util/block-helpers.c | 46 +++
89
util/vhost-user-server.c | 446 +++++++++++++++++++++
90
block/export/meson.build | 3 +-
91
contrib/libvhost-user/meson.build | 1 +
92
meson.build | 22 +-
93
nbd/meson.build | 2 +
94
storage-daemon/meson.build | 3 +-
95
stubs/meson.build | 1 +
96
tests/qemu-iotests/274 | 20 +
97
tests/qemu-iotests/274.out | 68 ++++
98
util/meson.build | 4 +
99
33 files changed, 1420 insertions(+), 122 deletions(-)
100
create mode 100644 block/export/vhost-user-blk-server.h
101
create mode 100644 include/qemu/vhost-user-server.h
102
create mode 100644 util/block-helpers.h
103
create mode 100644 block/export/vhost-user-blk-server.c
104
create mode 100644 stubs/blk-exp-close-all.c
105
create mode 100644 util/block-helpers.c
106
create mode 100644 util/vhost-user-server.c
30
107
31
--
108
--
32
2.9.4
109
2.26.2
33
110
34
diff view generated by jsdifflib
New patch
1
From: Philippe Mathieu-Daudé <philmd@redhat.com>
1
2
3
Keep statistics of some hardware errors, and number of
4
aligned/unaligned I/O accesses.
5
6
QMP example booting a full RHEL 8.3 aarch64 guest:
7
8
{ "execute": "query-blockstats" }
9
{
10
"return": [
11
{
12
"device": "",
13
"node-name": "drive0",
14
"stats": {
15
"flush_total_time_ns": 6026948,
16
"wr_highest_offset": 3383991230464,
17
"wr_total_time_ns": 807450995,
18
"failed_wr_operations": 0,
19
"failed_rd_operations": 0,
20
"wr_merged": 3,
21
"wr_bytes": 50133504,
22
"failed_unmap_operations": 0,
23
"failed_flush_operations": 0,
24
"account_invalid": false,
25
"rd_total_time_ns": 1846979900,
26
"flush_operations": 130,
27
"wr_operations": 659,
28
"rd_merged": 1192,
29
"rd_bytes": 218244096,
30
"account_failed": false,
31
"idle_time_ns": 2678641497,
32
"rd_operations": 7406,
33
},
34
"driver-specific": {
35
"driver": "nvme",
36
"completion-errors": 0,
37
"unaligned-accesses": 2959,
38
"aligned-accesses": 4477
39
},
40
"qdev": "/machine/peripheral-anon/device[0]/virtio-backend"
41
}
42
]
43
}
44
45
Suggested-by: Stefan Hajnoczi <stefanha@gmail.com>
46
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
47
Acked-by: Markus Armbruster <armbru@redhat.com>
48
Message-id: 20201001162939.1567915-1-philmd@redhat.com
49
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
50
---
51
qapi/block-core.json | 24 +++++++++++++++++++++++-
52
block/nvme.c | 27 +++++++++++++++++++++++++++
53
2 files changed, 50 insertions(+), 1 deletion(-)
54
55
diff --git a/qapi/block-core.json b/qapi/block-core.json
56
index XXXXXXX..XXXXXXX 100644
57
--- a/qapi/block-core.json
58
+++ b/qapi/block-core.json
59
@@ -XXX,XX +XXX,XX @@
60
'discard-nb-failed': 'uint64',
61
'discard-bytes-ok': 'uint64' } }
62
63
+##
64
+# @BlockStatsSpecificNvme:
65
+#
66
+# NVMe driver statistics
67
+#
68
+# @completion-errors: The number of completion errors.
69
+#
70
+# @aligned-accesses: The number of aligned accesses performed by
71
+# the driver.
72
+#
73
+# @unaligned-accesses: The number of unaligned accesses performed by
74
+# the driver.
75
+#
76
+# Since: 5.2
77
+##
78
+{ 'struct': 'BlockStatsSpecificNvme',
79
+ 'data': {
80
+ 'completion-errors': 'uint64',
81
+ 'aligned-accesses': 'uint64',
82
+ 'unaligned-accesses': 'uint64' } }
83
+
84
##
85
# @BlockStatsSpecific:
86
#
87
@@ -XXX,XX +XXX,XX @@
88
'discriminator': 'driver',
89
'data': {
90
'file': 'BlockStatsSpecificFile',
91
- 'host_device': 'BlockStatsSpecificFile' } }
92
+ 'host_device': 'BlockStatsSpecificFile',
93
+ 'nvme': 'BlockStatsSpecificNvme' } }
94
95
##
96
# @BlockStats:
97
diff --git a/block/nvme.c b/block/nvme.c
98
index XXXXXXX..XXXXXXX 100644
99
--- a/block/nvme.c
100
+++ b/block/nvme.c
101
@@ -XXX,XX +XXX,XX @@ struct BDRVNVMeState {
102
103
/* PCI address (required for nvme_refresh_filename()) */
104
char *device;
105
+
106
+ struct {
107
+ uint64_t completion_errors;
108
+ uint64_t aligned_accesses;
109
+ uint64_t unaligned_accesses;
110
+ } stats;
111
};
112
113
#define NVME_BLOCK_OPT_DEVICE "device"
114
@@ -XXX,XX +XXX,XX @@ static bool nvme_process_completion(NVMeQueuePair *q)
115
break;
116
}
117
ret = nvme_translate_error(c);
118
+ if (ret) {
119
+ s->stats.completion_errors++;
120
+ }
121
q->cq.head = (q->cq.head + 1) % NVME_QUEUE_SIZE;
122
if (!q->cq.head) {
123
q->cq_phase = !q->cq_phase;
124
@@ -XXX,XX +XXX,XX @@ static int nvme_co_prw(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
125
assert(QEMU_IS_ALIGNED(bytes, s->page_size));
126
assert(bytes <= s->max_transfer);
127
if (nvme_qiov_aligned(bs, qiov)) {
128
+ s->stats.aligned_accesses++;
129
return nvme_co_prw_aligned(bs, offset, bytes, qiov, is_write, flags);
130
}
131
+ s->stats.unaligned_accesses++;
132
trace_nvme_prw_buffered(s, offset, bytes, qiov->niov, is_write);
133
buf = qemu_try_memalign(s->page_size, bytes);
134
135
@@ -XXX,XX +XXX,XX @@ static void nvme_unregister_buf(BlockDriverState *bs, void *host)
136
qemu_vfio_dma_unmap(s->vfio, host);
137
}
138
139
+static BlockStatsSpecific *nvme_get_specific_stats(BlockDriverState *bs)
140
+{
141
+ BlockStatsSpecific *stats = g_new(BlockStatsSpecific, 1);
142
+ BDRVNVMeState *s = bs->opaque;
143
+
144
+ stats->driver = BLOCKDEV_DRIVER_NVME;
145
+ stats->u.nvme = (BlockStatsSpecificNvme) {
146
+ .completion_errors = s->stats.completion_errors,
147
+ .aligned_accesses = s->stats.aligned_accesses,
148
+ .unaligned_accesses = s->stats.unaligned_accesses,
149
+ };
150
+
151
+ return stats;
152
+}
153
+
154
static const char *const nvme_strong_runtime_opts[] = {
155
NVME_BLOCK_OPT_DEVICE,
156
NVME_BLOCK_OPT_NAMESPACE,
157
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_nvme = {
158
.bdrv_refresh_filename = nvme_refresh_filename,
159
.bdrv_refresh_limits = nvme_refresh_limits,
160
.strong_runtime_opts = nvme_strong_runtime_opts,
161
+ .bdrv_get_specific_stats = nvme_get_specific_stats,
162
163
.bdrv_detach_aio_context = nvme_detach_aio_context,
164
.bdrv_attach_aio_context = nvme_attach_aio_context,
165
--
166
2.26.2
167
diff view generated by jsdifflib
New patch
1
From: Coiby Xu <coiby.xu@gmail.com>
1
2
3
Allow vu_message_read to be replaced by one which will make use of the
4
QIOChannel functions. Thus reading vhost-user message won't stall the
5
guest. For slave channel, we still use the default vu_message_read.
6
7
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
8
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Message-id: 20200918080912.321299-2-coiby.xu@gmail.com
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
13
contrib/libvhost-user/libvhost-user.h | 21 +++++++++++++++++++++
14
contrib/libvhost-user/libvhost-user-glib.c | 2 +-
15
contrib/libvhost-user/libvhost-user.c | 14 +++++++-------
16
tests/vhost-user-bridge.c | 2 ++
17
tools/virtiofsd/fuse_virtio.c | 4 ++--
18
5 files changed, 33 insertions(+), 10 deletions(-)
19
20
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
21
index XXXXXXX..XXXXXXX 100644
22
--- a/contrib/libvhost-user/libvhost-user.h
23
+++ b/contrib/libvhost-user/libvhost-user.h
24
@@ -XXX,XX +XXX,XX @@
25
*/
26
#define VHOST_USER_MAX_RAM_SLOTS 32
27
28
+#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
29
+
30
typedef enum VhostSetConfigType {
31
VHOST_SET_CONFIG_TYPE_MASTER = 0,
32
VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
33
@@ -XXX,XX +XXX,XX @@ typedef uint64_t (*vu_get_features_cb) (VuDev *dev);
34
typedef void (*vu_set_features_cb) (VuDev *dev, uint64_t features);
35
typedef int (*vu_process_msg_cb) (VuDev *dev, VhostUserMsg *vmsg,
36
int *do_reply);
37
+typedef bool (*vu_read_msg_cb) (VuDev *dev, int sock, VhostUserMsg *vmsg);
38
typedef void (*vu_queue_set_started_cb) (VuDev *dev, int qidx, bool started);
39
typedef bool (*vu_queue_is_processed_in_order_cb) (VuDev *dev, int qidx);
40
typedef int (*vu_get_config_cb) (VuDev *dev, uint8_t *config, uint32_t len);
41
@@ -XXX,XX +XXX,XX @@ struct VuDev {
42
bool broken;
43
uint16_t max_queues;
44
45
+ /* @read_msg: custom method to read vhost-user message
46
+ *
47
+ * Read data from vhost_user socket fd and fill up
48
+ * the passed VhostUserMsg *vmsg struct.
49
+ *
50
+ * If reading fails, it should close the received set of file
51
+ * descriptors as socket message's auxiliary data.
52
+ *
53
+ * For the details, please refer to vu_message_read in libvhost-user.c
54
+ * which will be used by default if not custom method is provided when
55
+ * calling vu_init
56
+ *
57
+ * Returns: true if vhost-user message successfully received,
58
+ * otherwise return false.
59
+ *
60
+ */
61
+ vu_read_msg_cb read_msg;
62
/* @set_watch: add or update the given fd to the watch set,
63
* call cb when condition is met */
64
vu_set_watch_cb set_watch;
65
@@ -XXX,XX +XXX,XX @@ bool vu_init(VuDev *dev,
66
uint16_t max_queues,
67
int socket,
68
vu_panic_cb panic,
69
+ vu_read_msg_cb read_msg,
70
vu_set_watch_cb set_watch,
71
vu_remove_watch_cb remove_watch,
72
const VuDevIface *iface);
73
diff --git a/contrib/libvhost-user/libvhost-user-glib.c b/contrib/libvhost-user/libvhost-user-glib.c
74
index XXXXXXX..XXXXXXX 100644
75
--- a/contrib/libvhost-user/libvhost-user-glib.c
76
+++ b/contrib/libvhost-user/libvhost-user-glib.c
77
@@ -XXX,XX +XXX,XX @@ vug_init(VugDev *dev, uint16_t max_queues, int socket,
78
g_assert(dev);
79
g_assert(iface);
80
81
- if (!vu_init(&dev->parent, max_queues, socket, panic, set_watch,
82
+ if (!vu_init(&dev->parent, max_queues, socket, panic, NULL, set_watch,
83
remove_watch, iface)) {
84
return false;
85
}
86
diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
87
index XXXXXXX..XXXXXXX 100644
88
--- a/contrib/libvhost-user/libvhost-user.c
89
+++ b/contrib/libvhost-user/libvhost-user.c
90
@@ -XXX,XX +XXX,XX @@
91
/* The version of inflight buffer */
92
#define INFLIGHT_VERSION 1
93
94
-#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
95
-
96
/* The version of the protocol we support */
97
#define VHOST_USER_VERSION 1
98
#define LIBVHOST_USER_DEBUG 0
99
@@ -XXX,XX +XXX,XX @@ have_userfault(void)
100
}
101
102
static bool
103
-vu_message_read(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
104
+vu_message_read_default(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
105
{
106
char control[CMSG_SPACE(VHOST_MEMORY_BASELINE_NREGIONS * sizeof(int))] = {};
107
struct iovec iov = {
108
@@ -XXX,XX +XXX,XX @@ vu_process_message_reply(VuDev *dev, const VhostUserMsg *vmsg)
109
goto out;
110
}
111
112
- if (!vu_message_read(dev, dev->slave_fd, &msg_reply)) {
113
+ if (!vu_message_read_default(dev, dev->slave_fd, &msg_reply)) {
114
goto out;
115
}
116
117
@@ -XXX,XX +XXX,XX @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
118
/* Wait for QEMU to confirm that it's registered the handler for the
119
* faults.
120
*/
121
- if (!vu_message_read(dev, dev->sock, vmsg) ||
122
+ if (!dev->read_msg(dev, dev->sock, vmsg) ||
123
vmsg->size != sizeof(vmsg->payload.u64) ||
124
vmsg->payload.u64 != 0) {
125
vu_panic(dev, "failed to receive valid ack for postcopy set-mem-table");
126
@@ -XXX,XX +XXX,XX @@ vu_dispatch(VuDev *dev)
127
int reply_requested;
128
bool need_reply, success = false;
129
130
- if (!vu_message_read(dev, dev->sock, &vmsg)) {
131
+ if (!dev->read_msg(dev, dev->sock, &vmsg)) {
132
goto end;
133
}
134
135
@@ -XXX,XX +XXX,XX @@ vu_init(VuDev *dev,
136
uint16_t max_queues,
137
int socket,
138
vu_panic_cb panic,
139
+ vu_read_msg_cb read_msg,
140
vu_set_watch_cb set_watch,
141
vu_remove_watch_cb remove_watch,
142
const VuDevIface *iface)
143
@@ -XXX,XX +XXX,XX @@ vu_init(VuDev *dev,
144
145
dev->sock = socket;
146
dev->panic = panic;
147
+ dev->read_msg = read_msg ? read_msg : vu_message_read_default;
148
dev->set_watch = set_watch;
149
dev->remove_watch = remove_watch;
150
dev->iface = iface;
151
@@ -XXX,XX +XXX,XX @@ static void _vu_queue_notify(VuDev *dev, VuVirtq *vq, bool sync)
152
153
vu_message_write(dev, dev->slave_fd, &vmsg);
154
if (ack) {
155
- vu_message_read(dev, dev->slave_fd, &vmsg);
156
+ vu_message_read_default(dev, dev->slave_fd, &vmsg);
157
}
158
return;
159
}
160
diff --git a/tests/vhost-user-bridge.c b/tests/vhost-user-bridge.c
161
index XXXXXXX..XXXXXXX 100644
162
--- a/tests/vhost-user-bridge.c
163
+++ b/tests/vhost-user-bridge.c
164
@@ -XXX,XX +XXX,XX @@ vubr_accept_cb(int sock, void *ctx)
165
VHOST_USER_BRIDGE_MAX_QUEUES,
166
conn_fd,
167
vubr_panic,
168
+ NULL,
169
vubr_set_watch,
170
vubr_remove_watch,
171
&vuiface)) {
172
@@ -XXX,XX +XXX,XX @@ vubr_new(const char *path, bool client)
173
VHOST_USER_BRIDGE_MAX_QUEUES,
174
dev->sock,
175
vubr_panic,
176
+ NULL,
177
vubr_set_watch,
178
vubr_remove_watch,
179
&vuiface)) {
180
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
181
index XXXXXXX..XXXXXXX 100644
182
--- a/tools/virtiofsd/fuse_virtio.c
183
+++ b/tools/virtiofsd/fuse_virtio.c
184
@@ -XXX,XX +XXX,XX @@ int virtio_session_mount(struct fuse_session *se)
185
se->vu_socketfd = data_sock;
186
se->virtio_dev->se = se;
187
pthread_rwlock_init(&se->virtio_dev->vu_dispatch_rwlock, NULL);
188
- vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, fv_set_watch,
189
- fv_remove_watch, &fv_iface);
190
+ vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, NULL,
191
+ fv_set_watch, fv_remove_watch, &fv_iface);
192
193
return 0;
194
}
195
--
196
2.26.2
197
diff view generated by jsdifflib
New patch
1
From: Coiby Xu <coiby.xu@gmail.com>
1
2
3
When the client is running in gdb and quit command is run in gdb,
4
QEMU will still dispatch the event which will cause segment fault in
5
the callback function.
6
7
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
8
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
10
Message-id: 20200918080912.321299-3-coiby.xu@gmail.com
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
13
contrib/libvhost-user/libvhost-user.c | 1 +
14
1 file changed, 1 insertion(+)
15
16
diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
17
index XXXXXXX..XXXXXXX 100644
18
--- a/contrib/libvhost-user/libvhost-user.c
19
+++ b/contrib/libvhost-user/libvhost-user.c
20
@@ -XXX,XX +XXX,XX @@ vu_deinit(VuDev *dev)
21
}
22
23
if (vq->kick_fd != -1) {
24
+ dev->remove_watch(dev, vq->kick_fd);
25
close(vq->kick_fd);
26
vq->kick_fd = -1;
27
}
28
--
29
2.26.2
30
diff view generated by jsdifflib
New patch
1
From: Coiby Xu <coiby.xu@gmail.com>
1
2
3
Sharing QEMU devices via vhost-user protocol.
4
5
Only one vhost-user client can connect to the server one time.
6
7
Suggested-by: Kevin Wolf <kwolf@redhat.com>
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
10
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
12
Message-id: 20200918080912.321299-4-coiby.xu@gmail.com
13
[Fixed size_t %lu -> %zu format string compiler error.
14
--Stefan]
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
16
---
17
util/vhost-user-server.h | 65 ++++++
18
util/vhost-user-server.c | 428 +++++++++++++++++++++++++++++++++++++++
19
util/meson.build | 1 +
20
3 files changed, 494 insertions(+)
21
create mode 100644 util/vhost-user-server.h
22
create mode 100644 util/vhost-user-server.c
23
24
diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
25
new file mode 100644
26
index XXXXXXX..XXXXXXX
27
--- /dev/null
28
+++ b/util/vhost-user-server.h
29
@@ -XXX,XX +XXX,XX @@
30
+/*
31
+ * Sharing QEMU devices via vhost-user protocol
32
+ *
33
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
34
+ * Copyright (c) 2020 Red Hat, Inc.
35
+ *
36
+ * This work is licensed under the terms of the GNU GPL, version 2 or
37
+ * later. See the COPYING file in the top-level directory.
38
+ */
39
+
40
+#ifndef VHOST_USER_SERVER_H
41
+#define VHOST_USER_SERVER_H
42
+
43
+#include "contrib/libvhost-user/libvhost-user.h"
44
+#include "io/channel-socket.h"
45
+#include "io/channel-file.h"
46
+#include "io/net-listener.h"
47
+#include "qemu/error-report.h"
48
+#include "qapi/error.h"
49
+#include "standard-headers/linux/virtio_blk.h"
50
+
51
+typedef struct VuFdWatch {
52
+ VuDev *vu_dev;
53
+ int fd; /*kick fd*/
54
+ void *pvt;
55
+ vu_watch_cb cb;
56
+ bool processing;
57
+ QTAILQ_ENTRY(VuFdWatch) next;
58
+} VuFdWatch;
59
+
60
+typedef struct VuServer VuServer;
61
+typedef void DevicePanicNotifierFn(VuServer *server);
62
+
63
+struct VuServer {
64
+ QIONetListener *listener;
65
+ AioContext *ctx;
66
+ DevicePanicNotifierFn *device_panic_notifier;
67
+ int max_queues;
68
+ const VuDevIface *vu_iface;
69
+ VuDev vu_dev;
70
+ QIOChannel *ioc; /* The I/O channel with the client */
71
+ QIOChannelSocket *sioc; /* The underlying data channel with the client */
72
+ /* IOChannel for fd provided via VHOST_USER_SET_SLAVE_REQ_FD */
73
+ QIOChannel *ioc_slave;
74
+ QIOChannelSocket *sioc_slave;
75
+ Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
76
+ QTAILQ_HEAD(, VuFdWatch) vu_fd_watches;
77
+ /* restart coroutine co_trip if AIOContext is changed */
78
+ bool aio_context_changed;
79
+ bool processing_msg;
80
+};
81
+
82
+bool vhost_user_server_start(VuServer *server,
83
+ SocketAddress *unix_socket,
84
+ AioContext *ctx,
85
+ uint16_t max_queues,
86
+ DevicePanicNotifierFn *device_panic_notifier,
87
+ const VuDevIface *vu_iface,
88
+ Error **errp);
89
+
90
+void vhost_user_server_stop(VuServer *server);
91
+
92
+void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx);
93
+
94
+#endif /* VHOST_USER_SERVER_H */
95
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
96
new file mode 100644
97
index XXXXXXX..XXXXXXX
98
--- /dev/null
99
+++ b/util/vhost-user-server.c
100
@@ -XXX,XX +XXX,XX @@
101
+/*
102
+ * Sharing QEMU devices via vhost-user protocol
103
+ *
104
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
105
+ * Copyright (c) 2020 Red Hat, Inc.
106
+ *
107
+ * This work is licensed under the terms of the GNU GPL, version 2 or
108
+ * later. See the COPYING file in the top-level directory.
109
+ */
110
+#include "qemu/osdep.h"
111
+#include "qemu/main-loop.h"
112
+#include "vhost-user-server.h"
113
+
114
+static void vmsg_close_fds(VhostUserMsg *vmsg)
115
+{
116
+ int i;
117
+ for (i = 0; i < vmsg->fd_num; i++) {
118
+ close(vmsg->fds[i]);
119
+ }
120
+}
121
+
122
+static void vmsg_unblock_fds(VhostUserMsg *vmsg)
123
+{
124
+ int i;
125
+ for (i = 0; i < vmsg->fd_num; i++) {
126
+ qemu_set_nonblock(vmsg->fds[i]);
127
+ }
128
+}
129
+
130
+static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
131
+ gpointer opaque);
132
+
133
+static void close_client(VuServer *server)
134
+{
135
+ /*
136
+ * Before closing the client
137
+ *
138
+ * 1. Let vu_client_trip stop processing new vhost-user msg
139
+ *
140
+ * 2. remove kick_handler
141
+ *
142
+ * 3. wait for the kick handler to be finished
143
+ *
144
+ * 4. wait for the current vhost-user msg to be finished processing
145
+ */
146
+
147
+ QIOChannelSocket *sioc = server->sioc;
148
+ /* When this is set vu_client_trip will stop new processing vhost-user message */
149
+ server->sioc = NULL;
150
+
151
+ VuFdWatch *vu_fd_watch, *next;
152
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
153
+ aio_set_fd_handler(server->ioc->ctx, vu_fd_watch->fd, true, NULL,
154
+ NULL, NULL, NULL);
155
+ }
156
+
157
+ while (!QTAILQ_EMPTY(&server->vu_fd_watches)) {
158
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
159
+ if (!vu_fd_watch->processing) {
160
+ QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
161
+ g_free(vu_fd_watch);
162
+ }
163
+ }
164
+ }
165
+
166
+ while (server->processing_msg) {
167
+ if (server->ioc->read_coroutine) {
168
+ server->ioc->read_coroutine = NULL;
169
+ qio_channel_set_aio_fd_handler(server->ioc, server->ioc->ctx, NULL,
170
+ NULL, server->ioc);
171
+ server->processing_msg = false;
172
+ }
173
+ }
174
+
175
+ vu_deinit(&server->vu_dev);
176
+ object_unref(OBJECT(sioc));
177
+ object_unref(OBJECT(server->ioc));
178
+}
179
+
180
+static void panic_cb(VuDev *vu_dev, const char *buf)
181
+{
182
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
183
+
184
+ /* avoid while loop in close_client */
185
+ server->processing_msg = false;
186
+
187
+ if (buf) {
188
+ error_report("vu_panic: %s", buf);
189
+ }
190
+
191
+ if (server->sioc) {
192
+ close_client(server);
193
+ }
194
+
195
+ if (server->device_panic_notifier) {
196
+ server->device_panic_notifier(server);
197
+ }
198
+
199
+ /*
200
+ * Set the callback function for network listener so another
201
+ * vhost-user client can connect to this server
202
+ */
203
+ qio_net_listener_set_client_func(server->listener,
204
+ vu_accept,
205
+ server,
206
+ NULL);
207
+}
208
+
209
+static bool coroutine_fn
210
+vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
211
+{
212
+ struct iovec iov = {
213
+ .iov_base = (char *)vmsg,
214
+ .iov_len = VHOST_USER_HDR_SIZE,
215
+ };
216
+ int rc, read_bytes = 0;
217
+ Error *local_err = NULL;
218
+ /*
219
+ * Store fds/nfds returned from qio_channel_readv_full into
220
+ * temporary variables.
221
+ *
222
+ * VhostUserMsg is a packed structure, gcc will complain about passing
223
+ * pointer to a packed structure member if we pass &VhostUserMsg.fd_num
224
+ * and &VhostUserMsg.fds directly when calling qio_channel_readv_full,
225
+ * thus two temporary variables nfds and fds are used here.
226
+ */
227
+ size_t nfds = 0, nfds_t = 0;
228
+ const size_t max_fds = G_N_ELEMENTS(vmsg->fds);
229
+ int *fds_t = NULL;
230
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
231
+ QIOChannel *ioc = server->ioc;
232
+
233
+ if (!ioc) {
234
+ error_report_err(local_err);
235
+ goto fail;
236
+ }
237
+
238
+ assert(qemu_in_coroutine());
239
+ do {
240
+ /*
241
+ * qio_channel_readv_full may have short reads, keeping calling it
242
+ * until getting VHOST_USER_HDR_SIZE or 0 bytes in total
243
+ */
244
+ rc = qio_channel_readv_full(ioc, &iov, 1, &fds_t, &nfds_t, &local_err);
245
+ if (rc < 0) {
246
+ if (rc == QIO_CHANNEL_ERR_BLOCK) {
247
+ qio_channel_yield(ioc, G_IO_IN);
248
+ continue;
249
+ } else {
250
+ error_report_err(local_err);
251
+ return false;
252
+ }
253
+ }
254
+ read_bytes += rc;
255
+ if (nfds_t > 0) {
256
+ if (nfds + nfds_t > max_fds) {
257
+ error_report("A maximum of %zu fds are allowed, "
258
+ "however got %zu fds now",
259
+ max_fds, nfds + nfds_t);
260
+ goto fail;
261
+ }
262
+ memcpy(vmsg->fds + nfds, fds_t,
263
+ nfds_t *sizeof(vmsg->fds[0]));
264
+ nfds += nfds_t;
265
+ g_free(fds_t);
266
+ }
267
+ if (read_bytes == VHOST_USER_HDR_SIZE || rc == 0) {
268
+ break;
269
+ }
270
+ iov.iov_base = (char *)vmsg + read_bytes;
271
+ iov.iov_len = VHOST_USER_HDR_SIZE - read_bytes;
272
+ } while (true);
273
+
274
+ vmsg->fd_num = nfds;
275
+ /* qio_channel_readv_full will make socket fds blocking, unblock them */
276
+ vmsg_unblock_fds(vmsg);
277
+ if (vmsg->size > sizeof(vmsg->payload)) {
278
+ error_report("Error: too big message request: %d, "
279
+ "size: vmsg->size: %u, "
280
+ "while sizeof(vmsg->payload) = %zu",
281
+ vmsg->request, vmsg->size, sizeof(vmsg->payload));
282
+ goto fail;
283
+ }
284
+
285
+ struct iovec iov_payload = {
286
+ .iov_base = (char *)&vmsg->payload,
287
+ .iov_len = vmsg->size,
288
+ };
289
+ if (vmsg->size) {
290
+ rc = qio_channel_readv_all_eof(ioc, &iov_payload, 1, &local_err);
291
+ if (rc == -1) {
292
+ error_report_err(local_err);
293
+ goto fail;
294
+ }
295
+ }
296
+
297
+ return true;
298
+
299
+fail:
300
+ vmsg_close_fds(vmsg);
301
+
302
+ return false;
303
+}
304
+
305
+
306
+static void vu_client_start(VuServer *server);
307
+static coroutine_fn void vu_client_trip(void *opaque)
308
+{
309
+ VuServer *server = opaque;
310
+
311
+ while (!server->aio_context_changed && server->sioc) {
312
+ server->processing_msg = true;
313
+ vu_dispatch(&server->vu_dev);
314
+ server->processing_msg = false;
315
+ }
316
+
317
+ if (server->aio_context_changed && server->sioc) {
318
+ server->aio_context_changed = false;
319
+ vu_client_start(server);
320
+ }
321
+}
322
+
323
+static void vu_client_start(VuServer *server)
324
+{
325
+ server->co_trip = qemu_coroutine_create(vu_client_trip, server);
326
+ aio_co_enter(server->ctx, server->co_trip);
327
+}
328
+
329
+/*
330
+ * a wrapper for vu_kick_cb
331
+ *
332
+ * since aio_dispatch can only pass one user data pointer to the
333
+ * callback function, pack VuDev and pvt into a struct. Then unpack it
334
+ * and pass them to vu_kick_cb
335
+ */
336
+static void kick_handler(void *opaque)
337
+{
338
+ VuFdWatch *vu_fd_watch = opaque;
339
+ vu_fd_watch->processing = true;
340
+ vu_fd_watch->cb(vu_fd_watch->vu_dev, 0, vu_fd_watch->pvt);
341
+ vu_fd_watch->processing = false;
342
+}
343
+
344
+
345
+static VuFdWatch *find_vu_fd_watch(VuServer *server, int fd)
346
+{
347
+
348
+ VuFdWatch *vu_fd_watch, *next;
349
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
350
+ if (vu_fd_watch->fd == fd) {
351
+ return vu_fd_watch;
352
+ }
353
+ }
354
+ return NULL;
355
+}
356
+
357
+static void
358
+set_watch(VuDev *vu_dev, int fd, int vu_evt,
359
+ vu_watch_cb cb, void *pvt)
360
+{
361
+
362
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
363
+ g_assert(vu_dev);
364
+ g_assert(fd >= 0);
365
+ g_assert(cb);
366
+
367
+ VuFdWatch *vu_fd_watch = find_vu_fd_watch(server, fd);
368
+
369
+ if (!vu_fd_watch) {
370
+ VuFdWatch *vu_fd_watch = g_new0(VuFdWatch, 1);
371
+
372
+ QTAILQ_INSERT_TAIL(&server->vu_fd_watches, vu_fd_watch, next);
373
+
374
+ vu_fd_watch->fd = fd;
375
+ vu_fd_watch->cb = cb;
376
+ qemu_set_nonblock(fd);
377
+ aio_set_fd_handler(server->ioc->ctx, fd, true, kick_handler,
378
+ NULL, NULL, vu_fd_watch);
379
+ vu_fd_watch->vu_dev = vu_dev;
380
+ vu_fd_watch->pvt = pvt;
381
+ }
382
+}
383
+
384
+
385
+static void remove_watch(VuDev *vu_dev, int fd)
386
+{
387
+ VuServer *server;
388
+ g_assert(vu_dev);
389
+ g_assert(fd >= 0);
390
+
391
+ server = container_of(vu_dev, VuServer, vu_dev);
392
+
393
+ VuFdWatch *vu_fd_watch = find_vu_fd_watch(server, fd);
394
+
395
+ if (!vu_fd_watch) {
396
+ return;
397
+ }
398
+ aio_set_fd_handler(server->ioc->ctx, fd, true, NULL, NULL, NULL, NULL);
399
+
400
+ QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
401
+ g_free(vu_fd_watch);
402
+}
403
+
404
+
405
+static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
406
+ gpointer opaque)
407
+{
408
+ VuServer *server = opaque;
409
+
410
+ if (server->sioc) {
411
+ warn_report("Only one vhost-user client is allowed to "
412
+ "connect the server one time");
413
+ return;
414
+ }
415
+
416
+ if (!vu_init(&server->vu_dev, server->max_queues, sioc->fd, panic_cb,
417
+ vu_message_read, set_watch, remove_watch, server->vu_iface)) {
418
+ error_report("Failed to initialize libvhost-user");
419
+ return;
420
+ }
421
+
422
+ /*
423
+ * Unset the callback function for network listener to make another
424
+ * vhost-user client keeping waiting until this client disconnects
425
+ */
426
+ qio_net_listener_set_client_func(server->listener,
427
+ NULL,
428
+ NULL,
429
+ NULL);
430
+ server->sioc = sioc;
431
+ /*
432
+ * Increase the object reference, so sioc will not freed by
433
+ * qio_net_listener_channel_func which will call object_unref(OBJECT(sioc))
434
+ */
435
+ object_ref(OBJECT(server->sioc));
436
+ qio_channel_set_name(QIO_CHANNEL(sioc), "vhost-user client");
437
+ server->ioc = QIO_CHANNEL(sioc);
438
+ object_ref(OBJECT(server->ioc));
439
+ qio_channel_attach_aio_context(server->ioc, server->ctx);
440
+ qio_channel_set_blocking(QIO_CHANNEL(server->sioc), false, NULL);
441
+ vu_client_start(server);
442
+}
443
+
444
+
445
+void vhost_user_server_stop(VuServer *server)
446
+{
447
+ if (server->sioc) {
448
+ close_client(server);
449
+ }
450
+
451
+ if (server->listener) {
452
+ qio_net_listener_disconnect(server->listener);
453
+ object_unref(OBJECT(server->listener));
454
+ }
455
+
456
+}
457
+
458
+void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx)
459
+{
460
+ VuFdWatch *vu_fd_watch, *next;
461
+ void *opaque = NULL;
462
+ IOHandler *io_read = NULL;
463
+ bool attach;
464
+
465
+ server->ctx = ctx ? ctx : qemu_get_aio_context();
466
+
467
+ if (!server->sioc) {
468
+ /* not yet serving any client*/
469
+ return;
470
+ }
471
+
472
+ if (ctx) {
473
+ qio_channel_attach_aio_context(server->ioc, ctx);
474
+ server->aio_context_changed = true;
475
+ io_read = kick_handler;
476
+ attach = true;
477
+ } else {
478
+ qio_channel_detach_aio_context(server->ioc);
479
+ /* server->ioc->ctx keeps the old AioConext */
480
+ ctx = server->ioc->ctx;
481
+ attach = false;
482
+ }
483
+
484
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
485
+ if (vu_fd_watch->cb) {
486
+ opaque = attach ? vu_fd_watch : NULL;
487
+ aio_set_fd_handler(ctx, vu_fd_watch->fd, true,
488
+ io_read, NULL, NULL,
489
+ opaque);
490
+ }
491
+ }
492
+}
493
+
494
+
495
+bool vhost_user_server_start(VuServer *server,
496
+ SocketAddress *socket_addr,
497
+ AioContext *ctx,
498
+ uint16_t max_queues,
499
+ DevicePanicNotifierFn *device_panic_notifier,
500
+ const VuDevIface *vu_iface,
501
+ Error **errp)
502
+{
503
+ QIONetListener *listener = qio_net_listener_new();
504
+ if (qio_net_listener_open_sync(listener, socket_addr, 1,
505
+ errp) < 0) {
506
+ object_unref(OBJECT(listener));
507
+ return false;
508
+ }
509
+
510
+ /* zero out unspecified fileds */
511
+ *server = (VuServer) {
512
+ .listener = listener,
513
+ .vu_iface = vu_iface,
514
+ .max_queues = max_queues,
515
+ .ctx = ctx,
516
+ .device_panic_notifier = device_panic_notifier,
517
+ };
518
+
519
+ qio_net_listener_set_name(server->listener, "vhost-user-backend-listener");
520
+
521
+ qio_net_listener_set_client_func(server->listener,
522
+ vu_accept,
523
+ server,
524
+ NULL);
525
+
526
+ QTAILQ_INIT(&server->vu_fd_watches);
527
+ return true;
528
+}
529
diff --git a/util/meson.build b/util/meson.build
530
index XXXXXXX..XXXXXXX 100644
531
--- a/util/meson.build
532
+++ b/util/meson.build
533
@@ -XXX,XX +XXX,XX @@ if have_block
534
util_ss.add(files('main-loop.c'))
535
util_ss.add(files('nvdimm-utils.c'))
536
util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
537
+ util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
538
util_ss.add(files('qemu-coroutine-sleep.c'))
539
util_ss.add(files('qemu-co-shared-resource.c'))
540
util_ss.add(files('thread-pool.c', 'qemu-timer.c'))
541
--
542
2.26.2
543
diff view generated by jsdifflib
New patch
1
From: Coiby Xu <coiby.xu@gmail.com>
1
2
3
Move the constants from hw/core/qdev-properties.c to
4
util/block-helpers.h so that knowledge of the min/max values is
5
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
8
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
10
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
11
Message-id: 20200918080912.321299-5-coiby.xu@gmail.com
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
---
14
util/block-helpers.h | 19 +++++++++++++
15
hw/core/qdev-properties-system.c | 31 ++++-----------------
16
util/block-helpers.c | 46 ++++++++++++++++++++++++++++++++
17
util/meson.build | 1 +
18
4 files changed, 71 insertions(+), 26 deletions(-)
19
create mode 100644 util/block-helpers.h
20
create mode 100644 util/block-helpers.c
21
22
diff --git a/util/block-helpers.h b/util/block-helpers.h
23
new file mode 100644
24
index XXXXXXX..XXXXXXX
25
--- /dev/null
26
+++ b/util/block-helpers.h
27
@@ -XXX,XX +XXX,XX @@
28
+#ifndef BLOCK_HELPERS_H
29
+#define BLOCK_HELPERS_H
30
+
31
+#include "qemu/units.h"
32
+
33
+/* lower limit is sector size */
34
+#define MIN_BLOCK_SIZE INT64_C(512)
35
+#define MIN_BLOCK_SIZE_STR "512 B"
36
+/*
37
+ * upper limit is arbitrary, 2 MiB looks sufficient for all sensible uses, and
38
+ * matches qcow2 cluster size limit
39
+ */
40
+#define MAX_BLOCK_SIZE (2 * MiB)
41
+#define MAX_BLOCK_SIZE_STR "2 MiB"
42
+
43
+void check_block_size(const char *id, const char *name, int64_t value,
44
+ Error **errp);
45
+
46
+#endif /* BLOCK_HELPERS_H */
47
diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
48
index XXXXXXX..XXXXXXX 100644
49
--- a/hw/core/qdev-properties-system.c
50
+++ b/hw/core/qdev-properties-system.c
51
@@ -XXX,XX +XXX,XX @@
52
#include "sysemu/blockdev.h"
53
#include "net/net.h"
54
#include "hw/pci/pci.h"
55
+#include "util/block-helpers.h"
56
57
static bool check_prop_still_unset(DeviceState *dev, const char *name,
58
const void *old_val, const char *new_val,
59
@@ -XXX,XX +XXX,XX @@ const PropertyInfo qdev_prop_losttickpolicy = {
60
61
/* --- blocksize --- */
62
63
-/* lower limit is sector size */
64
-#define MIN_BLOCK_SIZE 512
65
-#define MIN_BLOCK_SIZE_STR "512 B"
66
-/*
67
- * upper limit is arbitrary, 2 MiB looks sufficient for all sensible uses, and
68
- * matches qcow2 cluster size limit
69
- */
70
-#define MAX_BLOCK_SIZE (2 * MiB)
71
-#define MAX_BLOCK_SIZE_STR "2 MiB"
72
-
73
static void set_blocksize(Object *obj, Visitor *v, const char *name,
74
void *opaque, Error **errp)
75
{
76
@@ -XXX,XX +XXX,XX @@ static void set_blocksize(Object *obj, Visitor *v, const char *name,
77
Property *prop = opaque;
78
uint32_t *ptr = qdev_get_prop_ptr(dev, prop);
79
uint64_t value;
80
+ Error *local_err = NULL;
81
82
if (dev->realized) {
83
qdev_prop_set_after_realize(dev, name, errp);
84
@@ -XXX,XX +XXX,XX @@ static void set_blocksize(Object *obj, Visitor *v, const char *name,
85
if (!visit_type_size(v, name, &value, errp)) {
86
return;
87
}
88
- /* value of 0 means "unset" */
89
- if (value && (value < MIN_BLOCK_SIZE || value > MAX_BLOCK_SIZE)) {
90
- error_setg(errp,
91
- "Property %s.%s doesn't take value %" PRIu64
92
- " (minimum: " MIN_BLOCK_SIZE_STR
93
- ", maximum: " MAX_BLOCK_SIZE_STR ")",
94
- dev->id ? : "", name, value);
95
+ check_block_size(dev->id ? : "", name, value, &local_err);
96
+ if (local_err) {
97
+ error_propagate(errp, local_err);
98
return;
99
}
100
-
101
- /* We rely on power-of-2 blocksizes for bitmasks */
102
- if ((value & (value - 1)) != 0) {
103
- error_setg(errp,
104
- "Property %s.%s doesn't take value '%" PRId64 "', "
105
- "it's not a power of 2", dev->id ?: "", name, (int64_t)value);
106
- return;
107
- }
108
-
109
*ptr = value;
110
}
111
112
diff --git a/util/block-helpers.c b/util/block-helpers.c
113
new file mode 100644
114
index XXXXXXX..XXXXXXX
115
--- /dev/null
116
+++ b/util/block-helpers.c
117
@@ -XXX,XX +XXX,XX @@
118
+/*
119
+ * Block utility functions
120
+ *
121
+ * Copyright IBM, Corp. 2011
122
+ * Copyright (c) 2020 Coiby Xu <coiby.xu@gmail.com>
123
+ *
124
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
125
+ * See the COPYING file in the top-level directory.
126
+ */
127
+
128
+#include "qemu/osdep.h"
129
+#include "qapi/error.h"
130
+#include "qapi/qmp/qerror.h"
131
+#include "block-helpers.h"
132
+
133
+/**
134
+ * check_block_size:
135
+ * @id: The unique ID of the object
136
+ * @name: The name of the property being validated
137
+ * @value: The block size in bytes
138
+ * @errp: A pointer to an area to store an error
139
+ *
140
+ * This function checks that the block size meets the following conditions:
141
+ * 1. At least MIN_BLOCK_SIZE
142
+ * 2. No larger than MAX_BLOCK_SIZE
143
+ * 3. A power of 2
144
+ */
145
+void check_block_size(const char *id, const char *name, int64_t value,
146
+ Error **errp)
147
+{
148
+ /* value of 0 means "unset" */
149
+ if (value && (value < MIN_BLOCK_SIZE || value > MAX_BLOCK_SIZE)) {
150
+ error_setg(errp, QERR_PROPERTY_VALUE_OUT_OF_RANGE,
151
+ id, name, value, MIN_BLOCK_SIZE, MAX_BLOCK_SIZE);
152
+ return;
153
+ }
154
+
155
+ /* We rely on power-of-2 blocksizes for bitmasks */
156
+ if ((value & (value - 1)) != 0) {
157
+ error_setg(errp,
158
+ "Property %s.%s doesn't take value '%" PRId64
159
+ "', it's not a power of 2",
160
+ id, name, value);
161
+ return;
162
+ }
163
+}
164
diff --git a/util/meson.build b/util/meson.build
165
index XXXXXXX..XXXXXXX 100644
166
--- a/util/meson.build
167
+++ b/util/meson.build
168
@@ -XXX,XX +XXX,XX @@ if have_block
169
util_ss.add(files('nvdimm-utils.c'))
170
util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
171
util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
172
+ util_ss.add(files('block-helpers.c'))
173
util_ss.add(files('qemu-coroutine-sleep.c'))
174
util_ss.add(files('qemu-co-shared-resource.c'))
175
util_ss.add(files('thread-pool.c', 'qemu-timer.c'))
176
--
177
2.26.2
178
diff view generated by jsdifflib
1
From: Kashyap Chamarthy <kchamart@redhat.com>
1
From: Coiby Xu <coiby.xu@gmail.com>
2
2
3
This patch documents (including their QMP invocations) all the four
3
By making use of libvhost-user, block device drive can be shared to
4
major kinds of live block operations:
4
the connected vhost-user client. Only one client can connect to the
5
server one time.
5
6
6
- `block-stream`
7
Since vhost-user-server needs a block drive to be created first, delay
7
- `block-commit`
8
the creation of this object.
8
- `drive-mirror` (& `blockdev-mirror`)
9
- `drive-backup` (& `blockdev-backup`)
10
9
11
Things considered while writing this document:
10
Suggested-by: Kevin Wolf <kwolf@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
13
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
14
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
15
Message-id: 20200918080912.321299-6-coiby.xu@gmail.com
16
[Shorten "vhost_user_blk_server" string to "vhost_user_blk" to avoid the
17
following compiler warning:
18
../block/export/vhost-user-blk-server.c:178:50: error: ‘%s’ directive output truncated writing 21 bytes into a region of size 20 [-Werror=format-truncation=]
19
and fix "Invalid size %ld ..." ssize_t format string arguments for
20
32-bit hosts.
21
--Stefan]
22
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
23
---
24
block/export/vhost-user-blk-server.h | 36 ++
25
block/export/vhost-user-blk-server.c | 661 +++++++++++++++++++++++++++
26
softmmu/vl.c | 4 +
27
block/meson.build | 1 +
28
4 files changed, 702 insertions(+)
29
create mode 100644 block/export/vhost-user-blk-server.h
30
create mode 100644 block/export/vhost-user-blk-server.c
12
31
13
- Use reStructuredText as markup language (with the goal of generating
32
diff --git a/block/export/vhost-user-blk-server.h b/block/export/vhost-user-blk-server.h
14
the HTML output using the Sphinx Documentation Generator). It is
15
gentler on the eye, and can be trivially converted to different
16
formats. (Another reason: upstream QEMU is considering to switch to
17
Sphinx, which uses reStructuredText as its markup language.)
18
19
- Raw QMP JSON output vs. 'qmp-shell'. I debated with myself whether
20
to only show raw QMP JSON output (as that is the canonical
21
representation), or use 'qmp-shell', which takes key-value pairs. I
22
settled on the approach of: for the first occurrence of a command,
23
use raw JSON; for subsequent occurrences, use 'qmp-shell', with an
24
occasional exception.
25
26
- Usage of `-blockdev` command-line.
27
28
- Usage of 'node-name' vs. file path to refer to disks. While we have
29
`blockdev-{mirror, backup}` as 'node-name'-alternatives for
30
`drive-{mirror, backup}`, the `block-commit` command still operates
31
on file names for parameters 'base' and 'top'. So I added a caveat
32
at the beginning to that effect.
33
34
Refer this related thread that I started (where I learnt
35
`block-stream` was recently reworked to accept 'node-name' for 'top'
36
and 'base' parameters):
37
https://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg06466.html
38
"[RFC] Making 'block-stream', and 'block-commit' accept node-name"
39
40
All commands showed in this document were tested while documenting.
41
42
Thanks: Eric Blake for the section: "A note on points-in-time vs file
43
names". This useful bit was originally articulated by Eric in his
44
KVMForum 2015 presentation, so I included that specific bit in this
45
document.
46
47
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
48
Reviewed-by: Eric Blake <eblake@redhat.com>
49
Message-id: 20170717105205.32639-3-kchamart@redhat.com
50
Signed-off-by: Jeff Cody <jcody@redhat.com>
51
---
52
docs/interop/live-block-operations.rst | 1088 ++++++++++++++++++++++++++++++++
53
docs/live-block-ops.txt | 72 ---
54
2 files changed, 1088 insertions(+), 72 deletions(-)
55
create mode 100644 docs/interop/live-block-operations.rst
56
delete mode 100644 docs/live-block-ops.txt
57
58
diff --git a/docs/interop/live-block-operations.rst b/docs/interop/live-block-operations.rst
59
new file mode 100644
33
new file mode 100644
60
index XXXXXXX..XXXXXXX
34
index XXXXXXX..XXXXXXX
61
--- /dev/null
35
--- /dev/null
62
+++ b/docs/interop/live-block-operations.rst
36
+++ b/block/export/vhost-user-blk-server.h
63
@@ -XXX,XX +XXX,XX @@
37
@@ -XXX,XX +XXX,XX @@
64
+..
38
+/*
65
+ Copyright (C) 2017 Red Hat Inc.
39
+ * Sharing QEMU block devices via vhost-user protocal
66
+
40
+ *
67
+ This work is licensed under the terms of the GNU GPL, version 2 or
41
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
68
+ later. See the COPYING file in the top-level directory.
42
+ * Copyright (c) 2020 Red Hat, Inc.
69
+
43
+ *
70
+============================
44
+ * This work is licensed under the terms of the GNU GPL, version 2 or
71
+Live Block Device Operations
45
+ * later. See the COPYING file in the top-level directory.
72
+============================
46
+ */
73
+
47
+
74
+QEMU Block Layer currently (as of QEMU 2.9) supports four major kinds of
48
+#ifndef VHOST_USER_BLK_SERVER_H
75
+live block device jobs -- stream, commit, mirror, and backup. These can
49
+#define VHOST_USER_BLK_SERVER_H
76
+be used to manipulate disk image chains to accomplish certain tasks,
50
+#include "util/vhost-user-server.h"
77
+namely: live copy data from backing files into overlays; shorten long
51
+
78
+disk image chains by merging data from overlays into backing files; live
52
+typedef struct VuBlockDev VuBlockDev;
79
+synchronize data from a disk image chain (including current active disk)
53
+#define TYPE_VHOST_USER_BLK_SERVER "vhost-user-blk-server"
80
+to another target image; and point-in-time (and incremental) backups of
54
+#define VHOST_USER_BLK_SERVER(obj) \
81
+a block device. Below is a description of the said block (QMP)
55
+ OBJECT_CHECK(VuBlockDev, obj, TYPE_VHOST_USER_BLK_SERVER)
82
+primitives, and some (non-exhaustive list of) examples to illustrate
56
+
83
+their use.
57
+/* vhost user block device */
84
+
58
+struct VuBlockDev {
85
+.. note::
59
+ Object parent_obj;
86
+ The file ``qapi/block-core.json`` in the QEMU source tree has the
60
+ char *node_name;
87
+ canonical QEMU API (QAPI) schema documentation for the QMP
61
+ SocketAddress *addr;
88
+ primitives discussed here.
62
+ AioContext *ctx;
89
+
63
+ VuServer vu_server;
90
+.. todo (kashyapc):: Remove the ".. contents::" directive when Sphinx is
64
+ bool running;
91
+ integrated.
65
+ uint32_t blk_size;
92
+
66
+ BlockBackend *backend;
93
+.. contents::
67
+ QIOChannelSocket *sioc;
94
+
68
+ QTAILQ_ENTRY(VuBlockDev) next;
95
+Disk image backing chain notation
69
+ struct virtio_blk_config blkcfg;
96
+---------------------------------
70
+ bool writable;
97
+
71
+};
98
+A simple disk image chain. (This can be created live using QMP
72
+
99
+``blockdev-snapshot-sync``, or offline via ``qemu-img``)::
73
+#endif /* VHOST_USER_BLK_SERVER_H */
100
+
74
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
101
+ (Live QEMU)
75
new file mode 100644
102
+ |
76
index XXXXXXX..XXXXXXX
103
+ .
77
--- /dev/null
104
+ V
78
+++ b/block/export/vhost-user-blk-server.c
105
+
79
@@ -XXX,XX +XXX,XX @@
106
+ [A] <----- [B]
80
+/*
107
+
81
+ * Sharing QEMU block devices via vhost-user protocal
108
+ (backing file) (overlay)
82
+ *
109
+
83
+ * Parts of the code based on nbd/server.c.
110
+The arrow can be read as: Image [A] is the backing file of disk image
84
+ *
111
+[B]. And live QEMU is currently writing to image [B], consequently, it
85
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
112
+is also referred to as the "active layer".
86
+ * Copyright (c) 2020 Red Hat, Inc.
113
+
87
+ *
114
+There are two kinds of terminology that are common when referring to
88
+ * This work is licensed under the terms of the GNU GPL, version 2 or
115
+files in a disk image backing chain:
89
+ * later. See the COPYING file in the top-level directory.
116
+
90
+ */
117
+(1) Directional: 'base' and 'top'. Given the simple disk image chain
91
+#include "qemu/osdep.h"
118
+ above, image [A] can be referred to as 'base', and image [B] as
92
+#include "block/block.h"
119
+ 'top'. (This terminology can be seen in in QAPI schema file,
93
+#include "vhost-user-blk-server.h"
120
+ block-core.json.)
94
+#include "qapi/error.h"
121
+
95
+#include "qom/object_interfaces.h"
122
+(2) Relational: 'backing file' and 'overlay'. Again, taking the same
96
+#include "sysemu/block-backend.h"
123
+ simple disk image chain from the above, disk image [A] is referred
97
+#include "util/block-helpers.h"
124
+ to as the backing file, and image [B] as overlay.
98
+
125
+
99
+enum {
126
+ Throughout this document, we will use the relational terminology.
100
+ VHOST_USER_BLK_MAX_QUEUES = 1,
127
+
101
+};
128
+.. important::
102
+struct virtio_blk_inhdr {
129
+ The overlay files can generally be any format that supports a
103
+ unsigned char status;
130
+ backing file, although QCOW2 is the preferred format and the one
104
+};
131
+ used in this document.
105
+
132
+
106
+typedef struct VuBlockReq {
133
+
107
+ VuVirtqElement *elem;
134
+Brief overview of live block QMP primitives
108
+ int64_t sector_num;
135
+-------------------------------------------
109
+ size_t size;
136
+
110
+ struct virtio_blk_inhdr *in;
137
+The following are the four different kinds of live block operations that
111
+ struct virtio_blk_outhdr out;
138
+QEMU block layer supports.
112
+ VuServer *server;
139
+
113
+ struct VuVirtq *vq;
140
+(1) ``block-stream``: Live copy of data from backing files into overlay
114
+} VuBlockReq;
141
+ files.
115
+
142
+
116
+static void vu_block_req_complete(VuBlockReq *req)
143
+ .. note:: Once the 'stream' operation has finished, three things to
117
+{
144
+ note:
118
+ VuDev *vu_dev = &req->server->vu_dev;
145
+
119
+
146
+ (a) QEMU rewrites the backing chain to remove
120
+ /* IO size with 1 extra status byte */
147
+ reference to the now-streamed and redundant backing
121
+ vu_queue_push(vu_dev, req->vq, req->elem, req->size + 1);
148
+ file;
122
+ vu_queue_notify(vu_dev, req->vq);
149
+
123
+
150
+ (b) the streamed file *itself* won't be removed by QEMU,
124
+ if (req->elem) {
151
+ and must be explicitly discarded by the user;
125
+ free(req->elem);
152
+
126
+ }
153
+ (c) the streamed file remains valid -- i.e. further
127
+
154
+ overlays can be created based on it. Refer the
128
+ g_free(req);
155
+ ``block-stream`` section further below for more
129
+}
156
+ details.
130
+
157
+
131
+static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
158
+(2) ``block-commit``: Live merge of data from overlay files into backing
132
+{
159
+ files (with the optional goal of removing the overlay file from the
133
+ return container_of(server, VuBlockDev, vu_server);
160
+ chain). Since QEMU 2.0, this includes "active ``block-commit``"
134
+}
161
+ (i.e. merge the current active layer into the base image).
135
+
162
+
136
+static int coroutine_fn
163
+ .. note:: Once the 'commit' operation has finished, there are three
137
+vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
164
+ things to note here as well:
138
+ uint32_t iovcnt, uint32_t type)
165
+
139
+{
166
+ (a) QEMU rewrites the backing chain to remove reference
140
+ struct virtio_blk_discard_write_zeroes desc;
167
+ to now-redundant overlay images that have been
141
+ ssize_t size = iov_to_buf(iov, iovcnt, 0, &desc, sizeof(desc));
168
+ committed into a backing file;
142
+ if (unlikely(size != sizeof(desc))) {
169
+
143
+ error_report("Invalid size %zd, expect %zu", size, sizeof(desc));
170
+ (b) the committed file *itself* won't be removed by QEMU
144
+ return -EINVAL;
171
+ -- it ought to be manually removed;
145
+ }
172
+
146
+
173
+ (c) however, unlike in the case of ``block-stream``, the
147
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
174
+ intermediate images will be rendered invalid -- i.e.
148
+ uint64_t range[2] = { le64_to_cpu(desc.sector) << 9,
175
+ no more further overlays can be created based on
149
+ le32_to_cpu(desc.num_sectors) << 9 };
176
+ them. Refer the ``block-commit`` section further
150
+ if (type == VIRTIO_BLK_T_DISCARD) {
177
+ below for more details.
151
+ if (blk_co_pdiscard(vdev_blk->backend, range[0], range[1]) == 0) {
178
+
152
+ return 0;
179
+(3) ``drive-mirror`` (and ``blockdev-mirror``): Synchronize a running
180
+ disk to another image.
181
+
182
+(4) ``drive-backup`` (and ``blockdev-backup``): Point-in-time (live) copy
183
+ of a block device to a destination.
184
+
185
+
186
+.. _`Interacting with a QEMU instance`:
187
+
188
+Interacting with a QEMU instance
189
+--------------------------------
190
+
191
+To show some example invocations of command-line, we will use the
192
+following invocation of QEMU, with a QMP server running over UNIX
193
+socket::
194
+
195
+ $ ./x86_64-softmmu/qemu-system-x86_64 -display none -nodefconfig \
196
+ -M q35 -nodefaults -m 512 \
197
+ -blockdev node-name=node-A,driver=qcow2,file.driver=file,file.node-name=file,file.filename=./a.qcow2 \
198
+ -device virtio-blk,drive=node-A,id=virtio0 \
199
+ -monitor stdio -qmp unix:/tmp/qmp-sock,server,nowait
200
+
201
+The ``-blockdev`` command-line option, used above, is available from
202
+QEMU 2.9 onwards. In the above invocation, notice the ``node-name``
203
+parameter that is used to refer to the disk image a.qcow2 ('node-A') --
204
+this is a cleaner way to refer to a disk image (as opposed to referring
205
+to it by spelling out file paths). So, we will continue to designate a
206
+``node-name`` to each further disk image created (either via
207
+``blockdev-snapshot-sync``, or ``blockdev-add``) as part of the disk
208
+image chain, and continue to refer to the disks using their
209
+``node-name`` (where possible, because ``block-commit`` does not yet, as
210
+of QEMU 2.9, accept ``node-name`` parameter) when performing various
211
+block operations.
212
+
213
+To interact with the QEMU instance launched above, we will use the
214
+``qmp-shell`` utility (located at: ``qemu/scripts/qmp``, as part of the
215
+QEMU source directory), which takes key-value pairs for QMP commands.
216
+Invoke it as below (which will also print out the complete raw JSON
217
+syntax for reference -- examples in the following sections)::
218
+
219
+ $ ./qmp-shell -v -p /tmp/qmp-sock
220
+ (QEMU)
221
+
222
+.. note::
223
+ In the event we have to repeat a certain QMP command, we will: for
224
+ the first occurrence of it, show the ``qmp-shell`` invocation, *and*
225
+ the corresponding raw JSON QMP syntax; but for subsequent
226
+ invocations, present just the ``qmp-shell`` syntax, and omit the
227
+ equivalent JSON output.
228
+
229
+
230
+Example disk image chain
231
+------------------------
232
+
233
+We will use the below disk image chain (and occasionally spelling it
234
+out where appropriate) when discussing various primitives::
235
+
236
+ [A] <-- [B] <-- [C] <-- [D]
237
+
238
+Where [A] is the original base image; [B] and [C] are intermediate
239
+overlay images; image [D] is the active layer -- i.e. live QEMU is
240
+writing to it. (The rule of thumb is: live QEMU will always be pointing
241
+to the rightmost image in a disk image chain.)
242
+
243
+The above image chain can be created by invoking
244
+``blockdev-snapshot-sync`` commands as following (which shows the
245
+creation of overlay image [B]) using the ``qmp-shell`` (our invocation
246
+also prints the raw JSON invocation of it)::
247
+
248
+ (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2
249
+ {
250
+ "execute": "blockdev-snapshot-sync",
251
+ "arguments": {
252
+ "node-name": "node-A",
253
+ "snapshot-file": "b.qcow2",
254
+ "format": "qcow2",
255
+ "snapshot-node-name": "node-B"
256
+ }
153
+ }
257
+ }
154
+ } else if (type == VIRTIO_BLK_T_WRITE_ZEROES) {
258
+
155
+ if (blk_co_pwrite_zeroes(vdev_blk->backend,
259
+Here, "node-A" is the name QEMU internally uses to refer to the base
156
+ range[0], range[1], 0) == 0) {
260
+image [A] -- it is the backing file, based on which the overlay image,
157
+ return 0;
261
+[B], is created.
262
+
263
+To create the rest of the overlay images, [C], and [D] (omitting the raw
264
+JSON output for brevity)::
265
+
266
+ (QEMU) blockdev-snapshot-sync node-name=node-B snapshot-file=c.qcow2 snapshot-node-name=node-C format=qcow2
267
+ (QEMU) blockdev-snapshot-sync node-name=node-C snapshot-file=d.qcow2 snapshot-node-name=node-D format=qcow2
268
+
269
+
270
+A note on points-in-time vs file names
271
+--------------------------------------
272
+
273
+In our disk image chain::
274
+
275
+ [A] <-- [B] <-- [C] <-- [D]
276
+
277
+We have *three* points in time and an active layer:
278
+
279
+- Point 1: Guest state when [B] was created is contained in file [A]
280
+- Point 2: Guest state when [C] was created is contained in [A] + [B]
281
+- Point 3: Guest state when [D] was created is contained in
282
+ [A] + [B] + [C]
283
+- Active layer: Current guest state is contained in [A] + [B] + [C] +
284
+ [D]
285
+
286
+Therefore, be aware with naming choices:
287
+
288
+- Naming a file after the time it is created is misleading -- the
289
+ guest data for that point in time is *not* contained in that file
290
+ (as explained earlier)
291
+- Rather, think of files as a *delta* from the backing file
292
+
293
+
294
+Live block streaming --- ``block-stream``
295
+-----------------------------------------
296
+
297
+The ``block-stream`` command allows you to do live copy data from backing
298
+files into overlay images.
299
+
300
+Given our original example disk image chain from earlier::
301
+
302
+ [A] <-- [B] <-- [C] <-- [D]
303
+
304
+The disk image chain can be shortened in one of the following different
305
+ways (not an exhaustive list).
306
+
307
+.. _`Case-1`:
308
+
309
+(1) Merge everything into the active layer: I.e. copy all contents from
310
+ the base image, [A], and overlay images, [B] and [C], into [D],
311
+ *while* the guest is running. The resulting chain will be a
312
+ standalone image, [D] -- with contents from [A], [B] and [C] merged
313
+ into it (where live QEMU writes go to)::
314
+
315
+ [D]
316
+
317
+.. _`Case-2`:
318
+
319
+(2) Taking the same example disk image chain mentioned earlier, merge
320
+ only images [B] and [C] into [D], the active layer. The result will
321
+ be contents of images [B] and [C] will be copied into [D], and the
322
+ backing file pointer of image [D] will be adjusted to point to image
323
+ [A]. The resulting chain will be::
324
+
325
+ [A] <-- [D]
326
+
327
+.. _`Case-3`:
328
+
329
+(3) Intermediate streaming (available since QEMU 2.8): Starting afresh
330
+ with the original example disk image chain, with a total of four
331
+ images, it is possible to copy contents from image [B] into image
332
+ [C]. Once the copy is finished, image [B] can now be (optionally)
333
+ discarded; and the backing file pointer of image [C] will be
334
+ adjusted to point to [A]. I.e. after performing "intermediate
335
+ streaming" of [B] into [C], the resulting image chain will be (where
336
+ live QEMU is writing to [D])::
337
+
338
+ [A] <-- [C] <-- [D]
339
+
340
+
341
+QMP invocation for ``block-stream``
342
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
343
+
344
+For `Case-1`_, to merge contents of all the backing files into the
345
+active layer, where 'node-D' is the current active image (by default
346
+``block-stream`` will flatten the entire chain); ``qmp-shell`` (and its
347
+corresponding JSON output)::
348
+
349
+ (QEMU) block-stream device=node-D job-id=job0
350
+ {
351
+ "execute": "block-stream",
352
+ "arguments": {
353
+ "device": "node-D",
354
+ "job-id": "job0"
355
+ }
158
+ }
356
+ }
159
+ }
357
+
160
+
358
+For `Case-2`_, merge contents of the images [B] and [C] into [D], where
161
+ return -EINVAL;
359
+image [D] ends up referring to image [A] as its backing file::
162
+}
360
+
163
+
361
+ (QEMU) block-stream device=node-D base-node=node-A job-id=job0
164
+static void coroutine_fn vu_block_flush(VuBlockReq *req)
362
+
165
+{
363
+And for `Case-3`_, of "intermediate" streaming", merge contents of
166
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
364
+images [B] into [C], where [C] ends up referring to [A] as its backing
167
+ BlockBackend *backend = vdev_blk->backend;
365
+image::
168
+ blk_co_flush(backend);
366
+
169
+}
367
+ (QEMU) block-stream device=node-C base-node=node-A job-id=job0
170
+
368
+
171
+struct req_data {
369
+Progress of a ``block-stream`` operation can be monitored via the QMP
172
+ VuServer *server;
370
+command::
173
+ VuVirtq *vq;
371
+
174
+ VuVirtqElement *elem;
372
+ (QEMU) query-block-jobs
175
+};
373
+ {
176
+
374
+ "execute": "query-block-jobs",
177
+static void coroutine_fn vu_block_virtio_process_req(void *opaque)
375
+ "arguments": {}
178
+{
376
+ }
179
+ struct req_data *data = opaque;
377
+
180
+ VuServer *server = data->server;
378
+
181
+ VuVirtq *vq = data->vq;
379
+Once the ``block-stream`` operation has completed, QEMU will emit an
182
+ VuVirtqElement *elem = data->elem;
380
+event, ``BLOCK_JOB_COMPLETED``. The intermediate overlays remain valid,
183
+ uint32_t type;
381
+and can now be (optionally) discarded, or retained to create further
184
+ VuBlockReq *req;
382
+overlays based on them. Finally, the ``block-stream`` jobs can be
185
+
383
+restarted at anytime.
186
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
384
+
187
+ BlockBackend *backend = vdev_blk->backend;
385
+
188
+
386
+Live block commit --- ``block-commit``
189
+ struct iovec *in_iov = elem->in_sg;
387
+--------------------------------------
190
+ struct iovec *out_iov = elem->out_sg;
388
+
191
+ unsigned in_num = elem->in_num;
389
+The ``block-commit`` command lets you merge live data from overlay
192
+ unsigned out_num = elem->out_num;
390
+images into backing file(s). Since QEMU 2.0, this includes "live active
193
+ /* refer to hw/block/virtio_blk.c */
391
+commit" (i.e. it is possible to merge the "active layer", the right-most
194
+ if (elem->out_num < 1 || elem->in_num < 1) {
392
+image in a disk image chain where live QEMU will be writing to, into the
195
+ error_report("virtio-blk request missing headers");
393
+base image). This is analogous to ``block-stream``, but in the opposite
196
+ free(elem);
394
+direction.
197
+ return;
395
+
198
+ }
396
+Again, starting afresh with our example disk image chain, where live
199
+
397
+QEMU is writing to the right-most image in the chain, [D]::
200
+ req = g_new0(VuBlockReq, 1);
398
+
201
+ req->server = server;
399
+ [A] <-- [B] <-- [C] <-- [D]
202
+ req->vq = vq;
400
+
203
+ req->elem = elem;
401
+The disk image chain can be shortened in one of the following ways:
204
+
402
+
205
+ if (unlikely(iov_to_buf(out_iov, out_num, 0, &req->out,
403
+.. _`block-commit_Case-1`:
206
+ sizeof(req->out)) != sizeof(req->out))) {
404
+
207
+ error_report("virtio-blk request outhdr too short");
405
+(1) Commit content from only image [B] into image [A]. The resulting
208
+ goto err;
406
+ chain is the following, where image [C] is adjusted to point at [A]
209
+ }
407
+ as its new backing file::
210
+
408
+
211
+ iov_discard_front(&out_iov, &out_num, sizeof(req->out));
409
+ [A] <-- [C] <-- [D]
212
+
410
+
213
+ if (in_iov[in_num - 1].iov_len < sizeof(struct virtio_blk_inhdr)) {
411
+(2) Commit content from images [B] and [C] into image [A]. The
214
+ error_report("virtio-blk request inhdr too short");
412
+ resulting chain, where image [D] is adjusted to point to image [A]
215
+ goto err;
413
+ as its new backing file::
216
+ }
414
+
217
+
415
+ [A] <-- [D]
218
+ /* We always touch the last byte, so just see how big in_iov is. */
416
+
219
+ req->in = (void *)in_iov[in_num - 1].iov_base
417
+.. _`block-commit_Case-3`:
220
+ + in_iov[in_num - 1].iov_len
418
+
221
+ - sizeof(struct virtio_blk_inhdr);
419
+(3) Commit content from images [B], [C], and the active layer [D] into
222
+ iov_discard_back(in_iov, &in_num, sizeof(struct virtio_blk_inhdr));
420
+ image [A]. The resulting chain (in this case, a consolidated single
223
+
421
+ image)::
224
+ type = le32_to_cpu(req->out.type);
422
+
225
+ switch (type & ~VIRTIO_BLK_T_BARRIER) {
423
+ [A]
226
+ case VIRTIO_BLK_T_IN:
424
+
227
+ case VIRTIO_BLK_T_OUT: {
425
+(4) Commit content from image only image [C] into image [B]. The
228
+ ssize_t ret = 0;
426
+ resulting chain::
229
+ bool is_write = type & VIRTIO_BLK_T_OUT;
427
+
230
+ req->sector_num = le64_to_cpu(req->out.sector);
428
+    [A] <-- [B] <-- [D]
231
+
429
+
232
+ int64_t offset = req->sector_num * vdev_blk->blk_size;
430
+(5) Commit content from image [C] and the active layer [D] into image
233
+ QEMUIOVector qiov;
431
+ [B]. The resulting chain::
234
+ if (is_write) {
432
+
235
+ qemu_iovec_init_external(&qiov, out_iov, out_num);
433
+    [A] <-- [B]
236
+ ret = blk_co_pwritev(backend, offset, qiov.size,
434
+
237
+ &qiov, 0);
435
+
238
+ } else {
436
+QMP invocation for ``block-commit``
239
+ qemu_iovec_init_external(&qiov, in_iov, in_num);
437
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
240
+ ret = blk_co_preadv(backend, offset, qiov.size,
438
+
241
+ &qiov, 0);
439
+For :ref:`Case-1 <block-commit_Case-1>`, to merge contents only from
440
+image [B] into image [A], the invocation is as follows::
441
+
442
+ (QEMU) block-commit device=node-D base=a.qcow2 top=b.qcow2 job-id=job0
443
+ {
444
+ "execute": "block-commit",
445
+ "arguments": {
446
+ "device": "node-D",
447
+ "job-id": "job0",
448
+ "top": "b.qcow2",
449
+ "base": "a.qcow2"
450
+ }
242
+ }
451
+ }
243
+ if (ret >= 0) {
452
+
244
+ req->in->status = VIRTIO_BLK_S_OK;
453
+Once the above ``block-commit`` operation has completed, a
245
+ } else {
454
+``BLOCK_JOB_COMPLETED`` event will be issued, and no further action is
246
+ req->in->status = VIRTIO_BLK_S_IOERR;
455
+required. As the end result, the backing file of image [C] is adjusted
456
+to point to image [A], and the original 4-image chain will end up being
457
+transformed to::
458
+
459
+ [A] <-- [C] <-- [D]
460
+
461
+.. note::
462
+ The intermediate image [B] is invalid (as in: no more further
463
+ overlays based on it can be created).
464
+
465
+ Reasoning: An intermediate image after a 'stream' operation still
466
+ represents that old point-in-time, and may be valid in that context.
467
+ However, an intermediate image after a 'commit' operation no longer
468
+ represents any point-in-time, and is invalid in any context.
469
+
470
+
471
+However, :ref:`Case-3 <block-commit_Case-3>` (also called: "active
472
+``block-commit``") is a *two-phase* operation: In the first phase, the
473
+content from the active overlay, along with the intermediate overlays,
474
+is copied into the backing file (also called the base image). In the
475
+second phase, adjust the said backing file as the current active image
476
+-- possible via issuing the command ``block-job-complete``. Optionally,
477
+the ``block-commit`` operation can be cancelled by issuing the command
478
+``block-job-cancel``, but be careful when doing this.
479
+
480
+Once the ``block-commit`` operation has completed, the event
481
+``BLOCK_JOB_READY`` will be emitted, signalling that the synchronization
482
+has finished. Now the job can be gracefully completed by issuing the
483
+command ``block-job-complete`` -- until such a command is issued, the
484
+'commit' operation remains active.
485
+
486
+The following is the flow for :ref:`Case-3 <block-commit_Case-3>` to
487
+convert a disk image chain such as this::
488
+
489
+ [A] <-- [B] <-- [C] <-- [D]
490
+
491
+Into::
492
+
493
+ [A]
494
+
495
+Where content from all the subsequent overlays, [B], and [C], including
496
+the active layer, [D], is committed back to [A] -- which is where live
497
+QEMU is performing all its current writes).
498
+
499
+Start the "active ``block-commit``" operation::
500
+
501
+ (QEMU) block-commit device=node-D base=a.qcow2 top=d.qcow2 job-id=job0
502
+ {
503
+ "execute": "block-commit",
504
+ "arguments": {
505
+ "device": "node-D",
506
+ "job-id": "job0",
507
+ "top": "d.qcow2",
508
+ "base": "a.qcow2"
509
+ }
247
+ }
510
+ }
248
+ break;
511
+
249
+ }
512
+
250
+ case VIRTIO_BLK_T_FLUSH:
513
+Once the synchronization has completed, the event ``BLOCK_JOB_READY`` will
251
+ vu_block_flush(req);
514
+be emitted.
252
+ req->in->status = VIRTIO_BLK_S_OK;
515
+
253
+ break;
516
+Then, optionally query for the status of the active block operations.
254
+ case VIRTIO_BLK_T_GET_ID: {
517
+We can see the 'commit' job is now ready to be completed, as indicated
255
+ size_t size = MIN(iov_size(&elem->in_sg[0], in_num),
518
+by the line *"ready": true*::
256
+ VIRTIO_BLK_ID_BYTES);
519
+
257
+ snprintf(elem->in_sg[0].iov_base, size, "%s", "vhost_user_blk");
520
+ (QEMU) query-block-jobs
258
+ req->in->status = VIRTIO_BLK_S_OK;
521
+ {
259
+ req->size = elem->in_sg[0].iov_len;
522
+ "execute": "query-block-jobs",
260
+ break;
523
+ "arguments": {}
261
+ }
524
+ }
262
+ case VIRTIO_BLK_T_DISCARD:
525
+ {
263
+ case VIRTIO_BLK_T_WRITE_ZEROES: {
526
+ "return": [
264
+ int rc;
527
+ {
265
+ rc = vu_block_discard_write_zeroes(req, &elem->out_sg[1],
528
+ "busy": false,
266
+ out_num, type);
529
+ "type": "commit",
267
+ if (rc == 0) {
530
+ "len": 1376256,
268
+ req->in->status = VIRTIO_BLK_S_OK;
531
+ "paused": false,
269
+ } else {
532
+ "ready": true,
270
+ req->in->status = VIRTIO_BLK_S_IOERR;
533
+ "io-status": "ok",
534
+ "offset": 1376256,
535
+ "device": "job0",
536
+ "speed": 0
537
+ }
538
+ ]
539
+ }
540
+
541
+Gracefully complete the 'commit' block device job::
542
+
543
+ (QEMU) block-job-complete device=job0
544
+ {
545
+ "execute": "block-job-complete",
546
+ "arguments": {
547
+ "device": "job0"
548
+ }
271
+ }
549
+ }
272
+ break;
550
+ {
273
+ }
551
+ "return": {}
274
+ default:
552
+ }
275
+ req->in->status = VIRTIO_BLK_S_UNSUPP;
553
+
276
+ break;
554
+Finally, once the above job is completed, an event
277
+ }
555
+``BLOCK_JOB_COMPLETED`` will be emitted.
278
+
556
+
279
+ vu_block_req_complete(req);
557
+.. note::
280
+ return;
558
+ The invocation for rest of the cases (2, 4, and 5), discussed in the
281
+
559
+ previous section, is omitted for brevity.
282
+err:
560
+
283
+ free(elem);
561
+
284
+ g_free(req);
562
+Live disk synchronization --- ``drive-mirror`` and ``blockdev-mirror``
285
+ return;
563
+----------------------------------------------------------------------
286
+}
564
+
287
+
565
+Synchronize a running disk image chain (all or part of it) to a target
288
+static void vu_block_process_vq(VuDev *vu_dev, int idx)
566
+image.
289
+{
567
+
290
+ VuServer *server;
568
+Again, given our familiar disk image chain::
291
+ VuVirtq *vq;
569
+
292
+ struct req_data *req_data;
570
+ [A] <-- [B] <-- [C] <-- [D]
293
+
571
+
294
+ server = container_of(vu_dev, VuServer, vu_dev);
572
+The ``drive-mirror`` (and its newer equivalent ``blockdev-mirror``) allows
295
+ assert(server);
573
+you to copy data from the entire chain into a single target image (which
296
+
574
+can be located on a different host).
297
+ vq = vu_get_queue(vu_dev, idx);
575
+
298
+ assert(vq);
576
+Once a 'mirror' job has started, there are two possible actions while a
299
+ VuVirtqElement *elem;
577
+``drive-mirror`` job is active:
300
+ while (1) {
578
+
301
+ elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) +
579
+(1) Issuing the command ``block-job-cancel`` after it emits the event
302
+ sizeof(VuBlockReq));
580
+ ``BLOCK_JOB_CANCELLED``: will (after completing synchronization of
303
+ if (elem) {
581
+ the content from the disk image chain to the target image, [E])
304
+ req_data = g_new0(struct req_data, 1);
582
+ create a point-in-time (which is at the time of *triggering* the
305
+ req_data->server = server;
583
+ cancel command) copy, contained in image [E], of the the entire disk
306
+ req_data->vq = vq;
584
+ image chain (or only the top-most image, depending on the ``sync``
307
+ req_data->elem = elem;
585
+ mode).
308
+ Coroutine *co = qemu_coroutine_create(vu_block_virtio_process_req,
586
+
309
+ req_data);
587
+(2) Issuing the command ``block-job-complete`` after it emits the event
310
+ aio_co_enter(server->ioc->ctx, co);
588
+ ``BLOCK_JOB_COMPLETED``: will, after completing synchronization of
311
+ } else {
589
+ the content, adjust the guest device (i.e. live QEMU) to point to
312
+ break;
590
+ the target image, and, causing all the new writes from this point on
591
+ to happen there. One use case for this is live storage migration.
592
+
593
+About synchronization modes: The synchronization mode determines
594
+*which* part of the disk image chain will be copied to the target.
595
+Currently, there are four different kinds:
596
+
597
+(1) ``full`` -- Synchronize the content of entire disk image chain to
598
+ the target
599
+
600
+(2) ``top`` -- Synchronize only the contents of the top-most disk image
601
+ in the chain to the target
602
+
603
+(3) ``none`` -- Synchronize only the new writes from this point on.
604
+
605
+ .. note:: In the case of ``drive-backup`` (or ``blockdev-backup``),
606
+ the behavior of ``none`` synchronization mode is different.
607
+ Normally, a ``backup`` job consists of two parts: Anything
608
+ that is overwritten by the guest is first copied out to
609
+ the backup, and in the background the whole image is
610
+ copied from start to end. With ``sync=none``, it's only
611
+ the first part.
612
+
613
+(4) ``incremental`` -- Synchronize content that is described by the
614
+ dirty bitmap
615
+
616
+.. note::
617
+ Refer to the :doc:`bitmaps` document in the QEMU source
618
+ tree to learn about the detailed workings of the ``incremental``
619
+ synchronization mode.
620
+
621
+
622
+QMP invocation for ``drive-mirror``
623
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
624
+
625
+To copy the contents of the entire disk image chain, from [A] all the
626
+way to [D], to a new target (``drive-mirror`` will create the destination
627
+file, if it doesn't already exist), call it [E]::
628
+
629
+ (QEMU) drive-mirror device=node-D target=e.qcow2 sync=full job-id=job0
630
+ {
631
+ "execute": "drive-mirror",
632
+ "arguments": {
633
+ "device": "node-D",
634
+ "job-id": "job0",
635
+ "target": "e.qcow2",
636
+ "sync": "full"
637
+ }
313
+ }
638
+ }
314
+ }
639
+
315
+}
640
+The ``"sync": "full"``, from the above, means: copy the *entire* chain
316
+
641
+to the destination.
317
+static void vu_block_queue_set_started(VuDev *vu_dev, int idx, bool started)
642
+
318
+{
643
+Following the above, querying for active block jobs will show that a
319
+ VuVirtq *vq;
644
+'mirror' job is "ready" to be completed (and QEMU will also emit an
320
+
645
+event, ``BLOCK_JOB_READY``)::
321
+ assert(vu_dev);
646
+
322
+
647
+ (QEMU) query-block-jobs
323
+ vq = vu_get_queue(vu_dev, idx);
648
+ {
324
+ vu_set_queue_handler(vu_dev, vq, started ? vu_block_process_vq : NULL);
649
+ "execute": "query-block-jobs",
325
+}
650
+ "arguments": {}
326
+
651
+ }
327
+static uint64_t vu_block_get_features(VuDev *dev)
652
+ {
328
+{
653
+ "return": [
329
+ uint64_t features;
654
+ {
330
+ VuServer *server = container_of(dev, VuServer, vu_dev);
655
+ "busy": false,
331
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
656
+ "type": "mirror",
332
+ features = 1ull << VIRTIO_BLK_F_SIZE_MAX |
657
+ "len": 21757952,
333
+ 1ull << VIRTIO_BLK_F_SEG_MAX |
658
+ "paused": false,
334
+ 1ull << VIRTIO_BLK_F_TOPOLOGY |
659
+ "ready": true,
335
+ 1ull << VIRTIO_BLK_F_BLK_SIZE |
660
+ "io-status": "ok",
336
+ 1ull << VIRTIO_BLK_F_FLUSH |
661
+ "offset": 21757952,
337
+ 1ull << VIRTIO_BLK_F_DISCARD |
662
+ "device": "job0",
338
+ 1ull << VIRTIO_BLK_F_WRITE_ZEROES |
663
+ "speed": 0
339
+ 1ull << VIRTIO_BLK_F_CONFIG_WCE |
664
+ }
340
+ 1ull << VIRTIO_F_VERSION_1 |
665
+ ]
341
+ 1ull << VIRTIO_RING_F_INDIRECT_DESC |
666
+ }
342
+ 1ull << VIRTIO_RING_F_EVENT_IDX |
667
+
343
+ 1ull << VHOST_USER_F_PROTOCOL_FEATURES;
668
+And, as noted in the previous section, there are two possible actions
344
+
669
+at this point:
345
+ if (!vdev_blk->writable) {
670
+
346
+ features |= 1ull << VIRTIO_BLK_F_RO;
671
+(a) Create a point-in-time snapshot by ending the synchronization. The
347
+ }
672
+ point-in-time is at the time of *ending* the sync. (The result of
348
+
673
+ the following being: the target image, [E], will be populated with
349
+ return features;
674
+ content from the entire chain, [A] to [D])::
350
+}
675
+
351
+
676
+ (QEMU) block-job-cancel device=job0
352
+static uint64_t vu_block_get_protocol_features(VuDev *dev)
677
+ {
353
+{
678
+ "execute": "block-job-cancel",
354
+ return 1ull << VHOST_USER_PROTOCOL_F_CONFIG |
679
+ "arguments": {
355
+ 1ull << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD;
680
+ "device": "job0"
356
+}
681
+ }
357
+
682
+ }
358
+static int
683
+
359
+vu_block_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
684
+(b) Or, complete the operation and pivot the live QEMU to the target
360
+{
685
+ copy::
361
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
686
+
362
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
687
+ (QEMU) block-job-complete device=job0
363
+ memcpy(config, &vdev_blk->blkcfg, len);
688
+
364
+
689
+In either of the above cases, if you once again run the
365
+ return 0;
690
+`query-block-jobs` command, there should not be any active block
366
+}
691
+operation.
367
+
692
+
368
+static int
693
+Comparing 'commit' and 'mirror': In both then cases, the overlay images
369
+vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
694
+can be discarded. However, with 'commit', the *existing* base image
370
+ uint32_t offset, uint32_t size, uint32_t flags)
695
+will be modified (by updating it with contents from overlays); while in
371
+{
696
+the case of 'mirror', a *new* target image is populated with the data
372
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
697
+from the disk image chain.
373
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
698
+
374
+ uint8_t wce;
699
+
375
+
700
+QMP invocation for live storage migration with ``drive-mirror`` + NBD
376
+ /* don't support live migration */
701
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
377
+ if (flags != VHOST_SET_CONFIG_TYPE_MASTER) {
702
+
378
+ return -EINVAL;
703
+Live storage migration (without shared storage setup) is one of the most
379
+ }
704
+common use-cases that takes advantage of the ``drive-mirror`` primitive
380
+
705
+and QEMU's built-in Network Block Device (NBD) server. Here's a quick
381
+ if (offset != offsetof(struct virtio_blk_config, wce) ||
706
+walk-through of this setup.
382
+ size != 1) {
707
+
383
+ return -EINVAL;
708
+Given the disk image chain::
384
+ }
709
+
385
+
710
+ [A] <-- [B] <-- [C] <-- [D]
386
+ wce = *data;
711
+
387
+ vdev_blk->blkcfg.wce = wce;
712
+Instead of copying content from the entire chain, synchronize *only* the
388
+ blk_set_enable_write_cache(vdev_blk->backend, wce);
713
+contents of the *top*-most disk image (i.e. the active layer), [D], to a
389
+ return 0;
714
+target, say, [TargetDisk].
390
+}
715
+
391
+
716
+.. important::
392
+/*
717
+ The destination host must already have the contents of the backing
393
+ * When the client disconnects, it sends a VHOST_USER_NONE request
718
+ chain, involving images [A], [B], and [C], visible via other means
394
+ * and vu_process_message will simple call exit which cause the VM
719
+ -- whether by ``cp``, ``rsync``, or by some storage array-specific
395
+ * to exit abruptly.
720
+ command.)
396
+ * To avoid this issue, process VHOST_USER_NONE request ahead
721
+
397
+ * of vu_process_message.
722
+Sometimes, this is also referred to as "shallow copy" -- because only
398
+ *
723
+the "active layer", and not the rest of the image chain, is copied to
399
+ */
724
+the destination.
400
+static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
725
+
401
+{
726
+.. note::
402
+ if (vmsg->request == VHOST_USER_NONE) {
727
+ In this example, for the sake of simplicity, we'll be using the same
403
+ dev->panic(dev, "disconnect");
728
+ ``localhost`` as both source and destination.
404
+ return true;
729
+
405
+ }
730
+As noted earlier, on the destination host the contents of the backing
406
+ return false;
731
+chain -- from images [A] to [C] -- are already expected to exist in some
407
+}
732
+form (e.g. in a file called, ``Contents-of-A-B-C.qcow2``). Now, on the
408
+
733
+destination host, let's create a target overlay image (with the image
409
+static const VuDevIface vu_block_iface = {
734
+``Contents-of-A-B-C.qcow2`` as its backing file), to which the contents
410
+ .get_features = vu_block_get_features,
735
+of image [D] (from the source QEMU) will be mirrored to::
411
+ .queue_set_started = vu_block_queue_set_started,
736
+
412
+ .get_protocol_features = vu_block_get_protocol_features,
737
+ $ qemu-img create -f qcow2 -b ./Contents-of-A-B-C.qcow2 \
413
+ .get_config = vu_block_get_config,
738
+ -F qcow2 ./target-disk.qcow2
414
+ .set_config = vu_block_set_config,
739
+
415
+ .process_msg = vu_block_process_msg,
740
+And start the destination QEMU (we already have the source QEMU running
416
+};
741
+-- discussed in the section: `Interacting with a QEMU instance`_)
417
+
742
+instance, with the following invocation. (As noted earlier, for
418
+static void blk_aio_attached(AioContext *ctx, void *opaque)
743
+simplicity's sake, the destination QEMU is started on the same host, but
419
+{
744
+it could be located elsewhere)::
420
+ VuBlockDev *vub_dev = opaque;
745
+
421
+ aio_context_acquire(ctx);
746
+ $ ./x86_64-softmmu/qemu-system-x86_64 -display none -nodefconfig \
422
+ vhost_user_server_set_aio_context(&vub_dev->vu_server, ctx);
747
+ -M q35 -nodefaults -m 512 \
423
+ aio_context_release(ctx);
748
+ -blockdev node-name=node-TargetDisk,driver=qcow2,file.driver=file,file.node-name=file,file.filename=./target-disk.qcow2 \
424
+}
749
+ -device virtio-blk,drive=node-TargetDisk,id=virtio0 \
425
+
750
+ -S -monitor stdio -qmp unix:./qmp-sock2,server,nowait \
426
+static void blk_aio_detach(void *opaque)
751
+ -incoming tcp:localhost:6666
427
+{
752
+
428
+ VuBlockDev *vub_dev = opaque;
753
+Given the disk image chain on source QEMU::
429
+ AioContext *ctx = vub_dev->vu_server.ctx;
754
+
430
+ aio_context_acquire(ctx);
755
+ [A] <-- [B] <-- [C] <-- [D]
431
+ vhost_user_server_set_aio_context(&vub_dev->vu_server, NULL);
756
+
432
+ aio_context_release(ctx);
757
+On the destination host, it is expected that the contents of the chain
433
+}
758
+``[A] <-- [B] <-- [C]`` are *already* present, and therefore copy *only*
434
+
759
+the content of image [D].
435
+static void
760
+
436
+vu_block_initialize_config(BlockDriverState *bs,
761
+(1) [On *destination* QEMU] As part of the first step, start the
437
+ struct virtio_blk_config *config, uint32_t blk_size)
762
+ built-in NBD server on a given host (local host, represented by
438
+{
763
+ ``::``)and port::
439
+ config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
764
+
440
+ config->blk_size = blk_size;
765
+ (QEMU) nbd-server-start addr={"type":"inet","data":{"host":"::","port":"49153"}}
441
+ config->size_max = 0;
766
+ {
442
+ config->seg_max = 128 - 2;
767
+ "execute": "nbd-server-start",
443
+ config->min_io_size = 1;
768
+ "arguments": {
444
+ config->opt_io_size = 1;
769
+ "addr": {
445
+ config->num_queues = VHOST_USER_BLK_MAX_QUEUES;
770
+ "data": {
446
+ config->max_discard_sectors = 32768;
771
+ "host": "::",
447
+ config->max_discard_seg = 1;
772
+ "port": "49153"
448
+ config->discard_sector_alignment = config->blk_size >> 9;
773
+ },
449
+ config->max_write_zeroes_sectors = 32768;
774
+ "type": "inet"
450
+ config->max_write_zeroes_seg = 1;
775
+ }
451
+}
776
+ }
452
+
777
+ }
453
+static VuBlockDev *vu_block_init(VuBlockDev *vu_block_device, Error **errp)
778
+
454
+{
779
+(2) [On *destination* QEMU] And export the destination disk image using
455
+
780
+ QEMU's built-in NBD server::
456
+ BlockBackend *blk;
781
+
457
+ Error *local_error = NULL;
782
+ (QEMU) nbd-server-add device=node-TargetDisk writable=true
458
+ const char *node_name = vu_block_device->node_name;
783
+ {
459
+ bool writable = vu_block_device->writable;
784
+ "execute": "nbd-server-add",
460
+ uint64_t perm = BLK_PERM_CONSISTENT_READ;
785
+ "arguments": {
461
+ int ret;
786
+ "device": "node-TargetDisk"
462
+
787
+ }
463
+ AioContext *ctx;
788
+ }
464
+
789
+
465
+ BlockDriverState *bs = bdrv_lookup_bs(node_name, node_name, &local_error);
790
+(3) [On *source* QEMU] Then, invoke ``drive-mirror`` (NB: since we're
466
+
791
+ running ``drive-mirror`` with ``mode=existing`` (meaning:
467
+ if (!bs) {
792
+ synchronize to a pre-created file, therefore 'existing', file on the
468
+ error_propagate(errp, local_error);
793
+ target host), with the synchronization mode as 'top' (``"sync:
469
+ return NULL;
794
+ "top"``)::
470
+ }
795
+
471
+
796
+ (QEMU) drive-mirror device=node-D target=nbd:localhost:49153:exportname=node-TargetDisk sync=top mode=existing job-id=job0
472
+ if (bdrv_is_read_only(bs)) {
797
+ {
473
+ writable = false;
798
+ "execute": "drive-mirror",
474
+ }
799
+ "arguments": {
475
+
800
+ "device": "node-D",
476
+ if (writable) {
801
+ "mode": "existing",
477
+ perm |= BLK_PERM_WRITE;
802
+ "job-id": "job0",
478
+ }
803
+ "target": "nbd:localhost:49153:exportname=node-TargetDisk",
479
+
804
+ "sync": "top"
480
+ ctx = bdrv_get_aio_context(bs);
805
+ }
481
+ aio_context_acquire(ctx);
806
+ }
482
+ bdrv_invalidate_cache(bs, NULL);
807
+
483
+ aio_context_release(ctx);
808
+(4) [On *source* QEMU] Once ``drive-mirror`` copies the entire data, and the
484
+
809
+ event ``BLOCK_JOB_READY`` is emitted, issue ``block-job-cancel`` to
485
+ /*
810
+ gracefully end the synchronization, from source QEMU::
486
+ * Don't allow resize while the vhost user server is running,
811
+
487
+ * otherwise we don't care what happens with the node.
812
+ (QEMU) block-job-cancel device=job0
488
+ */
813
+ {
489
+ blk = blk_new(bdrv_get_aio_context(bs), perm,
814
+ "execute": "block-job-cancel",
490
+ BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
815
+ "arguments": {
491
+ BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD);
816
+ "device": "job0"
492
+ ret = blk_insert_bs(blk, bs, errp);
817
+ }
493
+
818
+ }
494
+ if (ret < 0) {
819
+
495
+ goto fail;
820
+(5) [On *destination* QEMU] Then, stop the NBD server::
496
+ }
821
+
497
+
822
+ (QEMU) nbd-server-stop
498
+ blk_set_enable_write_cache(blk, false);
823
+ {
499
+
824
+ "execute": "nbd-server-stop",
500
+ blk_set_allow_aio_context_change(blk, true);
825
+ "arguments": {}
501
+
826
+ }
502
+ vu_block_device->blkcfg.wce = 0;
827
+
503
+ vu_block_device->backend = blk;
828
+(6) [On *destination* QEMU] Finally, resume the guest vCPUs by issuing the
504
+ if (!vu_block_device->blk_size) {
829
+ QMP command `cont`::
505
+ vu_block_device->blk_size = BDRV_SECTOR_SIZE;
830
+
506
+ }
831
+ (QEMU) cont
507
+ vu_block_device->blkcfg.blk_size = vu_block_device->blk_size;
832
+ {
508
+ blk_set_guest_block_size(blk, vu_block_device->blk_size);
833
+ "execute": "cont",
509
+ vu_block_initialize_config(bs, &vu_block_device->blkcfg,
834
+ "arguments": {}
510
+ vu_block_device->blk_size);
835
+ }
511
+ return vu_block_device;
836
+
512
+
837
+.. note::
513
+fail:
838
+ Higher-level libraries (e.g. libvirt) automate the entire above
514
+ blk_unref(blk);
839
+ process (although note that libvirt does not allow same-host
515
+ return NULL;
840
+ migrations to localhost for other reasons).
516
+}
841
+
517
+
842
+
518
+static void vu_block_deinit(VuBlockDev *vu_block_device)
843
+Notes on ``blockdev-mirror``
519
+{
844
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
520
+ if (vu_block_device->backend) {
845
+
521
+ blk_remove_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
846
+The ``blockdev-mirror`` command is equivalent in core functionality to
522
+ blk_aio_detach, vu_block_device);
847
+``drive-mirror``, except that it operates at node-level in a BDS graph.
523
+ }
848
+
524
+
849
+Also: for ``blockdev-mirror``, the 'target' image needs to be explicitly
525
+ blk_unref(vu_block_device->backend);
850
+created (using ``qemu-img``) and attach it to live QEMU via
526
+}
851
+``blockdev-add``, which assigns a name to the to-be created target node.
527
+
852
+
528
+static void vhost_user_blk_server_stop(VuBlockDev *vu_block_device)
853
+E.g. the sequence of actions to create a point-in-time backup of an
529
+{
854
+entire disk image chain, to a target, using ``blockdev-mirror`` would be:
530
+ vhost_user_server_stop(&vu_block_device->vu_server);
855
+
531
+ vu_block_deinit(vu_block_device);
856
+(0) Create the QCOW2 overlays, to arrive at a backing chain of desired
532
+}
857
+ depth
533
+
858
+
534
+static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
859
+(1) Create the target image (using ``qemu-img``), say, ``e.qcow2``
535
+ Error **errp)
860
+
536
+{
861
+(2) Attach the above created file (``e.qcow2``), run-time, using
537
+ AioContext *ctx;
862
+ ``blockdev-add`` to QEMU
538
+ SocketAddress *addr = vu_block_device->addr;
863
+
539
+
864
+(3) Perform ``blockdev-mirror`` (use ``"sync": "full"`` to copy the
540
+ if (!vu_block_init(vu_block_device, errp)) {
865
+ entire chain to the target). And notice the event
541
+ return;
866
+ ``BLOCK_JOB_READY``
542
+ }
867
+
543
+
868
+(4) Optionally, query for active block jobs, there should be a 'mirror'
544
+ ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
869
+ job ready to be completed
545
+
870
+
546
+ if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
871
+(5) Gracefully complete the 'mirror' block device job, and notice the
547
+ VHOST_USER_BLK_MAX_QUEUES,
872
+ the event ``BLOCK_JOB_COMPLETED``
548
+ NULL, &vu_block_iface,
873
+
549
+ errp)) {
874
+(6) Shutdown the guest by issuing the QMP ``quit`` command so that
550
+ goto error;
875
+ caches are flushed
551
+ }
876
+
552
+
877
+(7) Then, finally, compare the contents of the disk image chain, and
553
+ blk_add_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
878
+ the target copy with ``qemu-img compare``. You should notice:
554
+ blk_aio_detach, vu_block_device);
879
+ "Images are identical"
555
+ vu_block_device->running = true;
880
+
556
+ return;
881
+
557
+
882
+QMP invocation for ``blockdev-mirror``
558
+ error:
883
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
559
+ vu_block_deinit(vu_block_device);
884
+
560
+}
885
+Given the disk image chain::
561
+
886
+
562
+static bool vu_prop_modifiable(VuBlockDev *vus, Error **errp)
887
+ [A] <-- [B] <-- [C] <-- [D]
563
+{
888
+
564
+ if (vus->running) {
889
+To copy the contents of the entire disk image chain, from [A] all the
565
+ error_setg(errp, "The property can't be modified "
890
+way to [D], to a new target, call it [E]. The following is the flow.
566
+ "while the server is running");
891
+
567
+ return false;
892
+Create the overlay images, [B], [C], and [D]::
568
+ }
893
+
569
+ return true;
894
+ (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2
570
+}
895
+ (QEMU) blockdev-snapshot-sync node-name=node-B snapshot-file=c.qcow2 snapshot-node-name=node-C format=qcow2
571
+
896
+ (QEMU) blockdev-snapshot-sync node-name=node-C snapshot-file=d.qcow2 snapshot-node-name=node-D format=qcow2
572
+static void vu_set_node_name(Object *obj, const char *value, Error **errp)
897
+
573
+{
898
+Create the target image, [E]::
574
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
899
+
575
+
900
+ $ qemu-img create -f qcow2 e.qcow2 39M
576
+ if (!vu_prop_modifiable(vus, errp)) {
901
+
577
+ return;
902
+Add the above created target image to QEMU, via ``blockdev-add``::
578
+ }
903
+
579
+
904
+ (QEMU) blockdev-add driver=qcow2 node-name=node-E file={"driver":"file","filename":"e.qcow2"}
580
+ if (vus->node_name) {
905
+ {
581
+ g_free(vus->node_name);
906
+ "execute": "blockdev-add",
582
+ }
907
+ "arguments": {
583
+
908
+ "node-name": "node-E",
584
+ vus->node_name = g_strdup(value);
909
+ "driver": "qcow2",
585
+}
910
+ "file": {
586
+
911
+ "driver": "file",
587
+static char *vu_get_node_name(Object *obj, Error **errp)
912
+ "filename": "e.qcow2"
588
+{
913
+ }
589
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
914
+ }
590
+ return g_strdup(vus->node_name);
915
+ }
591
+}
916
+
592
+
917
+Perform ``blockdev-mirror``, and notice the event ``BLOCK_JOB_READY``::
593
+static void free_socket_addr(SocketAddress *addr)
918
+
594
+{
919
+ (QEMU) blockdev-mirror device=node-B target=node-E sync=full job-id=job0
595
+ g_free(addr->u.q_unix.path);
920
+ {
596
+ g_free(addr);
921
+ "execute": "blockdev-mirror",
597
+}
922
+ "arguments": {
598
+
923
+ "device": "node-D",
599
+static void vu_set_unix_socket(Object *obj, const char *value,
924
+ "job-id": "job0",
600
+ Error **errp)
925
+ "target": "node-E",
601
+{
926
+ "sync": "full"
602
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
927
+ }
603
+
928
+ }
604
+ if (!vu_prop_modifiable(vus, errp)) {
929
+
605
+ return;
930
+Query for active block jobs, there should be a 'mirror' job ready::
606
+ }
931
+
607
+
932
+ (QEMU) query-block-jobs
608
+ if (vus->addr) {
933
+ {
609
+ free_socket_addr(vus->addr);
934
+ "execute": "query-block-jobs",
610
+ }
935
+ "arguments": {}
611
+
936
+ }
612
+ SocketAddress *addr = g_new0(SocketAddress, 1);
937
+ {
613
+ addr->type = SOCKET_ADDRESS_TYPE_UNIX;
938
+ "return": [
614
+ addr->u.q_unix.path = g_strdup(value);
939
+ {
615
+ vus->addr = addr;
940
+ "busy": false,
616
+}
941
+ "type": "mirror",
617
+
942
+ "len": 21561344,
618
+static char *vu_get_unix_socket(Object *obj, Error **errp)
943
+ "paused": false,
619
+{
944
+ "ready": true,
620
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
945
+ "io-status": "ok",
621
+ return g_strdup(vus->addr->u.q_unix.path);
946
+ "offset": 21561344,
622
+}
947
+ "device": "job0",
623
+
948
+ "speed": 0
624
+static bool vu_get_block_writable(Object *obj, Error **errp)
949
+ }
625
+{
950
+ ]
626
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
951
+ }
627
+ return vus->writable;
952
+
628
+}
953
+Gracefully complete the block device job operation, and notice the
629
+
954
+event ``BLOCK_JOB_COMPLETED``::
630
+static void vu_set_block_writable(Object *obj, bool value, Error **errp)
955
+
631
+{
956
+ (QEMU) block-job-complete device=job0
632
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
957
+ {
633
+
958
+ "execute": "block-job-complete",
634
+ if (!vu_prop_modifiable(vus, errp)) {
959
+ "arguments": {
635
+ return;
960
+ "device": "job0"
636
+ }
961
+ }
637
+
962
+ }
638
+ vus->writable = value;
963
+ {
639
+}
964
+ "return": {}
640
+
965
+ }
641
+static void vu_get_blk_size(Object *obj, Visitor *v, const char *name,
966
+
642
+ void *opaque, Error **errp)
967
+Shutdown the guest, by issuing the ``quit`` QMP command::
643
+{
968
+
644
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
969
+ (QEMU) quit
645
+ uint32_t value = vus->blk_size;
970
+ {
646
+
971
+ "execute": "quit",
647
+ visit_type_uint32(v, name, &value, errp);
972
+ "arguments": {}
648
+}
973
+ }
649
+
974
+
650
+static void vu_set_blk_size(Object *obj, Visitor *v, const char *name,
975
+
651
+ void *opaque, Error **errp)
976
+Live disk backup --- ``drive-backup`` and ``blockdev-backup``
652
+{
977
+-------------------------------------------------------------
653
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
978
+
654
+
979
+The ``drive-backup`` (and its newer equivalent ``blockdev-backup``) allows
655
+ Error *local_err = NULL;
980
+you to create a point-in-time snapshot.
656
+ uint32_t value;
981
+
657
+
982
+In this case, the point-in-time is when you *start* the ``drive-backup``
658
+ if (!vu_prop_modifiable(vus, errp)) {
983
+(or its newer equivalent ``blockdev-backup``) command.
659
+ return;
984
+
660
+ }
985
+
661
+
986
+QMP invocation for ``drive-backup``
662
+ visit_type_uint32(v, name, &value, &local_err);
987
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
663
+ if (local_err) {
988
+
664
+ goto out;
989
+Yet again, starting afresh with our example disk image chain::
665
+ }
990
+
666
+
991
+ [A] <-- [B] <-- [C] <-- [D]
667
+ check_block_size(object_get_typename(obj), name, value, &local_err);
992
+
668
+ if (local_err) {
993
+To create a target image [E], with content populated from image [A] to
669
+ goto out;
994
+[D], from the above chain, the following is the syntax. (If the target
670
+ }
995
+image does not exist, ``drive-backup`` will create it)::
671
+
996
+
672
+ vus->blk_size = value;
997
+ (QEMU) drive-backup device=node-D sync=full target=e.qcow2 job-id=job0
673
+
998
+ {
674
+out:
999
+ "execute": "drive-backup",
675
+ error_propagate(errp, local_err);
1000
+ "arguments": {
676
+}
1001
+ "device": "node-D",
677
+
1002
+ "job-id": "job0",
678
+static void vhost_user_blk_server_instance_finalize(Object *obj)
1003
+ "sync": "full",
679
+{
1004
+ "target": "e.qcow2"
680
+ VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
1005
+ }
681
+
1006
+ }
682
+ vhost_user_blk_server_stop(vub);
1007
+
683
+
1008
+Once the above ``drive-backup`` has completed, a ``BLOCK_JOB_COMPLETED`` event
684
+ /*
1009
+will be issued, indicating the live block device job operation has
685
+ * Unlike object_property_add_str, object_class_property_add_str
1010
+completed, and no further action is required.
686
+ * doesn't have a release method. Thus manual memory freeing is
1011
+
687
+ * needed.
1012
+
688
+ */
1013
+Notes on ``blockdev-backup``
689
+ free_socket_addr(vub->addr);
1014
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
690
+ g_free(vub->node_name);
1015
+
691
+}
1016
+The ``blockdev-backup`` command is equivalent in functionality to
692
+
1017
+``drive-backup``, except that it operates at node-level in a Block Driver
693
+static void vhost_user_blk_server_complete(UserCreatable *obj, Error **errp)
1018
+State (BDS) graph.
694
+{
1019
+
695
+ VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
1020
+E.g. the sequence of actions to create a point-in-time backup
696
+
1021
+of an entire disk image chain, to a target, using ``blockdev-backup``
697
+ vhost_user_blk_server_start(vub, errp);
1022
+would be:
698
+}
1023
+
699
+
1024
+(0) Create the QCOW2 overlays, to arrive at a backing chain of desired
700
+static void vhost_user_blk_server_class_init(ObjectClass *klass,
1025
+ depth
701
+ void *class_data)
1026
+
702
+{
1027
+(1) Create the target image (using ``qemu-img``), say, ``e.qcow2``
703
+ UserCreatableClass *ucc = USER_CREATABLE_CLASS(klass);
1028
+
704
+ ucc->complete = vhost_user_blk_server_complete;
1029
+(2) Attach the above created file (``e.qcow2``), run-time, using
705
+
1030
+ ``blockdev-add`` to QEMU
706
+ object_class_property_add_bool(klass, "writable",
1031
+
707
+ vu_get_block_writable,
1032
+(3) Perform ``blockdev-backup`` (use ``"sync": "full"`` to copy the
708
+ vu_set_block_writable);
1033
+ entire chain to the target). And notice the event
709
+
1034
+ ``BLOCK_JOB_COMPLETED``
710
+ object_class_property_add_str(klass, "node-name",
1035
+
711
+ vu_get_node_name,
1036
+(4) Shutdown the guest, by issuing the QMP ``quit`` command, so that
712
+ vu_set_node_name);
1037
+ caches are flushed
713
+
1038
+
714
+ object_class_property_add_str(klass, "unix-socket",
1039
+(5) Then, finally, compare the contents of the disk image chain, and
715
+ vu_get_unix_socket,
1040
+ the target copy with ``qemu-img compare``. You should notice:
716
+ vu_set_unix_socket);
1041
+ "Images are identical"
717
+
1042
+
718
+ object_class_property_add(klass, "logical-block-size", "uint32",
1043
+The following section shows an example QMP invocation for
719
+ vu_get_blk_size, vu_set_blk_size,
1044
+``blockdev-backup``.
720
+ NULL, NULL);
1045
+
721
+}
1046
+QMP invocation for ``blockdev-backup``
722
+
1047
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
723
+static const TypeInfo vhost_user_blk_server_info = {
1048
+
724
+ .name = TYPE_VHOST_USER_BLK_SERVER,
1049
+Given a disk image chain of depth 1 where image [B] is the active
725
+ .parent = TYPE_OBJECT,
1050
+overlay (live QEMU is writing to it)::
726
+ .instance_size = sizeof(VuBlockDev),
1051
+
727
+ .instance_finalize = vhost_user_blk_server_instance_finalize,
1052
+ [A] <-- [B]
728
+ .class_init = vhost_user_blk_server_class_init,
1053
+
729
+ .interfaces = (InterfaceInfo[]) {
1054
+The following is the procedure to copy the content from the entire chain
730
+ {TYPE_USER_CREATABLE},
1055
+to a target image (say, [E]), which has the full content from [A] and
731
+ {}
1056
+[B].
732
+ },
1057
+
733
+};
1058
+Create the overlay [B]::
734
+
1059
+
735
+static void vhost_user_blk_server_register_types(void)
1060
+ (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2
736
+{
1061
+ {
737
+ type_register_static(&vhost_user_blk_server_info);
1062
+ "execute": "blockdev-snapshot-sync",
738
+}
1063
+ "arguments": {
739
+
1064
+ "node-name": "node-A",
740
+type_init(vhost_user_blk_server_register_types)
1065
+ "snapshot-file": "b.qcow2",
741
diff --git a/softmmu/vl.c b/softmmu/vl.c
1066
+ "format": "qcow2",
742
index XXXXXXX..XXXXXXX 100644
1067
+ "snapshot-node-name": "node-B"
743
--- a/softmmu/vl.c
1068
+ }
744
+++ b/softmmu/vl.c
1069
+ }
745
@@ -XXX,XX +XXX,XX @@ static bool object_create_initial(const char *type, QemuOpts *opts)
1070
+
746
}
1071
+
747
#endif
1072
+Create a target image that will contain the copy::
748
1073
+
749
+ /* Reason: vhost-user-blk-server property "node-name" */
1074
+ $ qemu-img create -f qcow2 e.qcow2 39M
750
+ if (g_str_equal(type, "vhost-user-blk-server")) {
1075
+
751
+ return false;
1076
+Then add it to QEMU via ``blockdev-add``::
752
+ }
1077
+
753
/*
1078
+ (QEMU) blockdev-add driver=qcow2 node-name=node-E file={"driver":"file","filename":"e.qcow2"}
754
* Reason: filter-* property "netdev" etc.
1079
+ {
755
*/
1080
+ "execute": "blockdev-add",
756
diff --git a/block/meson.build b/block/meson.build
1081
+ "arguments": {
757
index XXXXXXX..XXXXXXX 100644
1082
+ "node-name": "node-E",
758
--- a/block/meson.build
1083
+ "driver": "qcow2",
759
+++ b/block/meson.build
1084
+ "file": {
760
@@ -XXX,XX +XXX,XX @@ block_ss.add(when: 'CONFIG_WIN32', if_true: files('file-win32.c', 'win32-aio.c')
1085
+ "driver": "file",
761
block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, iokit])
1086
+ "filename": "e.qcow2"
762
block_ss.add(when: 'CONFIG_LIBISCSI', if_true: files('iscsi-opts.c'))
1087
+ }
763
block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c'))
1088
+ }
764
+block_ss.add(when: 'CONFIG_LINUX', if_true: files('export/vhost-user-blk-server.c', '../contrib/libvhost-user/libvhost-user.c'))
1089
+ }
765
block_ss.add(when: 'CONFIG_REPLICATION', if_true: files('replication.c'))
1090
+
766
block_ss.add(when: 'CONFIG_SHEEPDOG', if_true: files('sheepdog.c'))
1091
+Then invoke ``blockdev-backup`` to copy the contents from the entire
767
block_ss.add(when: ['CONFIG_LINUX_AIO', libaio], if_true: files('linux-aio.c'))
1092
+image chain, consisting of images [A] and [B] to the target image
1093
+'e.qcow2'::
1094
+
1095
+ (QEMU) blockdev-backup device=node-B target=node-E sync=full job-id=job0
1096
+ {
1097
+ "execute": "blockdev-backup",
1098
+ "arguments": {
1099
+ "device": "node-B",
1100
+ "job-id": "job0",
1101
+ "target": "node-E",
1102
+ "sync": "full"
1103
+ }
1104
+ }
1105
+
1106
+Once the above 'backup' operation has completed, the event,
1107
+``BLOCK_JOB_COMPLETED`` will be emitted, signalling successful
1108
+completion.
1109
+
1110
+Next, query for any active block device jobs (there should be none)::
1111
+
1112
+ (QEMU) query-block-jobs
1113
+ {
1114
+ "execute": "query-block-jobs",
1115
+ "arguments": {}
1116
+ }
1117
+
1118
+Shutdown the guest::
1119
+
1120
+ (QEMU) quit
1121
+ {
1122
+ "execute": "quit",
1123
+ "arguments": {}
1124
+ }
1125
+ "return": {}
1126
+ }
1127
+
1128
+.. note::
1129
+ The above step is really important; if forgotten, an error, "Failed
1130
+ to get shared "write" lock on e.qcow2", will be thrown when you do
1131
+ ``qemu-img compare`` to verify the integrity of the disk image
1132
+ with the backup content.
1133
+
1134
+
1135
+The end result will be the image 'e.qcow2' containing a
1136
+point-in-time backup of the disk image chain -- i.e. contents from
1137
+images [A] and [B] at the time the ``blockdev-backup`` command was
1138
+initiated.
1139
+
1140
+One way to confirm the backup disk image contains the identical content
1141
+with the disk image chain is to compare the backup and the contents of
1142
+the chain, you should see "Images are identical". (NB: this is assuming
1143
+QEMU was launched with ``-S`` option, which will not start the CPUs at
1144
+guest boot up)::
1145
+
1146
+ $ qemu-img compare b.qcow2 e.qcow2
1147
+ Warning: Image size mismatch!
1148
+ Images are identical.
1149
+
1150
+NOTE: The "Warning: Image size mismatch!" is expected, as we created the
1151
+target image (e.qcow2) with 39M size.
1152
diff --git a/docs/live-block-ops.txt b/docs/live-block-ops.txt
1153
deleted file mode 100644
1154
index XXXXXXX..XXXXXXX
1155
--- a/docs/live-block-ops.txt
1156
+++ /dev/null
1157
@@ -XXX,XX +XXX,XX @@
1158
-LIVE BLOCK OPERATIONS
1159
-=====================
1160
-
1161
-High level description of live block operations. Note these are not
1162
-supported for use with the raw format at the moment.
1163
-
1164
-Note also that this document is incomplete and it currently only
1165
-covers the 'stream' operation. Other operations supported by QEMU such
1166
-as 'commit', 'mirror' and 'backup' are not described here yet. Please
1167
-refer to the qapi/block-core.json file for an overview of those.
1168
-
1169
-Snapshot live merge
1170
-===================
1171
-
1172
-Given a snapshot chain, described in this document in the following
1173
-format:
1174
-
1175
-[A] <- [B] <- [C] <- [D] <- [E]
1176
-
1177
-Where the rightmost object ([E] in the example) described is the current
1178
-image which the guest OS has write access to. To the left of it is its base
1179
-image, and so on accordingly until the leftmost image, which has no
1180
-base.
1181
-
1182
-The snapshot live merge operation transforms such a chain into a
1183
-smaller one with fewer elements, such as this transformation relative
1184
-to the first example:
1185
-
1186
-[A] <- [E]
1187
-
1188
-Data is copied in the right direction with destination being the
1189
-rightmost image, but any other intermediate image can be specified
1190
-instead. In this example data is copied from [C] into [D], so [D] can
1191
-be backed by [B]:
1192
-
1193
-[A] <- [B] <- [D] <- [E]
1194
-
1195
-The operation is implemented in QEMU through image streaming facilities.
1196
-
1197
-The basic idea is to execute 'block_stream virtio0' while the guest is
1198
-running. Progress can be monitored using 'info block-jobs'. When the
1199
-streaming operation completes it raises a QMP event. 'block_stream'
1200
-copies data from the backing file(s) into the active image. When finished,
1201
-it adjusts the backing file pointer.
1202
-
1203
-The 'base' parameter specifies an image which data need not be
1204
-streamed from. This image will be used as the backing file for the
1205
-destination image when the operation is finished.
1206
-
1207
-In the first example above, the command would be:
1208
-
1209
-(qemu) block_stream virtio0 file-A.img
1210
-
1211
-In order to specify a destination image different from the active
1212
-(rightmost) one we can use its node name instead.
1213
-
1214
-In the second example above, the command would be:
1215
-
1216
-(qemu) block_stream node-D file-B.img
1217
-
1218
-Live block copy
1219
-===============
1220
-
1221
-To copy an in use image to another destination in the filesystem, one
1222
-should create a live snapshot in the desired destination, then stream
1223
-into that image. Example:
1224
-
1225
-(qemu) snapshot_blkdev ide0-hd0 /new-path/disk.img qcow2
1226
-
1227
-(qemu) block_stream ide0-hd0
1228
-
1229
-
1230
--
768
--
1231
2.9.4
769
2.26.2
1232
770
1233
diff view generated by jsdifflib
New patch
1
From: Coiby Xu <coiby.xu@gmail.com>
1
2
3
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
4
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
5
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
7
Message-id: 20200918080912.321299-8-coiby.xu@gmail.com
8
[Removed reference to vhost-user-blk-test.c, it will be sent in a
9
separate pull request.
10
--Stefan]
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
13
MAINTAINERS | 7 +++++++
14
1 file changed, 7 insertions(+)
15
16
diff --git a/MAINTAINERS b/MAINTAINERS
17
index XXXXXXX..XXXXXXX 100644
18
--- a/MAINTAINERS
19
+++ b/MAINTAINERS
20
@@ -XXX,XX +XXX,XX @@ L: qemu-block@nongnu.org
21
S: Supported
22
F: tests/image-fuzzer/
23
24
+Vhost-user block device backend server
25
+M: Coiby Xu <Coiby.Xu@gmail.com>
26
+S: Maintained
27
+F: block/export/vhost-user-blk-server.c
28
+F: util/vhost-user-server.c
29
+F: tests/qtest/libqos/vhost-user-blk.c
30
+
31
Replication
32
M: Wen Congyang <wencongyang2@huawei.com>
33
M: Xie Changlong <xiechanglong.d@gmail.com>
34
--
35
2.26.2
36
diff view generated by jsdifflib
New patch
1
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2
Message-id: 20200924151549.913737-3-stefanha@redhat.com
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
---
5
util/vhost-user-server.c | 2 +-
6
1 file changed, 1 insertion(+), 1 deletion(-)
1
7
8
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
9
index XXXXXXX..XXXXXXX 100644
10
--- a/util/vhost-user-server.c
11
+++ b/util/vhost-user-server.c
12
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
13
return false;
14
}
15
16
- /* zero out unspecified fileds */
17
+ /* zero out unspecified fields */
18
*server = (VuServer) {
19
.listener = listener,
20
.vu_iface = vu_iface,
21
--
22
2.26.2
23
diff view generated by jsdifflib
New patch
1
We already have access to the value with the correct type (ioc and sioc
2
are the same QIOChannel).
1
3
4
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Message-id: 20200924151549.913737-4-stefanha@redhat.com
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
---
8
util/vhost-user-server.c | 2 +-
9
1 file changed, 1 insertion(+), 1 deletion(-)
10
11
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
12
index XXXXXXX..XXXXXXX 100644
13
--- a/util/vhost-user-server.c
14
+++ b/util/vhost-user-server.c
15
@@ -XXX,XX +XXX,XX @@ static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
16
server->ioc = QIO_CHANNEL(sioc);
17
object_ref(OBJECT(server->ioc));
18
qio_channel_attach_aio_context(server->ioc, server->ctx);
19
- qio_channel_set_blocking(QIO_CHANNEL(server->sioc), false, NULL);
20
+ qio_channel_set_blocking(server->ioc, false, NULL);
21
vu_client_start(server);
22
}
23
24
--
25
2.26.2
26
diff view generated by jsdifflib
New patch
1
Explicitly deleting watches is not necessary since libvhost-user calls
2
remove_watch() during vu_deinit(). Add an assertion to check this
3
though.
1
4
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Message-id: 20200924151549.913737-5-stefanha@redhat.com
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
---
9
util/vhost-user-server.c | 19 ++++---------------
10
1 file changed, 4 insertions(+), 15 deletions(-)
11
12
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
13
index XXXXXXX..XXXXXXX 100644
14
--- a/util/vhost-user-server.c
15
+++ b/util/vhost-user-server.c
16
@@ -XXX,XX +XXX,XX @@ static void close_client(VuServer *server)
17
/* When this is set vu_client_trip will stop new processing vhost-user message */
18
server->sioc = NULL;
19
20
- VuFdWatch *vu_fd_watch, *next;
21
- QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
22
- aio_set_fd_handler(server->ioc->ctx, vu_fd_watch->fd, true, NULL,
23
- NULL, NULL, NULL);
24
- }
25
-
26
- while (!QTAILQ_EMPTY(&server->vu_fd_watches)) {
27
- QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
28
- if (!vu_fd_watch->processing) {
29
- QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
30
- g_free(vu_fd_watch);
31
- }
32
- }
33
- }
34
-
35
while (server->processing_msg) {
36
if (server->ioc->read_coroutine) {
37
server->ioc->read_coroutine = NULL;
38
@@ -XXX,XX +XXX,XX @@ static void close_client(VuServer *server)
39
}
40
41
vu_deinit(&server->vu_dev);
42
+
43
+ /* vu_deinit() should have called remove_watch() */
44
+ assert(QTAILQ_EMPTY(&server->vu_fd_watches));
45
+
46
object_unref(OBJECT(sioc));
47
object_unref(OBJECT(server->ioc));
48
}
49
--
50
2.26.2
51
diff view generated by jsdifflib
New patch
1
Only one struct is needed per request. Drop req_data and the separate
2
VuBlockReq instance. Instead let vu_queue_pop() allocate everything at
3
once.
1
4
5
This fixes the req_data memory leak in vu_block_virtio_process_req().
6
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Message-id: 20200924151549.913737-6-stefanha@redhat.com
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
11
block/export/vhost-user-blk-server.c | 68 +++++++++-------------------
12
1 file changed, 21 insertions(+), 47 deletions(-)
13
14
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
15
index XXXXXXX..XXXXXXX 100644
16
--- a/block/export/vhost-user-blk-server.c
17
+++ b/block/export/vhost-user-blk-server.c
18
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_inhdr {
19
};
20
21
typedef struct VuBlockReq {
22
- VuVirtqElement *elem;
23
+ VuVirtqElement elem;
24
int64_t sector_num;
25
size_t size;
26
struct virtio_blk_inhdr *in;
27
@@ -XXX,XX +XXX,XX @@ static void vu_block_req_complete(VuBlockReq *req)
28
VuDev *vu_dev = &req->server->vu_dev;
29
30
/* IO size with 1 extra status byte */
31
- vu_queue_push(vu_dev, req->vq, req->elem, req->size + 1);
32
+ vu_queue_push(vu_dev, req->vq, &req->elem, req->size + 1);
33
vu_queue_notify(vu_dev, req->vq);
34
35
- if (req->elem) {
36
- free(req->elem);
37
- }
38
-
39
- g_free(req);
40
+ free(req);
41
}
42
43
static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
44
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_flush(VuBlockReq *req)
45
blk_co_flush(backend);
46
}
47
48
-struct req_data {
49
- VuServer *server;
50
- VuVirtq *vq;
51
- VuVirtqElement *elem;
52
-};
53
-
54
static void coroutine_fn vu_block_virtio_process_req(void *opaque)
55
{
56
- struct req_data *data = opaque;
57
- VuServer *server = data->server;
58
- VuVirtq *vq = data->vq;
59
- VuVirtqElement *elem = data->elem;
60
+ VuBlockReq *req = opaque;
61
+ VuServer *server = req->server;
62
+ VuVirtqElement *elem = &req->elem;
63
uint32_t type;
64
- VuBlockReq *req;
65
66
VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
67
BlockBackend *backend = vdev_blk->backend;
68
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
69
struct iovec *out_iov = elem->out_sg;
70
unsigned in_num = elem->in_num;
71
unsigned out_num = elem->out_num;
72
+
73
/* refer to hw/block/virtio_blk.c */
74
if (elem->out_num < 1 || elem->in_num < 1) {
75
error_report("virtio-blk request missing headers");
76
- free(elem);
77
- return;
78
+ goto err;
79
}
80
81
- req = g_new0(VuBlockReq, 1);
82
- req->server = server;
83
- req->vq = vq;
84
- req->elem = elem;
85
-
86
if (unlikely(iov_to_buf(out_iov, out_num, 0, &req->out,
87
sizeof(req->out)) != sizeof(req->out))) {
88
error_report("virtio-blk request outhdr too short");
89
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
90
91
err:
92
free(elem);
93
- g_free(req);
94
- return;
95
}
96
97
static void vu_block_process_vq(VuDev *vu_dev, int idx)
98
{
99
- VuServer *server;
100
- VuVirtq *vq;
101
- struct req_data *req_data;
102
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
103
+ VuVirtq *vq = vu_get_queue(vu_dev, idx);
104
105
- server = container_of(vu_dev, VuServer, vu_dev);
106
- assert(server);
107
-
108
- vq = vu_get_queue(vu_dev, idx);
109
- assert(vq);
110
- VuVirtqElement *elem;
111
while (1) {
112
- elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) +
113
- sizeof(VuBlockReq));
114
- if (elem) {
115
- req_data = g_new0(struct req_data, 1);
116
- req_data->server = server;
117
- req_data->vq = vq;
118
- req_data->elem = elem;
119
- Coroutine *co = qemu_coroutine_create(vu_block_virtio_process_req,
120
- req_data);
121
- aio_co_enter(server->ioc->ctx, co);
122
- } else {
123
+ VuBlockReq *req;
124
+
125
+ req = vu_queue_pop(vu_dev, vq, sizeof(VuBlockReq));
126
+ if (!req) {
127
break;
128
}
129
+
130
+ req->server = server;
131
+ req->vq = vq;
132
+
133
+ Coroutine *co =
134
+ qemu_coroutine_create(vu_block_virtio_process_req, req);
135
+ qemu_coroutine_enter(co);
136
}
137
}
138
139
--
140
2.26.2
141
diff view generated by jsdifflib
New patch
1
The device panic notifier callback is not used. Drop it.
1
2
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
Message-id: 20200924151549.913737-7-stefanha@redhat.com
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
---
7
util/vhost-user-server.h | 3 ---
8
block/export/vhost-user-blk-server.c | 3 +--
9
util/vhost-user-server.c | 6 ------
10
3 files changed, 1 insertion(+), 11 deletions(-)
11
12
diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
13
index XXXXXXX..XXXXXXX 100644
14
--- a/util/vhost-user-server.h
15
+++ b/util/vhost-user-server.h
16
@@ -XXX,XX +XXX,XX @@ typedef struct VuFdWatch {
17
} VuFdWatch;
18
19
typedef struct VuServer VuServer;
20
-typedef void DevicePanicNotifierFn(VuServer *server);
21
22
struct VuServer {
23
QIONetListener *listener;
24
AioContext *ctx;
25
- DevicePanicNotifierFn *device_panic_notifier;
26
int max_queues;
27
const VuDevIface *vu_iface;
28
VuDev vu_dev;
29
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
30
SocketAddress *unix_socket,
31
AioContext *ctx,
32
uint16_t max_queues,
33
- DevicePanicNotifierFn *device_panic_notifier,
34
const VuDevIface *vu_iface,
35
Error **errp);
36
37
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
38
index XXXXXXX..XXXXXXX 100644
39
--- a/block/export/vhost-user-blk-server.c
40
+++ b/block/export/vhost-user-blk-server.c
41
@@ -XXX,XX +XXX,XX @@ static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
42
ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
43
44
if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
45
- VHOST_USER_BLK_MAX_QUEUES,
46
- NULL, &vu_block_iface,
47
+ VHOST_USER_BLK_MAX_QUEUES, &vu_block_iface,
48
errp)) {
49
goto error;
50
}
51
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
52
index XXXXXXX..XXXXXXX 100644
53
--- a/util/vhost-user-server.c
54
+++ b/util/vhost-user-server.c
55
@@ -XXX,XX +XXX,XX @@ static void panic_cb(VuDev *vu_dev, const char *buf)
56
close_client(server);
57
}
58
59
- if (server->device_panic_notifier) {
60
- server->device_panic_notifier(server);
61
- }
62
-
63
/*
64
* Set the callback function for network listener so another
65
* vhost-user client can connect to this server
66
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
67
SocketAddress *socket_addr,
68
AioContext *ctx,
69
uint16_t max_queues,
70
- DevicePanicNotifierFn *device_panic_notifier,
71
const VuDevIface *vu_iface,
72
Error **errp)
73
{
74
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
75
.vu_iface = vu_iface,
76
.max_queues = max_queues,
77
.ctx = ctx,
78
- .device_panic_notifier = device_panic_notifier,
79
};
80
81
qio_net_listener_set_name(server->listener, "vhost-user-backend-listener");
82
--
83
2.26.2
84
diff view generated by jsdifflib
New patch
1
fds[] is leaked when qio_channel_readv_full() fails.
1
2
3
Use vmsg->fds[] instead of keeping a local fds[] array. Then we can
4
reuse goto fail to clean up fds. vmsg->fd_num must be zeroed before the
5
loop to make this safe.
6
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Message-id: 20200924151549.913737-8-stefanha@redhat.com
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
11
util/vhost-user-server.c | 50 ++++++++++++++++++----------------------
12
1 file changed, 23 insertions(+), 27 deletions(-)
13
14
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
15
index XXXXXXX..XXXXXXX 100644
16
--- a/util/vhost-user-server.c
17
+++ b/util/vhost-user-server.c
18
@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
19
};
20
int rc, read_bytes = 0;
21
Error *local_err = NULL;
22
- /*
23
- * Store fds/nfds returned from qio_channel_readv_full into
24
- * temporary variables.
25
- *
26
- * VhostUserMsg is a packed structure, gcc will complain about passing
27
- * pointer to a packed structure member if we pass &VhostUserMsg.fd_num
28
- * and &VhostUserMsg.fds directly when calling qio_channel_readv_full,
29
- * thus two temporary variables nfds and fds are used here.
30
- */
31
- size_t nfds = 0, nfds_t = 0;
32
const size_t max_fds = G_N_ELEMENTS(vmsg->fds);
33
- int *fds_t = NULL;
34
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
35
QIOChannel *ioc = server->ioc;
36
37
+ vmsg->fd_num = 0;
38
if (!ioc) {
39
error_report_err(local_err);
40
goto fail;
41
@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
42
43
assert(qemu_in_coroutine());
44
do {
45
+ size_t nfds = 0;
46
+ int *fds = NULL;
47
+
48
/*
49
* qio_channel_readv_full may have short reads, keeping calling it
50
* until getting VHOST_USER_HDR_SIZE or 0 bytes in total
51
*/
52
- rc = qio_channel_readv_full(ioc, &iov, 1, &fds_t, &nfds_t, &local_err);
53
+ rc = qio_channel_readv_full(ioc, &iov, 1, &fds, &nfds, &local_err);
54
if (rc < 0) {
55
if (rc == QIO_CHANNEL_ERR_BLOCK) {
56
+ assert(local_err == NULL);
57
qio_channel_yield(ioc, G_IO_IN);
58
continue;
59
} else {
60
error_report_err(local_err);
61
- return false;
62
+ goto fail;
63
}
64
}
65
- read_bytes += rc;
66
- if (nfds_t > 0) {
67
- if (nfds + nfds_t > max_fds) {
68
+
69
+ if (nfds > 0) {
70
+ if (vmsg->fd_num + nfds > max_fds) {
71
error_report("A maximum of %zu fds are allowed, "
72
"however got %zu fds now",
73
- max_fds, nfds + nfds_t);
74
+ max_fds, vmsg->fd_num + nfds);
75
+ g_free(fds);
76
goto fail;
77
}
78
- memcpy(vmsg->fds + nfds, fds_t,
79
- nfds_t *sizeof(vmsg->fds[0]));
80
- nfds += nfds_t;
81
- g_free(fds_t);
82
+ memcpy(vmsg->fds + vmsg->fd_num, fds, nfds * sizeof(vmsg->fds[0]));
83
+ vmsg->fd_num += nfds;
84
+ g_free(fds);
85
}
86
- if (read_bytes == VHOST_USER_HDR_SIZE || rc == 0) {
87
- break;
88
+
89
+ if (rc == 0) { /* socket closed */
90
+ goto fail;
91
}
92
- iov.iov_base = (char *)vmsg + read_bytes;
93
- iov.iov_len = VHOST_USER_HDR_SIZE - read_bytes;
94
- } while (true);
95
96
- vmsg->fd_num = nfds;
97
+ iov.iov_base += rc;
98
+ iov.iov_len -= rc;
99
+ read_bytes += rc;
100
+ } while (read_bytes != VHOST_USER_HDR_SIZE);
101
+
102
/* qio_channel_readv_full will make socket fds blocking, unblock them */
103
vmsg_unblock_fds(vmsg);
104
if (vmsg->size > sizeof(vmsg->payload)) {
105
--
106
2.26.2
107
diff view generated by jsdifflib
New patch
1
Unexpected EOF is an error that must be reported.
1
2
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
Message-id: 20200924151549.913737-9-stefanha@redhat.com
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
---
7
util/vhost-user-server.c | 6 ++++--
8
1 file changed, 4 insertions(+), 2 deletions(-)
9
10
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
11
index XXXXXXX..XXXXXXX 100644
12
--- a/util/vhost-user-server.c
13
+++ b/util/vhost-user-server.c
14
@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
15
};
16
if (vmsg->size) {
17
rc = qio_channel_readv_all_eof(ioc, &iov_payload, 1, &local_err);
18
- if (rc == -1) {
19
- error_report_err(local_err);
20
+ if (rc != 1) {
21
+ if (local_err) {
22
+ error_report_err(local_err);
23
+ }
24
goto fail;
25
}
26
}
27
--
28
2.26.2
29
diff view generated by jsdifflib
New patch
1
The vu_client_trip() coroutine is leaked during AioContext switching. It
2
is also unsafe to destroy the vu_dev in panic_cb() since its callers
3
still access it in some cases.
1
4
5
Rework the lifecycle to solve these safety issues.
6
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Message-id: 20200924151549.913737-10-stefanha@redhat.com
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
11
util/vhost-user-server.h | 29 ++--
12
block/export/vhost-user-blk-server.c | 9 +-
13
util/vhost-user-server.c | 245 +++++++++++++++------------
14
3 files changed, 155 insertions(+), 128 deletions(-)
15
16
diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
17
index XXXXXXX..XXXXXXX 100644
18
--- a/util/vhost-user-server.h
19
+++ b/util/vhost-user-server.h
20
@@ -XXX,XX +XXX,XX @@
21
#include "qapi/error.h"
22
#include "standard-headers/linux/virtio_blk.h"
23
24
+/* A kick fd that we monitor on behalf of libvhost-user */
25
typedef struct VuFdWatch {
26
VuDev *vu_dev;
27
int fd; /*kick fd*/
28
void *pvt;
29
vu_watch_cb cb;
30
- bool processing;
31
QTAILQ_ENTRY(VuFdWatch) next;
32
} VuFdWatch;
33
34
-typedef struct VuServer VuServer;
35
-
36
-struct VuServer {
37
+/**
38
+ * VuServer:
39
+ * A vhost-user server instance with user-defined VuDevIface callbacks.
40
+ * Vhost-user device backends can be implemented using VuServer. VuDevIface
41
+ * callbacks and virtqueue kicks run in the given AioContext.
42
+ */
43
+typedef struct {
44
QIONetListener *listener;
45
+ QEMUBH *restart_listener_bh;
46
AioContext *ctx;
47
int max_queues;
48
const VuDevIface *vu_iface;
49
+
50
+ /* Protected by ctx lock */
51
VuDev vu_dev;
52
QIOChannel *ioc; /* The I/O channel with the client */
53
QIOChannelSocket *sioc; /* The underlying data channel with the client */
54
- /* IOChannel for fd provided via VHOST_USER_SET_SLAVE_REQ_FD */
55
- QIOChannel *ioc_slave;
56
- QIOChannelSocket *sioc_slave;
57
- Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
58
QTAILQ_HEAD(, VuFdWatch) vu_fd_watches;
59
- /* restart coroutine co_trip if AIOContext is changed */
60
- bool aio_context_changed;
61
- bool processing_msg;
62
-};
63
+
64
+ Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
65
+} VuServer;
66
67
bool vhost_user_server_start(VuServer *server,
68
SocketAddress *unix_socket,
69
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
70
71
void vhost_user_server_stop(VuServer *server);
72
73
-void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx);
74
+void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx);
75
+void vhost_user_server_detach_aio_context(VuServer *server);
76
77
#endif /* VHOST_USER_SERVER_H */
78
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
79
index XXXXXXX..XXXXXXX 100644
80
--- a/block/export/vhost-user-blk-server.c
81
+++ b/block/export/vhost-user-blk-server.c
82
@@ -XXX,XX +XXX,XX @@ static const VuDevIface vu_block_iface = {
83
static void blk_aio_attached(AioContext *ctx, void *opaque)
84
{
85
VuBlockDev *vub_dev = opaque;
86
- aio_context_acquire(ctx);
87
- vhost_user_server_set_aio_context(&vub_dev->vu_server, ctx);
88
- aio_context_release(ctx);
89
+ vhost_user_server_attach_aio_context(&vub_dev->vu_server, ctx);
90
}
91
92
static void blk_aio_detach(void *opaque)
93
{
94
VuBlockDev *vub_dev = opaque;
95
- AioContext *ctx = vub_dev->vu_server.ctx;
96
- aio_context_acquire(ctx);
97
- vhost_user_server_set_aio_context(&vub_dev->vu_server, NULL);
98
- aio_context_release(ctx);
99
+ vhost_user_server_detach_aio_context(&vub_dev->vu_server);
100
}
101
102
static void
103
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
104
index XXXXXXX..XXXXXXX 100644
105
--- a/util/vhost-user-server.c
106
+++ b/util/vhost-user-server.c
107
@@ -XXX,XX +XXX,XX @@
108
*/
109
#include "qemu/osdep.h"
110
#include "qemu/main-loop.h"
111
+#include "block/aio-wait.h"
112
#include "vhost-user-server.h"
113
114
+/*
115
+ * Theory of operation:
116
+ *
117
+ * VuServer is started and stopped by vhost_user_server_start() and
118
+ * vhost_user_server_stop() from the main loop thread. Starting the server
119
+ * opens a vhost-user UNIX domain socket and listens for incoming connections.
120
+ * Only one connection is allowed at a time.
121
+ *
122
+ * The connection is handled by the vu_client_trip() coroutine in the
123
+ * VuServer->ctx AioContext. The coroutine consists of a vu_dispatch() loop
124
+ * where libvhost-user calls vu_message_read() to receive the next vhost-user
125
+ * protocol messages over the UNIX domain socket.
126
+ *
127
+ * When virtqueues are set up libvhost-user calls set_watch() to monitor kick
128
+ * fds. These fds are also handled in the VuServer->ctx AioContext.
129
+ *
130
+ * Both vu_client_trip() and kick fd monitoring can be stopped by shutting down
131
+ * the socket connection. Shutting down the socket connection causes
132
+ * vu_message_read() to fail since no more data can be received from the socket.
133
+ * After vu_dispatch() fails, vu_client_trip() calls vu_deinit() to stop
134
+ * libvhost-user before terminating the coroutine. vu_deinit() calls
135
+ * remove_watch() to stop monitoring kick fds and this stops virtqueue
136
+ * processing.
137
+ *
138
+ * When vu_client_trip() has finished cleaning up it schedules a BH in the main
139
+ * loop thread to accept the next client connection.
140
+ *
141
+ * When libvhost-user detects an error it calls panic_cb() and sets the
142
+ * dev->broken flag. Both vu_client_trip() and kick fd processing stop when
143
+ * the dev->broken flag is set.
144
+ *
145
+ * It is possible to switch AioContexts using
146
+ * vhost_user_server_detach_aio_context() and
147
+ * vhost_user_server_attach_aio_context(). They stop monitoring fds in the old
148
+ * AioContext and resume monitoring in the new AioContext. The vu_client_trip()
149
+ * coroutine remains in a yielded state during the switch. This is made
150
+ * possible by QIOChannel's support for spurious coroutine re-entry in
151
+ * qio_channel_yield(). The coroutine will restart I/O when re-entered from the
152
+ * new AioContext.
153
+ */
154
+
155
static void vmsg_close_fds(VhostUserMsg *vmsg)
156
{
157
int i;
158
@@ -XXX,XX +XXX,XX @@ static void vmsg_unblock_fds(VhostUserMsg *vmsg)
159
}
160
}
161
162
-static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
163
- gpointer opaque);
164
-
165
-static void close_client(VuServer *server)
166
-{
167
- /*
168
- * Before closing the client
169
- *
170
- * 1. Let vu_client_trip stop processing new vhost-user msg
171
- *
172
- * 2. remove kick_handler
173
- *
174
- * 3. wait for the kick handler to be finished
175
- *
176
- * 4. wait for the current vhost-user msg to be finished processing
177
- */
178
-
179
- QIOChannelSocket *sioc = server->sioc;
180
- /* When this is set vu_client_trip will stop new processing vhost-user message */
181
- server->sioc = NULL;
182
-
183
- while (server->processing_msg) {
184
- if (server->ioc->read_coroutine) {
185
- server->ioc->read_coroutine = NULL;
186
- qio_channel_set_aio_fd_handler(server->ioc, server->ioc->ctx, NULL,
187
- NULL, server->ioc);
188
- server->processing_msg = false;
189
- }
190
- }
191
-
192
- vu_deinit(&server->vu_dev);
193
-
194
- /* vu_deinit() should have called remove_watch() */
195
- assert(QTAILQ_EMPTY(&server->vu_fd_watches));
196
-
197
- object_unref(OBJECT(sioc));
198
- object_unref(OBJECT(server->ioc));
199
-}
200
-
201
static void panic_cb(VuDev *vu_dev, const char *buf)
202
{
203
- VuServer *server = container_of(vu_dev, VuServer, vu_dev);
204
-
205
- /* avoid while loop in close_client */
206
- server->processing_msg = false;
207
-
208
- if (buf) {
209
- error_report("vu_panic: %s", buf);
210
- }
211
-
212
- if (server->sioc) {
213
- close_client(server);
214
- }
215
-
216
- /*
217
- * Set the callback function for network listener so another
218
- * vhost-user client can connect to this server
219
- */
220
- qio_net_listener_set_client_func(server->listener,
221
- vu_accept,
222
- server,
223
- NULL);
224
+ error_report("vu_panic: %s", buf);
225
}
226
227
static bool coroutine_fn
228
@@ -XXX,XX +XXX,XX @@ fail:
229
return false;
230
}
231
232
-
233
-static void vu_client_start(VuServer *server);
234
static coroutine_fn void vu_client_trip(void *opaque)
235
{
236
VuServer *server = opaque;
237
+ VuDev *vu_dev = &server->vu_dev;
238
239
- while (!server->aio_context_changed && server->sioc) {
240
- server->processing_msg = true;
241
- vu_dispatch(&server->vu_dev);
242
- server->processing_msg = false;
243
+ while (!vu_dev->broken && vu_dispatch(vu_dev)) {
244
+ /* Keep running */
245
}
246
247
- if (server->aio_context_changed && server->sioc) {
248
- server->aio_context_changed = false;
249
- vu_client_start(server);
250
- }
251
-}
252
+ vu_deinit(vu_dev);
253
+
254
+ /* vu_deinit() should have called remove_watch() */
255
+ assert(QTAILQ_EMPTY(&server->vu_fd_watches));
256
+
257
+ object_unref(OBJECT(server->sioc));
258
+ server->sioc = NULL;
259
260
-static void vu_client_start(VuServer *server)
261
-{
262
- server->co_trip = qemu_coroutine_create(vu_client_trip, server);
263
- aio_co_enter(server->ctx, server->co_trip);
264
+ object_unref(OBJECT(server->ioc));
265
+ server->ioc = NULL;
266
+
267
+ server->co_trip = NULL;
268
+ if (server->restart_listener_bh) {
269
+ qemu_bh_schedule(server->restart_listener_bh);
270
+ }
271
+ aio_wait_kick();
272
}
273
274
/*
275
@@ -XXX,XX +XXX,XX @@ static void vu_client_start(VuServer *server)
276
static void kick_handler(void *opaque)
277
{
278
VuFdWatch *vu_fd_watch = opaque;
279
- vu_fd_watch->processing = true;
280
- vu_fd_watch->cb(vu_fd_watch->vu_dev, 0, vu_fd_watch->pvt);
281
- vu_fd_watch->processing = false;
282
+ VuDev *vu_dev = vu_fd_watch->vu_dev;
283
+
284
+ vu_fd_watch->cb(vu_dev, 0, vu_fd_watch->pvt);
285
+
286
+ /* Stop vu_client_trip() if an error occurred in vu_fd_watch->cb() */
287
+ if (vu_dev->broken) {
288
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
289
+
290
+ qio_channel_shutdown(server->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
291
+ }
292
}
293
294
-
295
static VuFdWatch *find_vu_fd_watch(VuServer *server, int fd)
296
{
297
298
@@ -XXX,XX +XXX,XX @@ static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
299
qio_channel_set_name(QIO_CHANNEL(sioc), "vhost-user client");
300
server->ioc = QIO_CHANNEL(sioc);
301
object_ref(OBJECT(server->ioc));
302
- qio_channel_attach_aio_context(server->ioc, server->ctx);
303
+
304
+ /* TODO vu_message_write() spins if non-blocking! */
305
qio_channel_set_blocking(server->ioc, false, NULL);
306
- vu_client_start(server);
307
+
308
+ server->co_trip = qemu_coroutine_create(vu_client_trip, server);
309
+
310
+ aio_context_acquire(server->ctx);
311
+ vhost_user_server_attach_aio_context(server, server->ctx);
312
+ aio_context_release(server->ctx);
313
}
314
315
-
316
void vhost_user_server_stop(VuServer *server)
317
{
318
+ aio_context_acquire(server->ctx);
319
+
320
+ qemu_bh_delete(server->restart_listener_bh);
321
+ server->restart_listener_bh = NULL;
322
+
323
if (server->sioc) {
324
- close_client(server);
325
+ VuFdWatch *vu_fd_watch;
326
+
327
+ QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
328
+ aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true,
329
+ NULL, NULL, NULL, vu_fd_watch);
330
+ }
331
+
332
+ qio_channel_shutdown(server->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
333
+
334
+ AIO_WAIT_WHILE(server->ctx, server->co_trip);
335
}
336
337
+ aio_context_release(server->ctx);
338
+
339
if (server->listener) {
340
qio_net_listener_disconnect(server->listener);
341
object_unref(OBJECT(server->listener));
342
}
343
+}
344
+
345
+/*
346
+ * Allow the next client to connect to the server. Called from a BH in the main
347
+ * loop.
348
+ */
349
+static void restart_listener_bh(void *opaque)
350
+{
351
+ VuServer *server = opaque;
352
353
+ qio_net_listener_set_client_func(server->listener, vu_accept, server,
354
+ NULL);
355
}
356
357
-void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx)
358
+/* Called with ctx acquired */
359
+void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx)
360
{
361
- VuFdWatch *vu_fd_watch, *next;
362
- void *opaque = NULL;
363
- IOHandler *io_read = NULL;
364
- bool attach;
365
+ VuFdWatch *vu_fd_watch;
366
367
- server->ctx = ctx ? ctx : qemu_get_aio_context();
368
+ server->ctx = ctx;
369
370
if (!server->sioc) {
371
- /* not yet serving any client*/
372
return;
373
}
374
375
- if (ctx) {
376
- qio_channel_attach_aio_context(server->ioc, ctx);
377
- server->aio_context_changed = true;
378
- io_read = kick_handler;
379
- attach = true;
380
- } else {
381
+ qio_channel_attach_aio_context(server->ioc, ctx);
382
+
383
+ QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
384
+ aio_set_fd_handler(ctx, vu_fd_watch->fd, true, kick_handler, NULL,
385
+ NULL, vu_fd_watch);
386
+ }
387
+
388
+ aio_co_schedule(ctx, server->co_trip);
389
+}
390
+
391
+/* Called with server->ctx acquired */
392
+void vhost_user_server_detach_aio_context(VuServer *server)
393
+{
394
+ if (server->sioc) {
395
+ VuFdWatch *vu_fd_watch;
396
+
397
+ QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
398
+ aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true,
399
+ NULL, NULL, NULL, vu_fd_watch);
400
+ }
401
+
402
qio_channel_detach_aio_context(server->ioc);
403
- /* server->ioc->ctx keeps the old AioConext */
404
- ctx = server->ioc->ctx;
405
- attach = false;
406
}
407
408
- QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
409
- if (vu_fd_watch->cb) {
410
- opaque = attach ? vu_fd_watch : NULL;
411
- aio_set_fd_handler(ctx, vu_fd_watch->fd, true,
412
- io_read, NULL, NULL,
413
- opaque);
414
- }
415
- }
416
+ server->ctx = NULL;
417
}
418
419
-
420
bool vhost_user_server_start(VuServer *server,
421
SocketAddress *socket_addr,
422
AioContext *ctx,
423
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
424
const VuDevIface *vu_iface,
425
Error **errp)
426
{
427
+ QEMUBH *bh;
428
QIONetListener *listener = qio_net_listener_new();
429
if (qio_net_listener_open_sync(listener, socket_addr, 1,
430
errp) < 0) {
431
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
432
return false;
433
}
434
435
+ bh = qemu_bh_new(restart_listener_bh, server);
436
+
437
/* zero out unspecified fields */
438
*server = (VuServer) {
439
.listener = listener,
440
+ .restart_listener_bh = bh,
441
.vu_iface = vu_iface,
442
.max_queues = max_queues,
443
.ctx = ctx,
444
--
445
2.26.2
446
diff view generated by jsdifflib
New patch
1
Propagate the flush return value since errors are possible.
1
2
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
Message-id: 20200924151549.913737-11-stefanha@redhat.com
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
---
7
block/export/vhost-user-blk-server.c | 11 +++++++----
8
1 file changed, 7 insertions(+), 4 deletions(-)
9
10
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
11
index XXXXXXX..XXXXXXX 100644
12
--- a/block/export/vhost-user-blk-server.c
13
+++ b/block/export/vhost-user-blk-server.c
14
@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
15
return -EINVAL;
16
}
17
18
-static void coroutine_fn vu_block_flush(VuBlockReq *req)
19
+static int coroutine_fn vu_block_flush(VuBlockReq *req)
20
{
21
VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
22
BlockBackend *backend = vdev_blk->backend;
23
- blk_co_flush(backend);
24
+ return blk_co_flush(backend);
25
}
26
27
static void coroutine_fn vu_block_virtio_process_req(void *opaque)
28
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
29
break;
30
}
31
case VIRTIO_BLK_T_FLUSH:
32
- vu_block_flush(req);
33
- req->in->status = VIRTIO_BLK_S_OK;
34
+ if (vu_block_flush(req) == 0) {
35
+ req->in->status = VIRTIO_BLK_S_OK;
36
+ } else {
37
+ req->in->status = VIRTIO_BLK_S_IOERR;
38
+ }
39
break;
40
case VIRTIO_BLK_T_GET_ID: {
41
size_t size = MIN(iov_size(&elem->in_sg[0], in_num),
42
--
43
2.26.2
44
diff view generated by jsdifflib
New patch
1
1
Use the new QAPI block exports API instead of defining our own QOM
2
objects.
3
4
This is a large change because the lifecycle of VuBlockDev needs to
5
follow BlockExportDriver. QOM properties are replaced by QAPI options
6
objects.
7
8
VuBlockDev is renamed VuBlkExport and contains a BlockExport field.
9
Several fields can be dropped since BlockExport already has equivalents.
10
11
The file names and meson build integration will be adjusted in a future
12
patch. libvhost-user should probably be built as a static library that
13
is linked into QEMU instead of as a .c file that results in duplicate
14
compilation.
15
16
The new command-line syntax is:
17
18
$ qemu-storage-daemon \
19
--blockdev file,node-name=drive0,filename=test.img \
20
--export vhost-user-blk,node-name=drive0,id=export0,unix-socket=/tmp/vhost-user-blk.sock
21
22
Note that unix-socket is optional because we may wish to accept chardevs
23
too in the future.
24
25
Markus noted that supported address families are not explicit in the
26
QAPI schema. It is unlikely that support for more address families will
27
be added since file descriptor passing is required and few address
28
families support it. If a new address family needs to be added, then the
29
QAPI 'features' syntax can be used to advertize them.
30
31
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
32
Acked-by: Markus Armbruster <armbru@redhat.com>
33
Message-id: 20200924151549.913737-12-stefanha@redhat.com
34
[Skip test on big-endian host architectures because this device doesn't
35
support them yet (as already mentioned in a code comment).
36
--Stefan]
37
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
38
---
39
qapi/block-export.json | 21 +-
40
block/export/vhost-user-blk-server.h | 23 +-
41
block/export/export.c | 6 +
42
block/export/vhost-user-blk-server.c | 452 +++++++--------------------
43
util/vhost-user-server.c | 10 +-
44
block/export/meson.build | 1 +
45
block/meson.build | 1 -
46
7 files changed, 156 insertions(+), 358 deletions(-)
47
48
diff --git a/qapi/block-export.json b/qapi/block-export.json
49
index XXXXXXX..XXXXXXX 100644
50
--- a/qapi/block-export.json
51
+++ b/qapi/block-export.json
52
@@ -XXX,XX +XXX,XX @@
53
'data': { '*name': 'str', '*description': 'str',
54
'*bitmap': 'str' } }
55
56
+##
57
+# @BlockExportOptionsVhostUserBlk:
58
+#
59
+# A vhost-user-blk block export.
60
+#
61
+# @addr: The vhost-user socket on which to listen. Both 'unix' and 'fd'
62
+# SocketAddress types are supported. Passed fds must be UNIX domain
63
+# sockets.
64
+# @logical-block-size: Logical block size in bytes. Defaults to 512 bytes.
65
+#
66
+# Since: 5.2
67
+##
68
+{ 'struct': 'BlockExportOptionsVhostUserBlk',
69
+ 'data': { 'addr': 'SocketAddress', '*logical-block-size': 'size' } }
70
+
71
##
72
# @NbdServerAddOptions:
73
#
74
@@ -XXX,XX +XXX,XX @@
75
# An enumeration of block export types
76
#
77
# @nbd: NBD export
78
+# @vhost-user-blk: vhost-user-blk export (since 5.2)
79
#
80
# Since: 4.2
81
##
82
{ 'enum': 'BlockExportType',
83
- 'data': [ 'nbd' ] }
84
+ 'data': [ 'nbd', 'vhost-user-blk' ] }
85
86
##
87
# @BlockExportOptions:
88
@@ -XXX,XX +XXX,XX @@
89
'*writethrough': 'bool' },
90
'discriminator': 'type',
91
'data': {
92
- 'nbd': 'BlockExportOptionsNbd'
93
+ 'nbd': 'BlockExportOptionsNbd',
94
+ 'vhost-user-blk': 'BlockExportOptionsVhostUserBlk'
95
} }
96
97
##
98
diff --git a/block/export/vhost-user-blk-server.h b/block/export/vhost-user-blk-server.h
99
index XXXXXXX..XXXXXXX 100644
100
--- a/block/export/vhost-user-blk-server.h
101
+++ b/block/export/vhost-user-blk-server.h
102
@@ -XXX,XX +XXX,XX @@
103
104
#ifndef VHOST_USER_BLK_SERVER_H
105
#define VHOST_USER_BLK_SERVER_H
106
-#include "util/vhost-user-server.h"
107
108
-typedef struct VuBlockDev VuBlockDev;
109
-#define TYPE_VHOST_USER_BLK_SERVER "vhost-user-blk-server"
110
-#define VHOST_USER_BLK_SERVER(obj) \
111
- OBJECT_CHECK(VuBlockDev, obj, TYPE_VHOST_USER_BLK_SERVER)
112
+#include "block/export.h"
113
114
-/* vhost user block device */
115
-struct VuBlockDev {
116
- Object parent_obj;
117
- char *node_name;
118
- SocketAddress *addr;
119
- AioContext *ctx;
120
- VuServer vu_server;
121
- bool running;
122
- uint32_t blk_size;
123
- BlockBackend *backend;
124
- QIOChannelSocket *sioc;
125
- QTAILQ_ENTRY(VuBlockDev) next;
126
- struct virtio_blk_config blkcfg;
127
- bool writable;
128
-};
129
+/* For block/export/export.c */
130
+extern const BlockExportDriver blk_exp_vhost_user_blk;
131
132
#endif /* VHOST_USER_BLK_SERVER_H */
133
diff --git a/block/export/export.c b/block/export/export.c
134
index XXXXXXX..XXXXXXX 100644
135
--- a/block/export/export.c
136
+++ b/block/export/export.c
137
@@ -XXX,XX +XXX,XX @@
138
#include "sysemu/block-backend.h"
139
#include "block/export.h"
140
#include "block/nbd.h"
141
+#if CONFIG_LINUX
142
+#include "block/export/vhost-user-blk-server.h"
143
+#endif
144
#include "qapi/error.h"
145
#include "qapi/qapi-commands-block-export.h"
146
#include "qapi/qapi-events-block-export.h"
147
@@ -XXX,XX +XXX,XX @@
148
149
static const BlockExportDriver *blk_exp_drivers[] = {
150
&blk_exp_nbd,
151
+#if CONFIG_LINUX
152
+ &blk_exp_vhost_user_blk,
153
+#endif
154
};
155
156
/* Only accessed from the main thread */
157
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
158
index XXXXXXX..XXXXXXX 100644
159
--- a/block/export/vhost-user-blk-server.c
160
+++ b/block/export/vhost-user-blk-server.c
161
@@ -XXX,XX +XXX,XX @@
162
*/
163
#include "qemu/osdep.h"
164
#include "block/block.h"
165
+#include "contrib/libvhost-user/libvhost-user.h"
166
+#include "standard-headers/linux/virtio_blk.h"
167
+#include "util/vhost-user-server.h"
168
#include "vhost-user-blk-server.h"
169
#include "qapi/error.h"
170
#include "qom/object_interfaces.h"
171
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_inhdr {
172
unsigned char status;
173
};
174
175
-typedef struct VuBlockReq {
176
+typedef struct VuBlkReq {
177
VuVirtqElement elem;
178
int64_t sector_num;
179
size_t size;
180
@@ -XXX,XX +XXX,XX @@ typedef struct VuBlockReq {
181
struct virtio_blk_outhdr out;
182
VuServer *server;
183
struct VuVirtq *vq;
184
-} VuBlockReq;
185
+} VuBlkReq;
186
187
-static void vu_block_req_complete(VuBlockReq *req)
188
+/* vhost user block device */
189
+typedef struct {
190
+ BlockExport export;
191
+ VuServer vu_server;
192
+ uint32_t blk_size;
193
+ QIOChannelSocket *sioc;
194
+ struct virtio_blk_config blkcfg;
195
+ bool writable;
196
+} VuBlkExport;
197
+
198
+static void vu_blk_req_complete(VuBlkReq *req)
199
{
200
VuDev *vu_dev = &req->server->vu_dev;
201
202
@@ -XXX,XX +XXX,XX @@ static void vu_block_req_complete(VuBlockReq *req)
203
free(req);
204
}
205
206
-static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
207
-{
208
- return container_of(server, VuBlockDev, vu_server);
209
-}
210
-
211
static int coroutine_fn
212
-vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
213
- uint32_t iovcnt, uint32_t type)
214
+vu_blk_discard_write_zeroes(BlockBackend *blk, struct iovec *iov,
215
+ uint32_t iovcnt, uint32_t type)
216
{
217
struct virtio_blk_discard_write_zeroes desc;
218
ssize_t size = iov_to_buf(iov, iovcnt, 0, &desc, sizeof(desc));
219
@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
220
return -EINVAL;
221
}
222
223
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
224
uint64_t range[2] = { le64_to_cpu(desc.sector) << 9,
225
le32_to_cpu(desc.num_sectors) << 9 };
226
if (type == VIRTIO_BLK_T_DISCARD) {
227
- if (blk_co_pdiscard(vdev_blk->backend, range[0], range[1]) == 0) {
228
+ if (blk_co_pdiscard(blk, range[0], range[1]) == 0) {
229
return 0;
230
}
231
} else if (type == VIRTIO_BLK_T_WRITE_ZEROES) {
232
- if (blk_co_pwrite_zeroes(vdev_blk->backend,
233
- range[0], range[1], 0) == 0) {
234
+ if (blk_co_pwrite_zeroes(blk, range[0], range[1], 0) == 0) {
235
return 0;
236
}
237
}
238
@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
239
return -EINVAL;
240
}
241
242
-static int coroutine_fn vu_block_flush(VuBlockReq *req)
243
+static void coroutine_fn vu_blk_virtio_process_req(void *opaque)
244
{
245
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
246
- BlockBackend *backend = vdev_blk->backend;
247
- return blk_co_flush(backend);
248
-}
249
-
250
-static void coroutine_fn vu_block_virtio_process_req(void *opaque)
251
-{
252
- VuBlockReq *req = opaque;
253
+ VuBlkReq *req = opaque;
254
VuServer *server = req->server;
255
VuVirtqElement *elem = &req->elem;
256
uint32_t type;
257
258
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
259
- BlockBackend *backend = vdev_blk->backend;
260
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
261
+ BlockBackend *blk = vexp->export.blk;
262
263
struct iovec *in_iov = elem->in_sg;
264
struct iovec *out_iov = elem->out_sg;
265
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
266
bool is_write = type & VIRTIO_BLK_T_OUT;
267
req->sector_num = le64_to_cpu(req->out.sector);
268
269
- int64_t offset = req->sector_num * vdev_blk->blk_size;
270
+ if (is_write && !vexp->writable) {
271
+ req->in->status = VIRTIO_BLK_S_IOERR;
272
+ break;
273
+ }
274
+
275
+ int64_t offset = req->sector_num * vexp->blk_size;
276
QEMUIOVector qiov;
277
if (is_write) {
278
qemu_iovec_init_external(&qiov, out_iov, out_num);
279
- ret = blk_co_pwritev(backend, offset, qiov.size,
280
- &qiov, 0);
281
+ ret = blk_co_pwritev(blk, offset, qiov.size, &qiov, 0);
282
} else {
283
qemu_iovec_init_external(&qiov, in_iov, in_num);
284
- ret = blk_co_preadv(backend, offset, qiov.size,
285
- &qiov, 0);
286
+ ret = blk_co_preadv(blk, offset, qiov.size, &qiov, 0);
287
}
288
if (ret >= 0) {
289
req->in->status = VIRTIO_BLK_S_OK;
290
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
291
break;
292
}
293
case VIRTIO_BLK_T_FLUSH:
294
- if (vu_block_flush(req) == 0) {
295
+ if (blk_co_flush(blk) == 0) {
296
req->in->status = VIRTIO_BLK_S_OK;
297
} else {
298
req->in->status = VIRTIO_BLK_S_IOERR;
299
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
300
case VIRTIO_BLK_T_DISCARD:
301
case VIRTIO_BLK_T_WRITE_ZEROES: {
302
int rc;
303
- rc = vu_block_discard_write_zeroes(req, &elem->out_sg[1],
304
- out_num, type);
305
+
306
+ if (!vexp->writable) {
307
+ req->in->status = VIRTIO_BLK_S_IOERR;
308
+ break;
309
+ }
310
+
311
+ rc = vu_blk_discard_write_zeroes(blk, &elem->out_sg[1], out_num, type);
312
if (rc == 0) {
313
req->in->status = VIRTIO_BLK_S_OK;
314
} else {
315
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
316
break;
317
}
318
319
- vu_block_req_complete(req);
320
+ vu_blk_req_complete(req);
321
return;
322
323
err:
324
- free(elem);
325
+ free(req);
326
}
327
328
-static void vu_block_process_vq(VuDev *vu_dev, int idx)
329
+static void vu_blk_process_vq(VuDev *vu_dev, int idx)
330
{
331
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
332
VuVirtq *vq = vu_get_queue(vu_dev, idx);
333
334
while (1) {
335
- VuBlockReq *req;
336
+ VuBlkReq *req;
337
338
- req = vu_queue_pop(vu_dev, vq, sizeof(VuBlockReq));
339
+ req = vu_queue_pop(vu_dev, vq, sizeof(VuBlkReq));
340
if (!req) {
341
break;
342
}
343
@@ -XXX,XX +XXX,XX @@ static void vu_block_process_vq(VuDev *vu_dev, int idx)
344
req->vq = vq;
345
346
Coroutine *co =
347
- qemu_coroutine_create(vu_block_virtio_process_req, req);
348
+ qemu_coroutine_create(vu_blk_virtio_process_req, req);
349
qemu_coroutine_enter(co);
350
}
351
}
352
353
-static void vu_block_queue_set_started(VuDev *vu_dev, int idx, bool started)
354
+static void vu_blk_queue_set_started(VuDev *vu_dev, int idx, bool started)
355
{
356
VuVirtq *vq;
357
358
assert(vu_dev);
359
360
vq = vu_get_queue(vu_dev, idx);
361
- vu_set_queue_handler(vu_dev, vq, started ? vu_block_process_vq : NULL);
362
+ vu_set_queue_handler(vu_dev, vq, started ? vu_blk_process_vq : NULL);
363
}
364
365
-static uint64_t vu_block_get_features(VuDev *dev)
366
+static uint64_t vu_blk_get_features(VuDev *dev)
367
{
368
uint64_t features;
369
VuServer *server = container_of(dev, VuServer, vu_dev);
370
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
371
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
372
features = 1ull << VIRTIO_BLK_F_SIZE_MAX |
373
1ull << VIRTIO_BLK_F_SEG_MAX |
374
1ull << VIRTIO_BLK_F_TOPOLOGY |
375
@@ -XXX,XX +XXX,XX @@ static uint64_t vu_block_get_features(VuDev *dev)
376
1ull << VIRTIO_RING_F_EVENT_IDX |
377
1ull << VHOST_USER_F_PROTOCOL_FEATURES;
378
379
- if (!vdev_blk->writable) {
380
+ if (!vexp->writable) {
381
features |= 1ull << VIRTIO_BLK_F_RO;
382
}
383
384
return features;
385
}
386
387
-static uint64_t vu_block_get_protocol_features(VuDev *dev)
388
+static uint64_t vu_blk_get_protocol_features(VuDev *dev)
389
{
390
return 1ull << VHOST_USER_PROTOCOL_F_CONFIG |
391
1ull << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD;
392
}
393
394
static int
395
-vu_block_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
396
+vu_blk_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
397
{
398
+ /* TODO blkcfg must be little-endian for VIRTIO 1.0 */
399
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
400
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
401
- memcpy(config, &vdev_blk->blkcfg, len);
402
-
403
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
404
+ memcpy(config, &vexp->blkcfg, len);
405
return 0;
406
}
407
408
static int
409
-vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
410
+vu_blk_set_config(VuDev *vu_dev, const uint8_t *data,
411
uint32_t offset, uint32_t size, uint32_t flags)
412
{
413
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
414
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
415
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
416
uint8_t wce;
417
418
/* don't support live migration */
419
@@ -XXX,XX +XXX,XX @@ vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
420
}
421
422
wce = *data;
423
- vdev_blk->blkcfg.wce = wce;
424
- blk_set_enable_write_cache(vdev_blk->backend, wce);
425
+ vexp->blkcfg.wce = wce;
426
+ blk_set_enable_write_cache(vexp->export.blk, wce);
427
return 0;
428
}
429
430
@@ -XXX,XX +XXX,XX @@ vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
431
* of vu_process_message.
432
*
433
*/
434
-static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
435
+static int vu_blk_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
436
{
437
if (vmsg->request == VHOST_USER_NONE) {
438
dev->panic(dev, "disconnect");
439
@@ -XXX,XX +XXX,XX @@ static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
440
return false;
441
}
442
443
-static const VuDevIface vu_block_iface = {
444
- .get_features = vu_block_get_features,
445
- .queue_set_started = vu_block_queue_set_started,
446
- .get_protocol_features = vu_block_get_protocol_features,
447
- .get_config = vu_block_get_config,
448
- .set_config = vu_block_set_config,
449
- .process_msg = vu_block_process_msg,
450
+static const VuDevIface vu_blk_iface = {
451
+ .get_features = vu_blk_get_features,
452
+ .queue_set_started = vu_blk_queue_set_started,
453
+ .get_protocol_features = vu_blk_get_protocol_features,
454
+ .get_config = vu_blk_get_config,
455
+ .set_config = vu_blk_set_config,
456
+ .process_msg = vu_blk_process_msg,
457
};
458
459
static void blk_aio_attached(AioContext *ctx, void *opaque)
460
{
461
- VuBlockDev *vub_dev = opaque;
462
- vhost_user_server_attach_aio_context(&vub_dev->vu_server, ctx);
463
+ VuBlkExport *vexp = opaque;
464
+ vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
465
}
466
467
static void blk_aio_detach(void *opaque)
468
{
469
- VuBlockDev *vub_dev = opaque;
470
- vhost_user_server_detach_aio_context(&vub_dev->vu_server);
471
+ VuBlkExport *vexp = opaque;
472
+ vhost_user_server_detach_aio_context(&vexp->vu_server);
473
}
474
475
static void
476
-vu_block_initialize_config(BlockDriverState *bs,
477
+vu_blk_initialize_config(BlockDriverState *bs,
478
struct virtio_blk_config *config, uint32_t blk_size)
479
{
480
config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
481
@@ -XXX,XX +XXX,XX @@ vu_block_initialize_config(BlockDriverState *bs,
482
config->max_write_zeroes_seg = 1;
483
}
484
485
-static VuBlockDev *vu_block_init(VuBlockDev *vu_block_device, Error **errp)
486
+static void vu_blk_exp_request_shutdown(BlockExport *exp)
487
{
488
+ VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
489
490
- BlockBackend *blk;
491
- Error *local_error = NULL;
492
- const char *node_name = vu_block_device->node_name;
493
- bool writable = vu_block_device->writable;
494
- uint64_t perm = BLK_PERM_CONSISTENT_READ;
495
- int ret;
496
-
497
- AioContext *ctx;
498
-
499
- BlockDriverState *bs = bdrv_lookup_bs(node_name, node_name, &local_error);
500
-
501
- if (!bs) {
502
- error_propagate(errp, local_error);
503
- return NULL;
504
- }
505
-
506
- if (bdrv_is_read_only(bs)) {
507
- writable = false;
508
- }
509
-
510
- if (writable) {
511
- perm |= BLK_PERM_WRITE;
512
- }
513
-
514
- ctx = bdrv_get_aio_context(bs);
515
- aio_context_acquire(ctx);
516
- bdrv_invalidate_cache(bs, NULL);
517
- aio_context_release(ctx);
518
-
519
- /*
520
- * Don't allow resize while the vhost user server is running,
521
- * otherwise we don't care what happens with the node.
522
- */
523
- blk = blk_new(bdrv_get_aio_context(bs), perm,
524
- BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
525
- BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD);
526
- ret = blk_insert_bs(blk, bs, errp);
527
-
528
- if (ret < 0) {
529
- goto fail;
530
- }
531
-
532
- blk_set_enable_write_cache(blk, false);
533
-
534
- blk_set_allow_aio_context_change(blk, true);
535
-
536
- vu_block_device->blkcfg.wce = 0;
537
- vu_block_device->backend = blk;
538
- if (!vu_block_device->blk_size) {
539
- vu_block_device->blk_size = BDRV_SECTOR_SIZE;
540
- }
541
- vu_block_device->blkcfg.blk_size = vu_block_device->blk_size;
542
- blk_set_guest_block_size(blk, vu_block_device->blk_size);
543
- vu_block_initialize_config(bs, &vu_block_device->blkcfg,
544
- vu_block_device->blk_size);
545
- return vu_block_device;
546
-
547
-fail:
548
- blk_unref(blk);
549
- return NULL;
550
-}
551
-
552
-static void vu_block_deinit(VuBlockDev *vu_block_device)
553
-{
554
- if (vu_block_device->backend) {
555
- blk_remove_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
556
- blk_aio_detach, vu_block_device);
557
- }
558
-
559
- blk_unref(vu_block_device->backend);
560
-}
561
-
562
-static void vhost_user_blk_server_stop(VuBlockDev *vu_block_device)
563
-{
564
- vhost_user_server_stop(&vu_block_device->vu_server);
565
- vu_block_deinit(vu_block_device);
566
-}
567
-
568
-static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
569
- Error **errp)
570
-{
571
- AioContext *ctx;
572
- SocketAddress *addr = vu_block_device->addr;
573
-
574
- if (!vu_block_init(vu_block_device, errp)) {
575
- return;
576
- }
577
-
578
- ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
579
-
580
- if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
581
- VHOST_USER_BLK_MAX_QUEUES, &vu_block_iface,
582
- errp)) {
583
- goto error;
584
- }
585
-
586
- blk_add_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
587
- blk_aio_detach, vu_block_device);
588
- vu_block_device->running = true;
589
- return;
590
-
591
- error:
592
- vu_block_deinit(vu_block_device);
593
-}
594
-
595
-static bool vu_prop_modifiable(VuBlockDev *vus, Error **errp)
596
-{
597
- if (vus->running) {
598
- error_setg(errp, "The property can't be modified "
599
- "while the server is running");
600
- return false;
601
- }
602
- return true;
603
-}
604
-
605
-static void vu_set_node_name(Object *obj, const char *value, Error **errp)
606
-{
607
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
608
-
609
- if (!vu_prop_modifiable(vus, errp)) {
610
- return;
611
- }
612
-
613
- if (vus->node_name) {
614
- g_free(vus->node_name);
615
- }
616
-
617
- vus->node_name = g_strdup(value);
618
-}
619
-
620
-static char *vu_get_node_name(Object *obj, Error **errp)
621
-{
622
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
623
- return g_strdup(vus->node_name);
624
-}
625
-
626
-static void free_socket_addr(SocketAddress *addr)
627
-{
628
- g_free(addr->u.q_unix.path);
629
- g_free(addr);
630
-}
631
-
632
-static void vu_set_unix_socket(Object *obj, const char *value,
633
- Error **errp)
634
-{
635
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
636
-
637
- if (!vu_prop_modifiable(vus, errp)) {
638
- return;
639
- }
640
-
641
- if (vus->addr) {
642
- free_socket_addr(vus->addr);
643
- }
644
-
645
- SocketAddress *addr = g_new0(SocketAddress, 1);
646
- addr->type = SOCKET_ADDRESS_TYPE_UNIX;
647
- addr->u.q_unix.path = g_strdup(value);
648
- vus->addr = addr;
649
+ vhost_user_server_stop(&vexp->vu_server);
650
}
651
652
-static char *vu_get_unix_socket(Object *obj, Error **errp)
653
+static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
654
+ Error **errp)
655
{
656
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
657
- return g_strdup(vus->addr->u.q_unix.path);
658
-}
659
-
660
-static bool vu_get_block_writable(Object *obj, Error **errp)
661
-{
662
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
663
- return vus->writable;
664
-}
665
-
666
-static void vu_set_block_writable(Object *obj, bool value, Error **errp)
667
-{
668
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
669
-
670
- if (!vu_prop_modifiable(vus, errp)) {
671
- return;
672
- }
673
-
674
- vus->writable = value;
675
-}
676
-
677
-static void vu_get_blk_size(Object *obj, Visitor *v, const char *name,
678
- void *opaque, Error **errp)
679
-{
680
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
681
- uint32_t value = vus->blk_size;
682
-
683
- visit_type_uint32(v, name, &value, errp);
684
-}
685
-
686
-static void vu_set_blk_size(Object *obj, Visitor *v, const char *name,
687
- void *opaque, Error **errp)
688
-{
689
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
690
-
691
+ VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
692
+ BlockExportOptionsVhostUserBlk *vu_opts = &opts->u.vhost_user_blk;
693
Error *local_err = NULL;
694
- uint32_t value;
695
+ uint64_t logical_block_size;
696
697
- if (!vu_prop_modifiable(vus, errp)) {
698
- return;
699
- }
700
+ vexp->writable = opts->writable;
701
+ vexp->blkcfg.wce = 0;
702
703
- visit_type_uint32(v, name, &value, &local_err);
704
- if (local_err) {
705
- goto out;
706
+ if (vu_opts->has_logical_block_size) {
707
+ logical_block_size = vu_opts->logical_block_size;
708
+ } else {
709
+ logical_block_size = BDRV_SECTOR_SIZE;
710
}
711
-
712
- check_block_size(object_get_typename(obj), name, value, &local_err);
713
+ check_block_size(exp->id, "logical-block-size", logical_block_size,
714
+ &local_err);
715
if (local_err) {
716
- goto out;
717
+ error_propagate(errp, local_err);
718
+ return -EINVAL;
719
+ }
720
+ vexp->blk_size = logical_block_size;
721
+ blk_set_guest_block_size(exp->blk, logical_block_size);
722
+ vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
723
+ logical_block_size);
724
+
725
+ blk_set_allow_aio_context_change(exp->blk, true);
726
+ blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
727
+ vexp);
728
+
729
+ if (!vhost_user_server_start(&vexp->vu_server, vu_opts->addr, exp->ctx,
730
+ VHOST_USER_BLK_MAX_QUEUES, &vu_blk_iface,
731
+ errp)) {
732
+ blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
733
+ blk_aio_detach, vexp);
734
+ return -EADDRNOTAVAIL;
735
}
736
737
- vus->blk_size = value;
738
-
739
-out:
740
- error_propagate(errp, local_err);
741
-}
742
-
743
-static void vhost_user_blk_server_instance_finalize(Object *obj)
744
-{
745
- VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
746
-
747
- vhost_user_blk_server_stop(vub);
748
-
749
- /*
750
- * Unlike object_property_add_str, object_class_property_add_str
751
- * doesn't have a release method. Thus manual memory freeing is
752
- * needed.
753
- */
754
- free_socket_addr(vub->addr);
755
- g_free(vub->node_name);
756
-}
757
-
758
-static void vhost_user_blk_server_complete(UserCreatable *obj, Error **errp)
759
-{
760
- VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
761
-
762
- vhost_user_blk_server_start(vub, errp);
763
+ return 0;
764
}
765
766
-static void vhost_user_blk_server_class_init(ObjectClass *klass,
767
- void *class_data)
768
+static void vu_blk_exp_delete(BlockExport *exp)
769
{
770
- UserCreatableClass *ucc = USER_CREATABLE_CLASS(klass);
771
- ucc->complete = vhost_user_blk_server_complete;
772
-
773
- object_class_property_add_bool(klass, "writable",
774
- vu_get_block_writable,
775
- vu_set_block_writable);
776
-
777
- object_class_property_add_str(klass, "node-name",
778
- vu_get_node_name,
779
- vu_set_node_name);
780
-
781
- object_class_property_add_str(klass, "unix-socket",
782
- vu_get_unix_socket,
783
- vu_set_unix_socket);
784
+ VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
785
786
- object_class_property_add(klass, "logical-block-size", "uint32",
787
- vu_get_blk_size, vu_set_blk_size,
788
- NULL, NULL);
789
+ blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
790
+ vexp);
791
}
792
793
-static const TypeInfo vhost_user_blk_server_info = {
794
- .name = TYPE_VHOST_USER_BLK_SERVER,
795
- .parent = TYPE_OBJECT,
796
- .instance_size = sizeof(VuBlockDev),
797
- .instance_finalize = vhost_user_blk_server_instance_finalize,
798
- .class_init = vhost_user_blk_server_class_init,
799
- .interfaces = (InterfaceInfo[]) {
800
- {TYPE_USER_CREATABLE},
801
- {}
802
- },
803
+const BlockExportDriver blk_exp_vhost_user_blk = {
804
+ .type = BLOCK_EXPORT_TYPE_VHOST_USER_BLK,
805
+ .instance_size = sizeof(VuBlkExport),
806
+ .create = vu_blk_exp_create,
807
+ .delete = vu_blk_exp_delete,
808
+ .request_shutdown = vu_blk_exp_request_shutdown,
809
};
810
-
811
-static void vhost_user_blk_server_register_types(void)
812
-{
813
- type_register_static(&vhost_user_blk_server_info);
814
-}
815
-
816
-type_init(vhost_user_blk_server_register_types)
817
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
818
index XXXXXXX..XXXXXXX 100644
819
--- a/util/vhost-user-server.c
820
+++ b/util/vhost-user-server.c
821
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
822
Error **errp)
823
{
824
QEMUBH *bh;
825
- QIONetListener *listener = qio_net_listener_new();
826
+ QIONetListener *listener;
827
+
828
+ if (socket_addr->type != SOCKET_ADDRESS_TYPE_UNIX &&
829
+ socket_addr->type != SOCKET_ADDRESS_TYPE_FD) {
830
+ error_setg(errp, "Only socket address types 'unix' and 'fd' are supported");
831
+ return false;
832
+ }
833
+
834
+ listener = qio_net_listener_new();
835
if (qio_net_listener_open_sync(listener, socket_addr, 1,
836
errp) < 0) {
837
object_unref(OBJECT(listener));
838
diff --git a/block/export/meson.build b/block/export/meson.build
839
index XXXXXXX..XXXXXXX 100644
840
--- a/block/export/meson.build
841
+++ b/block/export/meson.build
842
@@ -1 +1,2 @@
843
block_ss.add(files('export.c'))
844
+block_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-blk-server.c', '../../contrib/libvhost-user/libvhost-user.c'))
845
diff --git a/block/meson.build b/block/meson.build
846
index XXXXXXX..XXXXXXX 100644
847
--- a/block/meson.build
848
+++ b/block/meson.build
849
@@ -XXX,XX +XXX,XX @@ block_ss.add(when: 'CONFIG_WIN32', if_true: files('file-win32.c', 'win32-aio.c')
850
block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, iokit])
851
block_ss.add(when: 'CONFIG_LIBISCSI', if_true: files('iscsi-opts.c'))
852
block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c'))
853
-block_ss.add(when: 'CONFIG_LINUX', if_true: files('export/vhost-user-blk-server.c', '../contrib/libvhost-user/libvhost-user.c'))
854
block_ss.add(when: 'CONFIG_REPLICATION', if_true: files('replication.c'))
855
block_ss.add(when: 'CONFIG_SHEEPDOG', if_true: files('sheepdog.c'))
856
block_ss.add(when: ['CONFIG_LINUX_AIO', libaio], if_true: files('linux-aio.c'))
857
--
858
2.26.2
859
diff view generated by jsdifflib
New patch
1
Headers used by other subsystems are located in include/. Also add the
2
vhost-user-server and vhost-user-blk-server headers to MAINTAINERS.
1
3
4
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Message-id: 20200924151549.913737-13-stefanha@redhat.com
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
---
8
MAINTAINERS | 4 +++-
9
{util => include/qemu}/vhost-user-server.h | 0
10
block/export/vhost-user-blk-server.c | 2 +-
11
util/vhost-user-server.c | 2 +-
12
4 files changed, 5 insertions(+), 3 deletions(-)
13
rename {util => include/qemu}/vhost-user-server.h (100%)
14
15
diff --git a/MAINTAINERS b/MAINTAINERS
16
index XXXXXXX..XXXXXXX 100644
17
--- a/MAINTAINERS
18
+++ b/MAINTAINERS
19
@@ -XXX,XX +XXX,XX @@ Vhost-user block device backend server
20
M: Coiby Xu <Coiby.Xu@gmail.com>
21
S: Maintained
22
F: block/export/vhost-user-blk-server.c
23
-F: util/vhost-user-server.c
24
+F: block/export/vhost-user-blk-server.h
25
+F: include/qemu/vhost-user-server.h
26
F: tests/qtest/libqos/vhost-user-blk.c
27
+F: util/vhost-user-server.c
28
29
Replication
30
M: Wen Congyang <wencongyang2@huawei.com>
31
diff --git a/util/vhost-user-server.h b/include/qemu/vhost-user-server.h
32
similarity index 100%
33
rename from util/vhost-user-server.h
34
rename to include/qemu/vhost-user-server.h
35
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
36
index XXXXXXX..XXXXXXX 100644
37
--- a/block/export/vhost-user-blk-server.c
38
+++ b/block/export/vhost-user-blk-server.c
39
@@ -XXX,XX +XXX,XX @@
40
#include "block/block.h"
41
#include "contrib/libvhost-user/libvhost-user.h"
42
#include "standard-headers/linux/virtio_blk.h"
43
-#include "util/vhost-user-server.h"
44
+#include "qemu/vhost-user-server.h"
45
#include "vhost-user-blk-server.h"
46
#include "qapi/error.h"
47
#include "qom/object_interfaces.h"
48
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
49
index XXXXXXX..XXXXXXX 100644
50
--- a/util/vhost-user-server.c
51
+++ b/util/vhost-user-server.c
52
@@ -XXX,XX +XXX,XX @@
53
*/
54
#include "qemu/osdep.h"
55
#include "qemu/main-loop.h"
56
+#include "qemu/vhost-user-server.h"
57
#include "block/aio-wait.h"
58
-#include "vhost-user-server.h"
59
60
/*
61
* Theory of operation:
62
--
63
2.26.2
64
diff view generated by jsdifflib
New patch
1
Don't compile contrib/libvhost-user/libvhost-user.c again. Instead build
2
the static library once and then reuse it throughout QEMU.
1
3
4
Also switch from CONFIG_LINUX to CONFIG_VHOST_USER, which is what the
5
vhost-user tools (vhost-user-gpu, etc) do.
6
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Message-id: 20200924151549.913737-14-stefanha@redhat.com
9
[Added CONFIG_LINUX again because libvhost-user doesn't build on macOS.
10
--Stefan]
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
13
block/export/export.c | 8 ++++----
14
block/export/meson.build | 2 +-
15
contrib/libvhost-user/meson.build | 1 +
16
meson.build | 6 +++++-
17
util/meson.build | 4 +++-
18
5 files changed, 14 insertions(+), 7 deletions(-)
19
20
diff --git a/block/export/export.c b/block/export/export.c
21
index XXXXXXX..XXXXXXX 100644
22
--- a/block/export/export.c
23
+++ b/block/export/export.c
24
@@ -XXX,XX +XXX,XX @@
25
#include "sysemu/block-backend.h"
26
#include "block/export.h"
27
#include "block/nbd.h"
28
-#if CONFIG_LINUX
29
-#include "block/export/vhost-user-blk-server.h"
30
-#endif
31
#include "qapi/error.h"
32
#include "qapi/qapi-commands-block-export.h"
33
#include "qapi/qapi-events-block-export.h"
34
#include "qemu/id.h"
35
+#ifdef CONFIG_VHOST_USER
36
+#include "vhost-user-blk-server.h"
37
+#endif
38
39
static const BlockExportDriver *blk_exp_drivers[] = {
40
&blk_exp_nbd,
41
-#if CONFIG_LINUX
42
+#ifdef CONFIG_VHOST_USER
43
&blk_exp_vhost_user_blk,
44
#endif
45
};
46
diff --git a/block/export/meson.build b/block/export/meson.build
47
index XXXXXXX..XXXXXXX 100644
48
--- a/block/export/meson.build
49
+++ b/block/export/meson.build
50
@@ -XXX,XX +XXX,XX @@
51
block_ss.add(files('export.c'))
52
-block_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-blk-server.c', '../../contrib/libvhost-user/libvhost-user.c'))
53
+block_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
54
diff --git a/contrib/libvhost-user/meson.build b/contrib/libvhost-user/meson.build
55
index XXXXXXX..XXXXXXX 100644
56
--- a/contrib/libvhost-user/meson.build
57
+++ b/contrib/libvhost-user/meson.build
58
@@ -XXX,XX +XXX,XX @@
59
libvhost_user = static_library('vhost-user',
60
files('libvhost-user.c', 'libvhost-user-glib.c'),
61
build_by_default: false)
62
+vhost_user = declare_dependency(link_with: libvhost_user)
63
diff --git a/meson.build b/meson.build
64
index XXXXXXX..XXXXXXX 100644
65
--- a/meson.build
66
+++ b/meson.build
67
@@ -XXX,XX +XXX,XX @@ trace_events_subdirs += [
68
'util',
69
]
70
71
+vhost_user = not_found
72
+if 'CONFIG_VHOST_USER' in config_host
73
+ subdir('contrib/libvhost-user')
74
+endif
75
+
76
subdir('qapi')
77
subdir('qobject')
78
subdir('stubs')
79
@@ -XXX,XX +XXX,XX @@ if have_tools
80
install: true)
81
82
if 'CONFIG_VHOST_USER' in config_host
83
- subdir('contrib/libvhost-user')
84
subdir('contrib/vhost-user-blk')
85
subdir('contrib/vhost-user-gpu')
86
subdir('contrib/vhost-user-input')
87
diff --git a/util/meson.build b/util/meson.build
88
index XXXXXXX..XXXXXXX 100644
89
--- a/util/meson.build
90
+++ b/util/meson.build
91
@@ -XXX,XX +XXX,XX @@ if have_block
92
util_ss.add(files('main-loop.c'))
93
util_ss.add(files('nvdimm-utils.c'))
94
util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
95
- util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
96
+ util_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: [
97
+ files('vhost-user-server.c'), vhost_user
98
+ ])
99
util_ss.add(files('block-helpers.c'))
100
util_ss.add(files('qemu-coroutine-sleep.c'))
101
util_ss.add(files('qemu-co-shared-resource.c'))
102
--
103
2.26.2
104
diff view generated by jsdifflib
New patch
1
Introduce libblkdev.fa to avoid recompiling blockdev_ss twice.
1
2
3
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
4
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Message-id: 20200929125516.186715-3-stefanha@redhat.com
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
---
9
meson.build | 12 ++++++++++--
10
storage-daemon/meson.build | 3 +--
11
2 files changed, 11 insertions(+), 4 deletions(-)
12
13
diff --git a/meson.build b/meson.build
14
index XXXXXXX..XXXXXXX 100644
15
--- a/meson.build
16
+++ b/meson.build
17
@@ -XXX,XX +XXX,XX @@ blockdev_ss.add(files(
18
# os-win32.c does not
19
blockdev_ss.add(when: 'CONFIG_POSIX', if_true: files('os-posix.c'))
20
softmmu_ss.add(when: 'CONFIG_WIN32', if_true: [files('os-win32.c')])
21
-softmmu_ss.add_all(blockdev_ss)
22
23
common_ss.add(files('cpus-common.c'))
24
25
@@ -XXX,XX +XXX,XX @@ block = declare_dependency(link_whole: [libblock],
26
link_args: '@block.syms',
27
dependencies: [crypto, io])
28
29
+blockdev_ss = blockdev_ss.apply(config_host, strict: false)
30
+libblockdev = static_library('blockdev', blockdev_ss.sources() + genh,
31
+ dependencies: blockdev_ss.dependencies(),
32
+ name_suffix: 'fa',
33
+ build_by_default: false)
34
+
35
+blockdev = declare_dependency(link_whole: [libblockdev],
36
+ dependencies: [block])
37
+
38
qmp_ss = qmp_ss.apply(config_host, strict: false)
39
libqmp = static_library('qmp', qmp_ss.sources() + genh,
40
dependencies: qmp_ss.dependencies(),
41
@@ -XXX,XX +XXX,XX @@ foreach m : block_mods + softmmu_mods
42
install_dir: config_host['qemu_moddir'])
43
endforeach
44
45
-softmmu_ss.add(authz, block, chardev, crypto, io, qmp)
46
+softmmu_ss.add(authz, blockdev, chardev, crypto, io, qmp)
47
common_ss.add(qom, qemuutil)
48
49
common_ss.add_all(when: 'CONFIG_SOFTMMU', if_true: [softmmu_ss])
50
diff --git a/storage-daemon/meson.build b/storage-daemon/meson.build
51
index XXXXXXX..XXXXXXX 100644
52
--- a/storage-daemon/meson.build
53
+++ b/storage-daemon/meson.build
54
@@ -XXX,XX +XXX,XX @@
55
qsd_ss = ss.source_set()
56
qsd_ss.add(files('qemu-storage-daemon.c'))
57
-qsd_ss.add(block, chardev, qmp, qom, qemuutil)
58
-qsd_ss.add_all(blockdev_ss)
59
+qsd_ss.add(blockdev, chardev, qmp, qom, qemuutil)
60
61
subdir('qapi')
62
63
--
64
2.26.2
65
diff view generated by jsdifflib
New patch
1
Block exports are used by softmmu, qemu-storage-daemon, and qemu-nbd.
2
They are not used by other programs and are not otherwise needed in
3
libblock.
1
4
5
Undo the recent move of blockdev-nbd.c from blockdev_ss into block_ss.
6
Since bdrv_close_all() (libblock) calls blk_exp_close_all()
7
(libblockdev) a stub function is required..
8
9
Make qemu-nbd.c use signal handling utility functions instead of
10
duplicating the code. This helps because os-posix.c is in libblockdev
11
and it depends on a qemu_system_killed() symbol that qemu-nbd.c lacks.
12
Once we use the signal handling utility functions we also end up
13
providing the necessary symbol.
14
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
16
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
17
Reviewed-by: Eric Blake <eblake@redhat.com>
18
Message-id: 20200929125516.186715-4-stefanha@redhat.com
19
[Fixed s/ndb/nbd/ typo in commit description as suggested by Eric Blake
20
--Stefan]
21
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
22
---
23
qemu-nbd.c | 21 ++++++++-------------
24
stubs/blk-exp-close-all.c | 7 +++++++
25
block/export/meson.build | 4 ++--
26
meson.build | 4 ++--
27
nbd/meson.build | 2 ++
28
stubs/meson.build | 1 +
29
6 files changed, 22 insertions(+), 17 deletions(-)
30
create mode 100644 stubs/blk-exp-close-all.c
31
32
diff --git a/qemu-nbd.c b/qemu-nbd.c
33
index XXXXXXX..XXXXXXX 100644
34
--- a/qemu-nbd.c
35
+++ b/qemu-nbd.c
36
@@ -XXX,XX +XXX,XX @@
37
#include "qapi/error.h"
38
#include "qemu/cutils.h"
39
#include "sysemu/block-backend.h"
40
+#include "sysemu/runstate.h" /* for qemu_system_killed() prototype */
41
#include "block/block_int.h"
42
#include "block/nbd.h"
43
#include "qemu/main-loop.h"
44
@@ -XXX,XX +XXX,XX @@ QEMU_COPYRIGHT "\n"
45
}
46
47
#ifdef CONFIG_POSIX
48
-static void termsig_handler(int signum)
49
+/*
50
+ * The client thread uses SIGTERM to interrupt the server. A signal
51
+ * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
52
+ */
53
+void qemu_system_killed(int signum, pid_t pid)
54
{
55
qatomic_cmpxchg(&state, RUNNING, TERMINATE);
56
qemu_notify_event();
57
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
58
BlockExportOptions *export_opts;
59
60
#ifdef CONFIG_POSIX
61
- /*
62
- * Exit gracefully on various signals, which includes SIGTERM used
63
- * by 'qemu-nbd -v -c'.
64
- */
65
- struct sigaction sa_sigterm;
66
- memset(&sa_sigterm, 0, sizeof(sa_sigterm));
67
- sa_sigterm.sa_handler = termsig_handler;
68
- sigaction(SIGTERM, &sa_sigterm, NULL);
69
- sigaction(SIGINT, &sa_sigterm, NULL);
70
- sigaction(SIGHUP, &sa_sigterm, NULL);
71
-
72
- signal(SIGPIPE, SIG_IGN);
73
+ os_setup_early_signal_handling();
74
+ os_setup_signal_handling();
75
#endif
76
77
socket_init();
78
diff --git a/stubs/blk-exp-close-all.c b/stubs/blk-exp-close-all.c
79
new file mode 100644
80
index XXXXXXX..XXXXXXX
81
--- /dev/null
82
+++ b/stubs/blk-exp-close-all.c
83
@@ -XXX,XX +XXX,XX @@
84
+#include "qemu/osdep.h"
85
+#include "block/export.h"
86
+
87
+/* Only used in programs that support block exports (libblockdev.fa) */
88
+void blk_exp_close_all(void)
89
+{
90
+}
91
diff --git a/block/export/meson.build b/block/export/meson.build
92
index XXXXXXX..XXXXXXX 100644
93
--- a/block/export/meson.build
94
+++ b/block/export/meson.build
95
@@ -XXX,XX +XXX,XX @@
96
-block_ss.add(files('export.c'))
97
-block_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
98
+blockdev_ss.add(files('export.c'))
99
+blockdev_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
100
diff --git a/meson.build b/meson.build
101
index XXXXXXX..XXXXXXX 100644
102
--- a/meson.build
103
+++ b/meson.build
104
@@ -XXX,XX +XXX,XX @@ subdir('dump')
105
106
block_ss.add(files(
107
'block.c',
108
- 'blockdev-nbd.c',
109
'blockjob.c',
110
'job.c',
111
'qemu-io-cmds.c',
112
@@ -XXX,XX +XXX,XX @@ subdir('block')
113
114
blockdev_ss.add(files(
115
'blockdev.c',
116
+ 'blockdev-nbd.c',
117
'iothread.c',
118
'job-qmp.c',
119
))
120
@@ -XXX,XX +XXX,XX @@ if have_tools
121
qemu_io = executable('qemu-io', files('qemu-io.c'),
122
dependencies: [block, qemuutil], install: true)
123
qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'),
124
- dependencies: [block, qemuutil], install: true)
125
+ dependencies: [blockdev, qemuutil], install: true)
126
127
subdir('storage-daemon')
128
subdir('contrib/rdmacm-mux')
129
diff --git a/nbd/meson.build b/nbd/meson.build
130
index XXXXXXX..XXXXXXX 100644
131
--- a/nbd/meson.build
132
+++ b/nbd/meson.build
133
@@ -XXX,XX +XXX,XX @@
134
block_ss.add(files(
135
'client.c',
136
'common.c',
137
+))
138
+blockdev_ss.add(files(
139
'server.c',
140
))
141
diff --git a/stubs/meson.build b/stubs/meson.build
142
index XXXXXXX..XXXXXXX 100644
143
--- a/stubs/meson.build
144
+++ b/stubs/meson.build
145
@@ -XXX,XX +XXX,XX @@
146
stub_ss.add(files('arch_type.c'))
147
stub_ss.add(files('bdrv-next-monitor-owned.c'))
148
stub_ss.add(files('blk-commit-all.c'))
149
+stub_ss.add(files('blk-exp-close-all.c'))
150
stub_ss.add(files('blockdev-close-all-bdrv-states.c'))
151
stub_ss.add(files('change-state-handler.c'))
152
stub_ss.add(files('cmos.c'))
153
--
154
2.26.2
155
diff view generated by jsdifflib
New patch
1
Make it possible to specify the iothread where the export will run. By
2
default the block node can be moved to other AioContexts later and the
3
export will follow. The fixed-iothread option forces strict behavior
4
that prevents changing AioContext while the export is active. See the
5
QAPI docs for details.
1
6
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Message-id: 20200929125516.186715-5-stefanha@redhat.com
9
[Fix stray '#' character in block-export.json and add missing "(since:
10
5.2)" as suggested by Eric Blake.
11
--Stefan]
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
---
14
qapi/block-export.json | 11 ++++++++++
15
block/export/export.c | 31 +++++++++++++++++++++++++++-
16
block/export/vhost-user-blk-server.c | 5 ++++-
17
nbd/server.c | 2 --
18
4 files changed, 45 insertions(+), 4 deletions(-)
19
20
diff --git a/qapi/block-export.json b/qapi/block-export.json
21
index XXXXXXX..XXXXXXX 100644
22
--- a/qapi/block-export.json
23
+++ b/qapi/block-export.json
24
@@ -XXX,XX +XXX,XX @@
25
# export before completion is signalled. (since: 5.2;
26
# default: false)
27
#
28
+# @iothread: The name of the iothread object where the export will run. The
29
+# default is to use the thread currently associated with the
30
+# block node. (since: 5.2)
31
+#
32
+# @fixed-iothread: True prevents the block node from being moved to another
33
+# thread while the export is active. If true and @iothread is
34
+# given, export creation fails if the block node cannot be
35
+# moved to the iothread. The default is false. (since: 5.2)
36
+#
37
# Since: 4.2
38
##
39
{ 'union': 'BlockExportOptions',
40
'base': { 'type': 'BlockExportType',
41
'id': 'str',
42
+     '*fixed-iothread': 'bool',
43
+     '*iothread': 'str',
44
'node-name': 'str',
45
'*writable': 'bool',
46
'*writethrough': 'bool' },
47
diff --git a/block/export/export.c b/block/export/export.c
48
index XXXXXXX..XXXXXXX 100644
49
--- a/block/export/export.c
50
+++ b/block/export/export.c
51
@@ -XXX,XX +XXX,XX @@
52
53
#include "block/block.h"
54
#include "sysemu/block-backend.h"
55
+#include "sysemu/iothread.h"
56
#include "block/export.h"
57
#include "block/nbd.h"
58
#include "qapi/error.h"
59
@@ -XXX,XX +XXX,XX @@ static const BlockExportDriver *blk_exp_find_driver(BlockExportType type)
60
61
BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
62
{
63
+ bool fixed_iothread = export->has_fixed_iothread && export->fixed_iothread;
64
const BlockExportDriver *drv;
65
BlockExport *exp = NULL;
66
BlockDriverState *bs;
67
- BlockBackend *blk;
68
+ BlockBackend *blk = NULL;
69
AioContext *ctx;
70
uint64_t perm;
71
int ret;
72
@@ -XXX,XX +XXX,XX @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
73
ctx = bdrv_get_aio_context(bs);
74
aio_context_acquire(ctx);
75
76
+ if (export->has_iothread) {
77
+ IOThread *iothread;
78
+ AioContext *new_ctx;
79
+
80
+ iothread = iothread_by_id(export->iothread);
81
+ if (!iothread) {
82
+ error_setg(errp, "iothread \"%s\" not found", export->iothread);
83
+ goto fail;
84
+ }
85
+
86
+ new_ctx = iothread_get_aio_context(iothread);
87
+
88
+ ret = bdrv_try_set_aio_context(bs, new_ctx, errp);
89
+ if (ret == 0) {
90
+ aio_context_release(ctx);
91
+ aio_context_acquire(new_ctx);
92
+ ctx = new_ctx;
93
+ } else if (fixed_iothread) {
94
+ goto fail;
95
+ }
96
+ }
97
+
98
/*
99
* Block exports are used for non-shared storage migration. Make sure
100
* that BDRV_O_INACTIVE is cleared and the image is ready for write
101
@@ -XXX,XX +XXX,XX @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
102
}
103
104
blk = blk_new(ctx, perm, BLK_PERM_ALL);
105
+
106
+ if (!fixed_iothread) {
107
+ blk_set_allow_aio_context_change(blk, true);
108
+ }
109
+
110
ret = blk_insert_bs(blk, bs, errp);
111
if (ret < 0) {
112
goto fail;
113
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
114
index XXXXXXX..XXXXXXX 100644
115
--- a/block/export/vhost-user-blk-server.c
116
+++ b/block/export/vhost-user-blk-server.c
117
@@ -XXX,XX +XXX,XX @@ static const VuDevIface vu_blk_iface = {
118
static void blk_aio_attached(AioContext *ctx, void *opaque)
119
{
120
VuBlkExport *vexp = opaque;
121
+
122
+ vexp->export.ctx = ctx;
123
vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
124
}
125
126
static void blk_aio_detach(void *opaque)
127
{
128
VuBlkExport *vexp = opaque;
129
+
130
vhost_user_server_detach_aio_context(&vexp->vu_server);
131
+ vexp->export.ctx = NULL;
132
}
133
134
static void
135
@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
136
vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
137
logical_block_size);
138
139
- blk_set_allow_aio_context_change(exp->blk, true);
140
blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
141
vexp);
142
143
diff --git a/nbd/server.c b/nbd/server.c
144
index XXXXXXX..XXXXXXX 100644
145
--- a/nbd/server.c
146
+++ b/nbd/server.c
147
@@ -XXX,XX +XXX,XX @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
148
return ret;
149
}
150
151
- blk_set_allow_aio_context_change(blk, true);
152
-
153
QTAILQ_INIT(&exp->clients);
154
exp->name = g_strdup(arg->name);
155
exp->description = g_strdup(arg->description);
156
--
157
2.26.2
158
diff view generated by jsdifflib
New patch
1
Allow the number of queues to be configured using --export
2
vhost-user-blk,num-queues=N. This setting should match the QEMU --device
3
vhost-user-blk-pci,num-queues=N setting but QEMU vhost-user-blk.c lowers
4
its own value if the vhost-user-blk backend offers fewer queues than
5
QEMU.
1
6
7
The vhost-user-blk-server.c code is already capable of multi-queue. All
8
virtqueue processing runs in the same AioContext. No new locking is
9
needed.
10
11
Add the num-queues=N option and set the VIRTIO_BLK_F_MQ feature bit.
12
Note that the feature bit only announces the presence of the num_queues
13
configuration space field. It does not promise that there is more than 1
14
virtqueue, so we can set it unconditionally.
15
16
I tested multi-queue by running a random read fio test with numjobs=4 on
17
an -smp 4 guest. After the benchmark finished the guest /proc/interrupts
18
file showed activity on all 4 virtio-blk MSI-X. The /sys/block/vda/mq/
19
directory shows that Linux blk-mq has 4 queues configured.
20
21
An automated test is included in the next commit.
22
23
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
24
Acked-by: Markus Armbruster <armbru@redhat.com>
25
Message-id: 20201001144604.559733-2-stefanha@redhat.com
26
[Fixed accidental tab characters as suggested by Markus Armbruster
27
--Stefan]
28
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
29
---
30
qapi/block-export.json | 10 +++++++---
31
block/export/vhost-user-blk-server.c | 24 ++++++++++++++++++------
32
2 files changed, 25 insertions(+), 9 deletions(-)
33
34
diff --git a/qapi/block-export.json b/qapi/block-export.json
35
index XXXXXXX..XXXXXXX 100644
36
--- a/qapi/block-export.json
37
+++ b/qapi/block-export.json
38
@@ -XXX,XX +XXX,XX @@
39
# SocketAddress types are supported. Passed fds must be UNIX domain
40
# sockets.
41
# @logical-block-size: Logical block size in bytes. Defaults to 512 bytes.
42
+# @num-queues: Number of request virtqueues. Must be greater than 0. Defaults
43
+# to 1.
44
#
45
# Since: 5.2
46
##
47
{ 'struct': 'BlockExportOptionsVhostUserBlk',
48
- 'data': { 'addr': 'SocketAddress', '*logical-block-size': 'size' } }
49
+ 'data': { 'addr': 'SocketAddress',
50
+     '*logical-block-size': 'size',
51
+ '*num-queues': 'uint16'} }
52
53
##
54
# @NbdServerAddOptions:
55
@@ -XXX,XX +XXX,XX @@
56
{ 'union': 'BlockExportOptions',
57
'base': { 'type': 'BlockExportType',
58
'id': 'str',
59
-     '*fixed-iothread': 'bool',
60
-     '*iothread': 'str',
61
+ '*fixed-iothread': 'bool',
62
+ '*iothread': 'str',
63
'node-name': 'str',
64
'*writable': 'bool',
65
'*writethrough': 'bool' },
66
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
67
index XXXXXXX..XXXXXXX 100644
68
--- a/block/export/vhost-user-blk-server.c
69
+++ b/block/export/vhost-user-blk-server.c
70
@@ -XXX,XX +XXX,XX @@
71
#include "util/block-helpers.h"
72
73
enum {
74
- VHOST_USER_BLK_MAX_QUEUES = 1,
75
+ VHOST_USER_BLK_NUM_QUEUES_DEFAULT = 1,
76
};
77
struct virtio_blk_inhdr {
78
unsigned char status;
79
@@ -XXX,XX +XXX,XX @@ static uint64_t vu_blk_get_features(VuDev *dev)
80
1ull << VIRTIO_BLK_F_DISCARD |
81
1ull << VIRTIO_BLK_F_WRITE_ZEROES |
82
1ull << VIRTIO_BLK_F_CONFIG_WCE |
83
+ 1ull << VIRTIO_BLK_F_MQ |
84
1ull << VIRTIO_F_VERSION_1 |
85
1ull << VIRTIO_RING_F_INDIRECT_DESC |
86
1ull << VIRTIO_RING_F_EVENT_IDX |
87
@@ -XXX,XX +XXX,XX @@ static void blk_aio_detach(void *opaque)
88
89
static void
90
vu_blk_initialize_config(BlockDriverState *bs,
91
- struct virtio_blk_config *config, uint32_t blk_size)
92
+ struct virtio_blk_config *config,
93
+ uint32_t blk_size,
94
+ uint16_t num_queues)
95
{
96
config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
97
config->blk_size = blk_size;
98
@@ -XXX,XX +XXX,XX @@ vu_blk_initialize_config(BlockDriverState *bs,
99
config->seg_max = 128 - 2;
100
config->min_io_size = 1;
101
config->opt_io_size = 1;
102
- config->num_queues = VHOST_USER_BLK_MAX_QUEUES;
103
+ config->num_queues = num_queues;
104
config->max_discard_sectors = 32768;
105
config->max_discard_seg = 1;
106
config->discard_sector_alignment = config->blk_size >> 9;
107
@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
108
BlockExportOptionsVhostUserBlk *vu_opts = &opts->u.vhost_user_blk;
109
Error *local_err = NULL;
110
uint64_t logical_block_size;
111
+ uint16_t num_queues = VHOST_USER_BLK_NUM_QUEUES_DEFAULT;
112
113
vexp->writable = opts->writable;
114
vexp->blkcfg.wce = 0;
115
@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
116
}
117
vexp->blk_size = logical_block_size;
118
blk_set_guest_block_size(exp->blk, logical_block_size);
119
+
120
+ if (vu_opts->has_num_queues) {
121
+ num_queues = vu_opts->num_queues;
122
+ }
123
+ if (num_queues == 0) {
124
+ error_setg(errp, "num-queues must be greater than 0");
125
+ return -EINVAL;
126
+ }
127
+
128
vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
129
- logical_block_size);
130
+ logical_block_size, num_queues);
131
132
blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
133
vexp);
134
135
if (!vhost_user_server_start(&vexp->vu_server, vu_opts->addr, exp->ctx,
136
- VHOST_USER_BLK_MAX_QUEUES, &vu_blk_iface,
137
- errp)) {
138
+ num_queues, &vu_blk_iface, errp)) {
139
blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
140
blk_aio_detach, vexp);
141
return -EADDRNOTAVAIL;
142
--
143
2.26.2
144
diff view generated by jsdifflib
1
From: Kashyap Chamarthy <kchamart@redhat.com>
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2
2
3
This is part of the on-going effort to convert QEMU upstream
3
bdrv_co_block_status_above has several design problems with handling
4
documentation syntax to reStructuredText (rST).
4
short backing files:
5
5
6
The conversion to rST was done using:
6
1. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but
7
without BDRV_BLOCK_ALLOCATED flag, when actually short backing file
8
which produces these after-EOF zeros is inside requested backing
9
sequence.
7
10
8
$ pandoc -f markdown -t rst bitmaps.md -o bitmaps.rst
11
2. With want_zero=false, it may return pnum=0 prior to actual EOF,
12
because of EOF of short backing file.
9
13
10
Then, make a couple of small syntactical adjustments. While at it,
14
Fix these things, making logic about short backing files clearer.
11
reword a statement to avoid ambiguity. Addressing the feedback from
12
this thread:
13
15
14
https://lists.nongnu.org/archive/html/qemu-devel/2017-06/msg05428.html
16
With fixed bdrv_block_status_above we also have to improve is_zero in
17
qcow2 code, otherwise iotest 154 will fail, because with this patch we
18
stop to merge zeros of different types (produced by fully unallocated
19
in the whole backing chain regions vs produced by short backing files).
15
20
16
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
21
Note also, that this patch leaves for another day the general problem
17
Reviewed-by: John Snow <jsnow@redhat.com>
22
around block-status: misuse of BDRV_BLOCK_ALLOCATED as is-fs-allocated
23
vs go-to-backing.
24
25
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
26
Reviewed-by: Alberto Garcia <berto@igalia.com>
18
Reviewed-by: Eric Blake <eblake@redhat.com>
27
Reviewed-by: Eric Blake <eblake@redhat.com>
19
Message-id: 20170717105205.32639-2-kchamart@redhat.com
28
Message-id: 20200924194003.22080-2-vsementsov@virtuozzo.com
20
Signed-off-by: Jeff Cody <jcody@redhat.com>
29
[Fix s/comes/come/ as suggested by Eric Blake
30
--Stefan]
31
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
21
---
32
---
22
docs/devel/bitmaps.md | 505 ------------------------------------------
33
block/io.c | 68 ++++++++++++++++++++++++++++++++++++++++-----------
23
docs/interop/bitmaps.rst | 555 +++++++++++++++++++++++++++++++++++++++++++++++
34
block/qcow2.c | 16 ++++++++++--
24
2 files changed, 555 insertions(+), 505 deletions(-)
35
2 files changed, 68 insertions(+), 16 deletions(-)
25
delete mode 100644 docs/devel/bitmaps.md
26
create mode 100644 docs/interop/bitmaps.rst
27
36
28
diff --git a/docs/devel/bitmaps.md b/docs/devel/bitmaps.md
37
diff --git a/block/io.c b/block/io.c
29
deleted file mode 100644
38
index XXXXXXX..XXXXXXX 100644
30
index XXXXXXX..XXXXXXX
39
--- a/block/io.c
31
--- a/docs/devel/bitmaps.md
40
+++ b/block/io.c
32
+++ /dev/null
41
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
33
@@ -XXX,XX +XXX,XX @@
42
int64_t *map,
34
-<!--
43
BlockDriverState **file)
35
-Copyright 2015 John Snow <jsnow@redhat.com> and Red Hat, Inc.
44
{
36
-All rights reserved.
45
+ int ret;
37
-
46
BlockDriverState *p;
38
-This file is licensed via The FreeBSD Documentation License, the full text of
47
- int ret = 0;
39
-which is included at the end of this document.
48
- bool first = true;
40
--->
49
+ int64_t eof = 0;
41
-
50
42
-# Dirty Bitmaps and Incremental Backup
51
assert(bs != base);
43
-
52
- for (p = bs; p != base; p = bdrv_filter_or_cow_bs(p)) {
44
-* Dirty Bitmaps are objects that track which data needs to be backed up for the
45
- next incremental backup.
46
-
47
-* Dirty bitmaps can be created at any time and attached to any node
48
- (not just complete drives.)
49
-
50
-## Dirty Bitmap Names
51
-
52
-* A dirty bitmap's name is unique to the node, but bitmaps attached to different
53
- nodes can share the same name.
54
-
55
-* Dirty bitmaps created for internal use by QEMU may be anonymous and have no
56
- name, but any user-created bitmaps may not be. There can be any number of
57
- anonymous bitmaps per node.
58
-
59
-* The name of a user-created bitmap must not be empty ("").
60
-
61
-## Bitmap Modes
62
-
63
-* A Bitmap can be "frozen," which means that it is currently in-use by a backup
64
- operation and cannot be deleted, renamed, written to, reset,
65
- etc.
66
-
67
-* The normal operating mode for a bitmap is "active."
68
-
69
-## Basic QMP Usage
70
-
71
-### Supported Commands ###
72
-
73
-* block-dirty-bitmap-add
74
-* block-dirty-bitmap-remove
75
-* block-dirty-bitmap-clear
76
-
77
-### Creation
78
-
79
-* To create a new bitmap, enabled, on the drive with id=drive0:
80
-
81
-```json
82
-{ "execute": "block-dirty-bitmap-add",
83
- "arguments": {
84
- "node": "drive0",
85
- "name": "bitmap0"
86
- }
87
-}
88
-```
89
-
90
-* This bitmap will have a default granularity that matches the cluster size of
91
- its associated drive, if available, clamped to between [4KiB, 64KiB].
92
- The current default for qcow2 is 64KiB.
93
-
94
-* To create a new bitmap that tracks changes in 32KiB segments:
95
-
96
-```json
97
-{ "execute": "block-dirty-bitmap-add",
98
- "arguments": {
99
- "node": "drive0",
100
- "name": "bitmap0",
101
- "granularity": 32768
102
- }
103
-}
104
-```
105
-
106
-### Deletion
107
-
108
-* Bitmaps that are frozen cannot be deleted.
109
-
110
-* Deleting the bitmap does not impact any other bitmaps attached to the same
111
- node, nor does it affect any backups already created from this node.
112
-
113
-* Because bitmaps are only unique to the node to which they are attached,
114
- you must specify the node/drive name here, too.
115
-
116
-```json
117
-{ "execute": "block-dirty-bitmap-remove",
118
- "arguments": {
119
- "node": "drive0",
120
- "name": "bitmap0"
121
- }
122
-}
123
-```
124
-
125
-### Resetting
126
-
127
-* Resetting a bitmap will clear all information it holds.
128
-
129
-* An incremental backup created from an empty bitmap will copy no data,
130
- as if nothing has changed.
131
-
132
-```json
133
-{ "execute": "block-dirty-bitmap-clear",
134
- "arguments": {
135
- "node": "drive0",
136
- "name": "bitmap0"
137
- }
138
-}
139
-```
140
-
141
-## Transactions
142
-
143
-### Justification
144
-
145
-Bitmaps can be safely modified when the VM is paused or halted by using
146
-the basic QMP commands. For instance, you might perform the following actions:
147
-
148
-1. Boot the VM in a paused state.
149
-2. Create a full drive backup of drive0.
150
-3. Create a new bitmap attached to drive0.
151
-4. Resume execution of the VM.
152
-5. Incremental backups are ready to be created.
153
-
154
-At this point, the bitmap and drive backup would be correctly in sync,
155
-and incremental backups made from this point forward would be correctly aligned
156
-to the full drive backup.
157
-
158
-This is not particularly useful if we decide we want to start incremental
159
-backups after the VM has been running for a while, for which we will need to
160
-perform actions such as the following:
161
-
162
-1. Boot the VM and begin execution.
163
-2. Using a single transaction, perform the following operations:
164
- * Create bitmap0.
165
- * Create a full drive backup of drive0.
166
-3. Incremental backups are now ready to be created.
167
-
168
-### Supported Bitmap Transactions
169
-
170
-* block-dirty-bitmap-add
171
-* block-dirty-bitmap-clear
172
-
173
-The usages are identical to their respective QMP commands, but see below
174
-for examples.
175
-
176
-### Example: New Incremental Backup
177
-
178
-As outlined in the justification, perhaps we want to create a new incremental
179
-backup chain attached to a drive.
180
-
181
-```json
182
-{ "execute": "transaction",
183
- "arguments": {
184
- "actions": [
185
- {"type": "block-dirty-bitmap-add",
186
- "data": {"node": "drive0", "name": "bitmap0"} },
187
- {"type": "drive-backup",
188
- "data": {"device": "drive0", "target": "/path/to/full_backup.img",
189
- "sync": "full", "format": "qcow2"} }
190
- ]
191
- }
192
-}
193
-```
194
-
195
-### Example: New Incremental Backup Anchor Point
196
-
197
-Maybe we just want to create a new full backup with an existing bitmap and
198
-want to reset the bitmap to track the new chain.
199
-
200
-```json
201
-{ "execute": "transaction",
202
- "arguments": {
203
- "actions": [
204
- {"type": "block-dirty-bitmap-clear",
205
- "data": {"node": "drive0", "name": "bitmap0"} },
206
- {"type": "drive-backup",
207
- "data": {"device": "drive0", "target": "/path/to/new_full_backup.img",
208
- "sync": "full", "format": "qcow2"} }
209
- ]
210
- }
211
-}
212
-```
213
-
214
-## Incremental Backups
215
-
216
-The star of the show.
217
-
218
-**Nota Bene!** Only incremental backups of entire drives are supported for now.
219
-So despite the fact that you can attach a bitmap to any arbitrary node, they are
220
-only currently useful when attached to the root node. This is because
221
-drive-backup only supports drives/devices instead of arbitrary nodes.
222
-
223
-### Example: First Incremental Backup
224
-
225
-1. Create a full backup and sync it to the dirty bitmap, as in the transactional
226
-examples above; or with the VM offline, manually create a full copy and then
227
-create a new bitmap before the VM begins execution.
228
-
229
- * Let's assume the full backup is named 'full_backup.img'.
230
- * Let's assume the bitmap you created is 'bitmap0' attached to 'drive0'.
231
-
232
-2. Create a destination image for the incremental backup that utilizes the
233
-full backup as a backing image.
234
-
235
- * Let's assume it is named 'incremental.0.img'.
236
-
237
- ```sh
238
- # qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
239
- ```
240
-
241
-3. Issue the incremental backup command:
242
-
243
- ```json
244
- { "execute": "drive-backup",
245
- "arguments": {
246
- "device": "drive0",
247
- "bitmap": "bitmap0",
248
- "target": "incremental.0.img",
249
- "format": "qcow2",
250
- "sync": "incremental",
251
- "mode": "existing"
252
- }
253
- }
254
- ```
255
-
256
-### Example: Second Incremental Backup
257
-
258
-1. Create a new destination image for the incremental backup that points to the
259
- previous one, e.g.: 'incremental.1.img'
260
-
261
- ```sh
262
- # qemu-img create -f qcow2 incremental.1.img -b incremental.0.img -F qcow2
263
- ```
264
-
265
-2. Issue a new incremental backup command. The only difference here is that we
266
- have changed the target image below.
267
-
268
- ```json
269
- { "execute": "drive-backup",
270
- "arguments": {
271
- "device": "drive0",
272
- "bitmap": "bitmap0",
273
- "target": "incremental.1.img",
274
- "format": "qcow2",
275
- "sync": "incremental",
276
- "mode": "existing"
277
- }
278
- }
279
- ```
280
-
281
-## Errors
282
-
283
-* In the event of an error that occurs after a backup job is successfully
284
- launched, either by a direct QMP command or a QMP transaction, the user
285
- will receive a BLOCK_JOB_COMPLETE event with a failure message, accompanied
286
- by a BLOCK_JOB_ERROR event.
287
-
288
-* In the case of an event being cancelled, the user will receive a
289
- BLOCK_JOB_CANCELLED event instead of a pair of COMPLETE and ERROR events.
290
-
291
-* In either case, the incremental backup data contained within the bitmap is
292
- safely rolled back, and the data within the bitmap is not lost. The image
293
- file created for the failed attempt can be safely deleted.
294
-
295
-* Once the underlying problem is fixed (e.g. more storage space is freed up),
296
- you can simply retry the incremental backup command with the same bitmap.
297
-
298
-### Example
299
-
300
-1. Create a target image:
301
-
302
- ```sh
303
- # qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
304
- ```
305
-
306
-2. Attempt to create an incremental backup via QMP:
307
-
308
- ```json
309
- { "execute": "drive-backup",
310
- "arguments": {
311
- "device": "drive0",
312
- "bitmap": "bitmap0",
313
- "target": "incremental.0.img",
314
- "format": "qcow2",
315
- "sync": "incremental",
316
- "mode": "existing"
317
- }
318
- }
319
- ```
320
-
321
-3. Receive an event notifying us of failure:
322
-
323
- ```json
324
- { "timestamp": { "seconds": 1424709442, "microseconds": 844524 },
325
- "data": { "speed": 0, "offset": 0, "len": 67108864,
326
- "error": "No space left on device",
327
- "device": "drive1", "type": "backup" },
328
- "event": "BLOCK_JOB_COMPLETED" }
329
- ```
330
-
331
-4. Delete the failed incremental, and re-create the image.
332
-
333
- ```sh
334
- # rm incremental.0.img
335
- # qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
336
- ```
337
-
338
-5. Retry the command after fixing the underlying problem,
339
- such as freeing up space on the backup volume:
340
-
341
- ```json
342
- { "execute": "drive-backup",
343
- "arguments": {
344
- "device": "drive0",
345
- "bitmap": "bitmap0",
346
- "target": "incremental.0.img",
347
- "format": "qcow2",
348
- "sync": "incremental",
349
- "mode": "existing"
350
- }
351
- }
352
- ```
353
-
354
-6. Receive confirmation that the job completed successfully:
355
-
356
- ```json
357
- { "timestamp": { "seconds": 1424709668, "microseconds": 526525 },
358
- "data": { "device": "drive1", "type": "backup",
359
- "speed": 0, "len": 67108864, "offset": 67108864},
360
- "event": "BLOCK_JOB_COMPLETED" }
361
- ```
362
-
363
-### Partial Transactional Failures
364
-
365
-* Sometimes, a transaction will succeed in launching and return success,
366
- but then later the backup jobs themselves may fail. It is possible that
367
- a management application may have to deal with a partial backup failure
368
- after a successful transaction.
369
-
370
-* If multiple backup jobs are specified in a single transaction, when one of
371
- them fails, it will not interact with the other backup jobs in any way.
372
-
373
-* The job(s) that succeeded will clear the dirty bitmap associated with the
374
- operation, but the job(s) that failed will not. It is not "safe" to delete
375
- any incremental backups that were created successfully in this scenario,
376
- even though others failed.
377
-
378
-#### Example
379
-
380
-* QMP example highlighting two backup jobs:
381
-
382
- ```json
383
- { "execute": "transaction",
384
- "arguments": {
385
- "actions": [
386
- { "type": "drive-backup",
387
- "data": { "device": "drive0", "bitmap": "bitmap0",
388
- "format": "qcow2", "mode": "existing",
389
- "sync": "incremental", "target": "d0-incr-1.qcow2" } },
390
- { "type": "drive-backup",
391
- "data": { "device": "drive1", "bitmap": "bitmap1",
392
- "format": "qcow2", "mode": "existing",
393
- "sync": "incremental", "target": "d1-incr-1.qcow2" } },
394
- ]
395
- }
396
- }
397
- ```
398
-
399
-* QMP example response, highlighting one success and one failure:
400
- * Acknowledgement that the Transaction was accepted and jobs were launched:
401
- ```json
402
- { "return": {} }
403
- ```
404
-
405
- * Later, QEMU sends notice that the first job was completed:
406
- ```json
407
- { "timestamp": { "seconds": 1447192343, "microseconds": 615698 },
408
- "data": { "device": "drive0", "type": "backup",
409
- "speed": 0, "len": 67108864, "offset": 67108864 },
410
- "event": "BLOCK_JOB_COMPLETED"
411
- }
412
- ```
413
-
414
- * Later yet, QEMU sends notice that the second job has failed:
415
- ```json
416
- { "timestamp": { "seconds": 1447192399, "microseconds": 683015 },
417
- "data": { "device": "drive1", "action": "report",
418
- "operation": "read" },
419
- "event": "BLOCK_JOB_ERROR" }
420
- ```
421
-
422
- ```json
423
- { "timestamp": { "seconds": 1447192399, "microseconds": 685853 },
424
- "data": { "speed": 0, "offset": 0, "len": 67108864,
425
- "error": "Input/output error",
426
- "device": "drive1", "type": "backup" },
427
- "event": "BLOCK_JOB_COMPLETED" }
428
-
429
-* In the above example, "d0-incr-1.qcow2" is valid and must be kept,
430
- but "d1-incr-1.qcow2" is invalid and should be deleted. If a VM-wide
431
- incremental backup of all drives at a point-in-time is to be made,
432
- new backups for both drives will need to be made, taking into account
433
- that a new incremental backup for drive0 needs to be based on top of
434
- "d0-incr-1.qcow2."
435
-
436
-### Grouped Completion Mode
437
-
438
-* While jobs launched by transactions normally complete or fail on their own,
439
- it is possible to instruct them to complete or fail together as a group.
440
-
441
-* QMP transactions take an optional properties structure that can affect
442
- the semantics of the transaction.
443
-
444
-* The "completion-mode" transaction property can be either "individual"
445
- which is the default, legacy behavior described above, or "grouped,"
446
- a new behavior detailed below.
447
-
448
-* Delayed Completion: In grouped completion mode, no jobs will report
449
- success until all jobs are ready to report success.
450
-
451
-* Grouped failure: If any job fails in grouped completion mode, all remaining
452
- jobs will be cancelled. Any incremental backups will restore their dirty
453
- bitmap objects as if no backup command was ever issued.
454
-
455
- * Regardless of if QEMU reports a particular incremental backup job as
456
- CANCELLED or as an ERROR, the in-memory bitmap will be restored.
457
-
458
-#### Example
459
-
460
-* Here's the same example scenario from above with the new property:
461
-
462
- ```json
463
- { "execute": "transaction",
464
- "arguments": {
465
- "actions": [
466
- { "type": "drive-backup",
467
- "data": { "device": "drive0", "bitmap": "bitmap0",
468
- "format": "qcow2", "mode": "existing",
469
- "sync": "incremental", "target": "d0-incr-1.qcow2" } },
470
- { "type": "drive-backup",
471
- "data": { "device": "drive1", "bitmap": "bitmap1",
472
- "format": "qcow2", "mode": "existing",
473
- "sync": "incremental", "target": "d1-incr-1.qcow2" } },
474
- ],
475
- "properties": {
476
- "completion-mode": "grouped"
477
- }
478
- }
479
- }
480
- ```
481
-
482
-* QMP example response, highlighting a failure for drive2:
483
- * Acknowledgement that the Transaction was accepted and jobs were launched:
484
- ```json
485
- { "return": {} }
486
- ```
487
-
488
- * Later, QEMU sends notice that the second job has errored out,
489
- but that the first job was also cancelled:
490
- ```json
491
- { "timestamp": { "seconds": 1447193702, "microseconds": 632377 },
492
- "data": { "device": "drive1", "action": "report",
493
- "operation": "read" },
494
- "event": "BLOCK_JOB_ERROR" }
495
- ```
496
-
497
- ```json
498
- { "timestamp": { "seconds": 1447193702, "microseconds": 640074 },
499
- "data": { "speed": 0, "offset": 0, "len": 67108864,
500
- "error": "Input/output error",
501
- "device": "drive1", "type": "backup" },
502
- "event": "BLOCK_JOB_COMPLETED" }
503
- ```
504
-
505
- ```json
506
- { "timestamp": { "seconds": 1447193702, "microseconds": 640163 },
507
- "data": { "device": "drive0", "type": "backup", "speed": 0,
508
- "len": 67108864, "offset": 16777216 },
509
- "event": "BLOCK_JOB_CANCELLED" }
510
- ```
511
-
512
-<!--
513
-The FreeBSD Documentation License
514
-
515
-Redistribution and use in source (Markdown) and 'compiled' forms (SGML, HTML,
516
-PDF, PostScript, RTF and so forth) with or without modification, are permitted
517
-provided that the following conditions are met:
518
-
519
-Redistributions of source code (Markdown) must retain the above copyright
520
-notice, this list of conditions and the following disclaimer of this file
521
-unmodified.
522
-
523
-Redistributions in compiled form (transformed to other DTDs, converted to PDF,
524
-PostScript, RTF and other formats) must reproduce the above copyright notice,
525
-this list of conditions and the following disclaimer in the documentation and/or
526
-other materials provided with the distribution.
527
-
528
-THIS DOCUMENTATION IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
529
-AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
530
-IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
531
-DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
532
-FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
533
-DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
534
-SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
535
-CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
536
-OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
537
-THIS DOCUMENTATION, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
538
--->
539
diff --git a/docs/interop/bitmaps.rst b/docs/interop/bitmaps.rst
540
new file mode 100644
541
index XXXXXXX..XXXXXXX
542
--- /dev/null
543
+++ b/docs/interop/bitmaps.rst
544
@@ -XXX,XX +XXX,XX @@
545
+..
546
+ Copyright 2015 John Snow <jsnow@redhat.com> and Red Hat, Inc.
547
+ All rights reserved.
548
+
53
+
549
+ This file is licensed via The FreeBSD Documentation License, the full
54
+ ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
550
+ text of which is included at the end of this document.
55
+ if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED) {
551
+
56
+ return ret;
552
+====================================
553
+Dirty Bitmaps and Incremental Backup
554
+====================================
555
+
556
+- Dirty Bitmaps are objects that track which data needs to be backed up
557
+ for the next incremental backup.
558
+
559
+- Dirty bitmaps can be created at any time and attached to any node
560
+ (not just complete drives).
561
+
562
+.. contents::
563
+
564
+Dirty Bitmap Names
565
+------------------
566
+
567
+- A dirty bitmap's name is unique to the node, but bitmaps attached to
568
+ different nodes can share the same name.
569
+
570
+- Dirty bitmaps created for internal use by QEMU may be anonymous and
571
+ have no name, but any user-created bitmaps must have a name. There
572
+ can be any number of anonymous bitmaps per node.
573
+
574
+- The name of a user-created bitmap must not be empty ("").
575
+
576
+Bitmap Modes
577
+------------
578
+
579
+- A bitmap can be "frozen," which means that it is currently in-use by
580
+ a backup operation and cannot be deleted, renamed, written to, reset,
581
+ etc.
582
+
583
+- The normal operating mode for a bitmap is "active."
584
+
585
+Basic QMP Usage
586
+---------------
587
+
588
+Supported Commands
589
+~~~~~~~~~~~~~~~~~~
590
+
591
+- ``block-dirty-bitmap-add``
592
+- ``block-dirty-bitmap-remove``
593
+- ``block-dirty-bitmap-clear``
594
+
595
+Creation
596
+~~~~~~~~
597
+
598
+- To create a new bitmap, enabled, on the drive with id=drive0:
599
+
600
+.. code:: json
601
+
602
+ { "execute": "block-dirty-bitmap-add",
603
+ "arguments": {
604
+ "node": "drive0",
605
+ "name": "bitmap0"
606
+ }
607
+ }
57
+ }
608
+
58
+
609
+- This bitmap will have a default granularity that matches the cluster
59
+ if (ret & BDRV_BLOCK_EOF) {
610
+ size of its associated drive, if available, clamped to between [4KiB,
60
+ eof = offset + *pnum;
611
+ 64KiB]. The current default for qcow2 is 64KiB.
612
+
613
+- To create a new bitmap that tracks changes in 32KiB segments:
614
+
615
+.. code:: json
616
+
617
+ { "execute": "block-dirty-bitmap-add",
618
+ "arguments": {
619
+ "node": "drive0",
620
+ "name": "bitmap0",
621
+ "granularity": 32768
622
+ }
623
+ }
61
+ }
624
+
62
+
625
+Deletion
63
+ assert(*pnum <= bytes);
626
+~~~~~~~~
64
+ bytes = *pnum;
627
+
65
+
628
+- Bitmaps that are frozen cannot be deleted.
66
+ for (p = bdrv_filter_or_cow_bs(bs); p != base;
67
+ p = bdrv_filter_or_cow_bs(p))
68
+ {
69
ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
70
file);
71
if (ret < 0) {
72
- break;
73
+ return ret;
74
}
75
- if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
76
+ if (*pnum == 0) {
77
/*
78
- * Reading beyond the end of the file continues to read
79
- * zeroes, but we can only widen the result to the
80
- * unallocated length we learned from an earlier
81
- * iteration.
82
+ * The top layer deferred to this layer, and because this layer is
83
+ * short, any zeroes that we synthesize beyond EOF behave as if they
84
+ * were allocated at this layer.
85
+ *
86
+ * We don't include BDRV_BLOCK_EOF into ret, as upper layer may be
87
+ * larger. We'll add BDRV_BLOCK_EOF if needed at function end, see
88
+ * below.
89
*/
90
+ assert(ret & BDRV_BLOCK_EOF);
91
*pnum = bytes;
92
+ if (file) {
93
+ *file = p;
94
+ }
95
+ ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_ALLOCATED;
96
+ break;
97
}
98
- if (ret & (BDRV_BLOCK_ZERO | BDRV_BLOCK_DATA)) {
99
+ if (ret & BDRV_BLOCK_ALLOCATED) {
100
+ /*
101
+ * We've found the node and the status, we must break.
102
+ *
103
+ * Drop BDRV_BLOCK_EOF, as it's not for upper layer, which may be
104
+ * larger. We'll add BDRV_BLOCK_EOF if needed at function end, see
105
+ * below.
106
+ */
107
+ ret &= ~BDRV_BLOCK_EOF;
108
break;
109
}
110
- /* [offset, pnum] unallocated on this layer, which could be only
111
- * the first part of [offset, bytes]. */
112
- bytes = MIN(bytes, *pnum);
113
- first = false;
629
+
114
+
630
+- Deleting the bitmap does not impact any other bitmaps attached to the
115
+ /*
631
+ same node, nor does it affect any backups already created from this
116
+ * OK, [offset, offset + *pnum) region is unallocated on this layer,
632
+ node.
117
+ * let's continue the diving.
633
+
118
+ */
634
+- Because bitmaps are only unique to the node to which they are
119
+ assert(*pnum <= bytes);
635
+ attached, you must specify the node/drive name here, too.
120
+ bytes = *pnum;
636
+
637
+.. code:: json
638
+
639
+ { "execute": "block-dirty-bitmap-remove",
640
+ "arguments": {
641
+ "node": "drive0",
642
+ "name": "bitmap0"
643
+ }
644
+ }
121
+ }
645
+
122
+
646
+Resetting
123
+ if (offset + *pnum == eof) {
647
+~~~~~~~~~
124
+ ret |= BDRV_BLOCK_EOF;
125
}
648
+
126
+
649
+- Resetting a bitmap will clear all information it holds.
127
return ret;
128
}
129
130
diff --git a/block/qcow2.c b/block/qcow2.c
131
index XXXXXXX..XXXXXXX 100644
132
--- a/block/qcow2.c
133
+++ b/block/qcow2.c
134
@@ -XXX,XX +XXX,XX @@ static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
135
if (!bytes) {
136
return true;
137
}
138
- res = bdrv_block_status_above(bs, NULL, offset, bytes, &nr, NULL, NULL);
139
- return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
650
+
140
+
651
+- An incremental backup created from an empty bitmap will copy no data,
141
+ /*
652
+ as if nothing has changed.
142
+ * bdrv_block_status_above doesn't merge different types of zeros, for
143
+ * example, zeros which come from the region which is unallocated in
144
+ * the whole backing chain, and zeros which come because of a short
145
+ * backing file. So, we need a loop.
146
+ */
147
+ do {
148
+ res = bdrv_block_status_above(bs, NULL, offset, bytes, &nr, NULL, NULL);
149
+ offset += nr;
150
+ bytes -= nr;
151
+ } while (res >= 0 && (res & BDRV_BLOCK_ZERO) && nr && bytes);
653
+
152
+
654
+.. code:: json
153
+ return res >= 0 && (res & BDRV_BLOCK_ZERO) && bytes == 0;
655
+
154
}
656
+ { "execute": "block-dirty-bitmap-clear",
155
657
+ "arguments": {
156
static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
658
+ "node": "drive0",
659
+ "name": "bitmap0"
660
+ }
661
+ }
662
+
663
+Transactions
664
+------------
665
+
666
+Justification
667
+~~~~~~~~~~~~~
668
+
669
+Bitmaps can be safely modified when the VM is paused or halted by using
670
+the basic QMP commands. For instance, you might perform the following
671
+actions:
672
+
673
+1. Boot the VM in a paused state.
674
+2. Create a full drive backup of drive0.
675
+3. Create a new bitmap attached to drive0.
676
+4. Resume execution of the VM.
677
+5. Incremental backups are ready to be created.
678
+
679
+At this point, the bitmap and drive backup would be correctly in sync,
680
+and incremental backups made from this point forward would be correctly
681
+aligned to the full drive backup.
682
+
683
+This is not particularly useful if we decide we want to start
684
+incremental backups after the VM has been running for a while, for which
685
+we will need to perform actions such as the following:
686
+
687
+1. Boot the VM and begin execution.
688
+2. Using a single transaction, perform the following operations:
689
+
690
+ - Create ``bitmap0``.
691
+ - Create a full drive backup of ``drive0``.
692
+
693
+3. Incremental backups are now ready to be created.
694
+
695
+Supported Bitmap Transactions
696
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
697
+
698
+- ``block-dirty-bitmap-add``
699
+- ``block-dirty-bitmap-clear``
700
+
701
+The usages are identical to their respective QMP commands, but see below
702
+for examples.
703
+
704
+Example: New Incremental Backup
705
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
706
+
707
+As outlined in the justification, perhaps we want to create a new
708
+incremental backup chain attached to a drive.
709
+
710
+.. code:: json
711
+
712
+ { "execute": "transaction",
713
+ "arguments": {
714
+ "actions": [
715
+ {"type": "block-dirty-bitmap-add",
716
+ "data": {"node": "drive0", "name": "bitmap0"} },
717
+ {"type": "drive-backup",
718
+ "data": {"device": "drive0", "target": "/path/to/full_backup.img",
719
+ "sync": "full", "format": "qcow2"} }
720
+ ]
721
+ }
722
+ }
723
+
724
+Example: New Incremental Backup Anchor Point
725
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
726
+
727
+Maybe we just want to create a new full backup with an existing bitmap
728
+and want to reset the bitmap to track the new chain.
729
+
730
+.. code:: json
731
+
732
+ { "execute": "transaction",
733
+ "arguments": {
734
+ "actions": [
735
+ {"type": "block-dirty-bitmap-clear",
736
+ "data": {"node": "drive0", "name": "bitmap0"} },
737
+ {"type": "drive-backup",
738
+ "data": {"device": "drive0", "target": "/path/to/new_full_backup.img",
739
+ "sync": "full", "format": "qcow2"} }
740
+ ]
741
+ }
742
+ }
743
+
744
+Incremental Backups
745
+-------------------
746
+
747
+The star of the show.
748
+
749
+**Nota Bene!** Only incremental backups of entire drives are supported
750
+for now. So despite the fact that you can attach a bitmap to any
751
+arbitrary node, they are only currently useful when attached to the root
752
+node. This is because drive-backup only supports drives/devices instead
753
+of arbitrary nodes.
754
+
755
+Example: First Incremental Backup
756
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
757
+
758
+1. Create a full backup and sync it to the dirty bitmap, as in the
759
+ transactional examples above; or with the VM offline, manually create
760
+ a full copy and then create a new bitmap before the VM begins
761
+ execution.
762
+
763
+ - Let's assume the full backup is named ``full_backup.img``.
764
+ - Let's assume the bitmap you created is ``bitmap0`` attached to
765
+ ``drive0``.
766
+
767
+2. Create a destination image for the incremental backup that utilizes
768
+ the full backup as a backing image.
769
+
770
+ - Let's assume the new incremental image is named
771
+ ``incremental.0.img``.
772
+
773
+ .. code:: bash
774
+
775
+ $ qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
776
+
777
+3. Issue the incremental backup command:
778
+
779
+ .. code:: json
780
+
781
+ { "execute": "drive-backup",
782
+ "arguments": {
783
+ "device": "drive0",
784
+ "bitmap": "bitmap0",
785
+ "target": "incremental.0.img",
786
+ "format": "qcow2",
787
+ "sync": "incremental",
788
+ "mode": "existing"
789
+ }
790
+ }
791
+
792
+Example: Second Incremental Backup
793
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
794
+
795
+1. Create a new destination image for the incremental backup that points
796
+ to the previous one, e.g.: ``incremental.1.img``
797
+
798
+ .. code:: bash
799
+
800
+ $ qemu-img create -f qcow2 incremental.1.img -b incremental.0.img -F qcow2
801
+
802
+2. Issue a new incremental backup command. The only difference here is
803
+ that we have changed the target image below.
804
+
805
+ .. code:: json
806
+
807
+ { "execute": "drive-backup",
808
+ "arguments": {
809
+ "device": "drive0",
810
+ "bitmap": "bitmap0",
811
+ "target": "incremental.1.img",
812
+ "format": "qcow2",
813
+ "sync": "incremental",
814
+ "mode": "existing"
815
+ }
816
+ }
817
+
818
+Errors
819
+------
820
+
821
+- In the event of an error that occurs after a backup job is
822
+ successfully launched, either by a direct QMP command or a QMP
823
+ transaction, the user will receive a ``BLOCK_JOB_COMPLETE`` event with
824
+ a failure message, accompanied by a ``BLOCK_JOB_ERROR`` event.
825
+
826
+- In the case of an event being cancelled, the user will receive a
827
+ ``BLOCK_JOB_CANCELLED`` event instead of a pair of COMPLETE and ERROR
828
+ events.
829
+
830
+- In either case, the incremental backup data contained within the
831
+ bitmap is safely rolled back, and the data within the bitmap is not
832
+ lost. The image file created for the failed attempt can be safely
833
+ deleted.
834
+
835
+- Once the underlying problem is fixed (e.g. more storage space is
836
+ freed up), you can simply retry the incremental backup command with
837
+ the same bitmap.
838
+
839
+Example
840
+~~~~~~~
841
+
842
+1. Create a target image:
843
+
844
+ .. code:: bash
845
+
846
+ $ qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
847
+
848
+2. Attempt to create an incremental backup via QMP:
849
+
850
+ .. code:: json
851
+
852
+ { "execute": "drive-backup",
853
+ "arguments": {
854
+ "device": "drive0",
855
+ "bitmap": "bitmap0",
856
+ "target": "incremental.0.img",
857
+ "format": "qcow2",
858
+ "sync": "incremental",
859
+ "mode": "existing"
860
+ }
861
+ }
862
+
863
+3. Receive an event notifying us of failure:
864
+
865
+ .. code:: json
866
+
867
+ { "timestamp": { "seconds": 1424709442, "microseconds": 844524 },
868
+ "data": { "speed": 0, "offset": 0, "len": 67108864,
869
+ "error": "No space left on device",
870
+ "device": "drive1", "type": "backup" },
871
+ "event": "BLOCK_JOB_COMPLETED" }
872
+
873
+4. Delete the failed incremental, and re-create the image.
874
+
875
+ .. code:: bash
876
+
877
+ $ rm incremental.0.img
878
+ $ qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
879
+
880
+5. Retry the command after fixing the underlying problem, such as
881
+ freeing up space on the backup volume:
882
+
883
+ .. code:: json
884
+
885
+ { "execute": "drive-backup",
886
+ "arguments": {
887
+ "device": "drive0",
888
+ "bitmap": "bitmap0",
889
+ "target": "incremental.0.img",
890
+ "format": "qcow2",
891
+ "sync": "incremental",
892
+ "mode": "existing"
893
+ }
894
+ }
895
+
896
+6. Receive confirmation that the job completed successfully:
897
+
898
+ .. code:: json
899
+
900
+ { "timestamp": { "seconds": 1424709668, "microseconds": 526525 },
901
+ "data": { "device": "drive1", "type": "backup",
902
+ "speed": 0, "len": 67108864, "offset": 67108864},
903
+ "event": "BLOCK_JOB_COMPLETED" }
904
+
905
+Partial Transactional Failures
906
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
907
+
908
+- Sometimes, a transaction will succeed in launching and return
909
+ success, but then later the backup jobs themselves may fail. It is
910
+ possible that a management application may have to deal with a
911
+ partial backup failure after a successful transaction.
912
+
913
+- If multiple backup jobs are specified in a single transaction, when
914
+ one of them fails, it will not interact with the other backup jobs in
915
+ any way.
916
+
917
+- The job(s) that succeeded will clear the dirty bitmap associated with
918
+ the operation, but the job(s) that failed will not. It is not "safe"
919
+ to delete any incremental backups that were created successfully in
920
+ this scenario, even though others failed.
921
+
922
+Example
923
+^^^^^^^
924
+
925
+- QMP example highlighting two backup jobs:
926
+
927
+ .. code:: json
928
+
929
+ { "execute": "transaction",
930
+ "arguments": {
931
+ "actions": [
932
+ { "type": "drive-backup",
933
+ "data": { "device": "drive0", "bitmap": "bitmap0",
934
+ "format": "qcow2", "mode": "existing",
935
+ "sync": "incremental", "target": "d0-incr-1.qcow2" } },
936
+ { "type": "drive-backup",
937
+ "data": { "device": "drive1", "bitmap": "bitmap1",
938
+ "format": "qcow2", "mode": "existing",
939
+ "sync": "incremental", "target": "d1-incr-1.qcow2" } },
940
+ ]
941
+ }
942
+ }
943
+
944
+- QMP example response, highlighting one success and one failure:
945
+
946
+ - Acknowledgement that the Transaction was accepted and jobs were
947
+ launched:
948
+
949
+ .. code:: json
950
+
951
+ { "return": {} }
952
+
953
+ - Later, QEMU sends notice that the first job was completed:
954
+
955
+ .. code:: json
956
+
957
+ { "timestamp": { "seconds": 1447192343, "microseconds": 615698 },
958
+ "data": { "device": "drive0", "type": "backup",
959
+ "speed": 0, "len": 67108864, "offset": 67108864 },
960
+ "event": "BLOCK_JOB_COMPLETED"
961
+ }
962
+
963
+ - Later yet, QEMU sends notice that the second job has failed:
964
+
965
+ .. code:: json
966
+
967
+ { "timestamp": { "seconds": 1447192399, "microseconds": 683015 },
968
+ "data": { "device": "drive1", "action": "report",
969
+ "operation": "read" },
970
+ "event": "BLOCK_JOB_ERROR" }
971
+
972
+ .. code:: json
973
+
974
+ { "timestamp": { "seconds": 1447192399, "microseconds":
975
+ 685853 }, "data": { "speed": 0, "offset": 0, "len": 67108864,
976
+ "error": "Input/output error", "device": "drive1", "type":
977
+ "backup" }, "event": "BLOCK_JOB_COMPLETED" }
978
+
979
+- In the above example, ``d0-incr-1.qcow2`` is valid and must be kept,
980
+ but ``d1-incr-1.qcow2`` is invalid and should be deleted. If a VM-wide
981
+ incremental backup of all drives at a point-in-time is to be made,
982
+ new backups for both drives will need to be made, taking into account
983
+ that a new incremental backup for drive0 needs to be based on top of
984
+ ``d0-incr-1.qcow2``.
985
+
986
+Grouped Completion Mode
987
+~~~~~~~~~~~~~~~~~~~~~~~
988
+
989
+- While jobs launched by transactions normally complete or fail on
990
+ their own, it is possible to instruct them to complete or fail
991
+ together as a group.
992
+
993
+- QMP transactions take an optional properties structure that can
994
+ affect the semantics of the transaction.
995
+
996
+- The "completion-mode" transaction property can be either "individual"
997
+ which is the default, legacy behavior described above, or "grouped,"
998
+ a new behavior detailed below.
999
+
1000
+- Delayed Completion: In grouped completion mode, no jobs will report
1001
+ success until all jobs are ready to report success.
1002
+
1003
+- Grouped failure: If any job fails in grouped completion mode, all
1004
+ remaining jobs will be cancelled. Any incremental backups will
1005
+ restore their dirty bitmap objects as if no backup command was ever
1006
+ issued.
1007
+
1008
+ - Regardless of if QEMU reports a particular incremental backup job
1009
+ as CANCELLED or as an ERROR, the in-memory bitmap will be
1010
+ restored.
1011
+
1012
+Example
1013
+^^^^^^^
1014
+
1015
+- Here's the same example scenario from above with the new property:
1016
+
1017
+ .. code:: json
1018
+
1019
+ { "execute": "transaction",
1020
+ "arguments": {
1021
+ "actions": [
1022
+ { "type": "drive-backup",
1023
+ "data": { "device": "drive0", "bitmap": "bitmap0",
1024
+ "format": "qcow2", "mode": "existing",
1025
+ "sync": "incremental", "target": "d0-incr-1.qcow2" } },
1026
+ { "type": "drive-backup",
1027
+ "data": { "device": "drive1", "bitmap": "bitmap1",
1028
+ "format": "qcow2", "mode": "existing",
1029
+ "sync": "incremental", "target": "d1-incr-1.qcow2" } },
1030
+ ],
1031
+ "properties": {
1032
+ "completion-mode": "grouped"
1033
+ }
1034
+ }
1035
+ }
1036
+
1037
+- QMP example response, highlighting a failure for ``drive2``:
1038
+
1039
+ - Acknowledgement that the Transaction was accepted and jobs were
1040
+ launched:
1041
+
1042
+ .. code:: json
1043
+
1044
+ { "return": {} }
1045
+
1046
+ - Later, QEMU sends notice that the second job has errored out, but
1047
+ that the first job was also cancelled:
1048
+
1049
+ .. code:: json
1050
+
1051
+ { "timestamp": { "seconds": 1447193702, "microseconds": 632377 },
1052
+ "data": { "device": "drive1", "action": "report",
1053
+ "operation": "read" },
1054
+ "event": "BLOCK_JOB_ERROR" }
1055
+
1056
+ .. code:: json
1057
+
1058
+ { "timestamp": { "seconds": 1447193702, "microseconds": 640074 },
1059
+ "data": { "speed": 0, "offset": 0, "len": 67108864,
1060
+ "error": "Input/output error",
1061
+ "device": "drive1", "type": "backup" },
1062
+ "event": "BLOCK_JOB_COMPLETED" }
1063
+
1064
+ .. code:: json
1065
+
1066
+ { "timestamp": { "seconds": 1447193702, "microseconds": 640163 },
1067
+ "data": { "device": "drive0", "type": "backup", "speed": 0,
1068
+ "len": 67108864, "offset": 16777216 },
1069
+ "event": "BLOCK_JOB_CANCELLED" }
1070
+
1071
+.. raw:: html
1072
+
1073
+ <!--
1074
+ The FreeBSD Documentation License
1075
+
1076
+ Redistribution and use in source (Markdown) and 'compiled' forms (SGML, HTML,
1077
+ PDF, PostScript, RTF and so forth) with or without modification, are permitted
1078
+ provided that the following conditions are met:
1079
+
1080
+ Redistributions of source code (Markdown) must retain the above copyright
1081
+ notice, this list of conditions and the following disclaimer of this file
1082
+ unmodified.
1083
+
1084
+ Redistributions in compiled form (transformed to other DTDs, converted to PDF,
1085
+ PostScript, RTF and other formats) must reproduce the above copyright notice,
1086
+ this list of conditions and the following disclaimer in the documentation and/or
1087
+ other materials provided with the distribution.
1088
+
1089
+ THIS DOCUMENTATION IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
1090
+ AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
1091
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
1092
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
1093
+ FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
1094
+ DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
1095
+ SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
1096
+ CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
1097
+ OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
1098
+ THIS DOCUMENTATION, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
1099
+ -->
1100
--
157
--
1101
2.9.4
158
2.26.2
1102
159
1103
diff view generated by jsdifflib
New patch
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
2
3
In order to reuse bdrv_common_block_status_above in
4
bdrv_is_allocated_above, let's support include_base parameter.
5
6
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
7
Reviewed-by: Alberto Garcia <berto@igalia.com>
8
Reviewed-by: Eric Blake <eblake@redhat.com>
9
Message-id: 20200924194003.22080-3-vsementsov@virtuozzo.com
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
12
block/coroutines.h | 2 ++
13
block/io.c | 21 ++++++++++++++-------
14
2 files changed, 16 insertions(+), 7 deletions(-)
15
16
diff --git a/block/coroutines.h b/block/coroutines.h
17
index XXXXXXX..XXXXXXX 100644
18
--- a/block/coroutines.h
19
+++ b/block/coroutines.h
20
@@ -XXX,XX +XXX,XX @@ bdrv_pwritev(BdrvChild *child, int64_t offset, unsigned int bytes,
21
int coroutine_fn
22
bdrv_co_common_block_status_above(BlockDriverState *bs,
23
BlockDriverState *base,
24
+ bool include_base,
25
bool want_zero,
26
int64_t offset,
27
int64_t bytes,
28
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
29
int generated_co_wrapper
30
bdrv_common_block_status_above(BlockDriverState *bs,
31
BlockDriverState *base,
32
+ bool include_base,
33
bool want_zero,
34
int64_t offset,
35
int64_t bytes,
36
diff --git a/block/io.c b/block/io.c
37
index XXXXXXX..XXXXXXX 100644
38
--- a/block/io.c
39
+++ b/block/io.c
40
@@ -XXX,XX +XXX,XX @@ early_out:
41
int coroutine_fn
42
bdrv_co_common_block_status_above(BlockDriverState *bs,
43
BlockDriverState *base,
44
+ bool include_base,
45
bool want_zero,
46
int64_t offset,
47
int64_t bytes,
48
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
49
BlockDriverState *p;
50
int64_t eof = 0;
51
52
- assert(bs != base);
53
+ assert(include_base || bs != base);
54
+ assert(!include_base || base); /* Can't include NULL base */
55
56
ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
57
- if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED) {
58
+ if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
59
return ret;
60
}
61
62
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
63
assert(*pnum <= bytes);
64
bytes = *pnum;
65
66
- for (p = bdrv_filter_or_cow_bs(bs); p != base;
67
+ for (p = bdrv_filter_or_cow_bs(bs); include_base || p != base;
68
p = bdrv_filter_or_cow_bs(p))
69
{
70
ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
71
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
72
break;
73
}
74
75
+ if (p == base) {
76
+ assert(include_base);
77
+ break;
78
+ }
79
+
80
/*
81
* OK, [offset, offset + *pnum) region is unallocated on this layer,
82
* let's continue the diving.
83
@@ -XXX,XX +XXX,XX @@ int bdrv_block_status_above(BlockDriverState *bs, BlockDriverState *base,
84
int64_t offset, int64_t bytes, int64_t *pnum,
85
int64_t *map, BlockDriverState **file)
86
{
87
- return bdrv_common_block_status_above(bs, base, true, offset, bytes,
88
+ return bdrv_common_block_status_above(bs, base, false, true, offset, bytes,
89
pnum, map, file);
90
}
91
92
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
93
int ret;
94
int64_t dummy;
95
96
- ret = bdrv_common_block_status_above(bs, bdrv_filter_or_cow_bs(bs), false,
97
- offset, bytes, pnum ? pnum : &dummy,
98
- NULL, NULL);
99
+ ret = bdrv_common_block_status_above(bs, bs, true, false, offset,
100
+ bytes, pnum ? pnum : &dummy, NULL,
101
+ NULL);
102
if (ret < 0) {
103
return ret;
104
}
105
--
106
2.26.2
107
diff view generated by jsdifflib
New patch
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
2
3
We are going to reuse bdrv_common_block_status_above in
4
bdrv_is_allocated_above. bdrv_is_allocated_above may be called with
5
include_base == false and still bs == base (for ex. from img_rebase()).
6
7
So, support this corner case.
8
9
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
10
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
11
Reviewed-by: Eric Blake <eblake@redhat.com>
12
Reviewed-by: Alberto Garcia <berto@igalia.com>
13
Message-id: 20200924194003.22080-4-vsementsov@virtuozzo.com
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
15
---
16
block/io.c | 6 +++++-
17
1 file changed, 5 insertions(+), 1 deletion(-)
18
19
diff --git a/block/io.c b/block/io.c
20
index XXXXXXX..XXXXXXX 100644
21
--- a/block/io.c
22
+++ b/block/io.c
23
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
24
BlockDriverState *p;
25
int64_t eof = 0;
26
27
- assert(include_base || bs != base);
28
assert(!include_base || base); /* Can't include NULL base */
29
30
+ if (!include_base && bs == base) {
31
+ *pnum = bytes;
32
+ return 0;
33
+ }
34
+
35
ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
36
if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
37
return ret;
38
--
39
2.26.2
40
diff view generated by jsdifflib
New patch
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
2
3
bdrv_is_allocated_above wrongly handles short backing files: it reports
4
after-EOF space as UNALLOCATED which is wrong, as on read the data is
5
generated on the level of short backing file (if all overlays have
6
unallocated areas at that place).
7
8
Reusing bdrv_common_block_status_above fixes the issue and unifies code
9
path.
10
11
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
12
Reviewed-by: Eric Blake <eblake@redhat.com>
13
Reviewed-by: Alberto Garcia <berto@igalia.com>
14
Message-id: 20200924194003.22080-5-vsementsov@virtuozzo.com
15
[Fix s/has/have/ as suggested by Eric Blake. Fix s/area/areas/.
16
--Stefan]
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
18
---
19
block/io.c | 43 +++++--------------------------------------
20
1 file changed, 5 insertions(+), 38 deletions(-)
21
22
diff --git a/block/io.c b/block/io.c
23
index XXXXXXX..XXXXXXX 100644
24
--- a/block/io.c
25
+++ b/block/io.c
26
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
27
* at 'offset + *pnum' may return the same allocation status (in other
28
* words, the result is not necessarily the maximum possible range);
29
* but 'pnum' will only be 0 when end of file is reached.
30
- *
31
*/
32
int bdrv_is_allocated_above(BlockDriverState *top,
33
BlockDriverState *base,
34
bool include_base, int64_t offset,
35
int64_t bytes, int64_t *pnum)
36
{
37
- BlockDriverState *intermediate;
38
- int ret;
39
- int64_t n = bytes;
40
-
41
- assert(base || !include_base);
42
-
43
- intermediate = top;
44
- while (include_base || intermediate != base) {
45
- int64_t pnum_inter;
46
- int64_t size_inter;
47
-
48
- assert(intermediate);
49
- ret = bdrv_is_allocated(intermediate, offset, bytes, &pnum_inter);
50
- if (ret < 0) {
51
- return ret;
52
- }
53
- if (ret) {
54
- *pnum = pnum_inter;
55
- return 1;
56
- }
57
-
58
- size_inter = bdrv_getlength(intermediate);
59
- if (size_inter < 0) {
60
- return size_inter;
61
- }
62
- if (n > pnum_inter &&
63
- (intermediate == top || offset + pnum_inter < size_inter)) {
64
- n = pnum_inter;
65
- }
66
-
67
- if (intermediate == base) {
68
- break;
69
- }
70
-
71
- intermediate = bdrv_filter_or_cow_bs(intermediate);
72
+ int ret = bdrv_common_block_status_above(top, base, include_base, false,
73
+ offset, bytes, pnum, NULL, NULL);
74
+ if (ret < 0) {
75
+ return ret;
76
}
77
78
- *pnum = n;
79
- return 0;
80
+ return !!(ret & BDRV_BLOCK_ALLOCATED);
81
}
82
83
int coroutine_fn
84
--
85
2.26.2
86
diff view generated by jsdifflib
New patch
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
2
3
These cases are fixed by previous patches around block_status and
4
is_allocated.
5
6
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
7
Reviewed-by: Eric Blake <eblake@redhat.com>
8
Reviewed-by: Alberto Garcia <berto@igalia.com>
9
Message-id: 20200924194003.22080-6-vsementsov@virtuozzo.com
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
12
tests/qemu-iotests/274 | 20 +++++++++++
13
tests/qemu-iotests/274.out | 68 ++++++++++++++++++++++++++++++++++++++
14
2 files changed, 88 insertions(+)
15
16
diff --git a/tests/qemu-iotests/274 b/tests/qemu-iotests/274
17
index XXXXXXX..XXXXXXX 100755
18
--- a/tests/qemu-iotests/274
19
+++ b/tests/qemu-iotests/274
20
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('base') as base, \
21
iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, mid)
22
iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), mid)
23
24
+ iotests.log('=== Testing qemu-img commit (top -> base) ===')
25
+
26
+ create_chain()
27
+ iotests.qemu_img_log('commit', '-b', base, top)
28
+ iotests.img_info_log(base)
29
+ iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
30
+ iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), base)
31
+
32
+ iotests.log('=== Testing QMP active commit (top -> base) ===')
33
+
34
+ create_chain()
35
+ with create_vm() as vm:
36
+ vm.launch()
37
+ vm.qmp_log('block-commit', device='top', base_node='base',
38
+ job_id='job0', auto_dismiss=False)
39
+ vm.run_job('job0', wait=5)
40
+
41
+ iotests.img_info_log(mid)
42
+ iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
43
+ iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), base)
44
45
iotests.log('== Resize tests ==')
46
47
diff --git a/tests/qemu-iotests/274.out b/tests/qemu-iotests/274.out
48
index XXXXXXX..XXXXXXX 100644
49
--- a/tests/qemu-iotests/274.out
50
+++ b/tests/qemu-iotests/274.out
51
@@ -XXX,XX +XXX,XX @@ read 1048576/1048576 bytes at offset 0
52
read 1048576/1048576 bytes at offset 1048576
53
1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
54
55
+=== Testing qemu-img commit (top -> base) ===
56
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 lazy_refcounts=off refcount_bits=16
57
+
58
+Formatting 'TEST_DIR/PID-mid', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1048576 backing_file=TEST_DIR/PID-base backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
59
+
60
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 backing_file=TEST_DIR/PID-mid backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
61
+
62
+wrote 2097152/2097152 bytes at offset 0
63
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
64
+
65
+Image committed.
66
+
67
+image: TEST_IMG
68
+file format: IMGFMT
69
+virtual size: 2 MiB (2097152 bytes)
70
+cluster_size: 65536
71
+Format specific information:
72
+ compat: 1.1
73
+ compression type: zlib
74
+ lazy refcounts: false
75
+ refcount bits: 16
76
+ corrupt: false
77
+ extended l2: false
78
+
79
+read 1048576/1048576 bytes at offset 0
80
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
81
+
82
+read 1048576/1048576 bytes at offset 1048576
83
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
84
+
85
+=== Testing QMP active commit (top -> base) ===
86
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 lazy_refcounts=off refcount_bits=16
87
+
88
+Formatting 'TEST_DIR/PID-mid', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1048576 backing_file=TEST_DIR/PID-base backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
89
+
90
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 backing_file=TEST_DIR/PID-mid backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
91
+
92
+wrote 2097152/2097152 bytes at offset 0
93
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
94
+
95
+{"execute": "block-commit", "arguments": {"auto-dismiss": false, "base-node": "base", "device": "top", "job-id": "job0"}}
96
+{"return": {}}
97
+{"execute": "job-complete", "arguments": {"id": "job0"}}
98
+{"return": {}}
99
+{"data": {"device": "job0", "len": 1048576, "offset": 1048576, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
100
+{"data": {"device": "job0", "len": 1048576, "offset": 1048576, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
101
+{"execute": "job-dismiss", "arguments": {"id": "job0"}}
102
+{"return": {}}
103
+image: TEST_IMG
104
+file format: IMGFMT
105
+virtual size: 1 MiB (1048576 bytes)
106
+cluster_size: 65536
107
+backing file: TEST_DIR/PID-base
108
+backing file format: IMGFMT
109
+Format specific information:
110
+ compat: 1.1
111
+ compression type: zlib
112
+ lazy refcounts: false
113
+ refcount bits: 16
114
+ corrupt: false
115
+ extended l2: false
116
+
117
+read 1048576/1048576 bytes at offset 0
118
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
119
+
120
+read 1048576/1048576 bytes at offset 1048576
121
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
122
+
123
== Resize tests ==
124
=== preallocation=off ===
125
Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=6442450944 lazy_refcounts=off refcount_bits=16
126
--
127
2.26.2
128
diff view generated by jsdifflib