1
The following changes since commit 56f9e46b841c7be478ca038d8d4085d776ab4b0d:
1
The following changes since commit ac793156f650ae2d77834932d72224175ee69086:
2
2
3
Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2017-02-20' into staging (2017-02-20 17:42:47 +0000)
3
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20201020-1' into staging (2020-10-20 21:11:35 +0100)
4
4
5
are available in the git repository at:
5
are available in the Git repository at:
6
6
7
git://github.com/stefanha/qemu.git tags/block-pull-request
7
https://gitlab.com/stefanha/qemu.git tags/block-pull-request
8
8
9
for you to fetch changes up to a7b91d35bab97a2d3e779d0c64c9b837b52a6cf7:
9
for you to fetch changes up to 32a3fd65e7e3551337fd26bfc0e2f899d70c028c:
10
10
11
coroutine-lock: make CoRwlock thread-safe and fair (2017-02-21 11:39:40 +0000)
11
iotests: add commit top->base cases to 274 (2020-10-22 09:55:39 +0100)
12
12
13
----------------------------------------------------------------
13
----------------------------------------------------------------
14
Pull request
14
Pull request
15
15
16
v2:
16
v2:
17
* Rebased to resolve scsi conflicts
17
* Fix format string issues on 32-bit hosts [Peter]
18
* Fix qemu-nbd.c CONFIG_POSIX ifdef issue [Eric]
19
* Fix missing eventfd.h header on macOS [Peter]
20
* Drop unreliable vhost-user-blk test (will send a new patch when ready) [Peter]
21
22
This pull request contains the vhost-user-blk server by Coiby Xu along with my
23
additions, block/nvme.c alignment and hardware error statistics by Philippe
24
Mathieu-Daudé, and bdrv_co_block_status_above() fixes by Vladimir
25
Sementsov-Ogievskiy.
18
26
19
----------------------------------------------------------------
27
----------------------------------------------------------------
20
28
21
Paolo Bonzini (24):
29
Coiby Xu (6):
22
block: move AioContext, QEMUTimer, main-loop to libqemuutil
30
libvhost-user: Allow vu_message_read to be replaced
23
aio: introduce aio_co_schedule and aio_co_wake
31
libvhost-user: remove watch for kick_fd when de-initialize vu-dev
24
block-backend: allow blk_prw from coroutine context
32
util/vhost-user-server: generic vhost user server
25
test-thread-pool: use generic AioContext infrastructure
33
block: move logical block size check function to a common utility
26
io: add methods to set I/O handlers on AioContext
34
function
27
io: make qio_channel_yield aware of AioContexts
35
block/export: vhost-user block device backend server
28
nbd: convert to use qio_channel_yield
36
MAINTAINERS: Add vhost-user block device backend server maintainer
29
coroutine-lock: reschedule coroutine on the AioContext it was running
30
on
31
blkdebug: reschedule coroutine on the AioContext it is running on
32
qed: introduce qed_aio_start_io and qed_aio_next_io_cb
33
aio: push aio_context_acquire/release down to dispatching
34
block: explicitly acquire aiocontext in timers that need it
35
block: explicitly acquire aiocontext in callbacks that need it
36
block: explicitly acquire aiocontext in bottom halves that need it
37
block: explicitly acquire aiocontext in aio callbacks that need it
38
aio-posix: partially inline aio_dispatch into aio_poll
39
async: remove unnecessary inc/dec pairs
40
block: document fields protected by AioContext lock
41
coroutine-lock: make CoMutex thread-safe
42
coroutine-lock: add limited spinning to CoMutex
43
test-aio-multithread: add performance comparison with thread-based
44
mutexes
45
coroutine-lock: place CoMutex before CoQueue in header
46
coroutine-lock: add mutex argument to CoQueue APIs
47
coroutine-lock: make CoRwlock thread-safe and fair
48
37
49
Makefile.objs | 4 -
38
Philippe Mathieu-Daudé (1):
50
stubs/Makefile.objs | 1 +
39
block/nvme: Add driver statistics for access alignment and hw errors
51
tests/Makefile.include | 19 +-
40
52
util/Makefile.objs | 6 +-
41
Stefan Hajnoczi (16):
53
block/nbd-client.h | 2 +-
42
util/vhost-user-server: s/fileds/fields/ typo fix
54
block/qed.h | 3 +
43
util/vhost-user-server: drop unnecessary QOM cast
55
include/block/aio.h | 38 ++-
44
util/vhost-user-server: drop unnecessary watch deletion
56
include/block/block_int.h | 64 +++--
45
block/export: consolidate request structs into VuBlockReq
57
include/io/channel.h | 72 +++++-
46
util/vhost-user-server: drop unused DevicePanicNotifier
58
include/qemu/coroutine.h | 84 ++++---
47
util/vhost-user-server: fix memory leak in vu_message_read()
59
include/qemu/coroutine_int.h | 11 +-
48
util/vhost-user-server: check EOF when reading payload
60
include/sysemu/block-backend.h | 14 +-
49
util/vhost-user-server: rework vu_client_trip() coroutine lifecycle
61
tests/iothread.h | 25 ++
50
block/export: report flush errors
62
block/backup.c | 2 +-
51
block/export: convert vhost-user-blk server to block export API
63
block/blkdebug.c | 9 +-
52
util/vhost-user-server: move header to include/
64
block/blkreplay.c | 2 +-
53
util/vhost-user-server: use static library in meson.build
65
block/block-backend.c | 13 +-
54
qemu-storage-daemon: avoid compiling blockdev_ss twice
66
block/curl.c | 44 +++-
55
block: move block exports to libblockdev
67
block/gluster.c | 9 +-
56
block/export: add iothread and fixed-iothread options
68
block/io.c | 42 +---
57
block/export: add vhost-user-blk multi-queue support
69
block/iscsi.c | 15 +-
58
70
block/linux-aio.c | 10 +-
59
Vladimir Sementsov-Ogievskiy (5):
71
block/mirror.c | 12 +-
60
block/io: fix bdrv_co_block_status_above
72
block/nbd-client.c | 119 +++++----
61
block/io: bdrv_common_block_status_above: support include_base
73
block/nfs.c | 9 +-
62
block/io: bdrv_common_block_status_above: support bs == base
74
block/qcow2-cluster.c | 4 +-
63
block/io: fix bdrv_is_allocated_above
75
block/qed-cluster.c | 2 +
64
iotests: add commit top->base cases to 274
76
block/qed-table.c | 12 +-
65
77
block/qed.c | 58 +++--
66
MAINTAINERS | 9 +
78
block/sheepdog.c | 31 +--
67
qapi/block-core.json | 24 +-
79
block/ssh.c | 29 +--
68
qapi/block-export.json | 36 +-
80
block/throttle-groups.c | 4 +-
69
block/coroutines.h | 2 +
81
block/win32-aio.c | 9 +-
70
block/export/vhost-user-blk-server.h | 19 +
82
dma-helpers.c | 2 +
71
contrib/libvhost-user/libvhost-user.h | 21 +
83
hw/9pfs/9p.c | 2 +-
72
include/qemu/vhost-user-server.h | 65 +++
84
hw/block/virtio-blk.c | 19 +-
73
util/block-helpers.h | 19 +
85
hw/scsi/scsi-bus.c | 2 +
74
block/export/export.c | 37 +-
86
hw/scsi/scsi-disk.c | 15 ++
75
block/export/vhost-user-blk-server.c | 431 ++++++++++++++++++++
87
hw/scsi/scsi-generic.c | 20 +-
76
block/io.c | 132 +++---
88
hw/scsi/virtio-scsi.c | 7 +
77
block/nvme.c | 27 ++
89
io/channel-command.c | 13 +
78
block/qcow2.c | 16 +-
90
io/channel-file.c | 11 +
79
contrib/libvhost-user/libvhost-user-glib.c | 2 +-
91
io/channel-socket.c | 16 +-
80
contrib/libvhost-user/libvhost-user.c | 15 +-
92
io/channel-tls.c | 12 +
81
hw/core/qdev-properties-system.c | 31 +-
93
io/channel-watch.c | 6 +
82
nbd/server.c | 2 -
94
io/channel.c | 97 ++++++--
83
qemu-nbd.c | 21 +-
95
nbd/client.c | 2 +-
84
softmmu/vl.c | 4 +
96
nbd/common.c | 9 +-
85
stubs/blk-exp-close-all.c | 7 +
97
nbd/server.c | 94 +++-----
86
tests/vhost-user-bridge.c | 2 +
98
stubs/linux-aio.c | 32 +++
87
tools/virtiofsd/fuse_virtio.c | 4 +-
99
stubs/set-fd-handler.c | 11 -
88
util/block-helpers.c | 46 +++
100
tests/iothread.c | 91 +++++++
89
util/vhost-user-server.c | 446 +++++++++++++++++++++
101
tests/test-aio-multithread.c | 463 ++++++++++++++++++++++++++++++++++++
90
block/export/meson.build | 3 +-
102
tests/test-thread-pool.c | 12 +-
91
contrib/libvhost-user/meson.build | 1 +
103
aio-posix.c => util/aio-posix.c | 62 ++---
92
meson.build | 22 +-
104
aio-win32.c => util/aio-win32.c | 30 +--
93
nbd/meson.build | 2 +
105
util/aiocb.c | 55 +++++
94
storage-daemon/meson.build | 3 +-
106
async.c => util/async.c | 84 ++++++-
95
stubs/meson.build | 1 +
107
iohandler.c => util/iohandler.c | 0
96
tests/qemu-iotests/274 | 20 +
108
main-loop.c => util/main-loop.c | 0
97
tests/qemu-iotests/274.out | 68 ++++
109
util/qemu-coroutine-lock.c | 254 ++++++++++++++++++--
98
util/meson.build | 4 +
110
util/qemu-coroutine-sleep.c | 2 +-
99
33 files changed, 1420 insertions(+), 122 deletions(-)
111
util/qemu-coroutine.c | 8 +
100
create mode 100644 block/export/vhost-user-blk-server.h
112
qemu-timer.c => util/qemu-timer.c | 0
101
create mode 100644 include/qemu/vhost-user-server.h
113
thread-pool.c => util/thread-pool.c | 8 +-
102
create mode 100644 util/block-helpers.h
114
trace-events | 11 -
103
create mode 100644 block/export/vhost-user-blk-server.c
115
util/trace-events | 17 +-
104
create mode 100644 stubs/blk-exp-close-all.c
116
67 files changed, 1712 insertions(+), 533 deletions(-)
105
create mode 100644 util/block-helpers.c
117
create mode 100644 tests/iothread.h
106
create mode 100644 util/vhost-user-server.c
118
create mode 100644 stubs/linux-aio.c
119
create mode 100644 tests/iothread.c
120
create mode 100644 tests/test-aio-multithread.c
121
rename aio-posix.c => util/aio-posix.c (94%)
122
rename aio-win32.c => util/aio-win32.c (95%)
123
create mode 100644 util/aiocb.c
124
rename async.c => util/async.c (82%)
125
rename iohandler.c => util/iohandler.c (100%)
126
rename main-loop.c => util/main-loop.c (100%)
127
rename qemu-timer.c => util/qemu-timer.c (100%)
128
rename thread-pool.c => util/thread-pool.c (97%)
129
107
130
--
108
--
131
2.9.3
109
2.26.2
132
110
133
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
From: Philippe Mathieu-Daudé <philmd@redhat.com>
2
2
3
Add two implementations of the same benchmark as the previous patch,
3
Keep statistics of some hardware errors, and number of
4
but using pthreads. One uses a normal QemuMutex, the other is Linux
4
aligned/unaligned I/O accesses.
5
only and implements a fair mutex based on MCS locks and futexes.
6
This shows that the slower performance of the 5-thread case is due to
7
the fairness of CoMutex, rather than to coroutines. If fairness does
8
not matter, as is the case with two threads, CoMutex can actually be
9
faster than pthreads.
10
5
11
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6
QMP example booting a full RHEL 8.3 aarch64 guest:
12
Reviewed-by: Fam Zheng <famz@redhat.com>
7
13
Message-id: 20170213181244.16297-4-pbonzini@redhat.com
8
{ "execute": "query-blockstats" }
9
{
10
"return": [
11
{
12
"device": "",
13
"node-name": "drive0",
14
"stats": {
15
"flush_total_time_ns": 6026948,
16
"wr_highest_offset": 3383991230464,
17
"wr_total_time_ns": 807450995,
18
"failed_wr_operations": 0,
19
"failed_rd_operations": 0,
20
"wr_merged": 3,
21
"wr_bytes": 50133504,
22
"failed_unmap_operations": 0,
23
"failed_flush_operations": 0,
24
"account_invalid": false,
25
"rd_total_time_ns": 1846979900,
26
"flush_operations": 130,
27
"wr_operations": 659,
28
"rd_merged": 1192,
29
"rd_bytes": 218244096,
30
"account_failed": false,
31
"idle_time_ns": 2678641497,
32
"rd_operations": 7406,
33
},
34
"driver-specific": {
35
"driver": "nvme",
36
"completion-errors": 0,
37
"unaligned-accesses": 2959,
38
"aligned-accesses": 4477
39
},
40
"qdev": "/machine/peripheral-anon/device[0]/virtio-backend"
41
}
42
]
43
}
44
45
Suggested-by: Stefan Hajnoczi <stefanha@gmail.com>
46
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
47
Acked-by: Markus Armbruster <armbru@redhat.com>
48
Message-id: 20201001162939.1567915-1-philmd@redhat.com
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
49
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
15
---
50
---
16
tests/test-aio-multithread.c | 164 +++++++++++++++++++++++++++++++++++++++++++
51
qapi/block-core.json | 24 +++++++++++++++++++++++-
17
1 file changed, 164 insertions(+)
52
block/nvme.c | 27 +++++++++++++++++++++++++++
53
2 files changed, 50 insertions(+), 1 deletion(-)
18
54
19
diff --git a/tests/test-aio-multithread.c b/tests/test-aio-multithread.c
55
diff --git a/qapi/block-core.json b/qapi/block-core.json
20
index XXXXXXX..XXXXXXX 100644
56
index XXXXXXX..XXXXXXX 100644
21
--- a/tests/test-aio-multithread.c
57
--- a/qapi/block-core.json
22
+++ b/tests/test-aio-multithread.c
58
+++ b/qapi/block-core.json
23
@@ -XXX,XX +XXX,XX @@ static void test_multi_co_mutex_2_30(void)
59
@@ -XXX,XX +XXX,XX @@
24
test_multi_co_mutex(2, 30);
60
'discard-nb-failed': 'uint64',
61
'discard-bytes-ok': 'uint64' } }
62
63
+##
64
+# @BlockStatsSpecificNvme:
65
+#
66
+# NVMe driver statistics
67
+#
68
+# @completion-errors: The number of completion errors.
69
+#
70
+# @aligned-accesses: The number of aligned accesses performed by
71
+# the driver.
72
+#
73
+# @unaligned-accesses: The number of unaligned accesses performed by
74
+# the driver.
75
+#
76
+# Since: 5.2
77
+##
78
+{ 'struct': 'BlockStatsSpecificNvme',
79
+ 'data': {
80
+ 'completion-errors': 'uint64',
81
+ 'aligned-accesses': 'uint64',
82
+ 'unaligned-accesses': 'uint64' } }
83
+
84
##
85
# @BlockStatsSpecific:
86
#
87
@@ -XXX,XX +XXX,XX @@
88
'discriminator': 'driver',
89
'data': {
90
'file': 'BlockStatsSpecificFile',
91
- 'host_device': 'BlockStatsSpecificFile' } }
92
+ 'host_device': 'BlockStatsSpecificFile',
93
+ 'nvme': 'BlockStatsSpecificNvme' } }
94
95
##
96
# @BlockStats:
97
diff --git a/block/nvme.c b/block/nvme.c
98
index XXXXXXX..XXXXXXX 100644
99
--- a/block/nvme.c
100
+++ b/block/nvme.c
101
@@ -XXX,XX +XXX,XX @@ struct BDRVNVMeState {
102
103
/* PCI address (required for nvme_refresh_filename()) */
104
char *device;
105
+
106
+ struct {
107
+ uint64_t completion_errors;
108
+ uint64_t aligned_accesses;
109
+ uint64_t unaligned_accesses;
110
+ } stats;
111
};
112
113
#define NVME_BLOCK_OPT_DEVICE "device"
114
@@ -XXX,XX +XXX,XX @@ static bool nvme_process_completion(NVMeQueuePair *q)
115
break;
116
}
117
ret = nvme_translate_error(c);
118
+ if (ret) {
119
+ s->stats.completion_errors++;
120
+ }
121
q->cq.head = (q->cq.head + 1) % NVME_QUEUE_SIZE;
122
if (!q->cq.head) {
123
q->cq_phase = !q->cq_phase;
124
@@ -XXX,XX +XXX,XX @@ static int nvme_co_prw(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
125
assert(QEMU_IS_ALIGNED(bytes, s->page_size));
126
assert(bytes <= s->max_transfer);
127
if (nvme_qiov_aligned(bs, qiov)) {
128
+ s->stats.aligned_accesses++;
129
return nvme_co_prw_aligned(bs, offset, bytes, qiov, is_write, flags);
130
}
131
+ s->stats.unaligned_accesses++;
132
trace_nvme_prw_buffered(s, offset, bytes, qiov->niov, is_write);
133
buf = qemu_try_memalign(s->page_size, bytes);
134
135
@@ -XXX,XX +XXX,XX @@ static void nvme_unregister_buf(BlockDriverState *bs, void *host)
136
qemu_vfio_dma_unmap(s->vfio, host);
25
}
137
}
26
138
27
+/* Same test with fair mutexes, for performance comparison. */
139
+static BlockStatsSpecific *nvme_get_specific_stats(BlockDriverState *bs)
140
+{
141
+ BlockStatsSpecific *stats = g_new(BlockStatsSpecific, 1);
142
+ BDRVNVMeState *s = bs->opaque;
28
+
143
+
29
+#ifdef CONFIG_LINUX
144
+ stats->driver = BLOCKDEV_DRIVER_NVME;
30
+#include "qemu/futex.h"
145
+ stats->u.nvme = (BlockStatsSpecificNvme) {
146
+ .completion_errors = s->stats.completion_errors,
147
+ .aligned_accesses = s->stats.aligned_accesses,
148
+ .unaligned_accesses = s->stats.unaligned_accesses,
149
+ };
31
+
150
+
32
+/* The nodes for the mutex reside in this structure (on which we try to avoid
151
+ return stats;
33
+ * false sharing). The head of the mutex is in the "mutex_head" variable.
34
+ */
35
+static struct {
36
+ int next, locked;
37
+ int padding[14];
38
+} nodes[NUM_CONTEXTS] __attribute__((__aligned__(64)));
39
+
40
+static int mutex_head = -1;
41
+
42
+static void mcs_mutex_lock(void)
43
+{
44
+ int prev;
45
+
46
+ nodes[id].next = -1;
47
+ nodes[id].locked = 1;
48
+ prev = atomic_xchg(&mutex_head, id);
49
+ if (prev != -1) {
50
+ atomic_set(&nodes[prev].next, id);
51
+ qemu_futex_wait(&nodes[id].locked, 1);
52
+ }
53
+}
152
+}
54
+
153
+
55
+static void mcs_mutex_unlock(void)
154
static const char *const nvme_strong_runtime_opts[] = {
56
+{
155
NVME_BLOCK_OPT_DEVICE,
57
+ int next;
156
NVME_BLOCK_OPT_NAMESPACE,
58
+ if (nodes[id].next == -1) {
157
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_nvme = {
59
+ if (atomic_read(&mutex_head) == id &&
158
.bdrv_refresh_filename = nvme_refresh_filename,
60
+ atomic_cmpxchg(&mutex_head, id, -1) == id) {
159
.bdrv_refresh_limits = nvme_refresh_limits,
61
+ /* Last item in the list, exit. */
160
.strong_runtime_opts = nvme_strong_runtime_opts,
62
+ return;
161
+ .bdrv_get_specific_stats = nvme_get_specific_stats,
63
+ }
162
64
+ while (atomic_read(&nodes[id].next) == -1) {
163
.bdrv_detach_aio_context = nvme_detach_aio_context,
65
+ /* mcs_mutex_lock did the xchg, but has not updated
164
.bdrv_attach_aio_context = nvme_attach_aio_context,
66
+ * nodes[prev].next yet.
67
+ */
68
+ }
69
+ }
70
+
71
+ /* Wake up the next in line. */
72
+ next = nodes[id].next;
73
+ nodes[next].locked = 0;
74
+ qemu_futex_wake(&nodes[next].locked, 1);
75
+}
76
+
77
+static void test_multi_fair_mutex_entry(void *opaque)
78
+{
79
+ while (!atomic_mb_read(&now_stopping)) {
80
+ mcs_mutex_lock();
81
+ counter++;
82
+ mcs_mutex_unlock();
83
+ atomic_inc(&atomic_counter);
84
+ }
85
+ atomic_dec(&running);
86
+}
87
+
88
+static void test_multi_fair_mutex(int threads, int seconds)
89
+{
90
+ int i;
91
+
92
+ assert(mutex_head == -1);
93
+ counter = 0;
94
+ atomic_counter = 0;
95
+ now_stopping = false;
96
+
97
+ create_aio_contexts();
98
+ assert(threads <= NUM_CONTEXTS);
99
+ running = threads;
100
+ for (i = 0; i < threads; i++) {
101
+ Coroutine *co1 = qemu_coroutine_create(test_multi_fair_mutex_entry, NULL);
102
+ aio_co_schedule(ctx[i], co1);
103
+ }
104
+
105
+ g_usleep(seconds * 1000000);
106
+
107
+ atomic_mb_set(&now_stopping, true);
108
+ while (running > 0) {
109
+ g_usleep(100000);
110
+ }
111
+
112
+ join_aio_contexts();
113
+ g_test_message("%d iterations/second\n", counter / seconds);
114
+ g_assert_cmpint(counter, ==, atomic_counter);
115
+}
116
+
117
+static void test_multi_fair_mutex_1(void)
118
+{
119
+ test_multi_fair_mutex(NUM_CONTEXTS, 1);
120
+}
121
+
122
+static void test_multi_fair_mutex_10(void)
123
+{
124
+ test_multi_fair_mutex(NUM_CONTEXTS, 10);
125
+}
126
+#endif
127
+
128
+/* Same test with pthread mutexes, for performance comparison and
129
+ * portability. */
130
+
131
+static QemuMutex mutex;
132
+
133
+static void test_multi_mutex_entry(void *opaque)
134
+{
135
+ while (!atomic_mb_read(&now_stopping)) {
136
+ qemu_mutex_lock(&mutex);
137
+ counter++;
138
+ qemu_mutex_unlock(&mutex);
139
+ atomic_inc(&atomic_counter);
140
+ }
141
+ atomic_dec(&running);
142
+}
143
+
144
+static void test_multi_mutex(int threads, int seconds)
145
+{
146
+ int i;
147
+
148
+ qemu_mutex_init(&mutex);
149
+ counter = 0;
150
+ atomic_counter = 0;
151
+ now_stopping = false;
152
+
153
+ create_aio_contexts();
154
+ assert(threads <= NUM_CONTEXTS);
155
+ running = threads;
156
+ for (i = 0; i < threads; i++) {
157
+ Coroutine *co1 = qemu_coroutine_create(test_multi_mutex_entry, NULL);
158
+ aio_co_schedule(ctx[i], co1);
159
+ }
160
+
161
+ g_usleep(seconds * 1000000);
162
+
163
+ atomic_mb_set(&now_stopping, true);
164
+ while (running > 0) {
165
+ g_usleep(100000);
166
+ }
167
+
168
+ join_aio_contexts();
169
+ g_test_message("%d iterations/second\n", counter / seconds);
170
+ g_assert_cmpint(counter, ==, atomic_counter);
171
+}
172
+
173
+static void test_multi_mutex_1(void)
174
+{
175
+ test_multi_mutex(NUM_CONTEXTS, 1);
176
+}
177
+
178
+static void test_multi_mutex_10(void)
179
+{
180
+ test_multi_mutex(NUM_CONTEXTS, 10);
181
+}
182
+
183
/* End of tests. */
184
185
int main(int argc, char **argv)
186
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
187
g_test_add_func("/aio/multi/schedule", test_multi_co_schedule_1);
188
g_test_add_func("/aio/multi/mutex/contended", test_multi_co_mutex_1);
189
g_test_add_func("/aio/multi/mutex/handoff", test_multi_co_mutex_2_3);
190
+#ifdef CONFIG_LINUX
191
+ g_test_add_func("/aio/multi/mutex/mcs", test_multi_fair_mutex_1);
192
+#endif
193
+ g_test_add_func("/aio/multi/mutex/pthread", test_multi_mutex_1);
194
} else {
195
g_test_add_func("/aio/multi/schedule", test_multi_co_schedule_10);
196
g_test_add_func("/aio/multi/mutex/contended", test_multi_co_mutex_10);
197
g_test_add_func("/aio/multi/mutex/handoff", test_multi_co_mutex_2_30);
198
+#ifdef CONFIG_LINUX
199
+ g_test_add_func("/aio/multi/mutex/mcs", test_multi_fair_mutex_10);
200
+#endif
201
+ g_test_add_func("/aio/multi/mutex/pthread", test_multi_mutex_10);
202
}
203
return g_test_run();
204
}
205
--
165
--
206
2.9.3
166
2.26.2
207
167
208
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
From: Coiby Xu <coiby.xu@gmail.com>
2
2
3
Keep the coroutine on the same AioContext. Without this change,
3
Allow vu_message_read to be replaced by one which will make use of the
4
there would be a race between yielding the coroutine and reentering it.
4
QIOChannel functions. Thus reading vhost-user message won't stall the
5
While the race cannot happen now, because the code only runs from a single
5
guest. For slave channel, we still use the default vu_message_read.
6
AioContext, this will change with multiqueue support in the block layer.
7
6
8
While doing the change, replace custom bottom half with aio_co_schedule.
7
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
9
8
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
10
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11
Reviewed-by: Fam Zheng <famz@redhat.com>
12
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
10
Message-id: 20200918080912.321299-2-coiby.xu@gmail.com
14
Message-id: 20170213135235.12274-10-pbonzini@redhat.com
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
16
---
12
---
17
block/blkdebug.c | 9 +--------
13
contrib/libvhost-user/libvhost-user.h | 21 +++++++++++++++++++++
18
1 file changed, 1 insertion(+), 8 deletions(-)
14
contrib/libvhost-user/libvhost-user-glib.c | 2 +-
15
contrib/libvhost-user/libvhost-user.c | 14 +++++++-------
16
tests/vhost-user-bridge.c | 2 ++
17
tools/virtiofsd/fuse_virtio.c | 4 ++--
18
5 files changed, 33 insertions(+), 10 deletions(-)
19
19
20
diff --git a/block/blkdebug.c b/block/blkdebug.c
20
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
21
index XXXXXXX..XXXXXXX 100644
21
index XXXXXXX..XXXXXXX 100644
22
--- a/block/blkdebug.c
22
--- a/contrib/libvhost-user/libvhost-user.h
23
+++ b/block/blkdebug.c
23
+++ b/contrib/libvhost-user/libvhost-user.h
24
@@ -XXX,XX +XXX,XX @@ out:
24
@@ -XXX,XX +XXX,XX @@
25
return ret;
25
*/
26
#define VHOST_USER_MAX_RAM_SLOTS 32
27
28
+#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
29
+
30
typedef enum VhostSetConfigType {
31
VHOST_SET_CONFIG_TYPE_MASTER = 0,
32
VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
33
@@ -XXX,XX +XXX,XX @@ typedef uint64_t (*vu_get_features_cb) (VuDev *dev);
34
typedef void (*vu_set_features_cb) (VuDev *dev, uint64_t features);
35
typedef int (*vu_process_msg_cb) (VuDev *dev, VhostUserMsg *vmsg,
36
int *do_reply);
37
+typedef bool (*vu_read_msg_cb) (VuDev *dev, int sock, VhostUserMsg *vmsg);
38
typedef void (*vu_queue_set_started_cb) (VuDev *dev, int qidx, bool started);
39
typedef bool (*vu_queue_is_processed_in_order_cb) (VuDev *dev, int qidx);
40
typedef int (*vu_get_config_cb) (VuDev *dev, uint8_t *config, uint32_t len);
41
@@ -XXX,XX +XXX,XX @@ struct VuDev {
42
bool broken;
43
uint16_t max_queues;
44
45
+ /* @read_msg: custom method to read vhost-user message
46
+ *
47
+ * Read data from vhost_user socket fd and fill up
48
+ * the passed VhostUserMsg *vmsg struct.
49
+ *
50
+ * If reading fails, it should close the received set of file
51
+ * descriptors as socket message's auxiliary data.
52
+ *
53
+ * For the details, please refer to vu_message_read in libvhost-user.c
54
+ * which will be used by default if not custom method is provided when
55
+ * calling vu_init
56
+ *
57
+ * Returns: true if vhost-user message successfully received,
58
+ * otherwise return false.
59
+ *
60
+ */
61
+ vu_read_msg_cb read_msg;
62
/* @set_watch: add or update the given fd to the watch set,
63
* call cb when condition is met */
64
vu_set_watch_cb set_watch;
65
@@ -XXX,XX +XXX,XX @@ bool vu_init(VuDev *dev,
66
uint16_t max_queues,
67
int socket,
68
vu_panic_cb panic,
69
+ vu_read_msg_cb read_msg,
70
vu_set_watch_cb set_watch,
71
vu_remove_watch_cb remove_watch,
72
const VuDevIface *iface);
73
diff --git a/contrib/libvhost-user/libvhost-user-glib.c b/contrib/libvhost-user/libvhost-user-glib.c
74
index XXXXXXX..XXXXXXX 100644
75
--- a/contrib/libvhost-user/libvhost-user-glib.c
76
+++ b/contrib/libvhost-user/libvhost-user-glib.c
77
@@ -XXX,XX +XXX,XX @@ vug_init(VugDev *dev, uint16_t max_queues, int socket,
78
g_assert(dev);
79
g_assert(iface);
80
81
- if (!vu_init(&dev->parent, max_queues, socket, panic, set_watch,
82
+ if (!vu_init(&dev->parent, max_queues, socket, panic, NULL, set_watch,
83
remove_watch, iface)) {
84
return false;
85
}
86
diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
87
index XXXXXXX..XXXXXXX 100644
88
--- a/contrib/libvhost-user/libvhost-user.c
89
+++ b/contrib/libvhost-user/libvhost-user.c
90
@@ -XXX,XX +XXX,XX @@
91
/* The version of inflight buffer */
92
#define INFLIGHT_VERSION 1
93
94
-#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
95
-
96
/* The version of the protocol we support */
97
#define VHOST_USER_VERSION 1
98
#define LIBVHOST_USER_DEBUG 0
99
@@ -XXX,XX +XXX,XX @@ have_userfault(void)
26
}
100
}
27
101
28
-static void error_callback_bh(void *opaque)
102
static bool
29
-{
103
-vu_message_read(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
30
- Coroutine *co = opaque;
104
+vu_message_read_default(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
31
- qemu_coroutine_enter(co);
32
-}
33
-
34
static int inject_error(BlockDriverState *bs, BlkdebugRule *rule)
35
{
105
{
36
BDRVBlkdebugState *s = bs->opaque;
106
char control[CMSG_SPACE(VHOST_MEMORY_BASELINE_NREGIONS * sizeof(int))] = {};
37
@@ -XXX,XX +XXX,XX @@ static int inject_error(BlockDriverState *bs, BlkdebugRule *rule)
107
struct iovec iov = {
108
@@ -XXX,XX +XXX,XX @@ vu_process_message_reply(VuDev *dev, const VhostUserMsg *vmsg)
109
goto out;
38
}
110
}
39
111
40
if (!immediately) {
112
- if (!vu_message_read(dev, dev->slave_fd, &msg_reply)) {
41
- aio_bh_schedule_oneshot(bdrv_get_aio_context(bs), error_callback_bh,
113
+ if (!vu_message_read_default(dev, dev->slave_fd, &msg_reply)) {
42
- qemu_coroutine_self());
114
goto out;
43
+ aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
44
qemu_coroutine_yield();
45
}
115
}
46
116
117
@@ -XXX,XX +XXX,XX @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
118
/* Wait for QEMU to confirm that it's registered the handler for the
119
* faults.
120
*/
121
- if (!vu_message_read(dev, dev->sock, vmsg) ||
122
+ if (!dev->read_msg(dev, dev->sock, vmsg) ||
123
vmsg->size != sizeof(vmsg->payload.u64) ||
124
vmsg->payload.u64 != 0) {
125
vu_panic(dev, "failed to receive valid ack for postcopy set-mem-table");
126
@@ -XXX,XX +XXX,XX @@ vu_dispatch(VuDev *dev)
127
int reply_requested;
128
bool need_reply, success = false;
129
130
- if (!vu_message_read(dev, dev->sock, &vmsg)) {
131
+ if (!dev->read_msg(dev, dev->sock, &vmsg)) {
132
goto end;
133
}
134
135
@@ -XXX,XX +XXX,XX @@ vu_init(VuDev *dev,
136
uint16_t max_queues,
137
int socket,
138
vu_panic_cb panic,
139
+ vu_read_msg_cb read_msg,
140
vu_set_watch_cb set_watch,
141
vu_remove_watch_cb remove_watch,
142
const VuDevIface *iface)
143
@@ -XXX,XX +XXX,XX @@ vu_init(VuDev *dev,
144
145
dev->sock = socket;
146
dev->panic = panic;
147
+ dev->read_msg = read_msg ? read_msg : vu_message_read_default;
148
dev->set_watch = set_watch;
149
dev->remove_watch = remove_watch;
150
dev->iface = iface;
151
@@ -XXX,XX +XXX,XX @@ static void _vu_queue_notify(VuDev *dev, VuVirtq *vq, bool sync)
152
153
vu_message_write(dev, dev->slave_fd, &vmsg);
154
if (ack) {
155
- vu_message_read(dev, dev->slave_fd, &vmsg);
156
+ vu_message_read_default(dev, dev->slave_fd, &vmsg);
157
}
158
return;
159
}
160
diff --git a/tests/vhost-user-bridge.c b/tests/vhost-user-bridge.c
161
index XXXXXXX..XXXXXXX 100644
162
--- a/tests/vhost-user-bridge.c
163
+++ b/tests/vhost-user-bridge.c
164
@@ -XXX,XX +XXX,XX @@ vubr_accept_cb(int sock, void *ctx)
165
VHOST_USER_BRIDGE_MAX_QUEUES,
166
conn_fd,
167
vubr_panic,
168
+ NULL,
169
vubr_set_watch,
170
vubr_remove_watch,
171
&vuiface)) {
172
@@ -XXX,XX +XXX,XX @@ vubr_new(const char *path, bool client)
173
VHOST_USER_BRIDGE_MAX_QUEUES,
174
dev->sock,
175
vubr_panic,
176
+ NULL,
177
vubr_set_watch,
178
vubr_remove_watch,
179
&vuiface)) {
180
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
181
index XXXXXXX..XXXXXXX 100644
182
--- a/tools/virtiofsd/fuse_virtio.c
183
+++ b/tools/virtiofsd/fuse_virtio.c
184
@@ -XXX,XX +XXX,XX @@ int virtio_session_mount(struct fuse_session *se)
185
se->vu_socketfd = data_sock;
186
se->virtio_dev->se = se;
187
pthread_rwlock_init(&se->virtio_dev->vu_dispatch_rwlock, NULL);
188
- vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, fv_set_watch,
189
- fv_remove_watch, &fv_iface);
190
+ vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, NULL,
191
+ fv_set_watch, fv_remove_watch, &fv_iface);
192
193
return 0;
194
}
47
--
195
--
48
2.9.3
196
2.26.2
49
197
50
diff view generated by jsdifflib
New patch
1
From: Coiby Xu <coiby.xu@gmail.com>
1
2
3
When the client is running in gdb and quit command is run in gdb,
4
QEMU will still dispatch the event which will cause segment fault in
5
the callback function.
6
7
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
8
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
10
Message-id: 20200918080912.321299-3-coiby.xu@gmail.com
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
13
contrib/libvhost-user/libvhost-user.c | 1 +
14
1 file changed, 1 insertion(+)
15
16
diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
17
index XXXXXXX..XXXXXXX 100644
18
--- a/contrib/libvhost-user/libvhost-user.c
19
+++ b/contrib/libvhost-user/libvhost-user.c
20
@@ -XXX,XX +XXX,XX @@ vu_deinit(VuDev *dev)
21
}
22
23
if (vq->kick_fd != -1) {
24
+ dev->remove_watch(dev, vq->kick_fd);
25
close(vq->kick_fd);
26
vq->kick_fd = -1;
27
}
28
--
29
2.26.2
30
diff view generated by jsdifflib
New patch
1
From: Coiby Xu <coiby.xu@gmail.com>
1
2
3
Sharing QEMU devices via vhost-user protocol.
4
5
Only one vhost-user client can connect to the server one time.
6
7
Suggested-by: Kevin Wolf <kwolf@redhat.com>
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
10
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
12
Message-id: 20200918080912.321299-4-coiby.xu@gmail.com
13
[Fixed size_t %lu -> %zu format string compiler error.
14
--Stefan]
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
16
---
17
util/vhost-user-server.h | 65 ++++++
18
util/vhost-user-server.c | 428 +++++++++++++++++++++++++++++++++++++++
19
util/meson.build | 1 +
20
3 files changed, 494 insertions(+)
21
create mode 100644 util/vhost-user-server.h
22
create mode 100644 util/vhost-user-server.c
23
24
diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
25
new file mode 100644
26
index XXXXXXX..XXXXXXX
27
--- /dev/null
28
+++ b/util/vhost-user-server.h
29
@@ -XXX,XX +XXX,XX @@
30
+/*
31
+ * Sharing QEMU devices via vhost-user protocol
32
+ *
33
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
34
+ * Copyright (c) 2020 Red Hat, Inc.
35
+ *
36
+ * This work is licensed under the terms of the GNU GPL, version 2 or
37
+ * later. See the COPYING file in the top-level directory.
38
+ */
39
+
40
+#ifndef VHOST_USER_SERVER_H
41
+#define VHOST_USER_SERVER_H
42
+
43
+#include "contrib/libvhost-user/libvhost-user.h"
44
+#include "io/channel-socket.h"
45
+#include "io/channel-file.h"
46
+#include "io/net-listener.h"
47
+#include "qemu/error-report.h"
48
+#include "qapi/error.h"
49
+#include "standard-headers/linux/virtio_blk.h"
50
+
51
+typedef struct VuFdWatch {
52
+ VuDev *vu_dev;
53
+ int fd; /*kick fd*/
54
+ void *pvt;
55
+ vu_watch_cb cb;
56
+ bool processing;
57
+ QTAILQ_ENTRY(VuFdWatch) next;
58
+} VuFdWatch;
59
+
60
+typedef struct VuServer VuServer;
61
+typedef void DevicePanicNotifierFn(VuServer *server);
62
+
63
+struct VuServer {
64
+ QIONetListener *listener;
65
+ AioContext *ctx;
66
+ DevicePanicNotifierFn *device_panic_notifier;
67
+ int max_queues;
68
+ const VuDevIface *vu_iface;
69
+ VuDev vu_dev;
70
+ QIOChannel *ioc; /* The I/O channel with the client */
71
+ QIOChannelSocket *sioc; /* The underlying data channel with the client */
72
+ /* IOChannel for fd provided via VHOST_USER_SET_SLAVE_REQ_FD */
73
+ QIOChannel *ioc_slave;
74
+ QIOChannelSocket *sioc_slave;
75
+ Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
76
+ QTAILQ_HEAD(, VuFdWatch) vu_fd_watches;
77
+ /* restart coroutine co_trip if AIOContext is changed */
78
+ bool aio_context_changed;
79
+ bool processing_msg;
80
+};
81
+
82
+bool vhost_user_server_start(VuServer *server,
83
+ SocketAddress *unix_socket,
84
+ AioContext *ctx,
85
+ uint16_t max_queues,
86
+ DevicePanicNotifierFn *device_panic_notifier,
87
+ const VuDevIface *vu_iface,
88
+ Error **errp);
89
+
90
+void vhost_user_server_stop(VuServer *server);
91
+
92
+void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx);
93
+
94
+#endif /* VHOST_USER_SERVER_H */
95
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
96
new file mode 100644
97
index XXXXXXX..XXXXXXX
98
--- /dev/null
99
+++ b/util/vhost-user-server.c
100
@@ -XXX,XX +XXX,XX @@
101
+/*
102
+ * Sharing QEMU devices via vhost-user protocol
103
+ *
104
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
105
+ * Copyright (c) 2020 Red Hat, Inc.
106
+ *
107
+ * This work is licensed under the terms of the GNU GPL, version 2 or
108
+ * later. See the COPYING file in the top-level directory.
109
+ */
110
+#include "qemu/osdep.h"
111
+#include "qemu/main-loop.h"
112
+#include "vhost-user-server.h"
113
+
114
+static void vmsg_close_fds(VhostUserMsg *vmsg)
115
+{
116
+ int i;
117
+ for (i = 0; i < vmsg->fd_num; i++) {
118
+ close(vmsg->fds[i]);
119
+ }
120
+}
121
+
122
+static void vmsg_unblock_fds(VhostUserMsg *vmsg)
123
+{
124
+ int i;
125
+ for (i = 0; i < vmsg->fd_num; i++) {
126
+ qemu_set_nonblock(vmsg->fds[i]);
127
+ }
128
+}
129
+
130
+static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
131
+ gpointer opaque);
132
+
133
+static void close_client(VuServer *server)
134
+{
135
+ /*
136
+ * Before closing the client
137
+ *
138
+ * 1. Let vu_client_trip stop processing new vhost-user msg
139
+ *
140
+ * 2. remove kick_handler
141
+ *
142
+ * 3. wait for the kick handler to be finished
143
+ *
144
+ * 4. wait for the current vhost-user msg to be finished processing
145
+ */
146
+
147
+ QIOChannelSocket *sioc = server->sioc;
148
+ /* When this is set vu_client_trip will stop new processing vhost-user message */
149
+ server->sioc = NULL;
150
+
151
+ VuFdWatch *vu_fd_watch, *next;
152
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
153
+ aio_set_fd_handler(server->ioc->ctx, vu_fd_watch->fd, true, NULL,
154
+ NULL, NULL, NULL);
155
+ }
156
+
157
+ while (!QTAILQ_EMPTY(&server->vu_fd_watches)) {
158
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
159
+ if (!vu_fd_watch->processing) {
160
+ QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
161
+ g_free(vu_fd_watch);
162
+ }
163
+ }
164
+ }
165
+
166
+ while (server->processing_msg) {
167
+ if (server->ioc->read_coroutine) {
168
+ server->ioc->read_coroutine = NULL;
169
+ qio_channel_set_aio_fd_handler(server->ioc, server->ioc->ctx, NULL,
170
+ NULL, server->ioc);
171
+ server->processing_msg = false;
172
+ }
173
+ }
174
+
175
+ vu_deinit(&server->vu_dev);
176
+ object_unref(OBJECT(sioc));
177
+ object_unref(OBJECT(server->ioc));
178
+}
179
+
180
+static void panic_cb(VuDev *vu_dev, const char *buf)
181
+{
182
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
183
+
184
+ /* avoid while loop in close_client */
185
+ server->processing_msg = false;
186
+
187
+ if (buf) {
188
+ error_report("vu_panic: %s", buf);
189
+ }
190
+
191
+ if (server->sioc) {
192
+ close_client(server);
193
+ }
194
+
195
+ if (server->device_panic_notifier) {
196
+ server->device_panic_notifier(server);
197
+ }
198
+
199
+ /*
200
+ * Set the callback function for network listener so another
201
+ * vhost-user client can connect to this server
202
+ */
203
+ qio_net_listener_set_client_func(server->listener,
204
+ vu_accept,
205
+ server,
206
+ NULL);
207
+}
208
+
209
+static bool coroutine_fn
210
+vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
211
+{
212
+ struct iovec iov = {
213
+ .iov_base = (char *)vmsg,
214
+ .iov_len = VHOST_USER_HDR_SIZE,
215
+ };
216
+ int rc, read_bytes = 0;
217
+ Error *local_err = NULL;
218
+ /*
219
+ * Store fds/nfds returned from qio_channel_readv_full into
220
+ * temporary variables.
221
+ *
222
+ * VhostUserMsg is a packed structure, gcc will complain about passing
223
+ * pointer to a packed structure member if we pass &VhostUserMsg.fd_num
224
+ * and &VhostUserMsg.fds directly when calling qio_channel_readv_full,
225
+ * thus two temporary variables nfds and fds are used here.
226
+ */
227
+ size_t nfds = 0, nfds_t = 0;
228
+ const size_t max_fds = G_N_ELEMENTS(vmsg->fds);
229
+ int *fds_t = NULL;
230
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
231
+ QIOChannel *ioc = server->ioc;
232
+
233
+ if (!ioc) {
234
+ error_report_err(local_err);
235
+ goto fail;
236
+ }
237
+
238
+ assert(qemu_in_coroutine());
239
+ do {
240
+ /*
241
+ * qio_channel_readv_full may have short reads, keeping calling it
242
+ * until getting VHOST_USER_HDR_SIZE or 0 bytes in total
243
+ */
244
+ rc = qio_channel_readv_full(ioc, &iov, 1, &fds_t, &nfds_t, &local_err);
245
+ if (rc < 0) {
246
+ if (rc == QIO_CHANNEL_ERR_BLOCK) {
247
+ qio_channel_yield(ioc, G_IO_IN);
248
+ continue;
249
+ } else {
250
+ error_report_err(local_err);
251
+ return false;
252
+ }
253
+ }
254
+ read_bytes += rc;
255
+ if (nfds_t > 0) {
256
+ if (nfds + nfds_t > max_fds) {
257
+ error_report("A maximum of %zu fds are allowed, "
258
+ "however got %zu fds now",
259
+ max_fds, nfds + nfds_t);
260
+ goto fail;
261
+ }
262
+ memcpy(vmsg->fds + nfds, fds_t,
263
+ nfds_t *sizeof(vmsg->fds[0]));
264
+ nfds += nfds_t;
265
+ g_free(fds_t);
266
+ }
267
+ if (read_bytes == VHOST_USER_HDR_SIZE || rc == 0) {
268
+ break;
269
+ }
270
+ iov.iov_base = (char *)vmsg + read_bytes;
271
+ iov.iov_len = VHOST_USER_HDR_SIZE - read_bytes;
272
+ } while (true);
273
+
274
+ vmsg->fd_num = nfds;
275
+ /* qio_channel_readv_full will make socket fds blocking, unblock them */
276
+ vmsg_unblock_fds(vmsg);
277
+ if (vmsg->size > sizeof(vmsg->payload)) {
278
+ error_report("Error: too big message request: %d, "
279
+ "size: vmsg->size: %u, "
280
+ "while sizeof(vmsg->payload) = %zu",
281
+ vmsg->request, vmsg->size, sizeof(vmsg->payload));
282
+ goto fail;
283
+ }
284
+
285
+ struct iovec iov_payload = {
286
+ .iov_base = (char *)&vmsg->payload,
287
+ .iov_len = vmsg->size,
288
+ };
289
+ if (vmsg->size) {
290
+ rc = qio_channel_readv_all_eof(ioc, &iov_payload, 1, &local_err);
291
+ if (rc == -1) {
292
+ error_report_err(local_err);
293
+ goto fail;
294
+ }
295
+ }
296
+
297
+ return true;
298
+
299
+fail:
300
+ vmsg_close_fds(vmsg);
301
+
302
+ return false;
303
+}
304
+
305
+
306
+static void vu_client_start(VuServer *server);
307
+static coroutine_fn void vu_client_trip(void *opaque)
308
+{
309
+ VuServer *server = opaque;
310
+
311
+ while (!server->aio_context_changed && server->sioc) {
312
+ server->processing_msg = true;
313
+ vu_dispatch(&server->vu_dev);
314
+ server->processing_msg = false;
315
+ }
316
+
317
+ if (server->aio_context_changed && server->sioc) {
318
+ server->aio_context_changed = false;
319
+ vu_client_start(server);
320
+ }
321
+}
322
+
323
+static void vu_client_start(VuServer *server)
324
+{
325
+ server->co_trip = qemu_coroutine_create(vu_client_trip, server);
326
+ aio_co_enter(server->ctx, server->co_trip);
327
+}
328
+
329
+/*
330
+ * a wrapper for vu_kick_cb
331
+ *
332
+ * since aio_dispatch can only pass one user data pointer to the
333
+ * callback function, pack VuDev and pvt into a struct. Then unpack it
334
+ * and pass them to vu_kick_cb
335
+ */
336
+static void kick_handler(void *opaque)
337
+{
338
+ VuFdWatch *vu_fd_watch = opaque;
339
+ vu_fd_watch->processing = true;
340
+ vu_fd_watch->cb(vu_fd_watch->vu_dev, 0, vu_fd_watch->pvt);
341
+ vu_fd_watch->processing = false;
342
+}
343
+
344
+
345
+static VuFdWatch *find_vu_fd_watch(VuServer *server, int fd)
346
+{
347
+
348
+ VuFdWatch *vu_fd_watch, *next;
349
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
350
+ if (vu_fd_watch->fd == fd) {
351
+ return vu_fd_watch;
352
+ }
353
+ }
354
+ return NULL;
355
+}
356
+
357
+static void
358
+set_watch(VuDev *vu_dev, int fd, int vu_evt,
359
+ vu_watch_cb cb, void *pvt)
360
+{
361
+
362
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
363
+ g_assert(vu_dev);
364
+ g_assert(fd >= 0);
365
+ g_assert(cb);
366
+
367
+ VuFdWatch *vu_fd_watch = find_vu_fd_watch(server, fd);
368
+
369
+ if (!vu_fd_watch) {
370
+ VuFdWatch *vu_fd_watch = g_new0(VuFdWatch, 1);
371
+
372
+ QTAILQ_INSERT_TAIL(&server->vu_fd_watches, vu_fd_watch, next);
373
+
374
+ vu_fd_watch->fd = fd;
375
+ vu_fd_watch->cb = cb;
376
+ qemu_set_nonblock(fd);
377
+ aio_set_fd_handler(server->ioc->ctx, fd, true, kick_handler,
378
+ NULL, NULL, vu_fd_watch);
379
+ vu_fd_watch->vu_dev = vu_dev;
380
+ vu_fd_watch->pvt = pvt;
381
+ }
382
+}
383
+
384
+
385
+static void remove_watch(VuDev *vu_dev, int fd)
386
+{
387
+ VuServer *server;
388
+ g_assert(vu_dev);
389
+ g_assert(fd >= 0);
390
+
391
+ server = container_of(vu_dev, VuServer, vu_dev);
392
+
393
+ VuFdWatch *vu_fd_watch = find_vu_fd_watch(server, fd);
394
+
395
+ if (!vu_fd_watch) {
396
+ return;
397
+ }
398
+ aio_set_fd_handler(server->ioc->ctx, fd, true, NULL, NULL, NULL, NULL);
399
+
400
+ QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
401
+ g_free(vu_fd_watch);
402
+}
403
+
404
+
405
+static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
406
+ gpointer opaque)
407
+{
408
+ VuServer *server = opaque;
409
+
410
+ if (server->sioc) {
411
+ warn_report("Only one vhost-user client is allowed to "
412
+ "connect the server one time");
413
+ return;
414
+ }
415
+
416
+ if (!vu_init(&server->vu_dev, server->max_queues, sioc->fd, panic_cb,
417
+ vu_message_read, set_watch, remove_watch, server->vu_iface)) {
418
+ error_report("Failed to initialize libvhost-user");
419
+ return;
420
+ }
421
+
422
+ /*
423
+ * Unset the callback function for network listener to make another
424
+ * vhost-user client keeping waiting until this client disconnects
425
+ */
426
+ qio_net_listener_set_client_func(server->listener,
427
+ NULL,
428
+ NULL,
429
+ NULL);
430
+ server->sioc = sioc;
431
+ /*
432
+ * Increase the object reference, so sioc will not freed by
433
+ * qio_net_listener_channel_func which will call object_unref(OBJECT(sioc))
434
+ */
435
+ object_ref(OBJECT(server->sioc));
436
+ qio_channel_set_name(QIO_CHANNEL(sioc), "vhost-user client");
437
+ server->ioc = QIO_CHANNEL(sioc);
438
+ object_ref(OBJECT(server->ioc));
439
+ qio_channel_attach_aio_context(server->ioc, server->ctx);
440
+ qio_channel_set_blocking(QIO_CHANNEL(server->sioc), false, NULL);
441
+ vu_client_start(server);
442
+}
443
+
444
+
445
+void vhost_user_server_stop(VuServer *server)
446
+{
447
+ if (server->sioc) {
448
+ close_client(server);
449
+ }
450
+
451
+ if (server->listener) {
452
+ qio_net_listener_disconnect(server->listener);
453
+ object_unref(OBJECT(server->listener));
454
+ }
455
+
456
+}
457
+
458
+void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx)
459
+{
460
+ VuFdWatch *vu_fd_watch, *next;
461
+ void *opaque = NULL;
462
+ IOHandler *io_read = NULL;
463
+ bool attach;
464
+
465
+ server->ctx = ctx ? ctx : qemu_get_aio_context();
466
+
467
+ if (!server->sioc) {
468
+ /* not yet serving any client*/
469
+ return;
470
+ }
471
+
472
+ if (ctx) {
473
+ qio_channel_attach_aio_context(server->ioc, ctx);
474
+ server->aio_context_changed = true;
475
+ io_read = kick_handler;
476
+ attach = true;
477
+ } else {
478
+ qio_channel_detach_aio_context(server->ioc);
479
+ /* server->ioc->ctx keeps the old AioConext */
480
+ ctx = server->ioc->ctx;
481
+ attach = false;
482
+ }
483
+
484
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
485
+ if (vu_fd_watch->cb) {
486
+ opaque = attach ? vu_fd_watch : NULL;
487
+ aio_set_fd_handler(ctx, vu_fd_watch->fd, true,
488
+ io_read, NULL, NULL,
489
+ opaque);
490
+ }
491
+ }
492
+}
493
+
494
+
495
+bool vhost_user_server_start(VuServer *server,
496
+ SocketAddress *socket_addr,
497
+ AioContext *ctx,
498
+ uint16_t max_queues,
499
+ DevicePanicNotifierFn *device_panic_notifier,
500
+ const VuDevIface *vu_iface,
501
+ Error **errp)
502
+{
503
+ QIONetListener *listener = qio_net_listener_new();
504
+ if (qio_net_listener_open_sync(listener, socket_addr, 1,
505
+ errp) < 0) {
506
+ object_unref(OBJECT(listener));
507
+ return false;
508
+ }
509
+
510
+ /* zero out unspecified fileds */
511
+ *server = (VuServer) {
512
+ .listener = listener,
513
+ .vu_iface = vu_iface,
514
+ .max_queues = max_queues,
515
+ .ctx = ctx,
516
+ .device_panic_notifier = device_panic_notifier,
517
+ };
518
+
519
+ qio_net_listener_set_name(server->listener, "vhost-user-backend-listener");
520
+
521
+ qio_net_listener_set_client_func(server->listener,
522
+ vu_accept,
523
+ server,
524
+ NULL);
525
+
526
+ QTAILQ_INIT(&server->vu_fd_watches);
527
+ return true;
528
+}
529
diff --git a/util/meson.build b/util/meson.build
530
index XXXXXXX..XXXXXXX 100644
531
--- a/util/meson.build
532
+++ b/util/meson.build
533
@@ -XXX,XX +XXX,XX @@ if have_block
534
util_ss.add(files('main-loop.c'))
535
util_ss.add(files('nvdimm-utils.c'))
536
util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
537
+ util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
538
util_ss.add(files('qemu-coroutine-sleep.c'))
539
util_ss.add(files('qemu-co-shared-resource.c'))
540
util_ss.add(files('thread-pool.c', 'qemu-timer.c'))
541
--
542
2.26.2
543
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
From: Coiby Xu <coiby.xu@gmail.com>
2
2
3
AioContext is fairly self contained, the only dependency is QEMUTimer but
3
Move the constants from hw/core/qdev-properties.c to
4
that in turn doesn't need anything else. So move them out of block-obj-y
4
util/block-helpers.h so that knowledge of the min/max values is
5
to avoid introducing a dependency from io/ to block-obj-y.
6
5
7
main-loop and its dependency iohandler also need to be moved, because
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
later in this series io/ will call iohandler_get_aio_context.
7
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
9
8
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
[Changed copyright "the QEMU team" to "other QEMU contributors" as
9
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
11
suggested by Daniel Berrange and agreed by Paolo.
10
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
12
--Stefan]
11
Message-id: 20200918080912.321299-5-coiby.xu@gmail.com
13
14
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
15
Reviewed-by: Fam Zheng <famz@redhat.com>
16
Message-id: 20170213135235.12274-2-pbonzini@redhat.com
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
18
---
13
---
19
Makefile.objs | 4 ---
14
util/block-helpers.h | 19 +++++++++++++
20
stubs/Makefile.objs | 1 +
15
hw/core/qdev-properties-system.c | 31 ++++-----------------
21
tests/Makefile.include | 11 ++++----
16
util/block-helpers.c | 46 ++++++++++++++++++++++++++++++++
22
util/Makefile.objs | 6 +++-
17
util/meson.build | 1 +
23
block/io.c | 29 -------------------
18
4 files changed, 71 insertions(+), 26 deletions(-)
24
stubs/linux-aio.c | 32 +++++++++++++++++++++
19
create mode 100644 util/block-helpers.h
25
stubs/set-fd-handler.c | 11 --------
20
create mode 100644 util/block-helpers.c
26
aio-posix.c => util/aio-posix.c | 2 +-
27
aio-win32.c => util/aio-win32.c | 0
28
util/aiocb.c | 55 +++++++++++++++++++++++++++++++++++++
29
async.c => util/async.c | 3 +-
30
iohandler.c => util/iohandler.c | 0
31
main-loop.c => util/main-loop.c | 0
32
qemu-timer.c => util/qemu-timer.c | 0
33
thread-pool.c => util/thread-pool.c | 2 +-
34
trace-events | 11 --------
35
util/trace-events | 11 ++++++++
36
17 files changed, 114 insertions(+), 64 deletions(-)
37
create mode 100644 stubs/linux-aio.c
38
rename aio-posix.c => util/aio-posix.c (99%)
39
rename aio-win32.c => util/aio-win32.c (100%)
40
create mode 100644 util/aiocb.c
41
rename async.c => util/async.c (99%)
42
rename iohandler.c => util/iohandler.c (100%)
43
rename main-loop.c => util/main-loop.c (100%)
44
rename qemu-timer.c => util/qemu-timer.c (100%)
45
rename thread-pool.c => util/thread-pool.c (99%)
46
21
47
diff --git a/Makefile.objs b/Makefile.objs
22
diff --git a/util/block-helpers.h b/util/block-helpers.h
48
index XXXXXXX..XXXXXXX 100644
49
--- a/Makefile.objs
50
+++ b/Makefile.objs
51
@@ -XXX,XX +XXX,XX @@ chardev-obj-y = chardev/
52
#######################################################################
53
# block-obj-y is code used by both qemu system emulation and qemu-img
54
55
-block-obj-y = async.o thread-pool.o
56
block-obj-y += nbd/
57
block-obj-y += block.o blockjob.o
58
-block-obj-y += main-loop.o iohandler.o qemu-timer.o
59
-block-obj-$(CONFIG_POSIX) += aio-posix.o
60
-block-obj-$(CONFIG_WIN32) += aio-win32.o
61
block-obj-y += block/
62
block-obj-y += qemu-io-cmds.o
63
block-obj-$(CONFIG_REPLICATION) += replication.o
64
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
65
index XXXXXXX..XXXXXXX 100644
66
--- a/stubs/Makefile.objs
67
+++ b/stubs/Makefile.objs
68
@@ -XXX,XX +XXX,XX @@ stub-obj-y += get-vm-name.o
69
stub-obj-y += iothread.o
70
stub-obj-y += iothread-lock.o
71
stub-obj-y += is-daemonized.o
72
+stub-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
73
stub-obj-y += machine-init-done.o
74
stub-obj-y += migr-blocker.o
75
stub-obj-y += monitor.o
76
diff --git a/tests/Makefile.include b/tests/Makefile.include
77
index XXXXXXX..XXXXXXX 100644
78
--- a/tests/Makefile.include
79
+++ b/tests/Makefile.include
80
@@ -XXX,XX +XXX,XX @@ check-unit-y += tests/test-visitor-serialization$(EXESUF)
81
check-unit-y += tests/test-iov$(EXESUF)
82
gcov-files-test-iov-y = util/iov.c
83
check-unit-y += tests/test-aio$(EXESUF)
84
+gcov-files-test-aio-y = util/async.c util/qemu-timer.o
85
+gcov-files-test-aio-$(CONFIG_WIN32) += util/aio-win32.c
86
+gcov-files-test-aio-$(CONFIG_POSIX) += util/aio-posix.c
87
check-unit-y += tests/test-throttle$(EXESUF)
88
gcov-files-test-aio-$(CONFIG_WIN32) = aio-win32.c
89
gcov-files-test-aio-$(CONFIG_POSIX) = aio-posix.c
90
@@ -XXX,XX +XXX,XX @@ tests/check-qjson$(EXESUF): tests/check-qjson.o $(test-util-obj-y)
91
tests/check-qom-interface$(EXESUF): tests/check-qom-interface.o $(test-qom-obj-y)
92
tests/check-qom-proplist$(EXESUF): tests/check-qom-proplist.o $(test-qom-obj-y)
93
94
-tests/test-char$(EXESUF): tests/test-char.o qemu-timer.o \
95
-    $(test-util-obj-y) $(qtest-obj-y) $(test-block-obj-y) $(chardev-obj-y)
96
+tests/test-char$(EXESUF): tests/test-char.o $(test-util-obj-y) $(qtest-obj-y) $(test-io-obj-y) $(chardev-obj-y)
97
tests/test-coroutine$(EXESUF): tests/test-coroutine.o $(test-block-obj-y)
98
tests/test-aio$(EXESUF): tests/test-aio.o $(test-block-obj-y)
99
tests/test-throttle$(EXESUF): tests/test-throttle.o $(test-block-obj-y)
100
@@ -XXX,XX +XXX,XX @@ tests/test-vmstate$(EXESUF): tests/test-vmstate.o \
101
    migration/vmstate.o migration/qemu-file.o \
102
migration/qemu-file-channel.o migration/qjson.o \
103
    $(test-io-obj-y)
104
-tests/test-timed-average$(EXESUF): tests/test-timed-average.o qemu-timer.o \
105
-    $(test-util-obj-y)
106
+tests/test-timed-average$(EXESUF): tests/test-timed-average.o $(test-util-obj-y)
107
tests/test-base64$(EXESUF): tests/test-base64.o \
108
    libqemuutil.a libqemustub.a
109
tests/ptimer-test$(EXESUF): tests/ptimer-test.o tests/ptimer-test-stubs.o hw/core/ptimer.o libqemustub.a
110
@@ -XXX,XX +XXX,XX @@ tests/usb-hcd-ehci-test$(EXESUF): tests/usb-hcd-ehci-test.o $(libqos-usb-obj-y)
111
tests/usb-hcd-xhci-test$(EXESUF): tests/usb-hcd-xhci-test.o $(libqos-usb-obj-y)
112
tests/pc-cpu-test$(EXESUF): tests/pc-cpu-test.o
113
tests/postcopy-test$(EXESUF): tests/postcopy-test.o
114
-tests/vhost-user-test$(EXESUF): tests/vhost-user-test.o qemu-timer.o \
115
+tests/vhost-user-test$(EXESUF): tests/vhost-user-test.o $(test-util-obj-y) \
116
    $(qtest-obj-y) $(test-io-obj-y) $(libqos-virtio-obj-y) $(libqos-pc-obj-y) \
117
    $(chardev-obj-y)
118
tests/qemu-iotests/socket_scm_helper$(EXESUF): tests/qemu-iotests/socket_scm_helper.o
119
diff --git a/util/Makefile.objs b/util/Makefile.objs
120
index XXXXXXX..XXXXXXX 100644
121
--- a/util/Makefile.objs
122
+++ b/util/Makefile.objs
123
@@ -XXX,XX +XXX,XX @@
124
util-obj-y = osdep.o cutils.o unicode.o qemu-timer-common.o
125
util-obj-y += bufferiszero.o
126
util-obj-y += lockcnt.o
127
+util-obj-y += aiocb.o async.o thread-pool.o qemu-timer.o
128
+util-obj-y += main-loop.o iohandler.o
129
+util-obj-$(CONFIG_POSIX) += aio-posix.o
130
util-obj-$(CONFIG_POSIX) += compatfd.o
131
util-obj-$(CONFIG_POSIX) += event_notifier-posix.o
132
util-obj-$(CONFIG_POSIX) += mmap-alloc.o
133
util-obj-$(CONFIG_POSIX) += oslib-posix.o
134
util-obj-$(CONFIG_POSIX) += qemu-openpty.o
135
util-obj-$(CONFIG_POSIX) += qemu-thread-posix.o
136
-util-obj-$(CONFIG_WIN32) += event_notifier-win32.o
137
util-obj-$(CONFIG_POSIX) += memfd.o
138
+util-obj-$(CONFIG_WIN32) += aio-win32.o
139
+util-obj-$(CONFIG_WIN32) += event_notifier-win32.o
140
util-obj-$(CONFIG_WIN32) += oslib-win32.o
141
util-obj-$(CONFIG_WIN32) += qemu-thread-win32.o
142
util-obj-y += envlist.o path.o module.o
143
diff --git a/block/io.c b/block/io.c
144
index XXXXXXX..XXXXXXX 100644
145
--- a/block/io.c
146
+++ b/block/io.c
147
@@ -XXX,XX +XXX,XX @@ BlockAIOCB *bdrv_aio_flush(BlockDriverState *bs,
148
return &acb->common;
149
}
150
151
-void *qemu_aio_get(const AIOCBInfo *aiocb_info, BlockDriverState *bs,
152
- BlockCompletionFunc *cb, void *opaque)
153
-{
154
- BlockAIOCB *acb;
155
-
156
- acb = g_malloc(aiocb_info->aiocb_size);
157
- acb->aiocb_info = aiocb_info;
158
- acb->bs = bs;
159
- acb->cb = cb;
160
- acb->opaque = opaque;
161
- acb->refcnt = 1;
162
- return acb;
163
-}
164
-
165
-void qemu_aio_ref(void *p)
166
-{
167
- BlockAIOCB *acb = p;
168
- acb->refcnt++;
169
-}
170
-
171
-void qemu_aio_unref(void *p)
172
-{
173
- BlockAIOCB *acb = p;
174
- assert(acb->refcnt > 0);
175
- if (--acb->refcnt == 0) {
176
- g_free(acb);
177
- }
178
-}
179
-
180
/**************************************************************/
181
/* Coroutine block device emulation */
182
183
diff --git a/stubs/linux-aio.c b/stubs/linux-aio.c
184
new file mode 100644
23
new file mode 100644
185
index XXXXXXX..XXXXXXX
24
index XXXXXXX..XXXXXXX
186
--- /dev/null
25
--- /dev/null
187
+++ b/stubs/linux-aio.c
26
+++ b/util/block-helpers.h
27
@@ -XXX,XX +XXX,XX @@
28
+#ifndef BLOCK_HELPERS_H
29
+#define BLOCK_HELPERS_H
30
+
31
+#include "qemu/units.h"
32
+
33
+/* lower limit is sector size */
34
+#define MIN_BLOCK_SIZE INT64_C(512)
35
+#define MIN_BLOCK_SIZE_STR "512 B"
36
+/*
37
+ * upper limit is arbitrary, 2 MiB looks sufficient for all sensible uses, and
38
+ * matches qcow2 cluster size limit
39
+ */
40
+#define MAX_BLOCK_SIZE (2 * MiB)
41
+#define MAX_BLOCK_SIZE_STR "2 MiB"
42
+
43
+void check_block_size(const char *id, const char *name, int64_t value,
44
+ Error **errp);
45
+
46
+#endif /* BLOCK_HELPERS_H */
47
diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
48
index XXXXXXX..XXXXXXX 100644
49
--- a/hw/core/qdev-properties-system.c
50
+++ b/hw/core/qdev-properties-system.c
51
@@ -XXX,XX +XXX,XX @@
52
#include "sysemu/blockdev.h"
53
#include "net/net.h"
54
#include "hw/pci/pci.h"
55
+#include "util/block-helpers.h"
56
57
static bool check_prop_still_unset(DeviceState *dev, const char *name,
58
const void *old_val, const char *new_val,
59
@@ -XXX,XX +XXX,XX @@ const PropertyInfo qdev_prop_losttickpolicy = {
60
61
/* --- blocksize --- */
62
63
-/* lower limit is sector size */
64
-#define MIN_BLOCK_SIZE 512
65
-#define MIN_BLOCK_SIZE_STR "512 B"
66
-/*
67
- * upper limit is arbitrary, 2 MiB looks sufficient for all sensible uses, and
68
- * matches qcow2 cluster size limit
69
- */
70
-#define MAX_BLOCK_SIZE (2 * MiB)
71
-#define MAX_BLOCK_SIZE_STR "2 MiB"
72
-
73
static void set_blocksize(Object *obj, Visitor *v, const char *name,
74
void *opaque, Error **errp)
75
{
76
@@ -XXX,XX +XXX,XX @@ static void set_blocksize(Object *obj, Visitor *v, const char *name,
77
Property *prop = opaque;
78
uint32_t *ptr = qdev_get_prop_ptr(dev, prop);
79
uint64_t value;
80
+ Error *local_err = NULL;
81
82
if (dev->realized) {
83
qdev_prop_set_after_realize(dev, name, errp);
84
@@ -XXX,XX +XXX,XX @@ static void set_blocksize(Object *obj, Visitor *v, const char *name,
85
if (!visit_type_size(v, name, &value, errp)) {
86
return;
87
}
88
- /* value of 0 means "unset" */
89
- if (value && (value < MIN_BLOCK_SIZE || value > MAX_BLOCK_SIZE)) {
90
- error_setg(errp,
91
- "Property %s.%s doesn't take value %" PRIu64
92
- " (minimum: " MIN_BLOCK_SIZE_STR
93
- ", maximum: " MAX_BLOCK_SIZE_STR ")",
94
- dev->id ? : "", name, value);
95
+ check_block_size(dev->id ? : "", name, value, &local_err);
96
+ if (local_err) {
97
+ error_propagate(errp, local_err);
98
return;
99
}
100
-
101
- /* We rely on power-of-2 blocksizes for bitmasks */
102
- if ((value & (value - 1)) != 0) {
103
- error_setg(errp,
104
- "Property %s.%s doesn't take value '%" PRId64 "', "
105
- "it's not a power of 2", dev->id ?: "", name, (int64_t)value);
106
- return;
107
- }
108
-
109
*ptr = value;
110
}
111
112
diff --git a/util/block-helpers.c b/util/block-helpers.c
113
new file mode 100644
114
index XXXXXXX..XXXXXXX
115
--- /dev/null
116
+++ b/util/block-helpers.c
188
@@ -XXX,XX +XXX,XX @@
117
@@ -XXX,XX +XXX,XX @@
189
+/*
118
+/*
190
+ * Linux native AIO support.
119
+ * Block utility functions
191
+ *
120
+ *
192
+ * Copyright (C) 2009 IBM, Corp.
121
+ * Copyright IBM, Corp. 2011
193
+ * Copyright (C) 2009 Red Hat, Inc.
122
+ * Copyright (c) 2020 Coiby Xu <coiby.xu@gmail.com>
194
+ *
123
+ *
195
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
124
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
196
+ * See the COPYING file in the top-level directory.
125
+ * See the COPYING file in the top-level directory.
197
+ */
126
+ */
198
+#include "qemu/osdep.h"
199
+#include "block/aio.h"
200
+#include "block/raw-aio.h"
201
+
202
+void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context)
203
+{
204
+ abort();
205
+}
206
+
207
+void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context)
208
+{
209
+ abort();
210
+}
211
+
212
+LinuxAioState *laio_init(void)
213
+{
214
+ abort();
215
+}
216
+
217
+void laio_cleanup(LinuxAioState *s)
218
+{
219
+ abort();
220
+}
221
diff --git a/stubs/set-fd-handler.c b/stubs/set-fd-handler.c
222
index XXXXXXX..XXXXXXX 100644
223
--- a/stubs/set-fd-handler.c
224
+++ b/stubs/set-fd-handler.c
225
@@ -XXX,XX +XXX,XX @@ void qemu_set_fd_handler(int fd,
226
{
227
abort();
228
}
229
-
230
-void aio_set_fd_handler(AioContext *ctx,
231
- int fd,
232
- bool is_external,
233
- IOHandler *io_read,
234
- IOHandler *io_write,
235
- AioPollFn *io_poll,
236
- void *opaque)
237
-{
238
- abort();
239
-}
240
diff --git a/aio-posix.c b/util/aio-posix.c
241
similarity index 99%
242
rename from aio-posix.c
243
rename to util/aio-posix.c
244
index XXXXXXX..XXXXXXX 100644
245
--- a/aio-posix.c
246
+++ b/util/aio-posix.c
247
@@ -XXX,XX +XXX,XX @@
248
#include "qemu/rcu_queue.h"
249
#include "qemu/sockets.h"
250
#include "qemu/cutils.h"
251
-#include "trace-root.h"
252
+#include "trace.h"
253
#ifdef CONFIG_EPOLL_CREATE1
254
#include <sys/epoll.h>
255
#endif
256
diff --git a/aio-win32.c b/util/aio-win32.c
257
similarity index 100%
258
rename from aio-win32.c
259
rename to util/aio-win32.c
260
diff --git a/util/aiocb.c b/util/aiocb.c
261
new file mode 100644
262
index XXXXXXX..XXXXXXX
263
--- /dev/null
264
+++ b/util/aiocb.c
265
@@ -XXX,XX +XXX,XX @@
266
+/*
267
+ * BlockAIOCB allocation
268
+ *
269
+ * Copyright (c) 2003-2017 Fabrice Bellard and other QEMU contributors
270
+ *
271
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
272
+ * of this software and associated documentation files (the "Software"), to deal
273
+ * in the Software without restriction, including without limitation the rights
274
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
275
+ * copies of the Software, and to permit persons to whom the Software is
276
+ * furnished to do so, subject to the following conditions:
277
+ *
278
+ * The above copyright notice and this permission notice shall be included in
279
+ * all copies or substantial portions of the Software.
280
+ *
281
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
282
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
283
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
284
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
285
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
286
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
287
+ * THE SOFTWARE.
288
+ */
289
+
127
+
290
+#include "qemu/osdep.h"
128
+#include "qemu/osdep.h"
291
+#include "block/aio.h"
129
+#include "qapi/error.h"
130
+#include "qapi/qmp/qerror.h"
131
+#include "block-helpers.h"
292
+
132
+
293
+void *qemu_aio_get(const AIOCBInfo *aiocb_info, BlockDriverState *bs,
133
+/**
294
+ BlockCompletionFunc *cb, void *opaque)
134
+ * check_block_size:
135
+ * @id: The unique ID of the object
136
+ * @name: The name of the property being validated
137
+ * @value: The block size in bytes
138
+ * @errp: A pointer to an area to store an error
139
+ *
140
+ * This function checks that the block size meets the following conditions:
141
+ * 1. At least MIN_BLOCK_SIZE
142
+ * 2. No larger than MAX_BLOCK_SIZE
143
+ * 3. A power of 2
144
+ */
145
+void check_block_size(const char *id, const char *name, int64_t value,
146
+ Error **errp)
295
+{
147
+{
296
+ BlockAIOCB *acb;
148
+ /* value of 0 means "unset" */
149
+ if (value && (value < MIN_BLOCK_SIZE || value > MAX_BLOCK_SIZE)) {
150
+ error_setg(errp, QERR_PROPERTY_VALUE_OUT_OF_RANGE,
151
+ id, name, value, MIN_BLOCK_SIZE, MAX_BLOCK_SIZE);
152
+ return;
153
+ }
297
+
154
+
298
+ acb = g_malloc(aiocb_info->aiocb_size);
155
+ /* We rely on power-of-2 blocksizes for bitmasks */
299
+ acb->aiocb_info = aiocb_info;
156
+ if ((value & (value - 1)) != 0) {
300
+ acb->bs = bs;
157
+ error_setg(errp,
301
+ acb->cb = cb;
158
+ "Property %s.%s doesn't take value '%" PRId64
302
+ acb->opaque = opaque;
159
+ "', it's not a power of 2",
303
+ acb->refcnt = 1;
160
+ id, name, value);
304
+ return acb;
161
+ return;
305
+}
306
+
307
+void qemu_aio_ref(void *p)
308
+{
309
+ BlockAIOCB *acb = p;
310
+ acb->refcnt++;
311
+}
312
+
313
+void qemu_aio_unref(void *p)
314
+{
315
+ BlockAIOCB *acb = p;
316
+ assert(acb->refcnt > 0);
317
+ if (--acb->refcnt == 0) {
318
+ g_free(acb);
319
+ }
162
+ }
320
+}
163
+}
321
diff --git a/async.c b/util/async.c
164
diff --git a/util/meson.build b/util/meson.build
322
similarity index 99%
323
rename from async.c
324
rename to util/async.c
325
index XXXXXXX..XXXXXXX 100644
165
index XXXXXXX..XXXXXXX 100644
326
--- a/async.c
166
--- a/util/meson.build
327
+++ b/util/async.c
167
+++ b/util/meson.build
328
@@ -XXX,XX +XXX,XX @@
168
@@ -XXX,XX +XXX,XX @@ if have_block
329
/*
169
util_ss.add(files('nvdimm-utils.c'))
330
- * QEMU System Emulator
170
util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
331
+ * Data plane event loop
171
util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
332
*
172
+ util_ss.add(files('block-helpers.c'))
333
* Copyright (c) 2003-2008 Fabrice Bellard
173
util_ss.add(files('qemu-coroutine-sleep.c'))
334
+ * Copyright (c) 2009-2017 QEMU contributors
174
util_ss.add(files('qemu-co-shared-resource.c'))
335
*
175
util_ss.add(files('thread-pool.c', 'qemu-timer.c'))
336
* Permission is hereby granted, free of charge, to any person obtaining a copy
337
* of this software and associated documentation files (the "Software"), to deal
338
diff --git a/iohandler.c b/util/iohandler.c
339
similarity index 100%
340
rename from iohandler.c
341
rename to util/iohandler.c
342
diff --git a/main-loop.c b/util/main-loop.c
343
similarity index 100%
344
rename from main-loop.c
345
rename to util/main-loop.c
346
diff --git a/qemu-timer.c b/util/qemu-timer.c
347
similarity index 100%
348
rename from qemu-timer.c
349
rename to util/qemu-timer.c
350
diff --git a/thread-pool.c b/util/thread-pool.c
351
similarity index 99%
352
rename from thread-pool.c
353
rename to util/thread-pool.c
354
index XXXXXXX..XXXXXXX 100644
355
--- a/thread-pool.c
356
+++ b/util/thread-pool.c
357
@@ -XXX,XX +XXX,XX @@
358
#include "qemu/queue.h"
359
#include "qemu/thread.h"
360
#include "qemu/coroutine.h"
361
-#include "trace-root.h"
362
+#include "trace.h"
363
#include "block/thread-pool.h"
364
#include "qemu/main-loop.h"
365
366
diff --git a/trace-events b/trace-events
367
index XXXXXXX..XXXXXXX 100644
368
--- a/trace-events
369
+++ b/trace-events
370
@@ -XXX,XX +XXX,XX @@
371
#
372
# The <format-string> should be a sprintf()-compatible format string.
373
374
-# aio-posix.c
375
-run_poll_handlers_begin(void *ctx, int64_t max_ns) "ctx %p max_ns %"PRId64
376
-run_poll_handlers_end(void *ctx, bool progress) "ctx %p progress %d"
377
-poll_shrink(void *ctx, int64_t old, int64_t new) "ctx %p old %"PRId64" new %"PRId64
378
-poll_grow(void *ctx, int64_t old, int64_t new) "ctx %p old %"PRId64" new %"PRId64
379
-
380
-# thread-pool.c
381
-thread_pool_submit(void *pool, void *req, void *opaque) "pool %p req %p opaque %p"
382
-thread_pool_complete(void *pool, void *req, void *opaque, int ret) "pool %p req %p opaque %p ret %d"
383
-thread_pool_cancel(void *req, void *opaque) "req %p opaque %p"
384
-
385
# ioport.c
386
cpu_in(unsigned int addr, char size, unsigned int val) "addr %#x(%c) value %u"
387
cpu_out(unsigned int addr, char size, unsigned int val) "addr %#x(%c) value %u"
388
diff --git a/util/trace-events b/util/trace-events
389
index XXXXXXX..XXXXXXX 100644
390
--- a/util/trace-events
391
+++ b/util/trace-events
392
@@ -XXX,XX +XXX,XX @@
393
# See docs/tracing.txt for syntax documentation.
394
395
+# util/aio-posix.c
396
+run_poll_handlers_begin(void *ctx, int64_t max_ns) "ctx %p max_ns %"PRId64
397
+run_poll_handlers_end(void *ctx, bool progress) "ctx %p progress %d"
398
+poll_shrink(void *ctx, int64_t old, int64_t new) "ctx %p old %"PRId64" new %"PRId64
399
+poll_grow(void *ctx, int64_t old, int64_t new) "ctx %p old %"PRId64" new %"PRId64
400
+
401
+# util/thread-pool.c
402
+thread_pool_submit(void *pool, void *req, void *opaque) "pool %p req %p opaque %p"
403
+thread_pool_complete(void *pool, void *req, void *opaque, int ret) "pool %p req %p opaque %p ret %d"
404
+thread_pool_cancel(void *req, void *opaque) "req %p opaque %p"
405
+
406
# util/buffer.c
407
buffer_resize(const char *buf, size_t olen, size_t len) "%s: old %zd, new %zd"
408
buffer_move_empty(const char *buf, size_t len, const char *from) "%s: %zd bytes from %s"
409
--
176
--
410
2.9.3
177
2.26.2
411
178
412
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
From: Coiby Xu <coiby.xu@gmail.com>
2
2
3
aio_co_wake provides the infrastructure to start a coroutine on a "home"
3
By making use of libvhost-user, block device drive can be shared to
4
AioContext. It will be used by CoMutex and CoQueue, so that coroutines
4
the connected vhost-user client. Only one client can connect to the
5
don't jump from one context to another when they go to sleep on a
5
server one time.
6
mutex or waitqueue. However, it can also be used as a more efficient
7
alternative to one-shot bottom halves, and saves the effort of tracking
8
which AioContext a coroutine is running on.
9
6
10
aio_co_schedule is the part of aio_co_wake that starts a coroutine
7
Since vhost-user-server needs a block drive to be created first, delay
11
on a remove AioContext, but it is also useful to implement e.g.
8
the creation of this object.
12
bdrv_set_aio_context callbacks.
13
9
14
The implementation of aio_co_schedule is based on a lock-free
10
Suggested-by: Kevin Wolf <kwolf@redhat.com>
15
multiple-producer, single-consumer queue. The multiple producers use
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
16
cmpxchg to add to a LIFO stack. The consumer (a per-AioContext bottom
12
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
17
half) grabs all items added so far, inverts the list to make it FIFO,
13
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
18
and goes through it one item at a time until it's empty. The data
14
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
19
structure was inspired by OSv, which uses it in the very code we'll
15
Message-id: 20200918080912.321299-6-coiby.xu@gmail.com
20
"port" to QEMU for the thread-safe CoMutex.
16
[Shorten "vhost_user_blk_server" string to "vhost_user_blk" to avoid the
21
17
following compiler warning:
22
Most of the new code is really tests.
18
../block/export/vhost-user-blk-server.c:178:50: error: ‘%s’ directive output truncated writing 21 bytes into a region of size 20 [-Werror=format-truncation=]
23
19
and fix "Invalid size %ld ..." ssize_t format string arguments for
24
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
20
32-bit hosts.
25
Reviewed-by: Fam Zheng <famz@redhat.com>
21
--Stefan]
26
Message-id: 20170213135235.12274-3-pbonzini@redhat.com
27
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
22
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
28
---
23
---
29
tests/Makefile.include | 8 +-
24
block/export/vhost-user-blk-server.h | 36 ++
30
include/block/aio.h | 32 +++++++
25
block/export/vhost-user-blk-server.c | 661 +++++++++++++++++++++++++++
31
include/qemu/coroutine_int.h | 11 ++-
26
softmmu/vl.c | 4 +
32
tests/iothread.h | 25 +++++
27
block/meson.build | 1 +
33
tests/iothread.c | 91 ++++++++++++++++++
28
4 files changed, 702 insertions(+)
34
tests/test-aio-multithread.c | 213 +++++++++++++++++++++++++++++++++++++++++++
29
create mode 100644 block/export/vhost-user-blk-server.h
35
util/async.c | 65 +++++++++++++
30
create mode 100644 block/export/vhost-user-blk-server.c
36
util/qemu-coroutine.c | 8 ++
37
util/trace-events | 4 +
38
9 files changed, 453 insertions(+), 4 deletions(-)
39
create mode 100644 tests/iothread.h
40
create mode 100644 tests/iothread.c
41
create mode 100644 tests/test-aio-multithread.c
42
31
43
diff --git a/tests/Makefile.include b/tests/Makefile.include
32
diff --git a/block/export/vhost-user-blk-server.h b/block/export/vhost-user-blk-server.h
44
index XXXXXXX..XXXXXXX 100644
45
--- a/tests/Makefile.include
46
+++ b/tests/Makefile.include
47
@@ -XXX,XX +XXX,XX @@ check-unit-y += tests/test-aio$(EXESUF)
48
gcov-files-test-aio-y = util/async.c util/qemu-timer.o
49
gcov-files-test-aio-$(CONFIG_WIN32) += util/aio-win32.c
50
gcov-files-test-aio-$(CONFIG_POSIX) += util/aio-posix.c
51
+check-unit-y += tests/test-aio-multithread$(EXESUF)
52
+gcov-files-test-aio-multithread-y = $(gcov-files-test-aio-y)
53
+gcov-files-test-aio-multithread-y += util/qemu-coroutine.c tests/iothread.c
54
check-unit-y += tests/test-throttle$(EXESUF)
55
-gcov-files-test-aio-$(CONFIG_WIN32) = aio-win32.c
56
-gcov-files-test-aio-$(CONFIG_POSIX) = aio-posix.c
57
check-unit-y += tests/test-thread-pool$(EXESUF)
58
gcov-files-test-thread-pool-y = thread-pool.c
59
gcov-files-test-hbitmap-y = util/hbitmap.c
60
@@ -XXX,XX +XXX,XX @@ test-qapi-obj-y = tests/test-qapi-visit.o tests/test-qapi-types.o \
61
    $(test-qom-obj-y)
62
test-crypto-obj-y = $(crypto-obj-y) $(test-qom-obj-y)
63
test-io-obj-y = $(io-obj-y) $(test-crypto-obj-y)
64
-test-block-obj-y = $(block-obj-y) $(test-io-obj-y)
65
+test-block-obj-y = $(block-obj-y) $(test-io-obj-y) tests/iothread.o
66
67
tests/check-qint$(EXESUF): tests/check-qint.o $(test-util-obj-y)
68
tests/check-qstring$(EXESUF): tests/check-qstring.o $(test-util-obj-y)
69
@@ -XXX,XX +XXX,XX @@ tests/check-qom-proplist$(EXESUF): tests/check-qom-proplist.o $(test-qom-obj-y)
70
tests/test-char$(EXESUF): tests/test-char.o $(test-util-obj-y) $(qtest-obj-y) $(test-io-obj-y) $(chardev-obj-y)
71
tests/test-coroutine$(EXESUF): tests/test-coroutine.o $(test-block-obj-y)
72
tests/test-aio$(EXESUF): tests/test-aio.o $(test-block-obj-y)
73
+tests/test-aio-multithread$(EXESUF): tests/test-aio-multithread.o $(test-block-obj-y)
74
tests/test-throttle$(EXESUF): tests/test-throttle.o $(test-block-obj-y)
75
tests/test-blockjob$(EXESUF): tests/test-blockjob.o $(test-block-obj-y) $(test-util-obj-y)
76
tests/test-blockjob-txn$(EXESUF): tests/test-blockjob-txn.o $(test-block-obj-y) $(test-util-obj-y)
77
diff --git a/include/block/aio.h b/include/block/aio.h
78
index XXXXXXX..XXXXXXX 100644
79
--- a/include/block/aio.h
80
+++ b/include/block/aio.h
81
@@ -XXX,XX +XXX,XX @@ typedef void QEMUBHFunc(void *opaque);
82
typedef bool AioPollFn(void *opaque);
83
typedef void IOHandler(void *opaque);
84
85
+struct Coroutine;
86
struct ThreadPool;
87
struct LinuxAioState;
88
89
@@ -XXX,XX +XXX,XX @@ struct AioContext {
90
bool notified;
91
EventNotifier notifier;
92
93
+ QSLIST_HEAD(, Coroutine) scheduled_coroutines;
94
+ QEMUBH *co_schedule_bh;
95
+
96
/* Thread pool for performing work and receiving completion callbacks.
97
* Has its own locking.
98
*/
99
@@ -XXX,XX +XXX,XX @@ static inline bool aio_node_check(AioContext *ctx, bool is_external)
100
}
101
102
/**
103
+ * aio_co_schedule:
104
+ * @ctx: the aio context
105
+ * @co: the coroutine
106
+ *
107
+ * Start a coroutine on a remote AioContext.
108
+ *
109
+ * The coroutine must not be entered by anyone else while aio_co_schedule()
110
+ * is active. In addition the coroutine must have yielded unless ctx
111
+ * is the context in which the coroutine is running (i.e. the value of
112
+ * qemu_get_current_aio_context() from the coroutine itself).
113
+ */
114
+void aio_co_schedule(AioContext *ctx, struct Coroutine *co);
115
+
116
+/**
117
+ * aio_co_wake:
118
+ * @co: the coroutine
119
+ *
120
+ * Restart a coroutine on the AioContext where it was running last, thus
121
+ * preventing coroutines from jumping from one context to another when they
122
+ * go to sleep.
123
+ *
124
+ * aio_co_wake may be executed either in coroutine or non-coroutine
125
+ * context. The coroutine must not be entered by anyone else while
126
+ * aio_co_wake() is active.
127
+ */
128
+void aio_co_wake(struct Coroutine *co);
129
+
130
+/**
131
* Return the AioContext whose event loop runs in the current thread.
132
*
133
* If called from an IOThread this will be the IOThread's AioContext. If
134
diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
135
index XXXXXXX..XXXXXXX 100644
136
--- a/include/qemu/coroutine_int.h
137
+++ b/include/qemu/coroutine_int.h
138
@@ -XXX,XX +XXX,XX @@ struct Coroutine {
139
CoroutineEntry *entry;
140
void *entry_arg;
141
Coroutine *caller;
142
+
143
+ /* Only used when the coroutine has terminated. */
144
QSLIST_ENTRY(Coroutine) pool_next;
145
+
146
size_t locks_held;
147
148
- /* Coroutines that should be woken up when we yield or terminate */
149
+ /* Coroutines that should be woken up when we yield or terminate.
150
+ * Only used when the coroutine is running.
151
+ */
152
QSIMPLEQ_HEAD(, Coroutine) co_queue_wakeup;
153
+
154
+ /* Only used when the coroutine has yielded. */
155
+ AioContext *ctx;
156
QSIMPLEQ_ENTRY(Coroutine) co_queue_next;
157
+ QSLIST_ENTRY(Coroutine) co_scheduled_next;
158
};
159
160
Coroutine *qemu_coroutine_new(void);
161
diff --git a/tests/iothread.h b/tests/iothread.h
162
new file mode 100644
33
new file mode 100644
163
index XXXXXXX..XXXXXXX
34
index XXXXXXX..XXXXXXX
164
--- /dev/null
35
--- /dev/null
165
+++ b/tests/iothread.h
36
+++ b/block/export/vhost-user-blk-server.h
166
@@ -XXX,XX +XXX,XX @@
37
@@ -XXX,XX +XXX,XX @@
167
+/*
38
+/*
168
+ * Event loop thread implementation for unit tests
39
+ * Sharing QEMU block devices via vhost-user protocal
169
+ *
40
+ *
170
+ * Copyright Red Hat Inc., 2013, 2016
41
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
42
+ * Copyright (c) 2020 Red Hat, Inc.
171
+ *
43
+ *
172
+ * Authors:
44
+ * This work is licensed under the terms of the GNU GPL, version 2 or
173
+ * Stefan Hajnoczi <stefanha@redhat.com>
45
+ * later. See the COPYING file in the top-level directory.
174
+ * Paolo Bonzini <pbonzini@redhat.com>
175
+ *
176
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
177
+ * See the COPYING file in the top-level directory.
178
+ */
46
+ */
179
+#ifndef TEST_IOTHREAD_H
47
+
180
+#define TEST_IOTHREAD_H
48
+#ifndef VHOST_USER_BLK_SERVER_H
181
+
49
+#define VHOST_USER_BLK_SERVER_H
182
+#include "block/aio.h"
50
+#include "util/vhost-user-server.h"
183
+#include "qemu/thread.h"
51
+
184
+
52
+typedef struct VuBlockDev VuBlockDev;
185
+typedef struct IOThread IOThread;
53
+#define TYPE_VHOST_USER_BLK_SERVER "vhost-user-blk-server"
186
+
54
+#define VHOST_USER_BLK_SERVER(obj) \
187
+IOThread *iothread_new(void);
55
+ OBJECT_CHECK(VuBlockDev, obj, TYPE_VHOST_USER_BLK_SERVER)
188
+void iothread_join(IOThread *iothread);
56
+
189
+AioContext *iothread_get_aio_context(IOThread *iothread);
57
+/* vhost user block device */
190
+
58
+struct VuBlockDev {
191
+#endif
59
+ Object parent_obj;
192
diff --git a/tests/iothread.c b/tests/iothread.c
60
+ char *node_name;
61
+ SocketAddress *addr;
62
+ AioContext *ctx;
63
+ VuServer vu_server;
64
+ bool running;
65
+ uint32_t blk_size;
66
+ BlockBackend *backend;
67
+ QIOChannelSocket *sioc;
68
+ QTAILQ_ENTRY(VuBlockDev) next;
69
+ struct virtio_blk_config blkcfg;
70
+ bool writable;
71
+};
72
+
73
+#endif /* VHOST_USER_BLK_SERVER_H */
74
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
193
new file mode 100644
75
new file mode 100644
194
index XXXXXXX..XXXXXXX
76
index XXXXXXX..XXXXXXX
195
--- /dev/null
77
--- /dev/null
196
+++ b/tests/iothread.c
78
+++ b/block/export/vhost-user-blk-server.c
197
@@ -XXX,XX +XXX,XX @@
79
@@ -XXX,XX +XXX,XX @@
198
+/*
80
+/*
199
+ * Event loop thread implementation for unit tests
81
+ * Sharing QEMU block devices via vhost-user protocal
200
+ *
82
+ *
201
+ * Copyright Red Hat Inc., 2013, 2016
83
+ * Parts of the code based on nbd/server.c.
202
+ *
84
+ *
203
+ * Authors:
85
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
204
+ * Stefan Hajnoczi <stefanha@redhat.com>
86
+ * Copyright (c) 2020 Red Hat, Inc.
205
+ * Paolo Bonzini <pbonzini@redhat.com>
206
+ *
87
+ *
207
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
88
+ * This work is licensed under the terms of the GNU GPL, version 2 or
208
+ * See the COPYING file in the top-level directory.
89
+ * later. See the COPYING file in the top-level directory.
90
+ */
91
+#include "qemu/osdep.h"
92
+#include "block/block.h"
93
+#include "vhost-user-blk-server.h"
94
+#include "qapi/error.h"
95
+#include "qom/object_interfaces.h"
96
+#include "sysemu/block-backend.h"
97
+#include "util/block-helpers.h"
98
+
99
+enum {
100
+ VHOST_USER_BLK_MAX_QUEUES = 1,
101
+};
102
+struct virtio_blk_inhdr {
103
+ unsigned char status;
104
+};
105
+
106
+typedef struct VuBlockReq {
107
+ VuVirtqElement *elem;
108
+ int64_t sector_num;
109
+ size_t size;
110
+ struct virtio_blk_inhdr *in;
111
+ struct virtio_blk_outhdr out;
112
+ VuServer *server;
113
+ struct VuVirtq *vq;
114
+} VuBlockReq;
115
+
116
+static void vu_block_req_complete(VuBlockReq *req)
117
+{
118
+ VuDev *vu_dev = &req->server->vu_dev;
119
+
120
+ /* IO size with 1 extra status byte */
121
+ vu_queue_push(vu_dev, req->vq, req->elem, req->size + 1);
122
+ vu_queue_notify(vu_dev, req->vq);
123
+
124
+ if (req->elem) {
125
+ free(req->elem);
126
+ }
127
+
128
+ g_free(req);
129
+}
130
+
131
+static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
132
+{
133
+ return container_of(server, VuBlockDev, vu_server);
134
+}
135
+
136
+static int coroutine_fn
137
+vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
138
+ uint32_t iovcnt, uint32_t type)
139
+{
140
+ struct virtio_blk_discard_write_zeroes desc;
141
+ ssize_t size = iov_to_buf(iov, iovcnt, 0, &desc, sizeof(desc));
142
+ if (unlikely(size != sizeof(desc))) {
143
+ error_report("Invalid size %zd, expect %zu", size, sizeof(desc));
144
+ return -EINVAL;
145
+ }
146
+
147
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
148
+ uint64_t range[2] = { le64_to_cpu(desc.sector) << 9,
149
+ le32_to_cpu(desc.num_sectors) << 9 };
150
+ if (type == VIRTIO_BLK_T_DISCARD) {
151
+ if (blk_co_pdiscard(vdev_blk->backend, range[0], range[1]) == 0) {
152
+ return 0;
153
+ }
154
+ } else if (type == VIRTIO_BLK_T_WRITE_ZEROES) {
155
+ if (blk_co_pwrite_zeroes(vdev_blk->backend,
156
+ range[0], range[1], 0) == 0) {
157
+ return 0;
158
+ }
159
+ }
160
+
161
+ return -EINVAL;
162
+}
163
+
164
+static void coroutine_fn vu_block_flush(VuBlockReq *req)
165
+{
166
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
167
+ BlockBackend *backend = vdev_blk->backend;
168
+ blk_co_flush(backend);
169
+}
170
+
171
+struct req_data {
172
+ VuServer *server;
173
+ VuVirtq *vq;
174
+ VuVirtqElement *elem;
175
+};
176
+
177
+static void coroutine_fn vu_block_virtio_process_req(void *opaque)
178
+{
179
+ struct req_data *data = opaque;
180
+ VuServer *server = data->server;
181
+ VuVirtq *vq = data->vq;
182
+ VuVirtqElement *elem = data->elem;
183
+ uint32_t type;
184
+ VuBlockReq *req;
185
+
186
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
187
+ BlockBackend *backend = vdev_blk->backend;
188
+
189
+ struct iovec *in_iov = elem->in_sg;
190
+ struct iovec *out_iov = elem->out_sg;
191
+ unsigned in_num = elem->in_num;
192
+ unsigned out_num = elem->out_num;
193
+ /* refer to hw/block/virtio_blk.c */
194
+ if (elem->out_num < 1 || elem->in_num < 1) {
195
+ error_report("virtio-blk request missing headers");
196
+ free(elem);
197
+ return;
198
+ }
199
+
200
+ req = g_new0(VuBlockReq, 1);
201
+ req->server = server;
202
+ req->vq = vq;
203
+ req->elem = elem;
204
+
205
+ if (unlikely(iov_to_buf(out_iov, out_num, 0, &req->out,
206
+ sizeof(req->out)) != sizeof(req->out))) {
207
+ error_report("virtio-blk request outhdr too short");
208
+ goto err;
209
+ }
210
+
211
+ iov_discard_front(&out_iov, &out_num, sizeof(req->out));
212
+
213
+ if (in_iov[in_num - 1].iov_len < sizeof(struct virtio_blk_inhdr)) {
214
+ error_report("virtio-blk request inhdr too short");
215
+ goto err;
216
+ }
217
+
218
+ /* We always touch the last byte, so just see how big in_iov is. */
219
+ req->in = (void *)in_iov[in_num - 1].iov_base
220
+ + in_iov[in_num - 1].iov_len
221
+ - sizeof(struct virtio_blk_inhdr);
222
+ iov_discard_back(in_iov, &in_num, sizeof(struct virtio_blk_inhdr));
223
+
224
+ type = le32_to_cpu(req->out.type);
225
+ switch (type & ~VIRTIO_BLK_T_BARRIER) {
226
+ case VIRTIO_BLK_T_IN:
227
+ case VIRTIO_BLK_T_OUT: {
228
+ ssize_t ret = 0;
229
+ bool is_write = type & VIRTIO_BLK_T_OUT;
230
+ req->sector_num = le64_to_cpu(req->out.sector);
231
+
232
+ int64_t offset = req->sector_num * vdev_blk->blk_size;
233
+ QEMUIOVector qiov;
234
+ if (is_write) {
235
+ qemu_iovec_init_external(&qiov, out_iov, out_num);
236
+ ret = blk_co_pwritev(backend, offset, qiov.size,
237
+ &qiov, 0);
238
+ } else {
239
+ qemu_iovec_init_external(&qiov, in_iov, in_num);
240
+ ret = blk_co_preadv(backend, offset, qiov.size,
241
+ &qiov, 0);
242
+ }
243
+ if (ret >= 0) {
244
+ req->in->status = VIRTIO_BLK_S_OK;
245
+ } else {
246
+ req->in->status = VIRTIO_BLK_S_IOERR;
247
+ }
248
+ break;
249
+ }
250
+ case VIRTIO_BLK_T_FLUSH:
251
+ vu_block_flush(req);
252
+ req->in->status = VIRTIO_BLK_S_OK;
253
+ break;
254
+ case VIRTIO_BLK_T_GET_ID: {
255
+ size_t size = MIN(iov_size(&elem->in_sg[0], in_num),
256
+ VIRTIO_BLK_ID_BYTES);
257
+ snprintf(elem->in_sg[0].iov_base, size, "%s", "vhost_user_blk");
258
+ req->in->status = VIRTIO_BLK_S_OK;
259
+ req->size = elem->in_sg[0].iov_len;
260
+ break;
261
+ }
262
+ case VIRTIO_BLK_T_DISCARD:
263
+ case VIRTIO_BLK_T_WRITE_ZEROES: {
264
+ int rc;
265
+ rc = vu_block_discard_write_zeroes(req, &elem->out_sg[1],
266
+ out_num, type);
267
+ if (rc == 0) {
268
+ req->in->status = VIRTIO_BLK_S_OK;
269
+ } else {
270
+ req->in->status = VIRTIO_BLK_S_IOERR;
271
+ }
272
+ break;
273
+ }
274
+ default:
275
+ req->in->status = VIRTIO_BLK_S_UNSUPP;
276
+ break;
277
+ }
278
+
279
+ vu_block_req_complete(req);
280
+ return;
281
+
282
+err:
283
+ free(elem);
284
+ g_free(req);
285
+ return;
286
+}
287
+
288
+static void vu_block_process_vq(VuDev *vu_dev, int idx)
289
+{
290
+ VuServer *server;
291
+ VuVirtq *vq;
292
+ struct req_data *req_data;
293
+
294
+ server = container_of(vu_dev, VuServer, vu_dev);
295
+ assert(server);
296
+
297
+ vq = vu_get_queue(vu_dev, idx);
298
+ assert(vq);
299
+ VuVirtqElement *elem;
300
+ while (1) {
301
+ elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) +
302
+ sizeof(VuBlockReq));
303
+ if (elem) {
304
+ req_data = g_new0(struct req_data, 1);
305
+ req_data->server = server;
306
+ req_data->vq = vq;
307
+ req_data->elem = elem;
308
+ Coroutine *co = qemu_coroutine_create(vu_block_virtio_process_req,
309
+ req_data);
310
+ aio_co_enter(server->ioc->ctx, co);
311
+ } else {
312
+ break;
313
+ }
314
+ }
315
+}
316
+
317
+static void vu_block_queue_set_started(VuDev *vu_dev, int idx, bool started)
318
+{
319
+ VuVirtq *vq;
320
+
321
+ assert(vu_dev);
322
+
323
+ vq = vu_get_queue(vu_dev, idx);
324
+ vu_set_queue_handler(vu_dev, vq, started ? vu_block_process_vq : NULL);
325
+}
326
+
327
+static uint64_t vu_block_get_features(VuDev *dev)
328
+{
329
+ uint64_t features;
330
+ VuServer *server = container_of(dev, VuServer, vu_dev);
331
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
332
+ features = 1ull << VIRTIO_BLK_F_SIZE_MAX |
333
+ 1ull << VIRTIO_BLK_F_SEG_MAX |
334
+ 1ull << VIRTIO_BLK_F_TOPOLOGY |
335
+ 1ull << VIRTIO_BLK_F_BLK_SIZE |
336
+ 1ull << VIRTIO_BLK_F_FLUSH |
337
+ 1ull << VIRTIO_BLK_F_DISCARD |
338
+ 1ull << VIRTIO_BLK_F_WRITE_ZEROES |
339
+ 1ull << VIRTIO_BLK_F_CONFIG_WCE |
340
+ 1ull << VIRTIO_F_VERSION_1 |
341
+ 1ull << VIRTIO_RING_F_INDIRECT_DESC |
342
+ 1ull << VIRTIO_RING_F_EVENT_IDX |
343
+ 1ull << VHOST_USER_F_PROTOCOL_FEATURES;
344
+
345
+ if (!vdev_blk->writable) {
346
+ features |= 1ull << VIRTIO_BLK_F_RO;
347
+ }
348
+
349
+ return features;
350
+}
351
+
352
+static uint64_t vu_block_get_protocol_features(VuDev *dev)
353
+{
354
+ return 1ull << VHOST_USER_PROTOCOL_F_CONFIG |
355
+ 1ull << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD;
356
+}
357
+
358
+static int
359
+vu_block_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
360
+{
361
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
362
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
363
+ memcpy(config, &vdev_blk->blkcfg, len);
364
+
365
+ return 0;
366
+}
367
+
368
+static int
369
+vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
370
+ uint32_t offset, uint32_t size, uint32_t flags)
371
+{
372
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
373
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
374
+ uint8_t wce;
375
+
376
+ /* don't support live migration */
377
+ if (flags != VHOST_SET_CONFIG_TYPE_MASTER) {
378
+ return -EINVAL;
379
+ }
380
+
381
+ if (offset != offsetof(struct virtio_blk_config, wce) ||
382
+ size != 1) {
383
+ return -EINVAL;
384
+ }
385
+
386
+ wce = *data;
387
+ vdev_blk->blkcfg.wce = wce;
388
+ blk_set_enable_write_cache(vdev_blk->backend, wce);
389
+ return 0;
390
+}
391
+
392
+/*
393
+ * When the client disconnects, it sends a VHOST_USER_NONE request
394
+ * and vu_process_message will simple call exit which cause the VM
395
+ * to exit abruptly.
396
+ * To avoid this issue, process VHOST_USER_NONE request ahead
397
+ * of vu_process_message.
209
+ *
398
+ *
210
+ */
399
+ */
211
+
400
+static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
212
+#include "qemu/osdep.h"
401
+{
213
+#include "qapi/error.h"
402
+ if (vmsg->request == VHOST_USER_NONE) {
214
+#include "block/aio.h"
403
+ dev->panic(dev, "disconnect");
215
+#include "qemu/main-loop.h"
404
+ return true;
216
+#include "qemu/rcu.h"
405
+ }
217
+#include "iothread.h"
406
+ return false;
218
+
407
+}
219
+struct IOThread {
408
+
409
+static const VuDevIface vu_block_iface = {
410
+ .get_features = vu_block_get_features,
411
+ .queue_set_started = vu_block_queue_set_started,
412
+ .get_protocol_features = vu_block_get_protocol_features,
413
+ .get_config = vu_block_get_config,
414
+ .set_config = vu_block_set_config,
415
+ .process_msg = vu_block_process_msg,
416
+};
417
+
418
+static void blk_aio_attached(AioContext *ctx, void *opaque)
419
+{
420
+ VuBlockDev *vub_dev = opaque;
421
+ aio_context_acquire(ctx);
422
+ vhost_user_server_set_aio_context(&vub_dev->vu_server, ctx);
423
+ aio_context_release(ctx);
424
+}
425
+
426
+static void blk_aio_detach(void *opaque)
427
+{
428
+ VuBlockDev *vub_dev = opaque;
429
+ AioContext *ctx = vub_dev->vu_server.ctx;
430
+ aio_context_acquire(ctx);
431
+ vhost_user_server_set_aio_context(&vub_dev->vu_server, NULL);
432
+ aio_context_release(ctx);
433
+}
434
+
435
+static void
436
+vu_block_initialize_config(BlockDriverState *bs,
437
+ struct virtio_blk_config *config, uint32_t blk_size)
438
+{
439
+ config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
440
+ config->blk_size = blk_size;
441
+ config->size_max = 0;
442
+ config->seg_max = 128 - 2;
443
+ config->min_io_size = 1;
444
+ config->opt_io_size = 1;
445
+ config->num_queues = VHOST_USER_BLK_MAX_QUEUES;
446
+ config->max_discard_sectors = 32768;
447
+ config->max_discard_seg = 1;
448
+ config->discard_sector_alignment = config->blk_size >> 9;
449
+ config->max_write_zeroes_sectors = 32768;
450
+ config->max_write_zeroes_seg = 1;
451
+}
452
+
453
+static VuBlockDev *vu_block_init(VuBlockDev *vu_block_device, Error **errp)
454
+{
455
+
456
+ BlockBackend *blk;
457
+ Error *local_error = NULL;
458
+ const char *node_name = vu_block_device->node_name;
459
+ bool writable = vu_block_device->writable;
460
+ uint64_t perm = BLK_PERM_CONSISTENT_READ;
461
+ int ret;
462
+
220
+ AioContext *ctx;
463
+ AioContext *ctx;
221
+
464
+
222
+ QemuThread thread;
465
+ BlockDriverState *bs = bdrv_lookup_bs(node_name, node_name, &local_error);
223
+ QemuMutex init_done_lock;
466
+
224
+ QemuCond init_done_cond; /* is thread initialization done? */
467
+ if (!bs) {
225
+ bool stopping;
468
+ error_propagate(errp, local_error);
469
+ return NULL;
470
+ }
471
+
472
+ if (bdrv_is_read_only(bs)) {
473
+ writable = false;
474
+ }
475
+
476
+ if (writable) {
477
+ perm |= BLK_PERM_WRITE;
478
+ }
479
+
480
+ ctx = bdrv_get_aio_context(bs);
481
+ aio_context_acquire(ctx);
482
+ bdrv_invalidate_cache(bs, NULL);
483
+ aio_context_release(ctx);
484
+
485
+ /*
486
+ * Don't allow resize while the vhost user server is running,
487
+ * otherwise we don't care what happens with the node.
488
+ */
489
+ blk = blk_new(bdrv_get_aio_context(bs), perm,
490
+ BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
491
+ BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD);
492
+ ret = blk_insert_bs(blk, bs, errp);
493
+
494
+ if (ret < 0) {
495
+ goto fail;
496
+ }
497
+
498
+ blk_set_enable_write_cache(blk, false);
499
+
500
+ blk_set_allow_aio_context_change(blk, true);
501
+
502
+ vu_block_device->blkcfg.wce = 0;
503
+ vu_block_device->backend = blk;
504
+ if (!vu_block_device->blk_size) {
505
+ vu_block_device->blk_size = BDRV_SECTOR_SIZE;
506
+ }
507
+ vu_block_device->blkcfg.blk_size = vu_block_device->blk_size;
508
+ blk_set_guest_block_size(blk, vu_block_device->blk_size);
509
+ vu_block_initialize_config(bs, &vu_block_device->blkcfg,
510
+ vu_block_device->blk_size);
511
+ return vu_block_device;
512
+
513
+fail:
514
+ blk_unref(blk);
515
+ return NULL;
516
+}
517
+
518
+static void vu_block_deinit(VuBlockDev *vu_block_device)
519
+{
520
+ if (vu_block_device->backend) {
521
+ blk_remove_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
522
+ blk_aio_detach, vu_block_device);
523
+ }
524
+
525
+ blk_unref(vu_block_device->backend);
526
+}
527
+
528
+static void vhost_user_blk_server_stop(VuBlockDev *vu_block_device)
529
+{
530
+ vhost_user_server_stop(&vu_block_device->vu_server);
531
+ vu_block_deinit(vu_block_device);
532
+}
533
+
534
+static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
535
+ Error **errp)
536
+{
537
+ AioContext *ctx;
538
+ SocketAddress *addr = vu_block_device->addr;
539
+
540
+ if (!vu_block_init(vu_block_device, errp)) {
541
+ return;
542
+ }
543
+
544
+ ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
545
+
546
+ if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
547
+ VHOST_USER_BLK_MAX_QUEUES,
548
+ NULL, &vu_block_iface,
549
+ errp)) {
550
+ goto error;
551
+ }
552
+
553
+ blk_add_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
554
+ blk_aio_detach, vu_block_device);
555
+ vu_block_device->running = true;
556
+ return;
557
+
558
+ error:
559
+ vu_block_deinit(vu_block_device);
560
+}
561
+
562
+static bool vu_prop_modifiable(VuBlockDev *vus, Error **errp)
563
+{
564
+ if (vus->running) {
565
+ error_setg(errp, "The property can't be modified "
566
+ "while the server is running");
567
+ return false;
568
+ }
569
+ return true;
570
+}
571
+
572
+static void vu_set_node_name(Object *obj, const char *value, Error **errp)
573
+{
574
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
575
+
576
+ if (!vu_prop_modifiable(vus, errp)) {
577
+ return;
578
+ }
579
+
580
+ if (vus->node_name) {
581
+ g_free(vus->node_name);
582
+ }
583
+
584
+ vus->node_name = g_strdup(value);
585
+}
586
+
587
+static char *vu_get_node_name(Object *obj, Error **errp)
588
+{
589
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
590
+ return g_strdup(vus->node_name);
591
+}
592
+
593
+static void free_socket_addr(SocketAddress *addr)
594
+{
595
+ g_free(addr->u.q_unix.path);
596
+ g_free(addr);
597
+}
598
+
599
+static void vu_set_unix_socket(Object *obj, const char *value,
600
+ Error **errp)
601
+{
602
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
603
+
604
+ if (!vu_prop_modifiable(vus, errp)) {
605
+ return;
606
+ }
607
+
608
+ if (vus->addr) {
609
+ free_socket_addr(vus->addr);
610
+ }
611
+
612
+ SocketAddress *addr = g_new0(SocketAddress, 1);
613
+ addr->type = SOCKET_ADDRESS_TYPE_UNIX;
614
+ addr->u.q_unix.path = g_strdup(value);
615
+ vus->addr = addr;
616
+}
617
+
618
+static char *vu_get_unix_socket(Object *obj, Error **errp)
619
+{
620
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
621
+ return g_strdup(vus->addr->u.q_unix.path);
622
+}
623
+
624
+static bool vu_get_block_writable(Object *obj, Error **errp)
625
+{
626
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
627
+ return vus->writable;
628
+}
629
+
630
+static void vu_set_block_writable(Object *obj, bool value, Error **errp)
631
+{
632
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
633
+
634
+ if (!vu_prop_modifiable(vus, errp)) {
635
+ return;
636
+ }
637
+
638
+ vus->writable = value;
639
+}
640
+
641
+static void vu_get_blk_size(Object *obj, Visitor *v, const char *name,
642
+ void *opaque, Error **errp)
643
+{
644
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
645
+ uint32_t value = vus->blk_size;
646
+
647
+ visit_type_uint32(v, name, &value, errp);
648
+}
649
+
650
+static void vu_set_blk_size(Object *obj, Visitor *v, const char *name,
651
+ void *opaque, Error **errp)
652
+{
653
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
654
+
655
+ Error *local_err = NULL;
656
+ uint32_t value;
657
+
658
+ if (!vu_prop_modifiable(vus, errp)) {
659
+ return;
660
+ }
661
+
662
+ visit_type_uint32(v, name, &value, &local_err);
663
+ if (local_err) {
664
+ goto out;
665
+ }
666
+
667
+ check_block_size(object_get_typename(obj), name, value, &local_err);
668
+ if (local_err) {
669
+ goto out;
670
+ }
671
+
672
+ vus->blk_size = value;
673
+
674
+out:
675
+ error_propagate(errp, local_err);
676
+}
677
+
678
+static void vhost_user_blk_server_instance_finalize(Object *obj)
679
+{
680
+ VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
681
+
682
+ vhost_user_blk_server_stop(vub);
683
+
684
+ /*
685
+ * Unlike object_property_add_str, object_class_property_add_str
686
+ * doesn't have a release method. Thus manual memory freeing is
687
+ * needed.
688
+ */
689
+ free_socket_addr(vub->addr);
690
+ g_free(vub->node_name);
691
+}
692
+
693
+static void vhost_user_blk_server_complete(UserCreatable *obj, Error **errp)
694
+{
695
+ VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
696
+
697
+ vhost_user_blk_server_start(vub, errp);
698
+}
699
+
700
+static void vhost_user_blk_server_class_init(ObjectClass *klass,
701
+ void *class_data)
702
+{
703
+ UserCreatableClass *ucc = USER_CREATABLE_CLASS(klass);
704
+ ucc->complete = vhost_user_blk_server_complete;
705
+
706
+ object_class_property_add_bool(klass, "writable",
707
+ vu_get_block_writable,
708
+ vu_set_block_writable);
709
+
710
+ object_class_property_add_str(klass, "node-name",
711
+ vu_get_node_name,
712
+ vu_set_node_name);
713
+
714
+ object_class_property_add_str(klass, "unix-socket",
715
+ vu_get_unix_socket,
716
+ vu_set_unix_socket);
717
+
718
+ object_class_property_add(klass, "logical-block-size", "uint32",
719
+ vu_get_blk_size, vu_set_blk_size,
720
+ NULL, NULL);
721
+}
722
+
723
+static const TypeInfo vhost_user_blk_server_info = {
724
+ .name = TYPE_VHOST_USER_BLK_SERVER,
725
+ .parent = TYPE_OBJECT,
726
+ .instance_size = sizeof(VuBlockDev),
727
+ .instance_finalize = vhost_user_blk_server_instance_finalize,
728
+ .class_init = vhost_user_blk_server_class_init,
729
+ .interfaces = (InterfaceInfo[]) {
730
+ {TYPE_USER_CREATABLE},
731
+ {}
732
+ },
226
+};
733
+};
227
+
734
+
228
+static __thread IOThread *my_iothread;
735
+static void vhost_user_blk_server_register_types(void)
229
+
736
+{
230
+AioContext *qemu_get_current_aio_context(void)
737
+ type_register_static(&vhost_user_blk_server_info);
231
+{
738
+}
232
+ return my_iothread ? my_iothread->ctx : qemu_get_aio_context();
739
+
233
+}
740
+type_init(vhost_user_blk_server_register_types)
234
+
741
diff --git a/softmmu/vl.c b/softmmu/vl.c
235
+static void *iothread_run(void *opaque)
236
+{
237
+ IOThread *iothread = opaque;
238
+
239
+ rcu_register_thread();
240
+
241
+ my_iothread = iothread;
242
+ qemu_mutex_lock(&iothread->init_done_lock);
243
+ iothread->ctx = aio_context_new(&error_abort);
244
+ qemu_cond_signal(&iothread->init_done_cond);
245
+ qemu_mutex_unlock(&iothread->init_done_lock);
246
+
247
+ while (!atomic_read(&iothread->stopping)) {
248
+ aio_poll(iothread->ctx, true);
249
+ }
250
+
251
+ rcu_unregister_thread();
252
+ return NULL;
253
+}
254
+
255
+void iothread_join(IOThread *iothread)
256
+{
257
+ iothread->stopping = true;
258
+ aio_notify(iothread->ctx);
259
+ qemu_thread_join(&iothread->thread);
260
+ qemu_cond_destroy(&iothread->init_done_cond);
261
+ qemu_mutex_destroy(&iothread->init_done_lock);
262
+ aio_context_unref(iothread->ctx);
263
+ g_free(iothread);
264
+}
265
+
266
+IOThread *iothread_new(void)
267
+{
268
+ IOThread *iothread = g_new0(IOThread, 1);
269
+
270
+ qemu_mutex_init(&iothread->init_done_lock);
271
+ qemu_cond_init(&iothread->init_done_cond);
272
+ qemu_thread_create(&iothread->thread, NULL, iothread_run,
273
+ iothread, QEMU_THREAD_JOINABLE);
274
+
275
+ /* Wait for initialization to complete */
276
+ qemu_mutex_lock(&iothread->init_done_lock);
277
+ while (iothread->ctx == NULL) {
278
+ qemu_cond_wait(&iothread->init_done_cond,
279
+ &iothread->init_done_lock);
280
+ }
281
+ qemu_mutex_unlock(&iothread->init_done_lock);
282
+ return iothread;
283
+}
284
+
285
+AioContext *iothread_get_aio_context(IOThread *iothread)
286
+{
287
+ return iothread->ctx;
288
+}
289
diff --git a/tests/test-aio-multithread.c b/tests/test-aio-multithread.c
290
new file mode 100644
291
index XXXXXXX..XXXXXXX
292
--- /dev/null
293
+++ b/tests/test-aio-multithread.c
294
@@ -XXX,XX +XXX,XX @@
295
+/*
296
+ * AioContext multithreading tests
297
+ *
298
+ * Copyright Red Hat, Inc. 2016
299
+ *
300
+ * Authors:
301
+ * Paolo Bonzini <pbonzini@redhat.com>
302
+ *
303
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
304
+ * See the COPYING.LIB file in the top-level directory.
305
+ */
306
+
307
+#include "qemu/osdep.h"
308
+#include <glib.h>
309
+#include "block/aio.h"
310
+#include "qapi/error.h"
311
+#include "qemu/coroutine.h"
312
+#include "qemu/thread.h"
313
+#include "qemu/error-report.h"
314
+#include "iothread.h"
315
+
316
+/* AioContext management */
317
+
318
+#define NUM_CONTEXTS 5
319
+
320
+static IOThread *threads[NUM_CONTEXTS];
321
+static AioContext *ctx[NUM_CONTEXTS];
322
+static __thread int id = -1;
323
+
324
+static QemuEvent done_event;
325
+
326
+/* Run a function synchronously on a remote iothread. */
327
+
328
+typedef struct CtxRunData {
329
+ QEMUBHFunc *cb;
330
+ void *arg;
331
+} CtxRunData;
332
+
333
+static void ctx_run_bh_cb(void *opaque)
334
+{
335
+ CtxRunData *data = opaque;
336
+
337
+ data->cb(data->arg);
338
+ qemu_event_set(&done_event);
339
+}
340
+
341
+static void ctx_run(int i, QEMUBHFunc *cb, void *opaque)
342
+{
343
+ CtxRunData data = {
344
+ .cb = cb,
345
+ .arg = opaque
346
+ };
347
+
348
+ qemu_event_reset(&done_event);
349
+ aio_bh_schedule_oneshot(ctx[i], ctx_run_bh_cb, &data);
350
+ qemu_event_wait(&done_event);
351
+}
352
+
353
+/* Starting the iothreads. */
354
+
355
+static void set_id_cb(void *opaque)
356
+{
357
+ int *i = opaque;
358
+
359
+ id = *i;
360
+}
361
+
362
+static void create_aio_contexts(void)
363
+{
364
+ int i;
365
+
366
+ for (i = 0; i < NUM_CONTEXTS; i++) {
367
+ threads[i] = iothread_new();
368
+ ctx[i] = iothread_get_aio_context(threads[i]);
369
+ }
370
+
371
+ qemu_event_init(&done_event, false);
372
+ for (i = 0; i < NUM_CONTEXTS; i++) {
373
+ ctx_run(i, set_id_cb, &i);
374
+ }
375
+}
376
+
377
+/* Stopping the iothreads. */
378
+
379
+static void join_aio_contexts(void)
380
+{
381
+ int i;
382
+
383
+ for (i = 0; i < NUM_CONTEXTS; i++) {
384
+ aio_context_ref(ctx[i]);
385
+ }
386
+ for (i = 0; i < NUM_CONTEXTS; i++) {
387
+ iothread_join(threads[i]);
388
+ }
389
+ for (i = 0; i < NUM_CONTEXTS; i++) {
390
+ aio_context_unref(ctx[i]);
391
+ }
392
+ qemu_event_destroy(&done_event);
393
+}
394
+
395
+/* Basic test for the stuff above. */
396
+
397
+static void test_lifecycle(void)
398
+{
399
+ create_aio_contexts();
400
+ join_aio_contexts();
401
+}
402
+
403
+/* aio_co_schedule test. */
404
+
405
+static Coroutine *to_schedule[NUM_CONTEXTS];
406
+
407
+static bool now_stopping;
408
+
409
+static int count_retry;
410
+static int count_here;
411
+static int count_other;
412
+
413
+static bool schedule_next(int n)
414
+{
415
+ Coroutine *co;
416
+
417
+ co = atomic_xchg(&to_schedule[n], NULL);
418
+ if (!co) {
419
+ atomic_inc(&count_retry);
420
+ return false;
421
+ }
422
+
423
+ if (n == id) {
424
+ atomic_inc(&count_here);
425
+ } else {
426
+ atomic_inc(&count_other);
427
+ }
428
+
429
+ aio_co_schedule(ctx[n], co);
430
+ return true;
431
+}
432
+
433
+static void finish_cb(void *opaque)
434
+{
435
+ schedule_next(id);
436
+}
437
+
438
+static coroutine_fn void test_multi_co_schedule_entry(void *opaque)
439
+{
440
+ g_assert(to_schedule[id] == NULL);
441
+ atomic_mb_set(&to_schedule[id], qemu_coroutine_self());
442
+
443
+ while (!atomic_mb_read(&now_stopping)) {
444
+ int n;
445
+
446
+ n = g_test_rand_int_range(0, NUM_CONTEXTS);
447
+ schedule_next(n);
448
+ qemu_coroutine_yield();
449
+
450
+ g_assert(to_schedule[id] == NULL);
451
+ atomic_mb_set(&to_schedule[id], qemu_coroutine_self());
452
+ }
453
+}
454
+
455
+
456
+static void test_multi_co_schedule(int seconds)
457
+{
458
+ int i;
459
+
460
+ count_here = count_other = count_retry = 0;
461
+ now_stopping = false;
462
+
463
+ create_aio_contexts();
464
+ for (i = 0; i < NUM_CONTEXTS; i++) {
465
+ Coroutine *co1 = qemu_coroutine_create(test_multi_co_schedule_entry, NULL);
466
+ aio_co_schedule(ctx[i], co1);
467
+ }
468
+
469
+ g_usleep(seconds * 1000000);
470
+
471
+ atomic_mb_set(&now_stopping, true);
472
+ for (i = 0; i < NUM_CONTEXTS; i++) {
473
+ ctx_run(i, finish_cb, NULL);
474
+ to_schedule[i] = NULL;
475
+ }
476
+
477
+ join_aio_contexts();
478
+ g_test_message("scheduled %d, queued %d, retry %d, total %d\n",
479
+ count_other, count_here, count_retry,
480
+ count_here + count_other + count_retry);
481
+}
482
+
483
+static void test_multi_co_schedule_1(void)
484
+{
485
+ test_multi_co_schedule(1);
486
+}
487
+
488
+static void test_multi_co_schedule_10(void)
489
+{
490
+ test_multi_co_schedule(10);
491
+}
492
+
493
+/* End of tests. */
494
+
495
+int main(int argc, char **argv)
496
+{
497
+ init_clocks();
498
+
499
+ g_test_init(&argc, &argv, NULL);
500
+ g_test_add_func("/aio/multi/lifecycle", test_lifecycle);
501
+ if (g_test_quick()) {
502
+ g_test_add_func("/aio/multi/schedule", test_multi_co_schedule_1);
503
+ } else {
504
+ g_test_add_func("/aio/multi/schedule", test_multi_co_schedule_10);
505
+ }
506
+ return g_test_run();
507
+}
508
diff --git a/util/async.c b/util/async.c
509
index XXXXXXX..XXXXXXX 100644
742
index XXXXXXX..XXXXXXX 100644
510
--- a/util/async.c
743
--- a/softmmu/vl.c
511
+++ b/util/async.c
744
+++ b/softmmu/vl.c
512
@@ -XXX,XX +XXX,XX @@
745
@@ -XXX,XX +XXX,XX @@ static bool object_create_initial(const char *type, QemuOpts *opts)
513
#include "qemu/main-loop.h"
514
#include "qemu/atomic.h"
515
#include "block/raw-aio.h"
516
+#include "qemu/coroutine_int.h"
517
+#include "trace.h"
518
519
/***********************************************************/
520
/* bottom halves (can be seen as timers which expire ASAP) */
521
@@ -XXX,XX +XXX,XX @@ aio_ctx_finalize(GSource *source)
522
}
746
}
523
#endif
747
#endif
524
748
525
+ assert(QSLIST_EMPTY(&ctx->scheduled_coroutines));
749
+ /* Reason: vhost-user-blk-server property "node-name" */
526
+ qemu_bh_delete(ctx->co_schedule_bh);
750
+ if (g_str_equal(type, "vhost-user-blk-server")) {
527
+
751
+ return false;
528
qemu_lockcnt_lock(&ctx->list_lock);
752
+ }
529
assert(!qemu_lockcnt_count(&ctx->list_lock));
753
/*
530
while (ctx->first_bh) {
754
* Reason: filter-* property "netdev" etc.
531
@@ -XXX,XX +XXX,XX @@ static bool event_notifier_poll(void *opaque)
755
*/
532
return atomic_read(&ctx->notified);
756
diff --git a/block/meson.build b/block/meson.build
533
}
534
535
+static void co_schedule_bh_cb(void *opaque)
536
+{
537
+ AioContext *ctx = opaque;
538
+ QSLIST_HEAD(, Coroutine) straight, reversed;
539
+
540
+ QSLIST_MOVE_ATOMIC(&reversed, &ctx->scheduled_coroutines);
541
+ QSLIST_INIT(&straight);
542
+
543
+ while (!QSLIST_EMPTY(&reversed)) {
544
+ Coroutine *co = QSLIST_FIRST(&reversed);
545
+ QSLIST_REMOVE_HEAD(&reversed, co_scheduled_next);
546
+ QSLIST_INSERT_HEAD(&straight, co, co_scheduled_next);
547
+ }
548
+
549
+ while (!QSLIST_EMPTY(&straight)) {
550
+ Coroutine *co = QSLIST_FIRST(&straight);
551
+ QSLIST_REMOVE_HEAD(&straight, co_scheduled_next);
552
+ trace_aio_co_schedule_bh_cb(ctx, co);
553
+ qemu_coroutine_enter(co);
554
+ }
555
+}
556
+
557
AioContext *aio_context_new(Error **errp)
558
{
559
int ret;
560
@@ -XXX,XX +XXX,XX @@ AioContext *aio_context_new(Error **errp)
561
}
562
g_source_set_can_recurse(&ctx->source, true);
563
qemu_lockcnt_init(&ctx->list_lock);
564
+
565
+ ctx->co_schedule_bh = aio_bh_new(ctx, co_schedule_bh_cb, ctx);
566
+ QSLIST_INIT(&ctx->scheduled_coroutines);
567
+
568
aio_set_event_notifier(ctx, &ctx->notifier,
569
false,
570
(EventNotifierHandler *)
571
@@ -XXX,XX +XXX,XX @@ fail:
572
return NULL;
573
}
574
575
+void aio_co_schedule(AioContext *ctx, Coroutine *co)
576
+{
577
+ trace_aio_co_schedule(ctx, co);
578
+ QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
579
+ co, co_scheduled_next);
580
+ qemu_bh_schedule(ctx->co_schedule_bh);
581
+}
582
+
583
+void aio_co_wake(struct Coroutine *co)
584
+{
585
+ AioContext *ctx;
586
+
587
+ /* Read coroutine before co->ctx. Matches smp_wmb in
588
+ * qemu_coroutine_enter.
589
+ */
590
+ smp_read_barrier_depends();
591
+ ctx = atomic_read(&co->ctx);
592
+
593
+ if (ctx != qemu_get_current_aio_context()) {
594
+ aio_co_schedule(ctx, co);
595
+ return;
596
+ }
597
+
598
+ if (qemu_in_coroutine()) {
599
+ Coroutine *self = qemu_coroutine_self();
600
+ assert(self != co);
601
+ QSIMPLEQ_INSERT_TAIL(&self->co_queue_wakeup, co, co_queue_next);
602
+ } else {
603
+ aio_context_acquire(ctx);
604
+ qemu_coroutine_enter(co);
605
+ aio_context_release(ctx);
606
+ }
607
+}
608
+
609
void aio_context_ref(AioContext *ctx)
610
{
611
g_source_ref(&ctx->source);
612
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
613
index XXXXXXX..XXXXXXX 100644
757
index XXXXXXX..XXXXXXX 100644
614
--- a/util/qemu-coroutine.c
758
--- a/block/meson.build
615
+++ b/util/qemu-coroutine.c
759
+++ b/block/meson.build
616
@@ -XXX,XX +XXX,XX @@
760
@@ -XXX,XX +XXX,XX @@ block_ss.add(when: 'CONFIG_WIN32', if_true: files('file-win32.c', 'win32-aio.c')
617
#include "qemu/atomic.h"
761
block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, iokit])
618
#include "qemu/coroutine.h"
762
block_ss.add(when: 'CONFIG_LIBISCSI', if_true: files('iscsi-opts.c'))
619
#include "qemu/coroutine_int.h"
763
block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c'))
620
+#include "block/aio.h"
764
+block_ss.add(when: 'CONFIG_LINUX', if_true: files('export/vhost-user-blk-server.c', '../contrib/libvhost-user/libvhost-user.c'))
621
765
block_ss.add(when: 'CONFIG_REPLICATION', if_true: files('replication.c'))
622
enum {
766
block_ss.add(when: 'CONFIG_SHEEPDOG', if_true: files('sheepdog.c'))
623
POOL_BATCH_SIZE = 64,
767
block_ss.add(when: ['CONFIG_LINUX_AIO', libaio], if_true: files('linux-aio.c'))
624
@@ -XXX,XX +XXX,XX @@ void qemu_coroutine_enter(Coroutine *co)
625
}
626
627
co->caller = self;
628
+ co->ctx = qemu_get_current_aio_context();
629
+
630
+ /* Store co->ctx before anything that stores co. Matches
631
+ * barrier in aio_co_wake.
632
+ */
633
+ smp_wmb();
634
+
635
ret = qemu_coroutine_switch(self, co, COROUTINE_ENTER);
636
637
qemu_co_queue_run_restart(co);
638
diff --git a/util/trace-events b/util/trace-events
639
index XXXXXXX..XXXXXXX 100644
640
--- a/util/trace-events
641
+++ b/util/trace-events
642
@@ -XXX,XX +XXX,XX @@ run_poll_handlers_end(void *ctx, bool progress) "ctx %p progress %d"
643
poll_shrink(void *ctx, int64_t old, int64_t new) "ctx %p old %"PRId64" new %"PRId64
644
poll_grow(void *ctx, int64_t old, int64_t new) "ctx %p old %"PRId64" new %"PRId64
645
646
+# util/async.c
647
+aio_co_schedule(void *ctx, void *co) "ctx %p co %p"
648
+aio_co_schedule_bh_cb(void *ctx, void *co) "ctx %p co %p"
649
+
650
# util/thread-pool.c
651
thread_pool_submit(void *pool, void *req, void *opaque) "pool %p req %p opaque %p"
652
thread_pool_complete(void *pool, void *req, void *opaque, int ret) "pool %p req %p opaque %p ret %d"
653
--
768
--
654
2.9.3
769
2.26.2
655
770
656
diff view generated by jsdifflib
New patch
1
From: Coiby Xu <coiby.xu@gmail.com>
1
2
3
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
4
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
5
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
7
Message-id: 20200918080912.321299-8-coiby.xu@gmail.com
8
[Removed reference to vhost-user-blk-test.c, it will be sent in a
9
separate pull request.
10
--Stefan]
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
13
MAINTAINERS | 7 +++++++
14
1 file changed, 7 insertions(+)
15
16
diff --git a/MAINTAINERS b/MAINTAINERS
17
index XXXXXXX..XXXXXXX 100644
18
--- a/MAINTAINERS
19
+++ b/MAINTAINERS
20
@@ -XXX,XX +XXX,XX @@ L: qemu-block@nongnu.org
21
S: Supported
22
F: tests/image-fuzzer/
23
24
+Vhost-user block device backend server
25
+M: Coiby Xu <Coiby.Xu@gmail.com>
26
+S: Maintained
27
+F: block/export/vhost-user-blk-server.c
28
+F: util/vhost-user-server.c
29
+F: tests/qtest/libqos/vhost-user-blk.c
30
+
31
Replication
32
M: Wen Congyang <wencongyang2@huawei.com>
33
M: Xie Changlong <xiechanglong.d@gmail.com>
34
--
35
2.26.2
36
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2
2
Message-id: 20200924151549.913737-3-stefanha@redhat.com
3
The AioContext data structures are now protected by list_lock and/or
4
they are walked with FOREACH_RCU primitives. There is no need anymore
5
to acquire the AioContext for the entire duration of aio_dispatch.
6
Instead, just acquire it before and after invoking the callbacks.
7
The next step is then to push it further down.
8
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
11
Reviewed-by: Fam Zheng <famz@redhat.com>
12
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
13
Message-id: 20170213135235.12274-12-pbonzini@redhat.com
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
15
---
4
---
16
util/aio-posix.c | 25 +++++++++++--------------
5
util/vhost-user-server.c | 2 +-
17
util/aio-win32.c | 15 +++++++--------
6
1 file changed, 1 insertion(+), 1 deletion(-)
18
util/async.c | 2 ++
19
3 files changed, 20 insertions(+), 22 deletions(-)
20
7
21
diff --git a/util/aio-posix.c b/util/aio-posix.c
8
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
22
index XXXXXXX..XXXXXXX 100644
9
index XXXXXXX..XXXXXXX 100644
23
--- a/util/aio-posix.c
10
--- a/util/vhost-user-server.c
24
+++ b/util/aio-posix.c
11
+++ b/util/vhost-user-server.c
25
@@ -XXX,XX +XXX,XX @@ static bool aio_dispatch_handlers(AioContext *ctx)
12
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
26
(revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) &&
13
return false;
27
aio_node_check(ctx, node->is_external) &&
28
node->io_read) {
29
+ aio_context_acquire(ctx);
30
node->io_read(node->opaque);
31
+ aio_context_release(ctx);
32
33
/* aio_notify() does not count as progress */
34
if (node->opaque != &ctx->notifier) {
35
@@ -XXX,XX +XXX,XX @@ static bool aio_dispatch_handlers(AioContext *ctx)
36
(revents & (G_IO_OUT | G_IO_ERR)) &&
37
aio_node_check(ctx, node->is_external) &&
38
node->io_write) {
39
+ aio_context_acquire(ctx);
40
node->io_write(node->opaque);
41
+ aio_context_release(ctx);
42
progress = true;
43
}
44
45
@@ -XXX,XX +XXX,XX @@ bool aio_dispatch(AioContext *ctx, bool dispatch_fds)
46
}
14
}
47
15
48
/* Run our timers */
16
- /* zero out unspecified fileds */
49
+ aio_context_acquire(ctx);
17
+ /* zero out unspecified fields */
50
progress |= timerlistgroup_run_timers(&ctx->tlg);
18
*server = (VuServer) {
51
+ aio_context_release(ctx);
19
.listener = listener,
52
20
.vu_iface = vu_iface,
53
return progress;
54
}
55
@@ -XXX,XX +XXX,XX @@ bool aio_poll(AioContext *ctx, bool blocking)
56
int64_t timeout;
57
int64_t start = 0;
58
59
- aio_context_acquire(ctx);
60
- progress = false;
61
-
62
/* aio_notify can avoid the expensive event_notifier_set if
63
* everything (file descriptors, bottom halves, timers) will
64
* be re-evaluated before the next blocking poll(). This is
65
@@ -XXX,XX +XXX,XX @@ bool aio_poll(AioContext *ctx, bool blocking)
66
start = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
67
}
68
69
- if (try_poll_mode(ctx, blocking)) {
70
- progress = true;
71
- } else {
72
+ aio_context_acquire(ctx);
73
+ progress = try_poll_mode(ctx, blocking);
74
+ aio_context_release(ctx);
75
+
76
+ if (!progress) {
77
assert(npfd == 0);
78
79
/* fill pollfds */
80
@@ -XXX,XX +XXX,XX @@ bool aio_poll(AioContext *ctx, bool blocking)
81
timeout = blocking ? aio_compute_timeout(ctx) : 0;
82
83
/* wait until next event */
84
- if (timeout) {
85
- aio_context_release(ctx);
86
- }
87
if (aio_epoll_check_poll(ctx, pollfds, npfd, timeout)) {
88
AioHandler epoll_handler;
89
90
@@ -XXX,XX +XXX,XX @@ bool aio_poll(AioContext *ctx, bool blocking)
91
} else {
92
ret = qemu_poll_ns(pollfds, npfd, timeout);
93
}
94
- if (timeout) {
95
- aio_context_acquire(ctx);
96
- }
97
}
98
99
if (blocking) {
100
@@ -XXX,XX +XXX,XX @@ bool aio_poll(AioContext *ctx, bool blocking)
101
progress = true;
102
}
103
104
- aio_context_release(ctx);
105
-
106
return progress;
107
}
108
109
diff --git a/util/aio-win32.c b/util/aio-win32.c
110
index XXXXXXX..XXXXXXX 100644
111
--- a/util/aio-win32.c
112
+++ b/util/aio-win32.c
113
@@ -XXX,XX +XXX,XX @@ static bool aio_dispatch_handlers(AioContext *ctx, HANDLE event)
114
(revents || event_notifier_get_handle(node->e) == event) &&
115
node->io_notify) {
116
node->pfd.revents = 0;
117
+ aio_context_acquire(ctx);
118
node->io_notify(node->e);
119
+ aio_context_release(ctx);
120
121
/* aio_notify() does not count as progress */
122
if (node->e != &ctx->notifier) {
123
@@ -XXX,XX +XXX,XX @@ static bool aio_dispatch_handlers(AioContext *ctx, HANDLE event)
124
(node->io_read || node->io_write)) {
125
node->pfd.revents = 0;
126
if ((revents & G_IO_IN) && node->io_read) {
127
+ aio_context_acquire(ctx);
128
node->io_read(node->opaque);
129
+ aio_context_release(ctx);
130
progress = true;
131
}
132
if ((revents & G_IO_OUT) && node->io_write) {
133
+ aio_context_acquire(ctx);
134
node->io_write(node->opaque);
135
+ aio_context_release(ctx);
136
progress = true;
137
}
138
139
@@ -XXX,XX +XXX,XX @@ bool aio_poll(AioContext *ctx, bool blocking)
140
int count;
141
int timeout;
142
143
- aio_context_acquire(ctx);
144
progress = false;
145
146
/* aio_notify can avoid the expensive event_notifier_set if
147
@@ -XXX,XX +XXX,XX @@ bool aio_poll(AioContext *ctx, bool blocking)
148
149
timeout = blocking && !have_select_revents
150
? qemu_timeout_ns_to_ms(aio_compute_timeout(ctx)) : 0;
151
- if (timeout) {
152
- aio_context_release(ctx);
153
- }
154
ret = WaitForMultipleObjects(count, events, FALSE, timeout);
155
if (blocking) {
156
assert(first);
157
atomic_sub(&ctx->notify_me, 2);
158
}
159
- if (timeout) {
160
- aio_context_acquire(ctx);
161
- }
162
163
if (first) {
164
aio_notify_accept(ctx);
165
@@ -XXX,XX +XXX,XX @@ bool aio_poll(AioContext *ctx, bool blocking)
166
progress |= aio_dispatch_handlers(ctx, event);
167
} while (count > 0);
168
169
+ aio_context_acquire(ctx);
170
progress |= timerlistgroup_run_timers(&ctx->tlg);
171
-
172
aio_context_release(ctx);
173
return progress;
174
}
175
diff --git a/util/async.c b/util/async.c
176
index XXXXXXX..XXXXXXX 100644
177
--- a/util/async.c
178
+++ b/util/async.c
179
@@ -XXX,XX +XXX,XX @@ int aio_bh_poll(AioContext *ctx)
180
ret = 1;
181
}
182
bh->idle = 0;
183
+ aio_context_acquire(ctx);
184
aio_bh_call(bh);
185
+ aio_context_release(ctx);
186
}
187
if (bh->deleted) {
188
deleted = true;
189
--
21
--
190
2.9.3
22
2.26.2
191
23
192
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
We already have access to the value with the correct type (ioc and sioc
2
are the same QIOChannel).
2
3
3
Running a very small critical section on pthread_mutex_t and CoMutex
4
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
shows that pthread_mutex_t is much faster because it doesn't actually
5
Message-id: 20200924151549.913737-4-stefanha@redhat.com
5
go to sleep. What happens is that the critical section is shorter
6
than the latency of entering the kernel and thus FUTEX_WAIT always
7
fails. With CoMutex there is no such latency but you still want to
8
avoid wait and wakeup. So introduce it artificially.
9
10
This only works with one waiters; because CoMutex is fair, it will
11
always have more waits and wakeups than a pthread_mutex_t.
12
13
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
14
Reviewed-by: Fam Zheng <famz@redhat.com>
15
Message-id: 20170213181244.16297-3-pbonzini@redhat.com
16
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
17
---
7
---
18
include/qemu/coroutine.h | 5 +++++
8
util/vhost-user-server.c | 2 +-
19
util/qemu-coroutine-lock.c | 51 ++++++++++++++++++++++++++++++++++++++++------
9
1 file changed, 1 insertion(+), 1 deletion(-)
20
util/qemu-coroutine.c | 2 +-
21
3 files changed, 51 insertions(+), 7 deletions(-)
22
10
23
diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
11
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
24
index XXXXXXX..XXXXXXX 100644
12
index XXXXXXX..XXXXXXX 100644
25
--- a/include/qemu/coroutine.h
13
--- a/util/vhost-user-server.c
26
+++ b/include/qemu/coroutine.h
14
+++ b/util/vhost-user-server.c
27
@@ -XXX,XX +XXX,XX @@ typedef struct CoMutex {
15
@@ -XXX,XX +XXX,XX @@ static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
28
*/
16
server->ioc = QIO_CHANNEL(sioc);
29
unsigned locked;
17
object_ref(OBJECT(server->ioc));
30
18
qio_channel_attach_aio_context(server->ioc, server->ctx);
31
+ /* Context that is holding the lock. Useful to avoid spinning
19
- qio_channel_set_blocking(QIO_CHANNEL(server->sioc), false, NULL);
32
+ * when two coroutines on the same AioContext try to get the lock. :)
20
+ qio_channel_set_blocking(server->ioc, false, NULL);
33
+ */
21
vu_client_start(server);
34
+ AioContext *ctx;
35
+
36
/* A queue of waiters. Elements are added atomically in front of
37
* from_push. to_pop is only populated, and popped from, by whoever
38
* is in charge of the next wakeup. This can be an unlocker or,
39
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
40
index XXXXXXX..XXXXXXX 100644
41
--- a/util/qemu-coroutine-lock.c
42
+++ b/util/qemu-coroutine-lock.c
43
@@ -XXX,XX +XXX,XX @@
44
#include "qemu-common.h"
45
#include "qemu/coroutine.h"
46
#include "qemu/coroutine_int.h"
47
+#include "qemu/processor.h"
48
#include "qemu/queue.h"
49
#include "block/aio.h"
50
#include "trace.h"
51
@@ -XXX,XX +XXX,XX @@ void qemu_co_mutex_init(CoMutex *mutex)
52
memset(mutex, 0, sizeof(*mutex));
53
}
22
}
54
23
55
-static void coroutine_fn qemu_co_mutex_lock_slowpath(CoMutex *mutex)
56
+static void coroutine_fn qemu_co_mutex_wake(CoMutex *mutex, Coroutine *co)
57
+{
58
+ /* Read co before co->ctx; pairs with smp_wmb() in
59
+ * qemu_coroutine_enter().
60
+ */
61
+ smp_read_barrier_depends();
62
+ mutex->ctx = co->ctx;
63
+ aio_co_wake(co);
64
+}
65
+
66
+static void coroutine_fn qemu_co_mutex_lock_slowpath(AioContext *ctx,
67
+ CoMutex *mutex)
68
{
69
Coroutine *self = qemu_coroutine_self();
70
CoWaitRecord w;
71
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn qemu_co_mutex_lock_slowpath(CoMutex *mutex)
72
if (co == self) {
73
/* We got the lock ourselves! */
74
assert(to_wake == &w);
75
+ mutex->ctx = ctx;
76
return;
77
}
78
79
- aio_co_wake(co);
80
+ qemu_co_mutex_wake(mutex, co);
81
}
82
83
qemu_coroutine_yield();
84
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn qemu_co_mutex_lock_slowpath(CoMutex *mutex)
85
86
void coroutine_fn qemu_co_mutex_lock(CoMutex *mutex)
87
{
88
+ AioContext *ctx = qemu_get_current_aio_context();
89
Coroutine *self = qemu_coroutine_self();
90
+ int waiters, i;
91
92
- if (atomic_fetch_inc(&mutex->locked) == 0) {
93
+ /* Running a very small critical section on pthread_mutex_t and CoMutex
94
+ * shows that pthread_mutex_t is much faster because it doesn't actually
95
+ * go to sleep. What happens is that the critical section is shorter
96
+ * than the latency of entering the kernel and thus FUTEX_WAIT always
97
+ * fails. With CoMutex there is no such latency but you still want to
98
+ * avoid wait and wakeup. So introduce it artificially.
99
+ */
100
+ i = 0;
101
+retry_fast_path:
102
+ waiters = atomic_cmpxchg(&mutex->locked, 0, 1);
103
+ if (waiters != 0) {
104
+ while (waiters == 1 && ++i < 1000) {
105
+ if (atomic_read(&mutex->ctx) == ctx) {
106
+ break;
107
+ }
108
+ if (atomic_read(&mutex->locked) == 0) {
109
+ goto retry_fast_path;
110
+ }
111
+ cpu_relax();
112
+ }
113
+ waiters = atomic_fetch_inc(&mutex->locked);
114
+ }
115
+
116
+ if (waiters == 0) {
117
/* Uncontended. */
118
trace_qemu_co_mutex_lock_uncontended(mutex, self);
119
+ mutex->ctx = ctx;
120
} else {
121
- qemu_co_mutex_lock_slowpath(mutex);
122
+ qemu_co_mutex_lock_slowpath(ctx, mutex);
123
}
124
mutex->holder = self;
125
self->locks_held++;
126
@@ -XXX,XX +XXX,XX @@ void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex)
127
assert(mutex->holder == self);
128
assert(qemu_in_coroutine());
129
130
+ mutex->ctx = NULL;
131
mutex->holder = NULL;
132
self->locks_held--;
133
if (atomic_fetch_dec(&mutex->locked) == 1) {
134
@@ -XXX,XX +XXX,XX @@ void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex)
135
unsigned our_handoff;
136
137
if (to_wake) {
138
- Coroutine *co = to_wake->co;
139
- aio_co_wake(co);
140
+ qemu_co_mutex_wake(mutex, to_wake->co);
141
break;
142
}
143
144
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
145
index XXXXXXX..XXXXXXX 100644
146
--- a/util/qemu-coroutine.c
147
+++ b/util/qemu-coroutine.c
148
@@ -XXX,XX +XXX,XX @@ void qemu_coroutine_enter(Coroutine *co)
149
co->ctx = qemu_get_current_aio_context();
150
151
/* Store co->ctx before anything that stores co. Matches
152
- * barrier in aio_co_wake.
153
+ * barrier in aio_co_wake and qemu_co_mutex_wake.
154
*/
155
smp_wmb();
156
157
--
24
--
158
2.9.3
25
2.26.2
159
26
160
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
Explicitly deleting watches is not necessary since libvhost-user calls
2
remove_watch() during vu_deinit(). Add an assertion to check this
3
though.
2
4
3
Pull the increment/decrement pair out of aio_bh_poll and into the
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
callers.
6
Message-id: 20200924151549.913737-5-stefanha@redhat.com
5
6
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8
Reviewed-by: Fam Zheng <famz@redhat.com>
9
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
10
Message-id: 20170213135235.12274-18-pbonzini@redhat.com
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
8
---
13
util/aio-posix.c | 8 +++-----
9
util/vhost-user-server.c | 19 ++++---------------
14
util/aio-win32.c | 8 ++++----
10
1 file changed, 4 insertions(+), 15 deletions(-)
15
util/async.c | 12 ++++++------
16
3 files changed, 13 insertions(+), 15 deletions(-)
17
11
18
diff --git a/util/aio-posix.c b/util/aio-posix.c
12
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
19
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
20
--- a/util/aio-posix.c
14
--- a/util/vhost-user-server.c
21
+++ b/util/aio-posix.c
15
+++ b/util/vhost-user-server.c
22
@@ -XXX,XX +XXX,XX @@ static bool aio_dispatch_handlers(AioContext *ctx)
16
@@ -XXX,XX +XXX,XX @@ static void close_client(VuServer *server)
23
17
/* When this is set vu_client_trip will stop new processing vhost-user message */
24
void aio_dispatch(AioContext *ctx)
18
server->sioc = NULL;
25
{
19
26
+ qemu_lockcnt_inc(&ctx->list_lock);
20
- VuFdWatch *vu_fd_watch, *next;
27
aio_bh_poll(ctx);
21
- QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
22
- aio_set_fd_handler(server->ioc->ctx, vu_fd_watch->fd, true, NULL,
23
- NULL, NULL, NULL);
24
- }
28
-
25
-
29
- qemu_lockcnt_inc(&ctx->list_lock);
26
- while (!QTAILQ_EMPTY(&server->vu_fd_watches)) {
30
aio_dispatch_handlers(ctx);
27
- QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
31
qemu_lockcnt_dec(&ctx->list_lock);
28
- if (!vu_fd_watch->processing) {
32
29
- QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
33
@@ -XXX,XX +XXX,XX @@ bool aio_poll(AioContext *ctx, bool blocking)
30
- g_free(vu_fd_watch);
31
- }
32
- }
33
- }
34
-
35
while (server->processing_msg) {
36
if (server->ioc->read_coroutine) {
37
server->ioc->read_coroutine = NULL;
38
@@ -XXX,XX +XXX,XX @@ static void close_client(VuServer *server)
34
}
39
}
35
40
36
npfd = 0;
41
vu_deinit(&server->vu_dev);
37
- qemu_lockcnt_dec(&ctx->list_lock);
38
39
progress |= aio_bh_poll(ctx);
40
41
if (ret > 0) {
42
- qemu_lockcnt_inc(&ctx->list_lock);
43
progress |= aio_dispatch_handlers(ctx);
44
- qemu_lockcnt_dec(&ctx->list_lock);
45
}
46
47
+ qemu_lockcnt_dec(&ctx->list_lock);
48
+
42
+
49
progress |= timerlistgroup_run_timers(&ctx->tlg);
43
+ /* vu_deinit() should have called remove_watch() */
50
44
+ assert(QTAILQ_EMPTY(&server->vu_fd_watches));
51
return progress;
52
diff --git a/util/aio-win32.c b/util/aio-win32.c
53
index XXXXXXX..XXXXXXX 100644
54
--- a/util/aio-win32.c
55
+++ b/util/aio-win32.c
56
@@ -XXX,XX +XXX,XX @@ static bool aio_dispatch_handlers(AioContext *ctx, HANDLE event)
57
bool progress = false;
58
AioHandler *tmp;
59
60
- qemu_lockcnt_inc(&ctx->list_lock);
61
-
62
/*
63
* We have to walk very carefully in case aio_set_fd_handler is
64
* called while we're walking.
65
@@ -XXX,XX +XXX,XX @@ static bool aio_dispatch_handlers(AioContext *ctx, HANDLE event)
66
}
67
}
68
69
- qemu_lockcnt_dec(&ctx->list_lock);
70
return progress;
71
}
72
73
void aio_dispatch(AioContext *ctx)
74
{
75
+ qemu_lockcnt_inc(&ctx->list_lock);
76
aio_bh_poll(ctx);
77
aio_dispatch_handlers(ctx, INVALID_HANDLE_VALUE);
78
+ qemu_lockcnt_dec(&ctx->list_lock);
79
timerlistgroup_run_timers(&ctx->tlg);
80
}
81
82
@@ -XXX,XX +XXX,XX @@ bool aio_poll(AioContext *ctx, bool blocking)
83
}
84
}
85
86
- qemu_lockcnt_dec(&ctx->list_lock);
87
first = true;
88
89
/* ctx->notifier is always registered. */
90
@@ -XXX,XX +XXX,XX @@ bool aio_poll(AioContext *ctx, bool blocking)
91
progress |= aio_dispatch_handlers(ctx, event);
92
} while (count > 0);
93
94
+ qemu_lockcnt_dec(&ctx->list_lock);
95
+
45
+
96
progress |= timerlistgroup_run_timers(&ctx->tlg);
46
object_unref(OBJECT(sioc));
97
return progress;
47
object_unref(OBJECT(server->ioc));
98
}
99
diff --git a/util/async.c b/util/async.c
100
index XXXXXXX..XXXXXXX 100644
101
--- a/util/async.c
102
+++ b/util/async.c
103
@@ -XXX,XX +XXX,XX @@ void aio_bh_call(QEMUBH *bh)
104
bh->cb(bh->opaque);
105
}
106
107
-/* Multiple occurrences of aio_bh_poll cannot be called concurrently */
108
+/* Multiple occurrences of aio_bh_poll cannot be called concurrently.
109
+ * The count in ctx->list_lock is incremented before the call, and is
110
+ * not affected by the call.
111
+ */
112
int aio_bh_poll(AioContext *ctx)
113
{
114
QEMUBH *bh, **bhp, *next;
115
int ret;
116
bool deleted = false;
117
118
- qemu_lockcnt_inc(&ctx->list_lock);
119
-
120
ret = 0;
121
for (bh = atomic_rcu_read(&ctx->first_bh); bh; bh = next) {
122
next = atomic_rcu_read(&bh->next);
123
@@ -XXX,XX +XXX,XX @@ int aio_bh_poll(AioContext *ctx)
124
125
/* remove deleted bhs */
126
if (!deleted) {
127
- qemu_lockcnt_dec(&ctx->list_lock);
128
return ret;
129
}
130
131
- if (qemu_lockcnt_dec_and_lock(&ctx->list_lock)) {
132
+ if (qemu_lockcnt_dec_if_lock(&ctx->list_lock)) {
133
bhp = &ctx->first_bh;
134
while (*bhp) {
135
bh = *bhp;
136
@@ -XXX,XX +XXX,XX @@ int aio_bh_poll(AioContext *ctx)
137
bhp = &bh->next;
138
}
139
}
140
- qemu_lockcnt_unlock(&ctx->list_lock);
141
+ qemu_lockcnt_inc_and_unlock(&ctx->list_lock);
142
}
143
return ret;
144
}
48
}
145
--
49
--
146
2.9.3
50
2.26.2
147
51
148
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
Only one struct is needed per request. Drop req_data and the separate
2
VuBlockReq instance. Instead let vu_queue_pop() allocate everything at
3
once.
2
4
3
This patch prepares for the removal of unnecessary lockcnt inc/dec pairs.
5
This fixes the req_data memory leak in vu_block_virtio_process_req().
4
Extract the dispatching loop for file descriptor handlers into a new
5
function aio_dispatch_handlers, and then inline aio_dispatch into
6
aio_poll.
7
6
8
aio_dispatch can now become void.
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
8
Message-id: 20200924151549.913737-6-stefanha@redhat.com
10
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
12
Reviewed-by: Fam Zheng <famz@redhat.com>
13
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
14
Message-id: 20170213135235.12274-17-pbonzini@redhat.com
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
16
---
10
---
17
include/block/aio.h | 6 +-----
11
block/export/vhost-user-blk-server.c | 68 +++++++++-------------------
18
util/aio-posix.c | 44 ++++++++++++++------------------------------
12
1 file changed, 21 insertions(+), 47 deletions(-)
19
util/aio-win32.c | 13 ++++---------
20
util/async.c | 2 +-
21
4 files changed, 20 insertions(+), 45 deletions(-)
22
13
23
diff --git a/include/block/aio.h b/include/block/aio.h
14
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
24
index XXXXXXX..XXXXXXX 100644
15
index XXXXXXX..XXXXXXX 100644
25
--- a/include/block/aio.h
16
--- a/block/export/vhost-user-blk-server.c
26
+++ b/include/block/aio.h
17
+++ b/block/export/vhost-user-blk-server.c
27
@@ -XXX,XX +XXX,XX @@ bool aio_pending(AioContext *ctx);
18
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_inhdr {
28
/* Dispatch any pending callbacks from the GSource attached to the AioContext.
19
};
29
*
20
30
* This is used internally in the implementation of the GSource.
21
typedef struct VuBlockReq {
31
- *
22
- VuVirtqElement *elem;
32
- * @dispatch_fds: true to process fds, false to skip them
23
+ VuVirtqElement elem;
33
- * (can be used as an optimization by callers that know there
24
int64_t sector_num;
34
- * are no fds ready)
25
size_t size;
35
*/
26
struct virtio_blk_inhdr *in;
36
-bool aio_dispatch(AioContext *ctx, bool dispatch_fds);
27
@@ -XXX,XX +XXX,XX @@ static void vu_block_req_complete(VuBlockReq *req)
37
+void aio_dispatch(AioContext *ctx);
28
VuDev *vu_dev = &req->server->vu_dev;
38
29
39
/* Progress in completing AIO work to occur. This can issue new pending
30
/* IO size with 1 extra status byte */
40
* aio as a result of executing I/O completion or bh callbacks.
31
- vu_queue_push(vu_dev, req->vq, req->elem, req->size + 1);
41
diff --git a/util/aio-posix.c b/util/aio-posix.c
32
+ vu_queue_push(vu_dev, req->vq, &req->elem, req->size + 1);
42
index XXXXXXX..XXXXXXX 100644
33
vu_queue_notify(vu_dev, req->vq);
43
--- a/util/aio-posix.c
34
44
+++ b/util/aio-posix.c
35
- if (req->elem) {
45
@@ -XXX,XX +XXX,XX @@ static bool aio_dispatch_handlers(AioContext *ctx)
36
- free(req->elem);
46
AioHandler *node, *tmp;
47
bool progress = false;
48
49
- /*
50
- * We have to walk very carefully in case aio_set_fd_handler is
51
- * called while we're walking.
52
- */
53
- qemu_lockcnt_inc(&ctx->list_lock);
54
-
55
QLIST_FOREACH_SAFE_RCU(node, &ctx->aio_handlers, node, tmp) {
56
int revents;
57
58
@@ -XXX,XX +XXX,XX @@ static bool aio_dispatch_handlers(AioContext *ctx)
59
}
60
}
61
62
- qemu_lockcnt_dec(&ctx->list_lock);
63
return progress;
64
}
65
66
-/*
67
- * Note that dispatch_fds == false has the side-effect of post-poning the
68
- * freeing of deleted handlers.
69
- */
70
-bool aio_dispatch(AioContext *ctx, bool dispatch_fds)
71
+void aio_dispatch(AioContext *ctx)
72
{
73
- bool progress;
74
+ aio_bh_poll(ctx);
75
76
- /*
77
- * If there are callbacks left that have been queued, we need to call them.
78
- * Do not call select in this case, because it is possible that the caller
79
- * does not need a complete flush (as is the case for aio_poll loops).
80
- */
81
- progress = aio_bh_poll(ctx);
82
+ qemu_lockcnt_inc(&ctx->list_lock);
83
+ aio_dispatch_handlers(ctx);
84
+ qemu_lockcnt_dec(&ctx->list_lock);
85
86
- if (dispatch_fds) {
87
- progress |= aio_dispatch_handlers(ctx);
88
- }
37
- }
89
-
38
-
90
- /* Run our timers */
39
- g_free(req);
91
- progress |= timerlistgroup_run_timers(&ctx->tlg);
40
+ free(req);
41
}
42
43
static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
44
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_flush(VuBlockReq *req)
45
blk_co_flush(backend);
46
}
47
48
-struct req_data {
49
- VuServer *server;
50
- VuVirtq *vq;
51
- VuVirtqElement *elem;
52
-};
92
-
53
-
93
- return progress;
54
static void coroutine_fn vu_block_virtio_process_req(void *opaque)
94
+ timerlistgroup_run_timers(&ctx->tlg);
55
{
56
- struct req_data *data = opaque;
57
- VuServer *server = data->server;
58
- VuVirtq *vq = data->vq;
59
- VuVirtqElement *elem = data->elem;
60
+ VuBlockReq *req = opaque;
61
+ VuServer *server = req->server;
62
+ VuVirtqElement *elem = &req->elem;
63
uint32_t type;
64
- VuBlockReq *req;
65
66
VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
67
BlockBackend *backend = vdev_blk->backend;
68
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
69
struct iovec *out_iov = elem->out_sg;
70
unsigned in_num = elem->in_num;
71
unsigned out_num = elem->out_num;
72
+
73
/* refer to hw/block/virtio_blk.c */
74
if (elem->out_num < 1 || elem->in_num < 1) {
75
error_report("virtio-blk request missing headers");
76
- free(elem);
77
- return;
78
+ goto err;
79
}
80
81
- req = g_new0(VuBlockReq, 1);
82
- req->server = server;
83
- req->vq = vq;
84
- req->elem = elem;
85
-
86
if (unlikely(iov_to_buf(out_iov, out_num, 0, &req->out,
87
sizeof(req->out)) != sizeof(req->out))) {
88
error_report("virtio-blk request outhdr too short");
89
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
90
91
err:
92
free(elem);
93
- g_free(req);
94
- return;
95
}
95
}
96
96
97
/* These thread-local variables are used only in a small part of aio_poll
97
static void vu_block_process_vq(VuDev *vu_dev, int idx)
98
@@ -XXX,XX +XXX,XX @@ bool aio_poll(AioContext *ctx, bool blocking)
98
{
99
npfd = 0;
99
- VuServer *server;
100
qemu_lockcnt_dec(&ctx->list_lock);
100
- VuVirtq *vq;
101
101
- struct req_data *req_data;
102
- /* Run dispatch even if there were no readable fds to run timers */
102
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
103
- if (aio_dispatch(ctx, ret > 0)) {
103
+ VuVirtq *vq = vu_get_queue(vu_dev, idx);
104
- progress = true;
104
105
+ progress |= aio_bh_poll(ctx);
105
- server = container_of(vu_dev, VuServer, vu_dev);
106
- assert(server);
107
-
108
- vq = vu_get_queue(vu_dev, idx);
109
- assert(vq);
110
- VuVirtqElement *elem;
111
while (1) {
112
- elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) +
113
- sizeof(VuBlockReq));
114
- if (elem) {
115
- req_data = g_new0(struct req_data, 1);
116
- req_data->server = server;
117
- req_data->vq = vq;
118
- req_data->elem = elem;
119
- Coroutine *co = qemu_coroutine_create(vu_block_virtio_process_req,
120
- req_data);
121
- aio_co_enter(server->ioc->ctx, co);
122
- } else {
123
+ VuBlockReq *req;
106
+
124
+
107
+ if (ret > 0) {
125
+ req = vu_queue_pop(vu_dev, vq, sizeof(VuBlockReq));
108
+ qemu_lockcnt_inc(&ctx->list_lock);
126
+ if (!req) {
109
+ progress |= aio_dispatch_handlers(ctx);
127
break;
110
+ qemu_lockcnt_dec(&ctx->list_lock);
128
}
129
+
130
+ req->server = server;
131
+ req->vq = vq;
132
+
133
+ Coroutine *co =
134
+ qemu_coroutine_create(vu_block_virtio_process_req, req);
135
+ qemu_coroutine_enter(co);
111
}
136
}
112
113
+ progress |= timerlistgroup_run_timers(&ctx->tlg);
114
+
115
return progress;
116
}
137
}
117
138
118
diff --git a/util/aio-win32.c b/util/aio-win32.c
119
index XXXXXXX..XXXXXXX 100644
120
--- a/util/aio-win32.c
121
+++ b/util/aio-win32.c
122
@@ -XXX,XX +XXX,XX @@ static bool aio_dispatch_handlers(AioContext *ctx, HANDLE event)
123
return progress;
124
}
125
126
-bool aio_dispatch(AioContext *ctx, bool dispatch_fds)
127
+void aio_dispatch(AioContext *ctx)
128
{
129
- bool progress;
130
-
131
- progress = aio_bh_poll(ctx);
132
- if (dispatch_fds) {
133
- progress |= aio_dispatch_handlers(ctx, INVALID_HANDLE_VALUE);
134
- }
135
- progress |= timerlistgroup_run_timers(&ctx->tlg);
136
- return progress;
137
+ aio_bh_poll(ctx);
138
+ aio_dispatch_handlers(ctx, INVALID_HANDLE_VALUE);
139
+ timerlistgroup_run_timers(&ctx->tlg);
140
}
141
142
bool aio_poll(AioContext *ctx, bool blocking)
143
diff --git a/util/async.c b/util/async.c
144
index XXXXXXX..XXXXXXX 100644
145
--- a/util/async.c
146
+++ b/util/async.c
147
@@ -XXX,XX +XXX,XX @@ aio_ctx_dispatch(GSource *source,
148
AioContext *ctx = (AioContext *) source;
149
150
assert(callback == NULL);
151
- aio_dispatch(ctx, true);
152
+ aio_dispatch(ctx);
153
return true;
154
}
155
156
--
139
--
157
2.9.3
140
2.26.2
158
141
159
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
The device panic notifier callback is not used. Drop it.
2
2
3
qcow2_create2 calls this. Do not run a nested event loop, as that
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
breaks when aio_co_wake tries to queue the coroutine on the co_queue_wakeup
4
Message-id: 20200924151549.913737-7-stefanha@redhat.com
5
list of the currently running one.
6
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9
Reviewed-by: Fam Zheng <famz@redhat.com>
10
Message-id: 20170213135235.12274-4-pbonzini@redhat.com
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
6
---
13
block/block-backend.c | 12 ++++++++----
7
util/vhost-user-server.h | 3 ---
14
1 file changed, 8 insertions(+), 4 deletions(-)
8
block/export/vhost-user-blk-server.c | 3 +--
9
util/vhost-user-server.c | 6 ------
10
3 files changed, 1 insertion(+), 11 deletions(-)
15
11
16
diff --git a/block/block-backend.c b/block/block-backend.c
12
diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
17
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
18
--- a/block/block-backend.c
14
--- a/util/vhost-user-server.h
19
+++ b/block/block-backend.c
15
+++ b/util/vhost-user-server.h
20
@@ -XXX,XX +XXX,XX @@ static int blk_prw(BlockBackend *blk, int64_t offset, uint8_t *buf,
16
@@ -XXX,XX +XXX,XX @@ typedef struct VuFdWatch {
17
} VuFdWatch;
18
19
typedef struct VuServer VuServer;
20
-typedef void DevicePanicNotifierFn(VuServer *server);
21
22
struct VuServer {
23
QIONetListener *listener;
24
AioContext *ctx;
25
- DevicePanicNotifierFn *device_panic_notifier;
26
int max_queues;
27
const VuDevIface *vu_iface;
28
VuDev vu_dev;
29
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
30
SocketAddress *unix_socket,
31
AioContext *ctx,
32
uint16_t max_queues,
33
- DevicePanicNotifierFn *device_panic_notifier,
34
const VuDevIface *vu_iface,
35
Error **errp);
36
37
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
38
index XXXXXXX..XXXXXXX 100644
39
--- a/block/export/vhost-user-blk-server.c
40
+++ b/block/export/vhost-user-blk-server.c
41
@@ -XXX,XX +XXX,XX @@ static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
42
ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
43
44
if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
45
- VHOST_USER_BLK_MAX_QUEUES,
46
- NULL, &vu_block_iface,
47
+ VHOST_USER_BLK_MAX_QUEUES, &vu_block_iface,
48
errp)) {
49
goto error;
50
}
51
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
52
index XXXXXXX..XXXXXXX 100644
53
--- a/util/vhost-user-server.c
54
+++ b/util/vhost-user-server.c
55
@@ -XXX,XX +XXX,XX @@ static void panic_cb(VuDev *vu_dev, const char *buf)
56
close_client(server);
57
}
58
59
- if (server->device_panic_notifier) {
60
- server->device_panic_notifier(server);
61
- }
62
-
63
/*
64
* Set the callback function for network listener so another
65
* vhost-user client can connect to this server
66
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
67
SocketAddress *socket_addr,
68
AioContext *ctx,
69
uint16_t max_queues,
70
- DevicePanicNotifierFn *device_panic_notifier,
71
const VuDevIface *vu_iface,
72
Error **errp)
21
{
73
{
22
QEMUIOVector qiov;
74
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
23
struct iovec iov;
75
.vu_iface = vu_iface,
24
- Coroutine *co;
76
.max_queues = max_queues,
25
BlkRwCo rwco;
77
.ctx = ctx,
26
78
- .device_panic_notifier = device_panic_notifier,
27
iov = (struct iovec) {
28
@@ -XXX,XX +XXX,XX @@ static int blk_prw(BlockBackend *blk, int64_t offset, uint8_t *buf,
29
.ret = NOT_DONE,
30
};
79
};
31
80
32
- co = qemu_coroutine_create(co_entry, &rwco);
81
qio_net_listener_set_name(server->listener, "vhost-user-backend-listener");
33
- qemu_coroutine_enter(co);
34
- BDRV_POLL_WHILE(blk_bs(blk), rwco.ret == NOT_DONE);
35
+ if (qemu_in_coroutine()) {
36
+ /* Fast-path if already in coroutine context */
37
+ co_entry(&rwco);
38
+ } else {
39
+ Coroutine *co = qemu_coroutine_create(co_entry, &rwco);
40
+ qemu_coroutine_enter(co);
41
+ BDRV_POLL_WHILE(blk_bs(blk), rwco.ret == NOT_DONE);
42
+ }
43
44
return rwco.ret;
45
}
46
--
82
--
47
2.9.3
83
2.26.2
48
84
49
diff view generated by jsdifflib
New patch
1
fds[] is leaked when qio_channel_readv_full() fails.
1
2
3
Use vmsg->fds[] instead of keeping a local fds[] array. Then we can
4
reuse goto fail to clean up fds. vmsg->fd_num must be zeroed before the
5
loop to make this safe.
6
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Message-id: 20200924151549.913737-8-stefanha@redhat.com
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
11
util/vhost-user-server.c | 50 ++++++++++++++++++----------------------
12
1 file changed, 23 insertions(+), 27 deletions(-)
13
14
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
15
index XXXXXXX..XXXXXXX 100644
16
--- a/util/vhost-user-server.c
17
+++ b/util/vhost-user-server.c
18
@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
19
};
20
int rc, read_bytes = 0;
21
Error *local_err = NULL;
22
- /*
23
- * Store fds/nfds returned from qio_channel_readv_full into
24
- * temporary variables.
25
- *
26
- * VhostUserMsg is a packed structure, gcc will complain about passing
27
- * pointer to a packed structure member if we pass &VhostUserMsg.fd_num
28
- * and &VhostUserMsg.fds directly when calling qio_channel_readv_full,
29
- * thus two temporary variables nfds and fds are used here.
30
- */
31
- size_t nfds = 0, nfds_t = 0;
32
const size_t max_fds = G_N_ELEMENTS(vmsg->fds);
33
- int *fds_t = NULL;
34
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
35
QIOChannel *ioc = server->ioc;
36
37
+ vmsg->fd_num = 0;
38
if (!ioc) {
39
error_report_err(local_err);
40
goto fail;
41
@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
42
43
assert(qemu_in_coroutine());
44
do {
45
+ size_t nfds = 0;
46
+ int *fds = NULL;
47
+
48
/*
49
* qio_channel_readv_full may have short reads, keeping calling it
50
* until getting VHOST_USER_HDR_SIZE or 0 bytes in total
51
*/
52
- rc = qio_channel_readv_full(ioc, &iov, 1, &fds_t, &nfds_t, &local_err);
53
+ rc = qio_channel_readv_full(ioc, &iov, 1, &fds, &nfds, &local_err);
54
if (rc < 0) {
55
if (rc == QIO_CHANNEL_ERR_BLOCK) {
56
+ assert(local_err == NULL);
57
qio_channel_yield(ioc, G_IO_IN);
58
continue;
59
} else {
60
error_report_err(local_err);
61
- return false;
62
+ goto fail;
63
}
64
}
65
- read_bytes += rc;
66
- if (nfds_t > 0) {
67
- if (nfds + nfds_t > max_fds) {
68
+
69
+ if (nfds > 0) {
70
+ if (vmsg->fd_num + nfds > max_fds) {
71
error_report("A maximum of %zu fds are allowed, "
72
"however got %zu fds now",
73
- max_fds, nfds + nfds_t);
74
+ max_fds, vmsg->fd_num + nfds);
75
+ g_free(fds);
76
goto fail;
77
}
78
- memcpy(vmsg->fds + nfds, fds_t,
79
- nfds_t *sizeof(vmsg->fds[0]));
80
- nfds += nfds_t;
81
- g_free(fds_t);
82
+ memcpy(vmsg->fds + vmsg->fd_num, fds, nfds * sizeof(vmsg->fds[0]));
83
+ vmsg->fd_num += nfds;
84
+ g_free(fds);
85
}
86
- if (read_bytes == VHOST_USER_HDR_SIZE || rc == 0) {
87
- break;
88
+
89
+ if (rc == 0) { /* socket closed */
90
+ goto fail;
91
}
92
- iov.iov_base = (char *)vmsg + read_bytes;
93
- iov.iov_len = VHOST_USER_HDR_SIZE - read_bytes;
94
- } while (true);
95
96
- vmsg->fd_num = nfds;
97
+ iov.iov_base += rc;
98
+ iov.iov_len -= rc;
99
+ read_bytes += rc;
100
+ } while (read_bytes != VHOST_USER_HDR_SIZE);
101
+
102
/* qio_channel_readv_full will make socket fds blocking, unblock them */
103
vmsg_unblock_fds(vmsg);
104
if (vmsg->size > sizeof(vmsg->payload)) {
105
--
106
2.26.2
107
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
Unexpected EOF is an error that must be reported.
2
2
3
This adds a CoMutex around the existing CoQueue. Because the write-side
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
can just take CoMutex, the old "writer" field is not necessary anymore.
4
Message-id: 20200924151549.913737-9-stefanha@redhat.com
5
Instead of removing it altogether, count the number of pending writers
6
during a read-side critical section and forbid further readers from
7
entering.
8
9
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
10
Reviewed-by: Fam Zheng <famz@redhat.com>
11
Message-id: 20170213181244.16297-7-pbonzini@redhat.com
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
---
6
---
14
include/qemu/coroutine.h | 3 ++-
7
util/vhost-user-server.c | 6 ++++--
15
util/qemu-coroutine-lock.c | 35 ++++++++++++++++++++++++-----------
8
1 file changed, 4 insertions(+), 2 deletions(-)
16
2 files changed, 26 insertions(+), 12 deletions(-)
17
9
18
diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
10
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
19
index XXXXXXX..XXXXXXX 100644
11
index XXXXXXX..XXXXXXX 100644
20
--- a/include/qemu/coroutine.h
12
--- a/util/vhost-user-server.c
21
+++ b/include/qemu/coroutine.h
13
+++ b/util/vhost-user-server.c
22
@@ -XXX,XX +XXX,XX @@ bool qemu_co_queue_empty(CoQueue *queue);
14
@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
23
15
};
24
16
if (vmsg->size) {
25
typedef struct CoRwlock {
17
rc = qio_channel_readv_all_eof(ioc, &iov_payload, 1, &local_err);
26
- bool writer;
18
- if (rc == -1) {
27
+ int pending_writer;
19
- error_report_err(local_err);
28
int reader;
20
+ if (rc != 1) {
29
+ CoMutex mutex;
21
+ if (local_err) {
30
CoQueue queue;
22
+ error_report_err(local_err);
31
} CoRwlock;
23
+ }
32
24
goto fail;
33
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
34
index XXXXXXX..XXXXXXX 100644
35
--- a/util/qemu-coroutine-lock.c
36
+++ b/util/qemu-coroutine-lock.c
37
@@ -XXX,XX +XXX,XX @@ void qemu_co_rwlock_init(CoRwlock *lock)
38
{
39
memset(lock, 0, sizeof(*lock));
40
qemu_co_queue_init(&lock->queue);
41
+ qemu_co_mutex_init(&lock->mutex);
42
}
43
44
void qemu_co_rwlock_rdlock(CoRwlock *lock)
45
{
46
Coroutine *self = qemu_coroutine_self();
47
48
- while (lock->writer) {
49
- qemu_co_queue_wait(&lock->queue, NULL);
50
+ qemu_co_mutex_lock(&lock->mutex);
51
+ /* For fairness, wait if a writer is in line. */
52
+ while (lock->pending_writer) {
53
+ qemu_co_queue_wait(&lock->queue, &lock->mutex);
54
}
55
lock->reader++;
56
+ qemu_co_mutex_unlock(&lock->mutex);
57
+
58
+ /* The rest of the read-side critical section is run without the mutex. */
59
self->locks_held++;
60
}
61
62
@@ -XXX,XX +XXX,XX @@ void qemu_co_rwlock_unlock(CoRwlock *lock)
63
Coroutine *self = qemu_coroutine_self();
64
65
assert(qemu_in_coroutine());
66
- if (lock->writer) {
67
- lock->writer = false;
68
+ if (!lock->reader) {
69
+ /* The critical section started in qemu_co_rwlock_wrlock. */
70
qemu_co_queue_restart_all(&lock->queue);
71
} else {
72
+ self->locks_held--;
73
+
74
+ qemu_co_mutex_lock(&lock->mutex);
75
lock->reader--;
76
assert(lock->reader >= 0);
77
/* Wakeup only one waiting writer */
78
@@ -XXX,XX +XXX,XX @@ void qemu_co_rwlock_unlock(CoRwlock *lock)
79
qemu_co_queue_next(&lock->queue);
80
}
25
}
81
}
26
}
82
- self->locks_held--;
83
+ qemu_co_mutex_unlock(&lock->mutex);
84
}
85
86
void qemu_co_rwlock_wrlock(CoRwlock *lock)
87
{
88
- Coroutine *self = qemu_coroutine_self();
89
-
90
- while (lock->writer || lock->reader) {
91
- qemu_co_queue_wait(&lock->queue, NULL);
92
+ qemu_co_mutex_lock(&lock->mutex);
93
+ lock->pending_writer++;
94
+ while (lock->reader) {
95
+ qemu_co_queue_wait(&lock->queue, &lock->mutex);
96
}
97
- lock->writer = true;
98
- self->locks_held++;
99
+ lock->pending_writer--;
100
+
101
+ /* The rest of the write-side critical section is run with
102
+ * the mutex taken, so that lock->reader remains zero.
103
+ * There is no need to update self->locks_held.
104
+ */
105
}
106
--
27
--
107
2.9.3
28
2.26.2
108
29
109
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
The vu_client_trip() coroutine is leaked during AioContext switching. It
2
is also unsafe to destroy the vu_dev in panic_cb() since its callers
3
still access it in some cases.
2
4
3
This covers both file descriptor callbacks and polling callbacks,
5
Rework the lifecycle to solve these safety issues.
4
since they execute related code.
5
6
6
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
8
Message-id: 20200924151549.913737-10-stefanha@redhat.com
8
Reviewed-by: Fam Zheng <famz@redhat.com>
9
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
10
Message-id: 20170213135235.12274-14-pbonzini@redhat.com
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
10
---
13
block/curl.c | 16 +++++++++++++---
11
util/vhost-user-server.h | 29 ++--
14
block/iscsi.c | 4 ++++
12
block/export/vhost-user-blk-server.c | 9 +-
15
block/linux-aio.c | 4 ++++
13
util/vhost-user-server.c | 245 +++++++++++++++------------
16
block/nfs.c | 6 ++++++
14
3 files changed, 155 insertions(+), 128 deletions(-)
17
block/sheepdog.c | 29 +++++++++++++++--------------
18
block/ssh.c | 29 +++++++++--------------------
19
block/win32-aio.c | 10 ++++++----
20
hw/block/virtio-blk.c | 5 ++++-
21
hw/scsi/virtio-scsi.c | 7 +++++++
22
util/aio-posix.c | 7 -------
23
util/aio-win32.c | 6 ------
24
11 files changed, 68 insertions(+), 55 deletions(-)
25
15
26
diff --git a/block/curl.c b/block/curl.c
16
diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
27
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
28
--- a/block/curl.c
18
--- a/util/vhost-user-server.h
29
+++ b/block/curl.c
19
+++ b/util/vhost-user-server.h
30
@@ -XXX,XX +XXX,XX @@ static void curl_multi_check_completion(BDRVCURLState *s)
20
@@ -XXX,XX +XXX,XX @@
31
}
21
#include "qapi/error.h"
32
}
22
#include "standard-headers/linux/virtio_blk.h"
33
23
34
-static void curl_multi_do(void *arg)
24
+/* A kick fd that we monitor on behalf of libvhost-user */
35
+static void curl_multi_do_locked(CURLState *s)
25
typedef struct VuFdWatch {
36
{
26
VuDev *vu_dev;
37
- CURLState *s = (CURLState *)arg;
27
int fd; /*kick fd*/
38
CURLSocket *socket, *next_socket;
28
void *pvt;
39
int running;
29
vu_watch_cb cb;
40
int r;
30
- bool processing;
41
@@ -XXX,XX +XXX,XX @@ static void curl_multi_do(void *arg)
31
QTAILQ_ENTRY(VuFdWatch) next;
42
}
32
} VuFdWatch;
43
}
33
44
34
-typedef struct VuServer VuServer;
45
+static void curl_multi_do(void *arg)
35
-
36
-struct VuServer {
37
+/**
38
+ * VuServer:
39
+ * A vhost-user server instance with user-defined VuDevIface callbacks.
40
+ * Vhost-user device backends can be implemented using VuServer. VuDevIface
41
+ * callbacks and virtqueue kicks run in the given AioContext.
42
+ */
43
+typedef struct {
44
QIONetListener *listener;
45
+ QEMUBH *restart_listener_bh;
46
AioContext *ctx;
47
int max_queues;
48
const VuDevIface *vu_iface;
49
+
50
+ /* Protected by ctx lock */
51
VuDev vu_dev;
52
QIOChannel *ioc; /* The I/O channel with the client */
53
QIOChannelSocket *sioc; /* The underlying data channel with the client */
54
- /* IOChannel for fd provided via VHOST_USER_SET_SLAVE_REQ_FD */
55
- QIOChannel *ioc_slave;
56
- QIOChannelSocket *sioc_slave;
57
- Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
58
QTAILQ_HEAD(, VuFdWatch) vu_fd_watches;
59
- /* restart coroutine co_trip if AIOContext is changed */
60
- bool aio_context_changed;
61
- bool processing_msg;
62
-};
63
+
64
+ Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
65
+} VuServer;
66
67
bool vhost_user_server_start(VuServer *server,
68
SocketAddress *unix_socket,
69
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
70
71
void vhost_user_server_stop(VuServer *server);
72
73
-void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx);
74
+void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx);
75
+void vhost_user_server_detach_aio_context(VuServer *server);
76
77
#endif /* VHOST_USER_SERVER_H */
78
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
79
index XXXXXXX..XXXXXXX 100644
80
--- a/block/export/vhost-user-blk-server.c
81
+++ b/block/export/vhost-user-blk-server.c
82
@@ -XXX,XX +XXX,XX @@ static const VuDevIface vu_block_iface = {
83
static void blk_aio_attached(AioContext *ctx, void *opaque)
84
{
85
VuBlockDev *vub_dev = opaque;
86
- aio_context_acquire(ctx);
87
- vhost_user_server_set_aio_context(&vub_dev->vu_server, ctx);
88
- aio_context_release(ctx);
89
+ vhost_user_server_attach_aio_context(&vub_dev->vu_server, ctx);
90
}
91
92
static void blk_aio_detach(void *opaque)
93
{
94
VuBlockDev *vub_dev = opaque;
95
- AioContext *ctx = vub_dev->vu_server.ctx;
96
- aio_context_acquire(ctx);
97
- vhost_user_server_set_aio_context(&vub_dev->vu_server, NULL);
98
- aio_context_release(ctx);
99
+ vhost_user_server_detach_aio_context(&vub_dev->vu_server);
100
}
101
102
static void
103
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
104
index XXXXXXX..XXXXXXX 100644
105
--- a/util/vhost-user-server.c
106
+++ b/util/vhost-user-server.c
107
@@ -XXX,XX +XXX,XX @@
108
*/
109
#include "qemu/osdep.h"
110
#include "qemu/main-loop.h"
111
+#include "block/aio-wait.h"
112
#include "vhost-user-server.h"
113
114
+/*
115
+ * Theory of operation:
116
+ *
117
+ * VuServer is started and stopped by vhost_user_server_start() and
118
+ * vhost_user_server_stop() from the main loop thread. Starting the server
119
+ * opens a vhost-user UNIX domain socket and listens for incoming connections.
120
+ * Only one connection is allowed at a time.
121
+ *
122
+ * The connection is handled by the vu_client_trip() coroutine in the
123
+ * VuServer->ctx AioContext. The coroutine consists of a vu_dispatch() loop
124
+ * where libvhost-user calls vu_message_read() to receive the next vhost-user
125
+ * protocol messages over the UNIX domain socket.
126
+ *
127
+ * When virtqueues are set up libvhost-user calls set_watch() to monitor kick
128
+ * fds. These fds are also handled in the VuServer->ctx AioContext.
129
+ *
130
+ * Both vu_client_trip() and kick fd monitoring can be stopped by shutting down
131
+ * the socket connection. Shutting down the socket connection causes
132
+ * vu_message_read() to fail since no more data can be received from the socket.
133
+ * After vu_dispatch() fails, vu_client_trip() calls vu_deinit() to stop
134
+ * libvhost-user before terminating the coroutine. vu_deinit() calls
135
+ * remove_watch() to stop monitoring kick fds and this stops virtqueue
136
+ * processing.
137
+ *
138
+ * When vu_client_trip() has finished cleaning up it schedules a BH in the main
139
+ * loop thread to accept the next client connection.
140
+ *
141
+ * When libvhost-user detects an error it calls panic_cb() and sets the
142
+ * dev->broken flag. Both vu_client_trip() and kick fd processing stop when
143
+ * the dev->broken flag is set.
144
+ *
145
+ * It is possible to switch AioContexts using
146
+ * vhost_user_server_detach_aio_context() and
147
+ * vhost_user_server_attach_aio_context(). They stop monitoring fds in the old
148
+ * AioContext and resume monitoring in the new AioContext. The vu_client_trip()
149
+ * coroutine remains in a yielded state during the switch. This is made
150
+ * possible by QIOChannel's support for spurious coroutine re-entry in
151
+ * qio_channel_yield(). The coroutine will restart I/O when re-entered from the
152
+ * new AioContext.
153
+ */
154
+
155
static void vmsg_close_fds(VhostUserMsg *vmsg)
156
{
157
int i;
158
@@ -XXX,XX +XXX,XX @@ static void vmsg_unblock_fds(VhostUserMsg *vmsg)
159
}
160
}
161
162
-static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
163
- gpointer opaque);
164
-
165
-static void close_client(VuServer *server)
166
-{
167
- /*
168
- * Before closing the client
169
- *
170
- * 1. Let vu_client_trip stop processing new vhost-user msg
171
- *
172
- * 2. remove kick_handler
173
- *
174
- * 3. wait for the kick handler to be finished
175
- *
176
- * 4. wait for the current vhost-user msg to be finished processing
177
- */
178
-
179
- QIOChannelSocket *sioc = server->sioc;
180
- /* When this is set vu_client_trip will stop new processing vhost-user message */
181
- server->sioc = NULL;
182
-
183
- while (server->processing_msg) {
184
- if (server->ioc->read_coroutine) {
185
- server->ioc->read_coroutine = NULL;
186
- qio_channel_set_aio_fd_handler(server->ioc, server->ioc->ctx, NULL,
187
- NULL, server->ioc);
188
- server->processing_msg = false;
189
- }
190
- }
191
-
192
- vu_deinit(&server->vu_dev);
193
-
194
- /* vu_deinit() should have called remove_watch() */
195
- assert(QTAILQ_EMPTY(&server->vu_fd_watches));
196
-
197
- object_unref(OBJECT(sioc));
198
- object_unref(OBJECT(server->ioc));
199
-}
200
-
201
static void panic_cb(VuDev *vu_dev, const char *buf)
202
{
203
- VuServer *server = container_of(vu_dev, VuServer, vu_dev);
204
-
205
- /* avoid while loop in close_client */
206
- server->processing_msg = false;
207
-
208
- if (buf) {
209
- error_report("vu_panic: %s", buf);
210
- }
211
-
212
- if (server->sioc) {
213
- close_client(server);
214
- }
215
-
216
- /*
217
- * Set the callback function for network listener so another
218
- * vhost-user client can connect to this server
219
- */
220
- qio_net_listener_set_client_func(server->listener,
221
- vu_accept,
222
- server,
223
- NULL);
224
+ error_report("vu_panic: %s", buf);
225
}
226
227
static bool coroutine_fn
228
@@ -XXX,XX +XXX,XX @@ fail:
229
return false;
230
}
231
232
-
233
-static void vu_client_start(VuServer *server);
234
static coroutine_fn void vu_client_trip(void *opaque)
235
{
236
VuServer *server = opaque;
237
+ VuDev *vu_dev = &server->vu_dev;
238
239
- while (!server->aio_context_changed && server->sioc) {
240
- server->processing_msg = true;
241
- vu_dispatch(&server->vu_dev);
242
- server->processing_msg = false;
243
+ while (!vu_dev->broken && vu_dispatch(vu_dev)) {
244
+ /* Keep running */
245
}
246
247
- if (server->aio_context_changed && server->sioc) {
248
- server->aio_context_changed = false;
249
- vu_client_start(server);
250
- }
251
-}
252
+ vu_deinit(vu_dev);
253
+
254
+ /* vu_deinit() should have called remove_watch() */
255
+ assert(QTAILQ_EMPTY(&server->vu_fd_watches));
256
+
257
+ object_unref(OBJECT(server->sioc));
258
+ server->sioc = NULL;
259
260
-static void vu_client_start(VuServer *server)
261
-{
262
- server->co_trip = qemu_coroutine_create(vu_client_trip, server);
263
- aio_co_enter(server->ctx, server->co_trip);
264
+ object_unref(OBJECT(server->ioc));
265
+ server->ioc = NULL;
266
+
267
+ server->co_trip = NULL;
268
+ if (server->restart_listener_bh) {
269
+ qemu_bh_schedule(server->restart_listener_bh);
270
+ }
271
+ aio_wait_kick();
272
}
273
274
/*
275
@@ -XXX,XX +XXX,XX @@ static void vu_client_start(VuServer *server)
276
static void kick_handler(void *opaque)
277
{
278
VuFdWatch *vu_fd_watch = opaque;
279
- vu_fd_watch->processing = true;
280
- vu_fd_watch->cb(vu_fd_watch->vu_dev, 0, vu_fd_watch->pvt);
281
- vu_fd_watch->processing = false;
282
+ VuDev *vu_dev = vu_fd_watch->vu_dev;
283
+
284
+ vu_fd_watch->cb(vu_dev, 0, vu_fd_watch->pvt);
285
+
286
+ /* Stop vu_client_trip() if an error occurred in vu_fd_watch->cb() */
287
+ if (vu_dev->broken) {
288
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
289
+
290
+ qio_channel_shutdown(server->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
291
+ }
292
}
293
294
-
295
static VuFdWatch *find_vu_fd_watch(VuServer *server, int fd)
296
{
297
298
@@ -XXX,XX +XXX,XX @@ static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
299
qio_channel_set_name(QIO_CHANNEL(sioc), "vhost-user client");
300
server->ioc = QIO_CHANNEL(sioc);
301
object_ref(OBJECT(server->ioc));
302
- qio_channel_attach_aio_context(server->ioc, server->ctx);
303
+
304
+ /* TODO vu_message_write() spins if non-blocking! */
305
qio_channel_set_blocking(server->ioc, false, NULL);
306
- vu_client_start(server);
307
+
308
+ server->co_trip = qemu_coroutine_create(vu_client_trip, server);
309
+
310
+ aio_context_acquire(server->ctx);
311
+ vhost_user_server_attach_aio_context(server, server->ctx);
312
+ aio_context_release(server->ctx);
313
}
314
315
-
316
void vhost_user_server_stop(VuServer *server)
317
{
318
+ aio_context_acquire(server->ctx);
319
+
320
+ qemu_bh_delete(server->restart_listener_bh);
321
+ server->restart_listener_bh = NULL;
322
+
323
if (server->sioc) {
324
- close_client(server);
325
+ VuFdWatch *vu_fd_watch;
326
+
327
+ QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
328
+ aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true,
329
+ NULL, NULL, NULL, vu_fd_watch);
330
+ }
331
+
332
+ qio_channel_shutdown(server->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
333
+
334
+ AIO_WAIT_WHILE(server->ctx, server->co_trip);
335
}
336
337
+ aio_context_release(server->ctx);
338
+
339
if (server->listener) {
340
qio_net_listener_disconnect(server->listener);
341
object_unref(OBJECT(server->listener));
342
}
343
+}
344
+
345
+/*
346
+ * Allow the next client to connect to the server. Called from a BH in the main
347
+ * loop.
348
+ */
349
+static void restart_listener_bh(void *opaque)
46
+{
350
+{
47
+ CURLState *s = (CURLState *)arg;
351
+ VuServer *server = opaque;
48
+
352
49
+ aio_context_acquire(s->s->aio_context);
353
+ qio_net_listener_set_client_func(server->listener, vu_accept, server,
50
+ curl_multi_do_locked(s);
354
+ NULL);
51
+ aio_context_release(s->s->aio_context);
355
}
356
357
-void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx)
358
+/* Called with ctx acquired */
359
+void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx)
360
{
361
- VuFdWatch *vu_fd_watch, *next;
362
- void *opaque = NULL;
363
- IOHandler *io_read = NULL;
364
- bool attach;
365
+ VuFdWatch *vu_fd_watch;
366
367
- server->ctx = ctx ? ctx : qemu_get_aio_context();
368
+ server->ctx = ctx;
369
370
if (!server->sioc) {
371
- /* not yet serving any client*/
372
return;
373
}
374
375
- if (ctx) {
376
- qio_channel_attach_aio_context(server->ioc, ctx);
377
- server->aio_context_changed = true;
378
- io_read = kick_handler;
379
- attach = true;
380
- } else {
381
+ qio_channel_attach_aio_context(server->ioc, ctx);
382
+
383
+ QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
384
+ aio_set_fd_handler(ctx, vu_fd_watch->fd, true, kick_handler, NULL,
385
+ NULL, vu_fd_watch);
386
+ }
387
+
388
+ aio_co_schedule(ctx, server->co_trip);
52
+}
389
+}
53
+
390
+
54
static void curl_multi_read(void *arg)
391
+/* Called with server->ctx acquired */
55
{
392
+void vhost_user_server_detach_aio_context(VuServer *server)
56
CURLState *s = (CURLState *)arg;
393
+{
57
394
+ if (server->sioc) {
58
- curl_multi_do(arg);
395
+ VuFdWatch *vu_fd_watch;
59
+ aio_context_acquire(s->s->aio_context);
396
+
60
+ curl_multi_do_locked(s);
397
+ QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
61
curl_multi_check_completion(s->s);
398
+ aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true,
62
+ aio_context_release(s->s->aio_context);
399
+ NULL, NULL, NULL, vu_fd_watch);
63
}
400
+ }
64
401
+
65
static void curl_multi_timeout_do(void *arg)
402
qio_channel_detach_aio_context(server->ioc);
66
diff --git a/block/iscsi.c b/block/iscsi.c
403
- /* server->ioc->ctx keeps the old AioConext */
67
index XXXXXXX..XXXXXXX 100644
404
- ctx = server->ioc->ctx;
68
--- a/block/iscsi.c
405
- attach = false;
69
+++ b/block/iscsi.c
406
}
70
@@ -XXX,XX +XXX,XX @@ iscsi_process_read(void *arg)
407
71
IscsiLun *iscsilun = arg;
408
- QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
72
struct iscsi_context *iscsi = iscsilun->iscsi;
409
- if (vu_fd_watch->cb) {
73
410
- opaque = attach ? vu_fd_watch : NULL;
74
+ aio_context_acquire(iscsilun->aio_context);
411
- aio_set_fd_handler(ctx, vu_fd_watch->fd, true,
75
iscsi_service(iscsi, POLLIN);
412
- io_read, NULL, NULL,
76
iscsi_set_events(iscsilun);
413
- opaque);
77
+ aio_context_release(iscsilun->aio_context);
414
- }
78
}
415
- }
79
416
+ server->ctx = NULL;
80
static void
417
}
81
@@ -XXX,XX +XXX,XX @@ iscsi_process_write(void *arg)
418
82
IscsiLun *iscsilun = arg;
419
-
83
struct iscsi_context *iscsi = iscsilun->iscsi;
420
bool vhost_user_server_start(VuServer *server,
84
421
SocketAddress *socket_addr,
85
+ aio_context_acquire(iscsilun->aio_context);
422
AioContext *ctx,
86
iscsi_service(iscsi, POLLOUT);
423
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
87
iscsi_set_events(iscsilun);
424
const VuDevIface *vu_iface,
88
+ aio_context_release(iscsilun->aio_context);
425
Error **errp)
89
}
426
{
90
427
+ QEMUBH *bh;
91
static int64_t sector_lun2qemu(int64_t sector, IscsiLun *iscsilun)
428
QIONetListener *listener = qio_net_listener_new();
92
diff --git a/block/linux-aio.c b/block/linux-aio.c
429
if (qio_net_listener_open_sync(listener, socket_addr, 1,
93
index XXXXXXX..XXXXXXX 100644
430
errp) < 0) {
94
--- a/block/linux-aio.c
431
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
95
+++ b/block/linux-aio.c
96
@@ -XXX,XX +XXX,XX @@ static void qemu_laio_completion_cb(EventNotifier *e)
97
LinuxAioState *s = container_of(e, LinuxAioState, e);
98
99
if (event_notifier_test_and_clear(&s->e)) {
100
+ aio_context_acquire(s->aio_context);
101
qemu_laio_process_completions_and_submit(s);
102
+ aio_context_release(s->aio_context);
103
}
104
}
105
106
@@ -XXX,XX +XXX,XX @@ static bool qemu_laio_poll_cb(void *opaque)
107
return false;
432
return false;
108
}
433
}
109
434
110
+ aio_context_acquire(s->aio_context);
435
+ bh = qemu_bh_new(restart_listener_bh, server);
111
qemu_laio_process_completions_and_submit(s);
436
+
112
+ aio_context_release(s->aio_context);
437
/* zero out unspecified fields */
113
return true;
438
*server = (VuServer) {
114
}
439
.listener = listener,
115
440
+ .restart_listener_bh = bh,
116
diff --git a/block/nfs.c b/block/nfs.c
441
.vu_iface = vu_iface,
117
index XXXXXXX..XXXXXXX 100644
442
.max_queues = max_queues,
118
--- a/block/nfs.c
443
.ctx = ctx,
119
+++ b/block/nfs.c
120
@@ -XXX,XX +XXX,XX @@ static void nfs_set_events(NFSClient *client)
121
static void nfs_process_read(void *arg)
122
{
123
NFSClient *client = arg;
124
+
125
+ aio_context_acquire(client->aio_context);
126
nfs_service(client->context, POLLIN);
127
nfs_set_events(client);
128
+ aio_context_release(client->aio_context);
129
}
130
131
static void nfs_process_write(void *arg)
132
{
133
NFSClient *client = arg;
134
+
135
+ aio_context_acquire(client->aio_context);
136
nfs_service(client->context, POLLOUT);
137
nfs_set_events(client);
138
+ aio_context_release(client->aio_context);
139
}
140
141
static void nfs_co_init_task(BlockDriverState *bs, NFSRPC *task)
142
diff --git a/block/sheepdog.c b/block/sheepdog.c
143
index XXXXXXX..XXXXXXX 100644
144
--- a/block/sheepdog.c
145
+++ b/block/sheepdog.c
146
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int send_co_req(int sockfd, SheepdogReq *hdr, void *data,
147
return ret;
148
}
149
150
-static void restart_co_req(void *opaque)
151
-{
152
- Coroutine *co = opaque;
153
-
154
- qemu_coroutine_enter(co);
155
-}
156
-
157
typedef struct SheepdogReqCo {
158
int sockfd;
159
BlockDriverState *bs;
160
@@ -XXX,XX +XXX,XX @@ typedef struct SheepdogReqCo {
161
unsigned int *rlen;
162
int ret;
163
bool finished;
164
+ Coroutine *co;
165
} SheepdogReqCo;
166
167
+static void restart_co_req(void *opaque)
168
+{
169
+ SheepdogReqCo *srco = opaque;
170
+
171
+ aio_co_wake(srco->co);
172
+}
173
+
174
static coroutine_fn void do_co_req(void *opaque)
175
{
176
int ret;
177
- Coroutine *co;
178
SheepdogReqCo *srco = opaque;
179
int sockfd = srco->sockfd;
180
SheepdogReq *hdr = srco->hdr;
181
@@ -XXX,XX +XXX,XX @@ static coroutine_fn void do_co_req(void *opaque)
182
unsigned int *wlen = srco->wlen;
183
unsigned int *rlen = srco->rlen;
184
185
- co = qemu_coroutine_self();
186
+ srco->co = qemu_coroutine_self();
187
aio_set_fd_handler(srco->aio_context, sockfd, false,
188
- NULL, restart_co_req, NULL, co);
189
+ NULL, restart_co_req, NULL, srco);
190
191
ret = send_co_req(sockfd, hdr, data, wlen);
192
if (ret < 0) {
193
@@ -XXX,XX +XXX,XX @@ static coroutine_fn void do_co_req(void *opaque)
194
}
195
196
aio_set_fd_handler(srco->aio_context, sockfd, false,
197
- restart_co_req, NULL, NULL, co);
198
+ restart_co_req, NULL, NULL, srco);
199
200
ret = qemu_co_recv(sockfd, hdr, sizeof(*hdr));
201
if (ret != sizeof(*hdr)) {
202
@@ -XXX,XX +XXX,XX @@ out:
203
aio_set_fd_handler(srco->aio_context, sockfd, false,
204
NULL, NULL, NULL, NULL);
205
206
+ srco->co = NULL;
207
srco->ret = ret;
208
srco->finished = true;
209
if (srco->bs) {
210
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn aio_read_response(void *opaque)
211
* We've finished all requests which belong to the AIOCB, so
212
* we can switch back to sd_co_readv/writev now.
213
*/
214
- qemu_coroutine_enter(acb->coroutine);
215
+ aio_co_wake(acb->coroutine);
216
}
217
218
return;
219
@@ -XXX,XX +XXX,XX @@ static void co_read_response(void *opaque)
220
s->co_recv = qemu_coroutine_create(aio_read_response, opaque);
221
}
222
223
- qemu_coroutine_enter(s->co_recv);
224
+ aio_co_wake(s->co_recv);
225
}
226
227
static void co_write_request(void *opaque)
228
{
229
BDRVSheepdogState *s = opaque;
230
231
- qemu_coroutine_enter(s->co_send);
232
+ aio_co_wake(s->co_send);
233
}
234
235
/*
236
diff --git a/block/ssh.c b/block/ssh.c
237
index XXXXXXX..XXXXXXX 100644
238
--- a/block/ssh.c
239
+++ b/block/ssh.c
240
@@ -XXX,XX +XXX,XX @@ static void restart_coroutine(void *opaque)
241
242
DPRINTF("co=%p", co);
243
244
- qemu_coroutine_enter(co);
245
+ aio_co_wake(co);
246
}
247
248
-static coroutine_fn void set_fd_handler(BDRVSSHState *s, BlockDriverState *bs)
249
+/* A non-blocking call returned EAGAIN, so yield, ensuring the
250
+ * handlers are set up so that we'll be rescheduled when there is an
251
+ * interesting event on the socket.
252
+ */
253
+static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
254
{
255
int r;
256
IOHandler *rd_handler = NULL, *wr_handler = NULL;
257
@@ -XXX,XX +XXX,XX @@ static coroutine_fn void set_fd_handler(BDRVSSHState *s, BlockDriverState *bs)
258
259
aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock,
260
false, rd_handler, wr_handler, NULL, co);
261
-}
262
-
263
-static coroutine_fn void clear_fd_handler(BDRVSSHState *s,
264
- BlockDriverState *bs)
265
-{
266
- DPRINTF("s->sock=%d", s->sock);
267
- aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock,
268
- false, NULL, NULL, NULL, NULL);
269
-}
270
-
271
-/* A non-blocking call returned EAGAIN, so yield, ensuring the
272
- * handlers are set up so that we'll be rescheduled when there is an
273
- * interesting event on the socket.
274
- */
275
-static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
276
-{
277
- set_fd_handler(s, bs);
278
qemu_coroutine_yield();
279
- clear_fd_handler(s, bs);
280
+ DPRINTF("s->sock=%d - back", s->sock);
281
+ aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock, false,
282
+ NULL, NULL, NULL, NULL);
283
}
284
285
/* SFTP has a function `libssh2_sftp_seek64' which seeks to a position
286
diff --git a/block/win32-aio.c b/block/win32-aio.c
287
index XXXXXXX..XXXXXXX 100644
288
--- a/block/win32-aio.c
289
+++ b/block/win32-aio.c
290
@@ -XXX,XX +XXX,XX @@ struct QEMUWin32AIOState {
291
HANDLE hIOCP;
292
EventNotifier e;
293
int count;
294
- bool is_aio_context_attached;
295
+ AioContext *aio_ctx;
296
};
297
298
typedef struct QEMUWin32AIOCB {
299
@@ -XXX,XX +XXX,XX @@ static void win32_aio_process_completion(QEMUWin32AIOState *s,
300
}
301
302
303
+ aio_context_acquire(s->aio_ctx);
304
waiocb->common.cb(waiocb->common.opaque, ret);
305
+ aio_context_release(s->aio_ctx);
306
qemu_aio_unref(waiocb);
307
}
308
309
@@ -XXX,XX +XXX,XX @@ void win32_aio_detach_aio_context(QEMUWin32AIOState *aio,
310
AioContext *old_context)
311
{
312
aio_set_event_notifier(old_context, &aio->e, false, NULL, NULL);
313
- aio->is_aio_context_attached = false;
314
+ aio->aio_ctx = NULL;
315
}
316
317
void win32_aio_attach_aio_context(QEMUWin32AIOState *aio,
318
AioContext *new_context)
319
{
320
- aio->is_aio_context_attached = true;
321
+ aio->aio_ctx = new_context;
322
aio_set_event_notifier(new_context, &aio->e, false,
323
win32_aio_completion_cb, NULL);
324
}
325
@@ -XXX,XX +XXX,XX @@ out_free_state:
326
327
void win32_aio_cleanup(QEMUWin32AIOState *aio)
328
{
329
- assert(!aio->is_aio_context_attached);
330
+ assert(!aio->aio_ctx);
331
CloseHandle(aio->hIOCP);
332
event_notifier_cleanup(&aio->e);
333
g_free(aio);
334
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
335
index XXXXXXX..XXXXXXX 100644
336
--- a/hw/block/virtio-blk.c
337
+++ b/hw/block/virtio-blk.c
338
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_ioctl_complete(void *opaque, int status)
339
{
340
VirtIOBlockIoctlReq *ioctl_req = opaque;
341
VirtIOBlockReq *req = ioctl_req->req;
342
- VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
343
+ VirtIOBlock *s = req->dev;
344
+ VirtIODevice *vdev = VIRTIO_DEVICE(s);
345
struct virtio_scsi_inhdr *scsi;
346
struct sg_io_hdr *hdr;
347
348
@@ -XXX,XX +XXX,XX @@ bool virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
349
MultiReqBuffer mrb = {};
350
bool progress = false;
351
352
+ aio_context_acquire(blk_get_aio_context(s->blk));
353
blk_io_plug(s->blk);
354
355
do {
356
@@ -XXX,XX +XXX,XX @@ bool virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
357
}
358
359
blk_io_unplug(s->blk);
360
+ aio_context_release(blk_get_aio_context(s->blk));
361
return progress;
362
}
363
364
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
365
index XXXXXXX..XXXXXXX 100644
366
--- a/hw/scsi/virtio-scsi.c
367
+++ b/hw/scsi/virtio-scsi.c
368
@@ -XXX,XX +XXX,XX @@ bool virtio_scsi_handle_ctrl_vq(VirtIOSCSI *s, VirtQueue *vq)
369
VirtIOSCSIReq *req;
370
bool progress = false;
371
372
+ virtio_scsi_acquire(s);
373
while ((req = virtio_scsi_pop_req(s, vq))) {
374
progress = true;
375
virtio_scsi_handle_ctrl_req(s, req);
376
}
377
+ virtio_scsi_release(s);
378
return progress;
379
}
380
381
@@ -XXX,XX +XXX,XX @@ bool virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue *vq)
382
383
QTAILQ_HEAD(, VirtIOSCSIReq) reqs = QTAILQ_HEAD_INITIALIZER(reqs);
384
385
+ virtio_scsi_acquire(s);
386
do {
387
virtio_queue_set_notification(vq, 0);
388
389
@@ -XXX,XX +XXX,XX @@ bool virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue *vq)
390
QTAILQ_FOREACH_SAFE(req, &reqs, next, next) {
391
virtio_scsi_handle_cmd_req_submit(s, req);
392
}
393
+ virtio_scsi_release(s);
394
return progress;
395
}
396
397
@@ -XXX,XX +XXX,XX @@ out:
398
399
bool virtio_scsi_handle_event_vq(VirtIOSCSI *s, VirtQueue *vq)
400
{
401
+ virtio_scsi_acquire(s);
402
if (s->events_dropped) {
403
virtio_scsi_push_event(s, NULL, VIRTIO_SCSI_T_NO_EVENT, 0);
404
+ virtio_scsi_release(s);
405
return true;
406
}
407
+ virtio_scsi_release(s);
408
return false;
409
}
410
411
diff --git a/util/aio-posix.c b/util/aio-posix.c
412
index XXXXXXX..XXXXXXX 100644
413
--- a/util/aio-posix.c
414
+++ b/util/aio-posix.c
415
@@ -XXX,XX +XXX,XX @@ static bool aio_dispatch_handlers(AioContext *ctx)
416
(revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) &&
417
aio_node_check(ctx, node->is_external) &&
418
node->io_read) {
419
- aio_context_acquire(ctx);
420
node->io_read(node->opaque);
421
- aio_context_release(ctx);
422
423
/* aio_notify() does not count as progress */
424
if (node->opaque != &ctx->notifier) {
425
@@ -XXX,XX +XXX,XX @@ static bool aio_dispatch_handlers(AioContext *ctx)
426
(revents & (G_IO_OUT | G_IO_ERR)) &&
427
aio_node_check(ctx, node->is_external) &&
428
node->io_write) {
429
- aio_context_acquire(ctx);
430
node->io_write(node->opaque);
431
- aio_context_release(ctx);
432
progress = true;
433
}
434
435
@@ -XXX,XX +XXX,XX @@ bool aio_poll(AioContext *ctx, bool blocking)
436
start = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
437
}
438
439
- aio_context_acquire(ctx);
440
progress = try_poll_mode(ctx, blocking);
441
- aio_context_release(ctx);
442
-
443
if (!progress) {
444
assert(npfd == 0);
445
446
diff --git a/util/aio-win32.c b/util/aio-win32.c
447
index XXXXXXX..XXXXXXX 100644
448
--- a/util/aio-win32.c
449
+++ b/util/aio-win32.c
450
@@ -XXX,XX +XXX,XX @@ static bool aio_dispatch_handlers(AioContext *ctx, HANDLE event)
451
(revents || event_notifier_get_handle(node->e) == event) &&
452
node->io_notify) {
453
node->pfd.revents = 0;
454
- aio_context_acquire(ctx);
455
node->io_notify(node->e);
456
- aio_context_release(ctx);
457
458
/* aio_notify() does not count as progress */
459
if (node->e != &ctx->notifier) {
460
@@ -XXX,XX +XXX,XX @@ static bool aio_dispatch_handlers(AioContext *ctx, HANDLE event)
461
(node->io_read || node->io_write)) {
462
node->pfd.revents = 0;
463
if ((revents & G_IO_IN) && node->io_read) {
464
- aio_context_acquire(ctx);
465
node->io_read(node->opaque);
466
- aio_context_release(ctx);
467
progress = true;
468
}
469
if ((revents & G_IO_OUT) && node->io_write) {
470
- aio_context_acquire(ctx);
471
node->io_write(node->opaque);
472
- aio_context_release(ctx);
473
progress = true;
474
}
475
476
--
444
--
477
2.9.3
445
2.26.2
478
446
479
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
Propagate the flush return value since errors are possible.
2
2
3
This is in preparation for making qio_channel_yield work on
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
AioContexts other than the main one.
4
Message-id: 20200924151549.913737-11-stefanha@redhat.com
5
6
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9
Reviewed-by: Fam Zheng <famz@redhat.com>
10
Message-id: 20170213135235.12274-6-pbonzini@redhat.com
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
6
---
13
include/io/channel.h | 25 +++++++++++++++++++++++++
7
block/export/vhost-user-blk-server.c | 11 +++++++----
14
io/channel-command.c | 13 +++++++++++++
8
1 file changed, 7 insertions(+), 4 deletions(-)
15
io/channel-file.c | 11 +++++++++++
16
io/channel-socket.c | 16 +++++++++++-----
17
io/channel-tls.c | 12 ++++++++++++
18
io/channel-watch.c | 6 ++++++
19
io/channel.c | 11 +++++++++++
20
7 files changed, 89 insertions(+), 5 deletions(-)
21
9
22
diff --git a/include/io/channel.h b/include/io/channel.h
10
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
23
index XXXXXXX..XXXXXXX 100644
11
index XXXXXXX..XXXXXXX 100644
24
--- a/include/io/channel.h
12
--- a/block/export/vhost-user-blk-server.c
25
+++ b/include/io/channel.h
13
+++ b/block/export/vhost-user-blk-server.c
26
@@ -XXX,XX +XXX,XX @@
14
@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
27
15
return -EINVAL;
28
#include "qemu-common.h"
29
#include "qom/object.h"
30
+#include "block/aio.h"
31
32
#define TYPE_QIO_CHANNEL "qio-channel"
33
#define QIO_CHANNEL(obj) \
34
@@ -XXX,XX +XXX,XX @@ struct QIOChannelClass {
35
off_t offset,
36
int whence,
37
Error **errp);
38
+ void (*io_set_aio_fd_handler)(QIOChannel *ioc,
39
+ AioContext *ctx,
40
+ IOHandler *io_read,
41
+ IOHandler *io_write,
42
+ void *opaque);
43
};
44
45
/* General I/O handling functions */
46
@@ -XXX,XX +XXX,XX @@ void qio_channel_yield(QIOChannel *ioc,
47
void qio_channel_wait(QIOChannel *ioc,
48
GIOCondition condition);
49
50
+/**
51
+ * qio_channel_set_aio_fd_handler:
52
+ * @ioc: the channel object
53
+ * @ctx: the AioContext to set the handlers on
54
+ * @io_read: the read handler
55
+ * @io_write: the write handler
56
+ * @opaque: the opaque value passed to the handler
57
+ *
58
+ * This is used internally by qio_channel_yield(). It can
59
+ * be used by channel implementations to forward the handlers
60
+ * to another channel (e.g. from #QIOChannelTLS to the
61
+ * underlying socket).
62
+ */
63
+void qio_channel_set_aio_fd_handler(QIOChannel *ioc,
64
+ AioContext *ctx,
65
+ IOHandler *io_read,
66
+ IOHandler *io_write,
67
+ void *opaque);
68
+
69
#endif /* QIO_CHANNEL_H */
70
diff --git a/io/channel-command.c b/io/channel-command.c
71
index XXXXXXX..XXXXXXX 100644
72
--- a/io/channel-command.c
73
+++ b/io/channel-command.c
74
@@ -XXX,XX +XXX,XX @@ static int qio_channel_command_close(QIOChannel *ioc,
75
}
16
}
76
17
77
18
-static void coroutine_fn vu_block_flush(VuBlockReq *req)
78
+static void qio_channel_command_set_aio_fd_handler(QIOChannel *ioc,
19
+static int coroutine_fn vu_block_flush(VuBlockReq *req)
79
+ AioContext *ctx,
80
+ IOHandler *io_read,
81
+ IOHandler *io_write,
82
+ void *opaque)
83
+{
84
+ QIOChannelCommand *cioc = QIO_CHANNEL_COMMAND(ioc);
85
+ aio_set_fd_handler(ctx, cioc->readfd, false, io_read, NULL, NULL, opaque);
86
+ aio_set_fd_handler(ctx, cioc->writefd, false, NULL, io_write, NULL, opaque);
87
+}
88
+
89
+
90
static GSource *qio_channel_command_create_watch(QIOChannel *ioc,
91
GIOCondition condition)
92
{
20
{
93
@@ -XXX,XX +XXX,XX @@ static void qio_channel_command_class_init(ObjectClass *klass,
21
VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
94
ioc_klass->io_set_blocking = qio_channel_command_set_blocking;
22
BlockBackend *backend = vdev_blk->backend;
95
ioc_klass->io_close = qio_channel_command_close;
23
- blk_co_flush(backend);
96
ioc_klass->io_create_watch = qio_channel_command_create_watch;
24
+ return blk_co_flush(backend);
97
+ ioc_klass->io_set_aio_fd_handler = qio_channel_command_set_aio_fd_handler;
98
}
25
}
99
26
100
static const TypeInfo qio_channel_command_info = {
27
static void coroutine_fn vu_block_virtio_process_req(void *opaque)
101
diff --git a/io/channel-file.c b/io/channel-file.c
28
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
102
index XXXXXXX..XXXXXXX 100644
29
break;
103
--- a/io/channel-file.c
104
+++ b/io/channel-file.c
105
@@ -XXX,XX +XXX,XX @@ static int qio_channel_file_close(QIOChannel *ioc,
106
}
107
108
109
+static void qio_channel_file_set_aio_fd_handler(QIOChannel *ioc,
110
+ AioContext *ctx,
111
+ IOHandler *io_read,
112
+ IOHandler *io_write,
113
+ void *opaque)
114
+{
115
+ QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
116
+ aio_set_fd_handler(ctx, fioc->fd, false, io_read, io_write, NULL, opaque);
117
+}
118
+
119
static GSource *qio_channel_file_create_watch(QIOChannel *ioc,
120
GIOCondition condition)
121
{
122
@@ -XXX,XX +XXX,XX @@ static void qio_channel_file_class_init(ObjectClass *klass,
123
ioc_klass->io_seek = qio_channel_file_seek;
124
ioc_klass->io_close = qio_channel_file_close;
125
ioc_klass->io_create_watch = qio_channel_file_create_watch;
126
+ ioc_klass->io_set_aio_fd_handler = qio_channel_file_set_aio_fd_handler;
127
}
128
129
static const TypeInfo qio_channel_file_info = {
130
diff --git a/io/channel-socket.c b/io/channel-socket.c
131
index XXXXXXX..XXXXXXX 100644
132
--- a/io/channel-socket.c
133
+++ b/io/channel-socket.c
134
@@ -XXX,XX +XXX,XX @@ qio_channel_socket_set_blocking(QIOChannel *ioc,
135
qemu_set_block(sioc->fd);
136
} else {
137
qemu_set_nonblock(sioc->fd);
138
-#ifdef WIN32
139
- WSAEventSelect(sioc->fd, ioc->event,
140
- FD_READ | FD_ACCEPT | FD_CLOSE |
141
- FD_CONNECT | FD_WRITE | FD_OOB);
142
-#endif
143
}
30
}
144
return 0;
31
case VIRTIO_BLK_T_FLUSH:
145
}
32
- vu_block_flush(req);
146
@@ -XXX,XX +XXX,XX @@ qio_channel_socket_shutdown(QIOChannel *ioc,
33
- req->in->status = VIRTIO_BLK_S_OK;
147
return 0;
34
+ if (vu_block_flush(req) == 0) {
148
}
35
+ req->in->status = VIRTIO_BLK_S_OK;
149
36
+ } else {
150
+static void qio_channel_socket_set_aio_fd_handler(QIOChannel *ioc,
37
+ req->in->status = VIRTIO_BLK_S_IOERR;
151
+ AioContext *ctx,
38
+ }
152
+ IOHandler *io_read,
39
break;
153
+ IOHandler *io_write,
40
case VIRTIO_BLK_T_GET_ID: {
154
+ void *opaque)
41
size_t size = MIN(iov_size(&elem->in_sg[0], in_num),
155
+{
156
+ QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
157
+ aio_set_fd_handler(ctx, sioc->fd, false, io_read, io_write, NULL, opaque);
158
+}
159
+
160
static GSource *qio_channel_socket_create_watch(QIOChannel *ioc,
161
GIOCondition condition)
162
{
163
@@ -XXX,XX +XXX,XX @@ static void qio_channel_socket_class_init(ObjectClass *klass,
164
ioc_klass->io_set_cork = qio_channel_socket_set_cork;
165
ioc_klass->io_set_delay = qio_channel_socket_set_delay;
166
ioc_klass->io_create_watch = qio_channel_socket_create_watch;
167
+ ioc_klass->io_set_aio_fd_handler = qio_channel_socket_set_aio_fd_handler;
168
}
169
170
static const TypeInfo qio_channel_socket_info = {
171
diff --git a/io/channel-tls.c b/io/channel-tls.c
172
index XXXXXXX..XXXXXXX 100644
173
--- a/io/channel-tls.c
174
+++ b/io/channel-tls.c
175
@@ -XXX,XX +XXX,XX @@ static int qio_channel_tls_close(QIOChannel *ioc,
176
return qio_channel_close(tioc->master, errp);
177
}
178
179
+static void qio_channel_tls_set_aio_fd_handler(QIOChannel *ioc,
180
+ AioContext *ctx,
181
+ IOHandler *io_read,
182
+ IOHandler *io_write,
183
+ void *opaque)
184
+{
185
+ QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
186
+
187
+ qio_channel_set_aio_fd_handler(tioc->master, ctx, io_read, io_write, opaque);
188
+}
189
+
190
static GSource *qio_channel_tls_create_watch(QIOChannel *ioc,
191
GIOCondition condition)
192
{
193
@@ -XXX,XX +XXX,XX @@ static void qio_channel_tls_class_init(ObjectClass *klass,
194
ioc_klass->io_close = qio_channel_tls_close;
195
ioc_klass->io_shutdown = qio_channel_tls_shutdown;
196
ioc_klass->io_create_watch = qio_channel_tls_create_watch;
197
+ ioc_klass->io_set_aio_fd_handler = qio_channel_tls_set_aio_fd_handler;
198
}
199
200
static const TypeInfo qio_channel_tls_info = {
201
diff --git a/io/channel-watch.c b/io/channel-watch.c
202
index XXXXXXX..XXXXXXX 100644
203
--- a/io/channel-watch.c
204
+++ b/io/channel-watch.c
205
@@ -XXX,XX +XXX,XX @@ GSource *qio_channel_create_socket_watch(QIOChannel *ioc,
206
GSource *source;
207
QIOChannelSocketSource *ssource;
208
209
+#ifdef WIN32
210
+ WSAEventSelect(socket, ioc->event,
211
+ FD_READ | FD_ACCEPT | FD_CLOSE |
212
+ FD_CONNECT | FD_WRITE | FD_OOB);
213
+#endif
214
+
215
source = g_source_new(&qio_channel_socket_source_funcs,
216
sizeof(QIOChannelSocketSource));
217
ssource = (QIOChannelSocketSource *)source;
218
diff --git a/io/channel.c b/io/channel.c
219
index XXXXXXX..XXXXXXX 100644
220
--- a/io/channel.c
221
+++ b/io/channel.c
222
@@ -XXX,XX +XXX,XX @@ GSource *qio_channel_create_watch(QIOChannel *ioc,
223
}
224
225
226
+void qio_channel_set_aio_fd_handler(QIOChannel *ioc,
227
+ AioContext *ctx,
228
+ IOHandler *io_read,
229
+ IOHandler *io_write,
230
+ void *opaque)
231
+{
232
+ QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
233
+
234
+ klass->io_set_aio_fd_handler(ioc, ctx, io_read, io_write, opaque);
235
+}
236
+
237
guint qio_channel_add_watch(QIOChannel *ioc,
238
GIOCondition condition,
239
QIOChannelFunc func,
240
--
42
--
241
2.9.3
43
2.26.2
242
44
243
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
Use the new QAPI block exports API instead of defining our own QOM
2
2
objects.
3
This uses the lock-free mutex described in the paper '"Blocking without
3
4
Locking", or LFTHREADS: A lock-free thread library' by Gidenstam and
4
This is a large change because the lifecycle of VuBlockDev needs to
5
Papatriantafilou. The same technique is used in OSv, and in fact
5
follow BlockExportDriver. QOM properties are replaced by QAPI options
6
the code is essentially a conversion to C of OSv's code.
6
objects.
7
7
8
[Added missing coroutine_fn in tests/test-aio-multithread.c.
8
VuBlockDev is renamed VuBlkExport and contains a BlockExport field.
9
Several fields can be dropped since BlockExport already has equivalents.
10
11
The file names and meson build integration will be adjusted in a future
12
patch. libvhost-user should probably be built as a static library that
13
is linked into QEMU instead of as a .c file that results in duplicate
14
compilation.
15
16
The new command-line syntax is:
17
18
$ qemu-storage-daemon \
19
--blockdev file,node-name=drive0,filename=test.img \
20
--export vhost-user-blk,node-name=drive0,id=export0,unix-socket=/tmp/vhost-user-blk.sock
21
22
Note that unix-socket is optional because we may wish to accept chardevs
23
too in the future.
24
25
Markus noted that supported address families are not explicit in the
26
QAPI schema. It is unlikely that support for more address families will
27
be added since file descriptor passing is required and few address
28
families support it. If a new address family needs to be added, then the
29
QAPI 'features' syntax can be used to advertize them.
30
31
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
32
Acked-by: Markus Armbruster <armbru@redhat.com>
33
Message-id: 20200924151549.913737-12-stefanha@redhat.com
34
[Skip test on big-endian host architectures because this device doesn't
35
support them yet (as already mentioned in a code comment).
9
--Stefan]
36
--Stefan]
10
11
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
12
Reviewed-by: Fam Zheng <famz@redhat.com>
13
Message-id: 20170213181244.16297-2-pbonzini@redhat.com
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
37
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
15
---
38
---
16
include/qemu/coroutine.h | 17 ++++-
39
qapi/block-export.json | 21 +-
17
tests/test-aio-multithread.c | 86 ++++++++++++++++++++++++
40
block/export/vhost-user-blk-server.h | 23 +-
18
util/qemu-coroutine-lock.c | 155 ++++++++++++++++++++++++++++++++++++++++---
41
block/export/export.c | 6 +
19
util/trace-events | 1 +
42
block/export/vhost-user-blk-server.c | 452 +++++++--------------------
20
4 files changed, 246 insertions(+), 13 deletions(-)
43
util/vhost-user-server.c | 10 +-
21
44
block/export/meson.build | 1 +
22
diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
45
block/meson.build | 1 -
46
7 files changed, 156 insertions(+), 358 deletions(-)
47
48
diff --git a/qapi/block-export.json b/qapi/block-export.json
23
index XXXXXXX..XXXXXXX 100644
49
index XXXXXXX..XXXXXXX 100644
24
--- a/include/qemu/coroutine.h
50
--- a/qapi/block-export.json
25
+++ b/include/qemu/coroutine.h
51
+++ b/qapi/block-export.json
26
@@ -XXX,XX +XXX,XX @@ bool qemu_co_queue_empty(CoQueue *queue);
52
@@ -XXX,XX +XXX,XX @@
27
/**
53
'data': { '*name': 'str', '*description': 'str',
28
* Provides a mutex that can be used to synchronise coroutines
54
'*bitmap': 'str' } }
55
56
+##
57
+# @BlockExportOptionsVhostUserBlk:
58
+#
59
+# A vhost-user-blk block export.
60
+#
61
+# @addr: The vhost-user socket on which to listen. Both 'unix' and 'fd'
62
+# SocketAddress types are supported. Passed fds must be UNIX domain
63
+# sockets.
64
+# @logical-block-size: Logical block size in bytes. Defaults to 512 bytes.
65
+#
66
+# Since: 5.2
67
+##
68
+{ 'struct': 'BlockExportOptionsVhostUserBlk',
69
+ 'data': { 'addr': 'SocketAddress', '*logical-block-size': 'size' } }
70
+
71
##
72
# @NbdServerAddOptions:
73
#
74
@@ -XXX,XX +XXX,XX @@
75
# An enumeration of block export types
76
#
77
# @nbd: NBD export
78
+# @vhost-user-blk: vhost-user-blk export (since 5.2)
79
#
80
# Since: 4.2
81
##
82
{ 'enum': 'BlockExportType',
83
- 'data': [ 'nbd' ] }
84
+ 'data': [ 'nbd', 'vhost-user-blk' ] }
85
86
##
87
# @BlockExportOptions:
88
@@ -XXX,XX +XXX,XX @@
89
'*writethrough': 'bool' },
90
'discriminator': 'type',
91
'data': {
92
- 'nbd': 'BlockExportOptionsNbd'
93
+ 'nbd': 'BlockExportOptionsNbd',
94
+ 'vhost-user-blk': 'BlockExportOptionsVhostUserBlk'
95
} }
96
97
##
98
diff --git a/block/export/vhost-user-blk-server.h b/block/export/vhost-user-blk-server.h
99
index XXXXXXX..XXXXXXX 100644
100
--- a/block/export/vhost-user-blk-server.h
101
+++ b/block/export/vhost-user-blk-server.h
102
@@ -XXX,XX +XXX,XX @@
103
104
#ifndef VHOST_USER_BLK_SERVER_H
105
#define VHOST_USER_BLK_SERVER_H
106
-#include "util/vhost-user-server.h"
107
108
-typedef struct VuBlockDev VuBlockDev;
109
-#define TYPE_VHOST_USER_BLK_SERVER "vhost-user-blk-server"
110
-#define VHOST_USER_BLK_SERVER(obj) \
111
- OBJECT_CHECK(VuBlockDev, obj, TYPE_VHOST_USER_BLK_SERVER)
112
+#include "block/export.h"
113
114
-/* vhost user block device */
115
-struct VuBlockDev {
116
- Object parent_obj;
117
- char *node_name;
118
- SocketAddress *addr;
119
- AioContext *ctx;
120
- VuServer vu_server;
121
- bool running;
122
- uint32_t blk_size;
123
- BlockBackend *backend;
124
- QIOChannelSocket *sioc;
125
- QTAILQ_ENTRY(VuBlockDev) next;
126
- struct virtio_blk_config blkcfg;
127
- bool writable;
128
-};
129
+/* For block/export/export.c */
130
+extern const BlockExportDriver blk_exp_vhost_user_blk;
131
132
#endif /* VHOST_USER_BLK_SERVER_H */
133
diff --git a/block/export/export.c b/block/export/export.c
134
index XXXXXXX..XXXXXXX 100644
135
--- a/block/export/export.c
136
+++ b/block/export/export.c
137
@@ -XXX,XX +XXX,XX @@
138
#include "sysemu/block-backend.h"
139
#include "block/export.h"
140
#include "block/nbd.h"
141
+#if CONFIG_LINUX
142
+#include "block/export/vhost-user-blk-server.h"
143
+#endif
144
#include "qapi/error.h"
145
#include "qapi/qapi-commands-block-export.h"
146
#include "qapi/qapi-events-block-export.h"
147
@@ -XXX,XX +XXX,XX @@
148
149
static const BlockExportDriver *blk_exp_drivers[] = {
150
&blk_exp_nbd,
151
+#if CONFIG_LINUX
152
+ &blk_exp_vhost_user_blk,
153
+#endif
154
};
155
156
/* Only accessed from the main thread */
157
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
158
index XXXXXXX..XXXXXXX 100644
159
--- a/block/export/vhost-user-blk-server.c
160
+++ b/block/export/vhost-user-blk-server.c
161
@@ -XXX,XX +XXX,XX @@
29
*/
162
*/
30
+struct CoWaitRecord;
163
#include "qemu/osdep.h"
31
typedef struct CoMutex {
164
#include "block/block.h"
32
- bool locked;
165
+#include "contrib/libvhost-user/libvhost-user.h"
33
+ /* Count of pending lockers; 0 for a free mutex, 1 for an
166
+#include "standard-headers/linux/virtio_blk.h"
34
+ * uncontended mutex.
167
+#include "util/vhost-user-server.h"
35
+ */
168
#include "vhost-user-blk-server.h"
36
+ unsigned locked;
169
#include "qapi/error.h"
170
#include "qom/object_interfaces.h"
171
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_inhdr {
172
unsigned char status;
173
};
174
175
-typedef struct VuBlockReq {
176
+typedef struct VuBlkReq {
177
VuVirtqElement elem;
178
int64_t sector_num;
179
size_t size;
180
@@ -XXX,XX +XXX,XX @@ typedef struct VuBlockReq {
181
struct virtio_blk_outhdr out;
182
VuServer *server;
183
struct VuVirtq *vq;
184
-} VuBlockReq;
185
+} VuBlkReq;
186
187
-static void vu_block_req_complete(VuBlockReq *req)
188
+/* vhost user block device */
189
+typedef struct {
190
+ BlockExport export;
191
+ VuServer vu_server;
192
+ uint32_t blk_size;
193
+ QIOChannelSocket *sioc;
194
+ struct virtio_blk_config blkcfg;
195
+ bool writable;
196
+} VuBlkExport;
37
+
197
+
38
+ /* A queue of waiters. Elements are added atomically in front of
198
+static void vu_blk_req_complete(VuBlkReq *req)
39
+ * from_push. to_pop is only populated, and popped from, by whoever
199
{
40
+ * is in charge of the next wakeup. This can be an unlocker or,
200
VuDev *vu_dev = &req->server->vu_dev;
41
+ * through the handoff protocol, a locker that is about to go to sleep.
201
42
+ */
202
@@ -XXX,XX +XXX,XX @@ static void vu_block_req_complete(VuBlockReq *req)
43
+ QSLIST_HEAD(, CoWaitRecord) from_push, to_pop;
203
free(req);
44
+
204
}
45
+ unsigned handoff, sequence;
205
46
+
206
-static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
47
Coroutine *holder;
207
-{
48
- CoQueue queue;
208
- return container_of(server, VuBlockDev, vu_server);
49
} CoMutex;
209
-}
50
210
-
51
/**
211
static int coroutine_fn
52
diff --git a/tests/test-aio-multithread.c b/tests/test-aio-multithread.c
212
-vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
53
index XXXXXXX..XXXXXXX 100644
213
- uint32_t iovcnt, uint32_t type)
54
--- a/tests/test-aio-multithread.c
214
+vu_blk_discard_write_zeroes(BlockBackend *blk, struct iovec *iov,
55
+++ b/tests/test-aio-multithread.c
215
+ uint32_t iovcnt, uint32_t type)
56
@@ -XXX,XX +XXX,XX @@ static void test_multi_co_schedule_10(void)
216
{
57
test_multi_co_schedule(10);
217
struct virtio_blk_discard_write_zeroes desc;
58
}
218
ssize_t size = iov_to_buf(iov, iovcnt, 0, &desc, sizeof(desc));
59
219
@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
60
+/* CoMutex thread-safety. */
220
return -EINVAL;
61
+
62
+static uint32_t atomic_counter;
63
+static uint32_t running;
64
+static uint32_t counter;
65
+static CoMutex comutex;
66
+
67
+static void coroutine_fn test_multi_co_mutex_entry(void *opaque)
68
+{
69
+ while (!atomic_mb_read(&now_stopping)) {
70
+ qemu_co_mutex_lock(&comutex);
71
+ counter++;
72
+ qemu_co_mutex_unlock(&comutex);
73
+
74
+ /* Increase atomic_counter *after* releasing the mutex. Otherwise
75
+ * there is a chance (it happens about 1 in 3 runs) that the iothread
76
+ * exits before the coroutine is woken up, causing a spurious
77
+ * assertion failure.
78
+ */
79
+ atomic_inc(&atomic_counter);
80
+ }
81
+ atomic_dec(&running);
82
+}
83
+
84
+static void test_multi_co_mutex(int threads, int seconds)
85
+{
86
+ int i;
87
+
88
+ qemu_co_mutex_init(&comutex);
89
+ counter = 0;
90
+ atomic_counter = 0;
91
+ now_stopping = false;
92
+
93
+ create_aio_contexts();
94
+ assert(threads <= NUM_CONTEXTS);
95
+ running = threads;
96
+ for (i = 0; i < threads; i++) {
97
+ Coroutine *co1 = qemu_coroutine_create(test_multi_co_mutex_entry, NULL);
98
+ aio_co_schedule(ctx[i], co1);
99
+ }
100
+
101
+ g_usleep(seconds * 1000000);
102
+
103
+ atomic_mb_set(&now_stopping, true);
104
+ while (running > 0) {
105
+ g_usleep(100000);
106
+ }
107
+
108
+ join_aio_contexts();
109
+ g_test_message("%d iterations/second\n", counter / seconds);
110
+ g_assert_cmpint(counter, ==, atomic_counter);
111
+}
112
+
113
+/* Testing with NUM_CONTEXTS threads focuses on the queue. The mutex however
114
+ * is too contended (and the threads spend too much time in aio_poll)
115
+ * to actually stress the handoff protocol.
116
+ */
117
+static void test_multi_co_mutex_1(void)
118
+{
119
+ test_multi_co_mutex(NUM_CONTEXTS, 1);
120
+}
121
+
122
+static void test_multi_co_mutex_10(void)
123
+{
124
+ test_multi_co_mutex(NUM_CONTEXTS, 10);
125
+}
126
+
127
+/* Testing with fewer threads stresses the handoff protocol too. Still, the
128
+ * case where the locker _can_ pick up a handoff is very rare, happening
129
+ * about 10 times in 1 million, so increase the runtime a bit compared to
130
+ * other "quick" testcases that only run for 1 second.
131
+ */
132
+static void test_multi_co_mutex_2_3(void)
133
+{
134
+ test_multi_co_mutex(2, 3);
135
+}
136
+
137
+static void test_multi_co_mutex_2_30(void)
138
+{
139
+ test_multi_co_mutex(2, 30);
140
+}
141
+
142
/* End of tests. */
143
144
int main(int argc, char **argv)
145
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
146
g_test_add_func("/aio/multi/lifecycle", test_lifecycle);
147
if (g_test_quick()) {
148
g_test_add_func("/aio/multi/schedule", test_multi_co_schedule_1);
149
+ g_test_add_func("/aio/multi/mutex/contended", test_multi_co_mutex_1);
150
+ g_test_add_func("/aio/multi/mutex/handoff", test_multi_co_mutex_2_3);
151
} else {
152
g_test_add_func("/aio/multi/schedule", test_multi_co_schedule_10);
153
+ g_test_add_func("/aio/multi/mutex/contended", test_multi_co_mutex_10);
154
+ g_test_add_func("/aio/multi/mutex/handoff", test_multi_co_mutex_2_30);
155
}
221
}
156
return g_test_run();
222
157
}
223
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
158
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
224
uint64_t range[2] = { le64_to_cpu(desc.sector) << 9,
159
index XXXXXXX..XXXXXXX 100644
225
le32_to_cpu(desc.num_sectors) << 9 };
160
--- a/util/qemu-coroutine-lock.c
226
if (type == VIRTIO_BLK_T_DISCARD) {
161
+++ b/util/qemu-coroutine-lock.c
227
- if (blk_co_pdiscard(vdev_blk->backend, range[0], range[1]) == 0) {
162
@@ -XXX,XX +XXX,XX @@
228
+ if (blk_co_pdiscard(blk, range[0], range[1]) == 0) {
163
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
229
return 0;
164
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
230
}
165
* THE SOFTWARE.
231
} else if (type == VIRTIO_BLK_T_WRITE_ZEROES) {
166
+ *
232
- if (blk_co_pwrite_zeroes(vdev_blk->backend,
167
+ * The lock-free mutex implementation is based on OSv
233
- range[0], range[1], 0) == 0) {
168
+ * (core/lfmutex.cc, include/lockfree/mutex.hh).
234
+ if (blk_co_pwrite_zeroes(blk, range[0], range[1], 0) == 0) {
169
+ * Copyright (C) 2013 Cloudius Systems, Ltd.
235
return 0;
170
*/
236
}
171
172
#include "qemu/osdep.h"
173
@@ -XXX,XX +XXX,XX @@ bool qemu_co_queue_empty(CoQueue *queue)
174
return QSIMPLEQ_FIRST(&queue->entries) == NULL;
175
}
176
177
+/* The wait records are handled with a multiple-producer, single-consumer
178
+ * lock-free queue. There cannot be two concurrent pop_waiter() calls
179
+ * because pop_waiter() can only be called while mutex->handoff is zero.
180
+ * This can happen in three cases:
181
+ * - in qemu_co_mutex_unlock, before the hand-off protocol has started.
182
+ * In this case, qemu_co_mutex_lock will see mutex->handoff == 0 and
183
+ * not take part in the handoff.
184
+ * - in qemu_co_mutex_lock, if it steals the hand-off responsibility from
185
+ * qemu_co_mutex_unlock. In this case, qemu_co_mutex_unlock will fail
186
+ * the cmpxchg (it will see either 0 or the next sequence value) and
187
+ * exit. The next hand-off cannot begin until qemu_co_mutex_lock has
188
+ * woken up someone.
189
+ * - in qemu_co_mutex_unlock, if it takes the hand-off token itself.
190
+ * In this case another iteration starts with mutex->handoff == 0;
191
+ * a concurrent qemu_co_mutex_lock will fail the cmpxchg, and
192
+ * qemu_co_mutex_unlock will go back to case (1).
193
+ *
194
+ * The following functions manage this queue.
195
+ */
196
+typedef struct CoWaitRecord {
197
+ Coroutine *co;
198
+ QSLIST_ENTRY(CoWaitRecord) next;
199
+} CoWaitRecord;
200
+
201
+static void push_waiter(CoMutex *mutex, CoWaitRecord *w)
202
+{
203
+ w->co = qemu_coroutine_self();
204
+ QSLIST_INSERT_HEAD_ATOMIC(&mutex->from_push, w, next);
205
+}
206
+
207
+static void move_waiters(CoMutex *mutex)
208
+{
209
+ QSLIST_HEAD(, CoWaitRecord) reversed;
210
+ QSLIST_MOVE_ATOMIC(&reversed, &mutex->from_push);
211
+ while (!QSLIST_EMPTY(&reversed)) {
212
+ CoWaitRecord *w = QSLIST_FIRST(&reversed);
213
+ QSLIST_REMOVE_HEAD(&reversed, next);
214
+ QSLIST_INSERT_HEAD(&mutex->to_pop, w, next);
215
+ }
216
+}
217
+
218
+static CoWaitRecord *pop_waiter(CoMutex *mutex)
219
+{
220
+ CoWaitRecord *w;
221
+
222
+ if (QSLIST_EMPTY(&mutex->to_pop)) {
223
+ move_waiters(mutex);
224
+ if (QSLIST_EMPTY(&mutex->to_pop)) {
225
+ return NULL;
226
+ }
227
+ }
228
+ w = QSLIST_FIRST(&mutex->to_pop);
229
+ QSLIST_REMOVE_HEAD(&mutex->to_pop, next);
230
+ return w;
231
+}
232
+
233
+static bool has_waiters(CoMutex *mutex)
234
+{
235
+ return QSLIST_EMPTY(&mutex->to_pop) || QSLIST_EMPTY(&mutex->from_push);
236
+}
237
+
238
void qemu_co_mutex_init(CoMutex *mutex)
239
{
240
memset(mutex, 0, sizeof(*mutex));
241
- qemu_co_queue_init(&mutex->queue);
242
}
243
244
-void coroutine_fn qemu_co_mutex_lock(CoMutex *mutex)
245
+static void coroutine_fn qemu_co_mutex_lock_slowpath(CoMutex *mutex)
246
{
247
Coroutine *self = qemu_coroutine_self();
248
+ CoWaitRecord w;
249
+ unsigned old_handoff;
250
251
trace_qemu_co_mutex_lock_entry(mutex, self);
252
+ w.co = self;
253
+ push_waiter(mutex, &w);
254
255
- while (mutex->locked) {
256
- qemu_co_queue_wait(&mutex->queue);
257
+ /* This is the "Responsibility Hand-Off" protocol; a lock() picks from
258
+ * a concurrent unlock() the responsibility of waking somebody up.
259
+ */
260
+ old_handoff = atomic_mb_read(&mutex->handoff);
261
+ if (old_handoff &&
262
+ has_waiters(mutex) &&
263
+ atomic_cmpxchg(&mutex->handoff, old_handoff, 0) == old_handoff) {
264
+ /* There can be no concurrent pops, because there can be only
265
+ * one active handoff at a time.
266
+ */
267
+ CoWaitRecord *to_wake = pop_waiter(mutex);
268
+ Coroutine *co = to_wake->co;
269
+ if (co == self) {
270
+ /* We got the lock ourselves! */
271
+ assert(to_wake == &w);
272
+ return;
273
+ }
274
+
275
+ aio_co_wake(co);
276
}
237
}
277
238
@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
278
- mutex->locked = true;
239
return -EINVAL;
279
- mutex->holder = self;
240
}
280
- self->locks_held++;
241
281
-
242
-static int coroutine_fn vu_block_flush(VuBlockReq *req)
282
+ qemu_coroutine_yield();
243
+static void coroutine_fn vu_blk_virtio_process_req(void *opaque)
283
trace_qemu_co_mutex_lock_return(mutex, self);
244
{
284
}
245
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
285
246
- BlockBackend *backend = vdev_blk->backend;
286
+void coroutine_fn qemu_co_mutex_lock(CoMutex *mutex)
247
- return blk_co_flush(backend);
287
+{
248
-}
288
+ Coroutine *self = qemu_coroutine_self();
249
-
289
+
250
-static void coroutine_fn vu_block_virtio_process_req(void *opaque)
290
+ if (atomic_fetch_inc(&mutex->locked) == 0) {
251
-{
291
+ /* Uncontended. */
252
- VuBlockReq *req = opaque;
292
+ trace_qemu_co_mutex_lock_uncontended(mutex, self);
253
+ VuBlkReq *req = opaque;
293
+ } else {
254
VuServer *server = req->server;
294
+ qemu_co_mutex_lock_slowpath(mutex);
255
VuVirtqElement *elem = &req->elem;
295
+ }
256
uint32_t type;
296
+ mutex->holder = self;
257
297
+ self->locks_held++;
258
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
298
+}
259
- BlockBackend *backend = vdev_blk->backend;
299
+
260
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
300
void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex)
261
+ BlockBackend *blk = vexp->export.blk;
301
{
262
302
Coroutine *self = qemu_coroutine_self();
263
struct iovec *in_iov = elem->in_sg;
303
264
struct iovec *out_iov = elem->out_sg;
304
trace_qemu_co_mutex_unlock_entry(mutex, self);
265
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
305
266
bool is_write = type & VIRTIO_BLK_T_OUT;
306
- assert(mutex->locked == true);
267
req->sector_num = le64_to_cpu(req->out.sector);
307
+ assert(mutex->locked);
268
308
assert(mutex->holder == self);
269
- int64_t offset = req->sector_num * vdev_blk->blk_size;
309
assert(qemu_in_coroutine());
270
+ if (is_write && !vexp->writable) {
310
271
+ req->in->status = VIRTIO_BLK_S_IOERR;
311
- mutex->locked = false;
312
mutex->holder = NULL;
313
self->locks_held--;
314
- qemu_co_queue_next(&mutex->queue);
315
+ if (atomic_fetch_dec(&mutex->locked) == 1) {
316
+ /* No waiting qemu_co_mutex_lock(). Pfew, that was easy! */
317
+ return;
318
+ }
319
+
320
+ for (;;) {
321
+ CoWaitRecord *to_wake = pop_waiter(mutex);
322
+ unsigned our_handoff;
323
+
324
+ if (to_wake) {
325
+ Coroutine *co = to_wake->co;
326
+ aio_co_wake(co);
327
+ break;
272
+ break;
328
+ }
273
+ }
329
+
274
+
330
+ /* Some concurrent lock() is in progress (we know this because
275
+ int64_t offset = req->sector_num * vexp->blk_size;
331
+ * mutex->locked was >1) but it hasn't yet put itself on the wait
276
QEMUIOVector qiov;
332
+ * queue. Pick a sequence number for the handoff protocol (not 0).
277
if (is_write) {
333
+ */
278
qemu_iovec_init_external(&qiov, out_iov, out_num);
334
+ if (++mutex->sequence == 0) {
279
- ret = blk_co_pwritev(backend, offset, qiov.size,
335
+ mutex->sequence = 1;
280
- &qiov, 0);
336
+ }
281
+ ret = blk_co_pwritev(blk, offset, qiov.size, &qiov, 0);
282
} else {
283
qemu_iovec_init_external(&qiov, in_iov, in_num);
284
- ret = blk_co_preadv(backend, offset, qiov.size,
285
- &qiov, 0);
286
+ ret = blk_co_preadv(blk, offset, qiov.size, &qiov, 0);
287
}
288
if (ret >= 0) {
289
req->in->status = VIRTIO_BLK_S_OK;
290
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
291
break;
292
}
293
case VIRTIO_BLK_T_FLUSH:
294
- if (vu_block_flush(req) == 0) {
295
+ if (blk_co_flush(blk) == 0) {
296
req->in->status = VIRTIO_BLK_S_OK;
297
} else {
298
req->in->status = VIRTIO_BLK_S_IOERR;
299
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
300
case VIRTIO_BLK_T_DISCARD:
301
case VIRTIO_BLK_T_WRITE_ZEROES: {
302
int rc;
303
- rc = vu_block_discard_write_zeroes(req, &elem->out_sg[1],
304
- out_num, type);
337
+
305
+
338
+ our_handoff = mutex->sequence;
306
+ if (!vexp->writable) {
339
+ atomic_mb_set(&mutex->handoff, our_handoff);
307
+ req->in->status = VIRTIO_BLK_S_IOERR;
340
+ if (!has_waiters(mutex)) {
341
+ /* The concurrent lock has not added itself yet, so it
342
+ * will be able to pick our handoff.
343
+ */
344
+ break;
308
+ break;
345
+ }
309
+ }
346
+
310
+
347
+ /* Try to do the handoff protocol ourselves; if somebody else has
311
+ rc = vu_blk_discard_write_zeroes(blk, &elem->out_sg[1], out_num, type);
348
+ * already taken it, however, we're done and they're responsible.
312
if (rc == 0) {
349
+ */
313
req->in->status = VIRTIO_BLK_S_OK;
350
+ if (atomic_cmpxchg(&mutex->handoff, our_handoff, 0) != our_handoff) {
314
} else {
351
+ break;
315
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
352
+ }
316
break;
317
}
318
319
- vu_block_req_complete(req);
320
+ vu_blk_req_complete(req);
321
return;
322
323
err:
324
- free(elem);
325
+ free(req);
326
}
327
328
-static void vu_block_process_vq(VuDev *vu_dev, int idx)
329
+static void vu_blk_process_vq(VuDev *vu_dev, int idx)
330
{
331
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
332
VuVirtq *vq = vu_get_queue(vu_dev, idx);
333
334
while (1) {
335
- VuBlockReq *req;
336
+ VuBlkReq *req;
337
338
- req = vu_queue_pop(vu_dev, vq, sizeof(VuBlockReq));
339
+ req = vu_queue_pop(vu_dev, vq, sizeof(VuBlkReq));
340
if (!req) {
341
break;
342
}
343
@@ -XXX,XX +XXX,XX @@ static void vu_block_process_vq(VuDev *vu_dev, int idx)
344
req->vq = vq;
345
346
Coroutine *co =
347
- qemu_coroutine_create(vu_block_virtio_process_req, req);
348
+ qemu_coroutine_create(vu_blk_virtio_process_req, req);
349
qemu_coroutine_enter(co);
350
}
351
}
352
353
-static void vu_block_queue_set_started(VuDev *vu_dev, int idx, bool started)
354
+static void vu_blk_queue_set_started(VuDev *vu_dev, int idx, bool started)
355
{
356
VuVirtq *vq;
357
358
assert(vu_dev);
359
360
vq = vu_get_queue(vu_dev, idx);
361
- vu_set_queue_handler(vu_dev, vq, started ? vu_block_process_vq : NULL);
362
+ vu_set_queue_handler(vu_dev, vq, started ? vu_blk_process_vq : NULL);
363
}
364
365
-static uint64_t vu_block_get_features(VuDev *dev)
366
+static uint64_t vu_blk_get_features(VuDev *dev)
367
{
368
uint64_t features;
369
VuServer *server = container_of(dev, VuServer, vu_dev);
370
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
371
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
372
features = 1ull << VIRTIO_BLK_F_SIZE_MAX |
373
1ull << VIRTIO_BLK_F_SEG_MAX |
374
1ull << VIRTIO_BLK_F_TOPOLOGY |
375
@@ -XXX,XX +XXX,XX @@ static uint64_t vu_block_get_features(VuDev *dev)
376
1ull << VIRTIO_RING_F_EVENT_IDX |
377
1ull << VHOST_USER_F_PROTOCOL_FEATURES;
378
379
- if (!vdev_blk->writable) {
380
+ if (!vexp->writable) {
381
features |= 1ull << VIRTIO_BLK_F_RO;
382
}
383
384
return features;
385
}
386
387
-static uint64_t vu_block_get_protocol_features(VuDev *dev)
388
+static uint64_t vu_blk_get_protocol_features(VuDev *dev)
389
{
390
return 1ull << VHOST_USER_PROTOCOL_F_CONFIG |
391
1ull << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD;
392
}
393
394
static int
395
-vu_block_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
396
+vu_blk_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
397
{
398
+ /* TODO blkcfg must be little-endian for VIRTIO 1.0 */
399
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
400
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
401
- memcpy(config, &vdev_blk->blkcfg, len);
402
-
403
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
404
+ memcpy(config, &vexp->blkcfg, len);
405
return 0;
406
}
407
408
static int
409
-vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
410
+vu_blk_set_config(VuDev *vu_dev, const uint8_t *data,
411
uint32_t offset, uint32_t size, uint32_t flags)
412
{
413
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
414
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
415
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
416
uint8_t wce;
417
418
/* don't support live migration */
419
@@ -XXX,XX +XXX,XX @@ vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
420
}
421
422
wce = *data;
423
- vdev_blk->blkcfg.wce = wce;
424
- blk_set_enable_write_cache(vdev_blk->backend, wce);
425
+ vexp->blkcfg.wce = wce;
426
+ blk_set_enable_write_cache(vexp->export.blk, wce);
427
return 0;
428
}
429
430
@@ -XXX,XX +XXX,XX @@ vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
431
* of vu_process_message.
432
*
433
*/
434
-static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
435
+static int vu_blk_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
436
{
437
if (vmsg->request == VHOST_USER_NONE) {
438
dev->panic(dev, "disconnect");
439
@@ -XXX,XX +XXX,XX @@ static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
440
return false;
441
}
442
443
-static const VuDevIface vu_block_iface = {
444
- .get_features = vu_block_get_features,
445
- .queue_set_started = vu_block_queue_set_started,
446
- .get_protocol_features = vu_block_get_protocol_features,
447
- .get_config = vu_block_get_config,
448
- .set_config = vu_block_set_config,
449
- .process_msg = vu_block_process_msg,
450
+static const VuDevIface vu_blk_iface = {
451
+ .get_features = vu_blk_get_features,
452
+ .queue_set_started = vu_blk_queue_set_started,
453
+ .get_protocol_features = vu_blk_get_protocol_features,
454
+ .get_config = vu_blk_get_config,
455
+ .set_config = vu_blk_set_config,
456
+ .process_msg = vu_blk_process_msg,
457
};
458
459
static void blk_aio_attached(AioContext *ctx, void *opaque)
460
{
461
- VuBlockDev *vub_dev = opaque;
462
- vhost_user_server_attach_aio_context(&vub_dev->vu_server, ctx);
463
+ VuBlkExport *vexp = opaque;
464
+ vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
465
}
466
467
static void blk_aio_detach(void *opaque)
468
{
469
- VuBlockDev *vub_dev = opaque;
470
- vhost_user_server_detach_aio_context(&vub_dev->vu_server);
471
+ VuBlkExport *vexp = opaque;
472
+ vhost_user_server_detach_aio_context(&vexp->vu_server);
473
}
474
475
static void
476
-vu_block_initialize_config(BlockDriverState *bs,
477
+vu_blk_initialize_config(BlockDriverState *bs,
478
struct virtio_blk_config *config, uint32_t blk_size)
479
{
480
config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
481
@@ -XXX,XX +XXX,XX @@ vu_block_initialize_config(BlockDriverState *bs,
482
config->max_write_zeroes_seg = 1;
483
}
484
485
-static VuBlockDev *vu_block_init(VuBlockDev *vu_block_device, Error **errp)
486
+static void vu_blk_exp_request_shutdown(BlockExport *exp)
487
{
488
+ VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
489
490
- BlockBackend *blk;
491
- Error *local_error = NULL;
492
- const char *node_name = vu_block_device->node_name;
493
- bool writable = vu_block_device->writable;
494
- uint64_t perm = BLK_PERM_CONSISTENT_READ;
495
- int ret;
496
-
497
- AioContext *ctx;
498
-
499
- BlockDriverState *bs = bdrv_lookup_bs(node_name, node_name, &local_error);
500
-
501
- if (!bs) {
502
- error_propagate(errp, local_error);
503
- return NULL;
504
- }
505
-
506
- if (bdrv_is_read_only(bs)) {
507
- writable = false;
508
- }
509
-
510
- if (writable) {
511
- perm |= BLK_PERM_WRITE;
512
- }
513
-
514
- ctx = bdrv_get_aio_context(bs);
515
- aio_context_acquire(ctx);
516
- bdrv_invalidate_cache(bs, NULL);
517
- aio_context_release(ctx);
518
-
519
- /*
520
- * Don't allow resize while the vhost user server is running,
521
- * otherwise we don't care what happens with the node.
522
- */
523
- blk = blk_new(bdrv_get_aio_context(bs), perm,
524
- BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
525
- BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD);
526
- ret = blk_insert_bs(blk, bs, errp);
527
-
528
- if (ret < 0) {
529
- goto fail;
530
- }
531
-
532
- blk_set_enable_write_cache(blk, false);
533
-
534
- blk_set_allow_aio_context_change(blk, true);
535
-
536
- vu_block_device->blkcfg.wce = 0;
537
- vu_block_device->backend = blk;
538
- if (!vu_block_device->blk_size) {
539
- vu_block_device->blk_size = BDRV_SECTOR_SIZE;
540
- }
541
- vu_block_device->blkcfg.blk_size = vu_block_device->blk_size;
542
- blk_set_guest_block_size(blk, vu_block_device->blk_size);
543
- vu_block_initialize_config(bs, &vu_block_device->blkcfg,
544
- vu_block_device->blk_size);
545
- return vu_block_device;
546
-
547
-fail:
548
- blk_unref(blk);
549
- return NULL;
550
-}
551
-
552
-static void vu_block_deinit(VuBlockDev *vu_block_device)
553
-{
554
- if (vu_block_device->backend) {
555
- blk_remove_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
556
- blk_aio_detach, vu_block_device);
557
- }
558
-
559
- blk_unref(vu_block_device->backend);
560
-}
561
-
562
-static void vhost_user_blk_server_stop(VuBlockDev *vu_block_device)
563
-{
564
- vhost_user_server_stop(&vu_block_device->vu_server);
565
- vu_block_deinit(vu_block_device);
566
-}
567
-
568
-static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
569
- Error **errp)
570
-{
571
- AioContext *ctx;
572
- SocketAddress *addr = vu_block_device->addr;
573
-
574
- if (!vu_block_init(vu_block_device, errp)) {
575
- return;
576
- }
577
-
578
- ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
579
-
580
- if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
581
- VHOST_USER_BLK_MAX_QUEUES, &vu_block_iface,
582
- errp)) {
583
- goto error;
584
- }
585
-
586
- blk_add_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
587
- blk_aio_detach, vu_block_device);
588
- vu_block_device->running = true;
589
- return;
590
-
591
- error:
592
- vu_block_deinit(vu_block_device);
593
-}
594
-
595
-static bool vu_prop_modifiable(VuBlockDev *vus, Error **errp)
596
-{
597
- if (vus->running) {
598
- error_setg(errp, "The property can't be modified "
599
- "while the server is running");
600
- return false;
601
- }
602
- return true;
603
-}
604
-
605
-static void vu_set_node_name(Object *obj, const char *value, Error **errp)
606
-{
607
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
608
-
609
- if (!vu_prop_modifiable(vus, errp)) {
610
- return;
611
- }
612
-
613
- if (vus->node_name) {
614
- g_free(vus->node_name);
615
- }
616
-
617
- vus->node_name = g_strdup(value);
618
-}
619
-
620
-static char *vu_get_node_name(Object *obj, Error **errp)
621
-{
622
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
623
- return g_strdup(vus->node_name);
624
-}
625
-
626
-static void free_socket_addr(SocketAddress *addr)
627
-{
628
- g_free(addr->u.q_unix.path);
629
- g_free(addr);
630
-}
631
-
632
-static void vu_set_unix_socket(Object *obj, const char *value,
633
- Error **errp)
634
-{
635
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
636
-
637
- if (!vu_prop_modifiable(vus, errp)) {
638
- return;
639
- }
640
-
641
- if (vus->addr) {
642
- free_socket_addr(vus->addr);
643
- }
644
-
645
- SocketAddress *addr = g_new0(SocketAddress, 1);
646
- addr->type = SOCKET_ADDRESS_TYPE_UNIX;
647
- addr->u.q_unix.path = g_strdup(value);
648
- vus->addr = addr;
649
+ vhost_user_server_stop(&vexp->vu_server);
650
}
651
652
-static char *vu_get_unix_socket(Object *obj, Error **errp)
653
+static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
654
+ Error **errp)
655
{
656
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
657
- return g_strdup(vus->addr->u.q_unix.path);
658
-}
659
-
660
-static bool vu_get_block_writable(Object *obj, Error **errp)
661
-{
662
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
663
- return vus->writable;
664
-}
665
-
666
-static void vu_set_block_writable(Object *obj, bool value, Error **errp)
667
-{
668
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
669
-
670
- if (!vu_prop_modifiable(vus, errp)) {
671
- return;
672
- }
673
-
674
- vus->writable = value;
675
-}
676
-
677
-static void vu_get_blk_size(Object *obj, Visitor *v, const char *name,
678
- void *opaque, Error **errp)
679
-{
680
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
681
- uint32_t value = vus->blk_size;
682
-
683
- visit_type_uint32(v, name, &value, errp);
684
-}
685
-
686
-static void vu_set_blk_size(Object *obj, Visitor *v, const char *name,
687
- void *opaque, Error **errp)
688
-{
689
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
690
-
691
+ VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
692
+ BlockExportOptionsVhostUserBlk *vu_opts = &opts->u.vhost_user_blk;
693
Error *local_err = NULL;
694
- uint32_t value;
695
+ uint64_t logical_block_size;
696
697
- if (!vu_prop_modifiable(vus, errp)) {
698
- return;
699
- }
700
+ vexp->writable = opts->writable;
701
+ vexp->blkcfg.wce = 0;
702
703
- visit_type_uint32(v, name, &value, &local_err);
704
- if (local_err) {
705
- goto out;
706
+ if (vu_opts->has_logical_block_size) {
707
+ logical_block_size = vu_opts->logical_block_size;
708
+ } else {
709
+ logical_block_size = BDRV_SECTOR_SIZE;
710
}
711
-
712
- check_block_size(object_get_typename(obj), name, value, &local_err);
713
+ check_block_size(exp->id, "logical-block-size", logical_block_size,
714
+ &local_err);
715
if (local_err) {
716
- goto out;
717
+ error_propagate(errp, local_err);
718
+ return -EINVAL;
353
+ }
719
+ }
354
720
+ vexp->blk_size = logical_block_size;
355
trace_qemu_co_mutex_unlock_return(mutex, self);
721
+ blk_set_guest_block_size(exp->blk, logical_block_size);
356
}
722
+ vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
357
diff --git a/util/trace-events b/util/trace-events
723
+ logical_block_size);
724
+
725
+ blk_set_allow_aio_context_change(exp->blk, true);
726
+ blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
727
+ vexp);
728
+
729
+ if (!vhost_user_server_start(&vexp->vu_server, vu_opts->addr, exp->ctx,
730
+ VHOST_USER_BLK_MAX_QUEUES, &vu_blk_iface,
731
+ errp)) {
732
+ blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
733
+ blk_aio_detach, vexp);
734
+ return -EADDRNOTAVAIL;
735
}
736
737
- vus->blk_size = value;
738
-
739
-out:
740
- error_propagate(errp, local_err);
741
-}
742
-
743
-static void vhost_user_blk_server_instance_finalize(Object *obj)
744
-{
745
- VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
746
-
747
- vhost_user_blk_server_stop(vub);
748
-
749
- /*
750
- * Unlike object_property_add_str, object_class_property_add_str
751
- * doesn't have a release method. Thus manual memory freeing is
752
- * needed.
753
- */
754
- free_socket_addr(vub->addr);
755
- g_free(vub->node_name);
756
-}
757
-
758
-static void vhost_user_blk_server_complete(UserCreatable *obj, Error **errp)
759
-{
760
- VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
761
-
762
- vhost_user_blk_server_start(vub, errp);
763
+ return 0;
764
}
765
766
-static void vhost_user_blk_server_class_init(ObjectClass *klass,
767
- void *class_data)
768
+static void vu_blk_exp_delete(BlockExport *exp)
769
{
770
- UserCreatableClass *ucc = USER_CREATABLE_CLASS(klass);
771
- ucc->complete = vhost_user_blk_server_complete;
772
-
773
- object_class_property_add_bool(klass, "writable",
774
- vu_get_block_writable,
775
- vu_set_block_writable);
776
-
777
- object_class_property_add_str(klass, "node-name",
778
- vu_get_node_name,
779
- vu_set_node_name);
780
-
781
- object_class_property_add_str(klass, "unix-socket",
782
- vu_get_unix_socket,
783
- vu_set_unix_socket);
784
+ VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
785
786
- object_class_property_add(klass, "logical-block-size", "uint32",
787
- vu_get_blk_size, vu_set_blk_size,
788
- NULL, NULL);
789
+ blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
790
+ vexp);
791
}
792
793
-static const TypeInfo vhost_user_blk_server_info = {
794
- .name = TYPE_VHOST_USER_BLK_SERVER,
795
- .parent = TYPE_OBJECT,
796
- .instance_size = sizeof(VuBlockDev),
797
- .instance_finalize = vhost_user_blk_server_instance_finalize,
798
- .class_init = vhost_user_blk_server_class_init,
799
- .interfaces = (InterfaceInfo[]) {
800
- {TYPE_USER_CREATABLE},
801
- {}
802
- },
803
+const BlockExportDriver blk_exp_vhost_user_blk = {
804
+ .type = BLOCK_EXPORT_TYPE_VHOST_USER_BLK,
805
+ .instance_size = sizeof(VuBlkExport),
806
+ .create = vu_blk_exp_create,
807
+ .delete = vu_blk_exp_delete,
808
+ .request_shutdown = vu_blk_exp_request_shutdown,
809
};
810
-
811
-static void vhost_user_blk_server_register_types(void)
812
-{
813
- type_register_static(&vhost_user_blk_server_info);
814
-}
815
-
816
-type_init(vhost_user_blk_server_register_types)
817
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
358
index XXXXXXX..XXXXXXX 100644
818
index XXXXXXX..XXXXXXX 100644
359
--- a/util/trace-events
819
--- a/util/vhost-user-server.c
360
+++ b/util/trace-events
820
+++ b/util/vhost-user-server.c
361
@@ -XXX,XX +XXX,XX @@ qemu_coroutine_terminate(void *co) "self %p"
821
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
362
822
Error **errp)
363
# util/qemu-coroutine-lock.c
823
{
364
qemu_co_queue_run_restart(void *co) "co %p"
824
QEMUBH *bh;
365
+qemu_co_mutex_lock_uncontended(void *mutex, void *self) "mutex %p self %p"
825
- QIONetListener *listener = qio_net_listener_new();
366
qemu_co_mutex_lock_entry(void *mutex, void *self) "mutex %p self %p"
826
+ QIONetListener *listener;
367
qemu_co_mutex_lock_return(void *mutex, void *self) "mutex %p self %p"
827
+
368
qemu_co_mutex_unlock_entry(void *mutex, void *self) "mutex %p self %p"
828
+ if (socket_addr->type != SOCKET_ADDRESS_TYPE_UNIX &&
829
+ socket_addr->type != SOCKET_ADDRESS_TYPE_FD) {
830
+ error_setg(errp, "Only socket address types 'unix' and 'fd' are supported");
831
+ return false;
832
+ }
833
+
834
+ listener = qio_net_listener_new();
835
if (qio_net_listener_open_sync(listener, socket_addr, 1,
836
errp) < 0) {
837
object_unref(OBJECT(listener));
838
diff --git a/block/export/meson.build b/block/export/meson.build
839
index XXXXXXX..XXXXXXX 100644
840
--- a/block/export/meson.build
841
+++ b/block/export/meson.build
842
@@ -1 +1,2 @@
843
block_ss.add(files('export.c'))
844
+block_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-blk-server.c', '../../contrib/libvhost-user/libvhost-user.c'))
845
diff --git a/block/meson.build b/block/meson.build
846
index XXXXXXX..XXXXXXX 100644
847
--- a/block/meson.build
848
+++ b/block/meson.build
849
@@ -XXX,XX +XXX,XX @@ block_ss.add(when: 'CONFIG_WIN32', if_true: files('file-win32.c', 'win32-aio.c')
850
block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, iokit])
851
block_ss.add(when: 'CONFIG_LIBISCSI', if_true: files('iscsi-opts.c'))
852
block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c'))
853
-block_ss.add(when: 'CONFIG_LINUX', if_true: files('export/vhost-user-blk-server.c', '../contrib/libvhost-user/libvhost-user.c'))
854
block_ss.add(when: 'CONFIG_REPLICATION', if_true: files('replication.c'))
855
block_ss.add(when: 'CONFIG_SHEEPDOG', if_true: files('sheepdog.c'))
856
block_ss.add(when: ['CONFIG_LINUX_AIO', libaio], if_true: files('linux-aio.c'))
369
--
857
--
370
2.9.3
858
2.26.2
371
859
372
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
Headers used by other subsystems are located in include/. Also add the
2
vhost-user-server and vhost-user-blk-server headers to MAINTAINERS.
2
3
3
This will avoid forward references in the next patch. It is also
4
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
more logical because CoQueue is not anymore the basic primitive.
5
Message-id: 20200924151549.913737-13-stefanha@redhat.com
5
6
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7
Reviewed-by: Fam Zheng <famz@redhat.com>
8
Message-id: 20170213181244.16297-5-pbonzini@redhat.com
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
7
---
11
include/qemu/coroutine.h | 89 ++++++++++++++++++++++++------------------------
8
MAINTAINERS | 4 +++-
12
1 file changed, 44 insertions(+), 45 deletions(-)
9
{util => include/qemu}/vhost-user-server.h | 0
10
block/export/vhost-user-blk-server.c | 2 +-
11
util/vhost-user-server.c | 2 +-
12
4 files changed, 5 insertions(+), 3 deletions(-)
13
rename {util => include/qemu}/vhost-user-server.h (100%)
13
14
14
diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
15
diff --git a/MAINTAINERS b/MAINTAINERS
15
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
16
--- a/include/qemu/coroutine.h
17
--- a/MAINTAINERS
17
+++ b/include/qemu/coroutine.h
18
+++ b/MAINTAINERS
18
@@ -XXX,XX +XXX,XX @@ bool qemu_in_coroutine(void);
19
@@ -XXX,XX +XXX,XX @@ Vhost-user block device backend server
20
M: Coiby Xu <Coiby.Xu@gmail.com>
21
S: Maintained
22
F: block/export/vhost-user-blk-server.c
23
-F: util/vhost-user-server.c
24
+F: block/export/vhost-user-blk-server.h
25
+F: include/qemu/vhost-user-server.h
26
F: tests/qtest/libqos/vhost-user-blk.c
27
+F: util/vhost-user-server.c
28
29
Replication
30
M: Wen Congyang <wencongyang2@huawei.com>
31
diff --git a/util/vhost-user-server.h b/include/qemu/vhost-user-server.h
32
similarity index 100%
33
rename from util/vhost-user-server.h
34
rename to include/qemu/vhost-user-server.h
35
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
36
index XXXXXXX..XXXXXXX 100644
37
--- a/block/export/vhost-user-blk-server.c
38
+++ b/block/export/vhost-user-blk-server.c
39
@@ -XXX,XX +XXX,XX @@
40
#include "block/block.h"
41
#include "contrib/libvhost-user/libvhost-user.h"
42
#include "standard-headers/linux/virtio_blk.h"
43
-#include "util/vhost-user-server.h"
44
+#include "qemu/vhost-user-server.h"
45
#include "vhost-user-blk-server.h"
46
#include "qapi/error.h"
47
#include "qom/object_interfaces.h"
48
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
49
index XXXXXXX..XXXXXXX 100644
50
--- a/util/vhost-user-server.c
51
+++ b/util/vhost-user-server.c
52
@@ -XXX,XX +XXX,XX @@
19
*/
53
*/
20
bool qemu_coroutine_entered(Coroutine *co);
54
#include "qemu/osdep.h"
21
55
#include "qemu/main-loop.h"
22
-
56
+#include "qemu/vhost-user-server.h"
23
-/**
57
#include "block/aio-wait.h"
24
- * CoQueues are a mechanism to queue coroutines in order to continue executing
58
-#include "vhost-user-server.h"
25
- * them later. They provide the fundamental primitives on which coroutine locks
59
26
- * are built.
60
/*
27
- */
61
* Theory of operation:
28
-typedef struct CoQueue {
29
- QSIMPLEQ_HEAD(, Coroutine) entries;
30
-} CoQueue;
31
-
32
-/**
33
- * Initialise a CoQueue. This must be called before any other operation is used
34
- * on the CoQueue.
35
- */
36
-void qemu_co_queue_init(CoQueue *queue);
37
-
38
-/**
39
- * Adds the current coroutine to the CoQueue and transfers control to the
40
- * caller of the coroutine.
41
- */
42
-void coroutine_fn qemu_co_queue_wait(CoQueue *queue);
43
-
44
-/**
45
- * Restarts the next coroutine in the CoQueue and removes it from the queue.
46
- *
47
- * Returns true if a coroutine was restarted, false if the queue is empty.
48
- */
49
-bool coroutine_fn qemu_co_queue_next(CoQueue *queue);
50
-
51
-/**
52
- * Restarts all coroutines in the CoQueue and leaves the queue empty.
53
- */
54
-void coroutine_fn qemu_co_queue_restart_all(CoQueue *queue);
55
-
56
-/**
57
- * Enter the next coroutine in the queue
58
- */
59
-bool qemu_co_enter_next(CoQueue *queue);
60
-
61
-/**
62
- * Checks if the CoQueue is empty.
63
- */
64
-bool qemu_co_queue_empty(CoQueue *queue);
65
-
66
-
67
/**
68
* Provides a mutex that can be used to synchronise coroutines
69
*/
70
@@ -XXX,XX +XXX,XX @@ void coroutine_fn qemu_co_mutex_lock(CoMutex *mutex);
71
*/
72
void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex);
73
74
+
75
+/**
76
+ * CoQueues are a mechanism to queue coroutines in order to continue executing
77
+ * them later.
78
+ */
79
+typedef struct CoQueue {
80
+ QSIMPLEQ_HEAD(, Coroutine) entries;
81
+} CoQueue;
82
+
83
+/**
84
+ * Initialise a CoQueue. This must be called before any other operation is used
85
+ * on the CoQueue.
86
+ */
87
+void qemu_co_queue_init(CoQueue *queue);
88
+
89
+/**
90
+ * Adds the current coroutine to the CoQueue and transfers control to the
91
+ * caller of the coroutine.
92
+ */
93
+void coroutine_fn qemu_co_queue_wait(CoQueue *queue);
94
+
95
+/**
96
+ * Restarts the next coroutine in the CoQueue and removes it from the queue.
97
+ *
98
+ * Returns true if a coroutine was restarted, false if the queue is empty.
99
+ */
100
+bool coroutine_fn qemu_co_queue_next(CoQueue *queue);
101
+
102
+/**
103
+ * Restarts all coroutines in the CoQueue and leaves the queue empty.
104
+ */
105
+void coroutine_fn qemu_co_queue_restart_all(CoQueue *queue);
106
+
107
+/**
108
+ * Enter the next coroutine in the queue
109
+ */
110
+bool qemu_co_enter_next(CoQueue *queue);
111
+
112
+/**
113
+ * Checks if the CoQueue is empty.
114
+ */
115
+bool qemu_co_queue_empty(CoQueue *queue);
116
+
117
+
118
typedef struct CoRwlock {
119
bool writer;
120
int reader;
121
--
62
--
122
2.9.3
63
2.26.2
123
64
124
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
Don't compile contrib/libvhost-user/libvhost-user.c again. Instead build
2
the static library once and then reuse it throughout QEMU.
2
3
3
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
4
Also switch from CONFIG_LINUX to CONFIG_VHOST_USER, which is what the
4
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
5
vhost-user tools (vhost-user-gpu, etc) do.
5
Reviewed-by: Fam Zheng <famz@redhat.com>
6
6
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Message-id: 20170213135235.12274-19-pbonzini@redhat.com
8
Message-id: 20200924151549.913737-14-stefanha@redhat.com
9
[Added CONFIG_LINUX again because libvhost-user doesn't build on macOS.
10
--Stefan]
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
---
12
---
10
include/block/block_int.h | 64 +++++++++++++++++++++++++-----------------
13
block/export/export.c | 8 ++++----
11
include/sysemu/block-backend.h | 14 ++++++---
14
block/export/meson.build | 2 +-
12
2 files changed, 49 insertions(+), 29 deletions(-)
15
contrib/libvhost-user/meson.build | 1 +
16
meson.build | 6 +++++-
17
util/meson.build | 4 +++-
18
5 files changed, 14 insertions(+), 7 deletions(-)
13
19
14
diff --git a/include/block/block_int.h b/include/block/block_int.h
20
diff --git a/block/export/export.c b/block/export/export.c
15
index XXXXXXX..XXXXXXX 100644
21
index XXXXXXX..XXXXXXX 100644
16
--- a/include/block/block_int.h
22
--- a/block/export/export.c
17
+++ b/include/block/block_int.h
23
+++ b/block/export/export.c
18
@@ -XXX,XX +XXX,XX @@ struct BdrvChild {
24
@@ -XXX,XX +XXX,XX @@
19
* copied as well.
25
#include "sysemu/block-backend.h"
20
*/
26
#include "block/export.h"
21
struct BlockDriverState {
27
#include "block/nbd.h"
22
- int64_t total_sectors; /* if we are reading a disk image, give its
28
-#if CONFIG_LINUX
23
- size in sectors */
29
-#include "block/export/vhost-user-blk-server.h"
24
+ /* Protected by big QEMU lock or read-only after opening. No special
30
-#endif
25
+ * locking needed during I/O...
31
#include "qapi/error.h"
26
+ */
32
#include "qapi/qapi-commands-block-export.h"
27
int open_flags; /* flags used to open the file, re-used for re-open */
33
#include "qapi/qapi-events-block-export.h"
28
bool read_only; /* if true, the media is read only */
34
#include "qemu/id.h"
29
bool encrypted; /* if true, the media is encrypted */
35
+#ifdef CONFIG_VHOST_USER
30
@@ -XXX,XX +XXX,XX @@ struct BlockDriverState {
36
+#include "vhost-user-blk-server.h"
31
bool sg; /* if true, the device is a /dev/sg* */
37
+#endif
32
bool probed; /* if true, format was probed rather than specified */
38
33
39
static const BlockExportDriver *blk_exp_drivers[] = {
34
- int copy_on_read; /* if nonzero, copy read backing sectors into image.
40
&blk_exp_nbd,
35
- note this is a reference count */
41
-#if CONFIG_LINUX
36
-
42
+#ifdef CONFIG_VHOST_USER
37
- CoQueue flush_queue; /* Serializing flush queue */
43
&blk_exp_vhost_user_blk,
38
- bool active_flush_req; /* Flush request in flight? */
44
#endif
39
- unsigned int write_gen; /* Current data generation */
45
};
40
- unsigned int flushed_gen; /* Flushed write generation */
46
diff --git a/block/export/meson.build b/block/export/meson.build
41
-
47
index XXXXXXX..XXXXXXX 100644
42
BlockDriver *drv; /* NULL means no media */
48
--- a/block/export/meson.build
43
void *opaque;
49
+++ b/block/export/meson.build
44
50
@@ -XXX,XX +XXX,XX @@
45
@@ -XXX,XX +XXX,XX @@ struct BlockDriverState {
51
block_ss.add(files('export.c'))
46
BdrvChild *backing;
52
-block_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-blk-server.c', '../../contrib/libvhost-user/libvhost-user.c'))
47
BdrvChild *file;
53
+block_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
48
54
diff --git a/contrib/libvhost-user/meson.build b/contrib/libvhost-user/meson.build
49
- /* Callback before write request is processed */
55
index XXXXXXX..XXXXXXX 100644
50
- NotifierWithReturnList before_write_notifiers;
56
--- a/contrib/libvhost-user/meson.build
51
-
57
+++ b/contrib/libvhost-user/meson.build
52
- /* number of in-flight requests; overall and serialising */
58
@@ -XXX,XX +XXX,XX @@
53
- unsigned int in_flight;
59
libvhost_user = static_library('vhost-user',
54
- unsigned int serialising_in_flight;
60
files('libvhost-user.c', 'libvhost-user-glib.c'),
55
-
61
build_by_default: false)
56
- bool wakeup;
62
+vhost_user = declare_dependency(link_with: libvhost_user)
57
-
63
diff --git a/meson.build b/meson.build
58
- /* Offset after the highest byte written to */
64
index XXXXXXX..XXXXXXX 100644
59
- uint64_t wr_highest_offset;
65
--- a/meson.build
60
-
66
+++ b/meson.build
61
/* I/O Limits */
67
@@ -XXX,XX +XXX,XX @@ trace_events_subdirs += [
62
BlockLimits bl;
68
'util',
63
69
]
64
@@ -XXX,XX +XXX,XX @@ struct BlockDriverState {
70
65
QTAILQ_ENTRY(BlockDriverState) bs_list;
71
+vhost_user = not_found
66
/* element of the list of monitor-owned BDS */
72
+if 'CONFIG_VHOST_USER' in config_host
67
QTAILQ_ENTRY(BlockDriverState) monitor_list;
73
+ subdir('contrib/libvhost-user')
68
- QLIST_HEAD(, BdrvDirtyBitmap) dirty_bitmaps;
74
+endif
69
int refcnt;
70
71
- QLIST_HEAD(, BdrvTrackedRequest) tracked_requests;
72
-
73
/* operation blockers */
74
QLIST_HEAD(, BdrvOpBlocker) op_blockers[BLOCK_OP_TYPE_MAX];
75
76
@@ -XXX,XX +XXX,XX @@ struct BlockDriverState {
77
/* The error object in use for blocking operations on backing_hd */
78
Error *backing_blocker;
79
80
+ /* Protected by AioContext lock */
81
+
75
+
82
+ /* If true, copy read backing sectors into image. Can be >1 if more
76
subdir('qapi')
83
+ * than one client has requested copy-on-read.
77
subdir('qobject')
84
+ */
78
subdir('stubs')
85
+ int copy_on_read;
79
@@ -XXX,XX +XXX,XX @@ if have_tools
86
+
80
install: true)
87
+ /* If we are reading a disk image, give its size in sectors.
81
88
+ * Generally read-only; it is written to by load_vmstate and save_vmstate,
82
if 'CONFIG_VHOST_USER' in config_host
89
+ * but the block layer is quiescent during those.
83
- subdir('contrib/libvhost-user')
90
+ */
84
subdir('contrib/vhost-user-blk')
91
+ int64_t total_sectors;
85
subdir('contrib/vhost-user-gpu')
92
+
86
subdir('contrib/vhost-user-input')
93
+ /* Callback before write request is processed */
87
diff --git a/util/meson.build b/util/meson.build
94
+ NotifierWithReturnList before_write_notifiers;
95
+
96
+ /* number of in-flight requests; overall and serialising */
97
+ unsigned int in_flight;
98
+ unsigned int serialising_in_flight;
99
+
100
+ bool wakeup;
101
+
102
+ /* Offset after the highest byte written to */
103
+ uint64_t wr_highest_offset;
104
+
105
/* threshold limit for writes, in bytes. "High water mark". */
106
uint64_t write_threshold_offset;
107
NotifierWithReturn write_threshold_notifier;
108
@@ -XXX,XX +XXX,XX @@ struct BlockDriverState {
109
/* counter for nested bdrv_io_plug */
110
unsigned io_plugged;
111
112
+ QLIST_HEAD(, BdrvTrackedRequest) tracked_requests;
113
+ CoQueue flush_queue; /* Serializing flush queue */
114
+ bool active_flush_req; /* Flush request in flight? */
115
+ unsigned int write_gen; /* Current data generation */
116
+ unsigned int flushed_gen; /* Flushed write generation */
117
+
118
+ QLIST_HEAD(, BdrvDirtyBitmap) dirty_bitmaps;
119
+
120
+ /* do we need to tell the quest if we have a volatile write cache? */
121
+ int enable_write_cache;
122
+
123
int quiesce_counter;
124
};
125
126
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
127
index XXXXXXX..XXXXXXX 100644
88
index XXXXXXX..XXXXXXX 100644
128
--- a/include/sysemu/block-backend.h
89
--- a/util/meson.build
129
+++ b/include/sysemu/block-backend.h
90
+++ b/util/meson.build
130
@@ -XXX,XX +XXX,XX @@ typedef struct BlockDevOps {
91
@@ -XXX,XX +XXX,XX @@ if have_block
131
* fields that must be public. This is in particular for QLIST_ENTRY() and
92
util_ss.add(files('main-loop.c'))
132
* friends so that BlockBackends can be kept in lists outside block-backend.c */
93
util_ss.add(files('nvdimm-utils.c'))
133
typedef struct BlockBackendPublic {
94
util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
134
- /* I/O throttling.
95
- util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
135
- * throttle_state tells us if this BlockBackend has I/O limits configured.
96
+ util_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: [
136
- * io_limits_disabled tells us if they are currently being enforced */
97
+ files('vhost-user-server.c'), vhost_user
137
+ /* I/O throttling has its own locking, but also some fields are
98
+ ])
138
+ * protected by the AioContext lock.
99
util_ss.add(files('block-helpers.c'))
139
+ */
100
util_ss.add(files('qemu-coroutine-sleep.c'))
140
+
101
util_ss.add(files('qemu-co-shared-resource.c'))
141
+ /* Protected by AioContext lock. */
142
CoQueue throttled_reqs[2];
143
+
144
+ /* Nonzero if the I/O limits are currently being ignored; generally
145
+ * it is zero. */
146
unsigned int io_limits_disabled;
147
148
/* The following fields are protected by the ThrottleGroup lock.
149
- * See the ThrottleGroup documentation for details. */
150
+ * See the ThrottleGroup documentation for details.
151
+ * throttle_state tells us if I/O limits are configured. */
152
ThrottleState *throttle_state;
153
ThrottleTimers throttle_timers;
154
unsigned pending_reqs[2];
155
--
102
--
156
2.9.3
103
2.26.2
157
104
158
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
Introduce libblkdev.fa to avoid recompiling blockdev_ss twice.
2
2
3
qed_aio_start_io and qed_aio_next_io will not have to acquire/release
3
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
4
the AioContext, while qed_aio_next_io_cb will. Split the functionality
4
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
5
and gain a little type-safety in the process.
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
6
Message-id: 20200929125516.186715-3-stefanha@redhat.com
7
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9
Reviewed-by: Fam Zheng <famz@redhat.com>
10
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
11
Message-id: 20170213135235.12274-11-pbonzini@redhat.com
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
---
8
---
14
block/qed.c | 39 +++++++++++++++++++++++++--------------
9
meson.build | 12 ++++++++++--
15
1 file changed, 25 insertions(+), 14 deletions(-)
10
storage-daemon/meson.build | 3 +--
11
2 files changed, 11 insertions(+), 4 deletions(-)
16
12
17
diff --git a/block/qed.c b/block/qed.c
13
diff --git a/meson.build b/meson.build
18
index XXXXXXX..XXXXXXX 100644
14
index XXXXXXX..XXXXXXX 100644
19
--- a/block/qed.c
15
--- a/meson.build
20
+++ b/block/qed.c
16
+++ b/meson.build
21
@@ -XXX,XX +XXX,XX @@ static CachedL2Table *qed_new_l2_table(BDRVQEDState *s)
17
@@ -XXX,XX +XXX,XX @@ blockdev_ss.add(files(
22
return l2_table;
18
# os-win32.c does not
23
}
19
blockdev_ss.add(when: 'CONFIG_POSIX', if_true: files('os-posix.c'))
24
20
softmmu_ss.add(when: 'CONFIG_WIN32', if_true: [files('os-win32.c')])
25
-static void qed_aio_next_io(void *opaque, int ret);
21
-softmmu_ss.add_all(blockdev_ss)
26
+static void qed_aio_next_io(QEDAIOCB *acb, int ret);
22
23
common_ss.add(files('cpus-common.c'))
24
25
@@ -XXX,XX +XXX,XX @@ block = declare_dependency(link_whole: [libblock],
26
link_args: '@block.syms',
27
dependencies: [crypto, io])
28
29
+blockdev_ss = blockdev_ss.apply(config_host, strict: false)
30
+libblockdev = static_library('blockdev', blockdev_ss.sources() + genh,
31
+ dependencies: blockdev_ss.dependencies(),
32
+ name_suffix: 'fa',
33
+ build_by_default: false)
27
+
34
+
28
+static void qed_aio_start_io(QEDAIOCB *acb)
35
+blockdev = declare_dependency(link_whole: [libblockdev],
29
+{
36
+ dependencies: [block])
30
+ qed_aio_next_io(acb, 0);
31
+}
32
+
37
+
33
+static void qed_aio_next_io_cb(void *opaque, int ret)
38
qmp_ss = qmp_ss.apply(config_host, strict: false)
34
+{
39
libqmp = static_library('qmp', qmp_ss.sources() + genh,
35
+ QEDAIOCB *acb = opaque;
40
dependencies: qmp_ss.dependencies(),
36
+
41
@@ -XXX,XX +XXX,XX @@ foreach m : block_mods + softmmu_mods
37
+ qed_aio_next_io(acb, ret);
42
install_dir: config_host['qemu_moddir'])
38
+}
43
endforeach
39
44
40
static void qed_plug_allocating_write_reqs(BDRVQEDState *s)
45
-softmmu_ss.add(authz, block, chardev, crypto, io, qmp)
41
{
46
+softmmu_ss.add(authz, blockdev, chardev, crypto, io, qmp)
42
@@ -XXX,XX +XXX,XX @@ static void qed_unplug_allocating_write_reqs(BDRVQEDState *s)
47
common_ss.add(qom, qemuutil)
43
48
44
acb = QSIMPLEQ_FIRST(&s->allocating_write_reqs);
49
common_ss.add_all(when: 'CONFIG_SOFTMMU', if_true: [softmmu_ss])
45
if (acb) {
50
diff --git a/storage-daemon/meson.build b/storage-daemon/meson.build
46
- qed_aio_next_io(acb, 0);
51
index XXXXXXX..XXXXXXX 100644
47
+ qed_aio_start_io(acb);
52
--- a/storage-daemon/meson.build
48
}
53
+++ b/storage-daemon/meson.build
49
}
54
@@ -XXX,XX +XXX,XX @@
50
55
qsd_ss = ss.source_set()
51
@@ -XXX,XX +XXX,XX @@ static void qed_aio_complete(QEDAIOCB *acb, int ret)
56
qsd_ss.add(files('qemu-storage-daemon.c'))
52
QSIMPLEQ_REMOVE_HEAD(&s->allocating_write_reqs, next);
57
-qsd_ss.add(block, chardev, qmp, qom, qemuutil)
53
acb = QSIMPLEQ_FIRST(&s->allocating_write_reqs);
58
-qsd_ss.add_all(blockdev_ss)
54
if (acb) {
59
+qsd_ss.add(blockdev, chardev, qmp, qom, qemuutil)
55
- qed_aio_next_io(acb, 0);
60
56
+ qed_aio_start_io(acb);
61
subdir('qapi')
57
} else if (s->header.features & QED_F_NEED_CHECK) {
58
qed_start_need_check_timer(s);
59
}
60
@@ -XXX,XX +XXX,XX @@ static void qed_commit_l2_update(void *opaque, int ret)
61
acb->request.l2_table = qed_find_l2_cache_entry(&s->l2_cache, l2_offset);
62
assert(acb->request.l2_table != NULL);
63
64
- qed_aio_next_io(opaque, ret);
65
+ qed_aio_next_io(acb, ret);
66
}
67
68
/**
69
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_l2_update(QEDAIOCB *acb, int ret, uint64_t offset)
70
if (need_alloc) {
71
/* Write out the whole new L2 table */
72
qed_write_l2_table(s, &acb->request, 0, s->table_nelems, true,
73
- qed_aio_write_l1_update, acb);
74
+ qed_aio_write_l1_update, acb);
75
} else {
76
/* Write out only the updated part of the L2 table */
77
qed_write_l2_table(s, &acb->request, index, acb->cur_nclusters, false,
78
- qed_aio_next_io, acb);
79
+ qed_aio_next_io_cb, acb);
80
}
81
return;
82
83
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_main(void *opaque, int ret)
84
}
85
86
if (acb->find_cluster_ret == QED_CLUSTER_FOUND) {
87
- next_fn = qed_aio_next_io;
88
+ next_fn = qed_aio_next_io_cb;
89
} else {
90
if (s->bs->backing) {
91
next_fn = qed_aio_write_flush_before_l2_update;
92
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
93
if (acb->flags & QED_AIOCB_ZERO) {
94
/* Skip ahead if the clusters are already zero */
95
if (acb->find_cluster_ret == QED_CLUSTER_ZERO) {
96
- qed_aio_next_io(acb, 0);
97
+ qed_aio_start_io(acb);
98
return;
99
}
100
101
@@ -XXX,XX +XXX,XX @@ static void qed_aio_read_data(void *opaque, int ret,
102
/* Handle zero cluster and backing file reads */
103
if (ret == QED_CLUSTER_ZERO) {
104
qemu_iovec_memset(&acb->cur_qiov, 0, 0, acb->cur_qiov.size);
105
- qed_aio_next_io(acb, 0);
106
+ qed_aio_start_io(acb);
107
return;
108
} else if (ret != QED_CLUSTER_FOUND) {
109
qed_read_backing_file(s, acb->cur_pos, &acb->cur_qiov,
110
- &acb->backing_qiov, qed_aio_next_io, acb);
111
+ &acb->backing_qiov, qed_aio_next_io_cb, acb);
112
return;
113
}
114
115
BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
116
bdrv_aio_readv(bs->file, offset / BDRV_SECTOR_SIZE,
117
&acb->cur_qiov, acb->cur_qiov.size / BDRV_SECTOR_SIZE,
118
- qed_aio_next_io, acb);
119
+ qed_aio_next_io_cb, acb);
120
return;
121
122
err:
123
@@ -XXX,XX +XXX,XX @@ err:
124
/**
125
* Begin next I/O or complete the request
126
*/
127
-static void qed_aio_next_io(void *opaque, int ret)
128
+static void qed_aio_next_io(QEDAIOCB *acb, int ret)
129
{
130
- QEDAIOCB *acb = opaque;
131
BDRVQEDState *s = acb_to_s(acb);
132
QEDFindClusterFunc *io_fn = (acb->flags & QED_AIOCB_WRITE) ?
133
qed_aio_write_data : qed_aio_read_data;
134
@@ -XXX,XX +XXX,XX @@ static BlockAIOCB *qed_aio_setup(BlockDriverState *bs,
135
qemu_iovec_init(&acb->cur_qiov, qiov->niov);
136
137
/* Start request */
138
- qed_aio_next_io(acb, 0);
139
+ qed_aio_start_io(acb);
140
return &acb->common;
141
}
142
62
143
--
63
--
144
2.9.3
64
2.26.2
145
65
146
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
Block exports are used by softmmu, qemu-storage-daemon, and qemu-nbd.
2
They are not used by other programs and are not otherwise needed in
3
libblock.
2
4
3
Once the thread pool starts using aio_co_wake, it will also need
5
Undo the recent move of blockdev-nbd.c from blockdev_ss into block_ss.
4
qemu_get_current_aio_context(). Make test-thread-pool create
6
Since bdrv_close_all() (libblock) calls blk_exp_close_all()
5
an AioContext with qemu_init_main_loop, so that stubs/iothread.c
7
(libblockdev) a stub function is required..
6
and tests/iothread.c can provide the rest.
7
8
8
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Make qemu-nbd.c use signal handling utility functions instead of
9
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
10
duplicating the code. This helps because os-posix.c is in libblockdev
10
Reviewed-by: Fam Zheng <famz@redhat.com>
11
and it depends on a qemu_system_killed() symbol that qemu-nbd.c lacks.
11
Message-id: 20170213135235.12274-5-pbonzini@redhat.com
12
Once we use the signal handling utility functions we also end up
13
providing the necessary symbol.
14
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
16
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
17
Reviewed-by: Eric Blake <eblake@redhat.com>
18
Message-id: 20200929125516.186715-4-stefanha@redhat.com
19
[Fixed s/ndb/nbd/ typo in commit description as suggested by Eric Blake
20
--Stefan]
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
21
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
---
22
---
14
tests/test-thread-pool.c | 12 +++---------
23
qemu-nbd.c | 21 ++++++++-------------
15
1 file changed, 3 insertions(+), 9 deletions(-)
24
stubs/blk-exp-close-all.c | 7 +++++++
25
block/export/meson.build | 4 ++--
26
meson.build | 4 ++--
27
nbd/meson.build | 2 ++
28
stubs/meson.build | 1 +
29
6 files changed, 22 insertions(+), 17 deletions(-)
30
create mode 100644 stubs/blk-exp-close-all.c
16
31
17
diff --git a/tests/test-thread-pool.c b/tests/test-thread-pool.c
32
diff --git a/qemu-nbd.c b/qemu-nbd.c
18
index XXXXXXX..XXXXXXX 100644
33
index XXXXXXX..XXXXXXX 100644
19
--- a/tests/test-thread-pool.c
34
--- a/qemu-nbd.c
20
+++ b/tests/test-thread-pool.c
35
+++ b/qemu-nbd.c
21
@@ -XXX,XX +XXX,XX @@
36
@@ -XXX,XX +XXX,XX @@
22
#include "qapi/error.h"
37
#include "qapi/error.h"
23
#include "qemu/timer.h"
38
#include "qemu/cutils.h"
24
#include "qemu/error-report.h"
39
#include "sysemu/block-backend.h"
25
+#include "qemu/main-loop.h"
40
+#include "sysemu/runstate.h" /* for qemu_system_killed() prototype */
26
41
#include "block/block_int.h"
27
static AioContext *ctx;
42
#include "block/nbd.h"
28
static ThreadPool *pool;
43
#include "qemu/main-loop.h"
29
@@ -XXX,XX +XXX,XX @@ static void test_cancel_async(void)
44
@@ -XXX,XX +XXX,XX @@ QEMU_COPYRIGHT "\n"
30
int main(int argc, char **argv)
45
}
46
47
#ifdef CONFIG_POSIX
48
-static void termsig_handler(int signum)
49
+/*
50
+ * The client thread uses SIGTERM to interrupt the server. A signal
51
+ * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
52
+ */
53
+void qemu_system_killed(int signum, pid_t pid)
31
{
54
{
32
int ret;
55
qatomic_cmpxchg(&state, RUNNING, TERMINATE);
33
- Error *local_error = NULL;
56
qemu_notify_event();
34
57
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
35
- init_clocks();
58
BlockExportOptions *export_opts;
59
60
#ifdef CONFIG_POSIX
61
- /*
62
- * Exit gracefully on various signals, which includes SIGTERM used
63
- * by 'qemu-nbd -v -c'.
64
- */
65
- struct sigaction sa_sigterm;
66
- memset(&sa_sigterm, 0, sizeof(sa_sigterm));
67
- sa_sigterm.sa_handler = termsig_handler;
68
- sigaction(SIGTERM, &sa_sigterm, NULL);
69
- sigaction(SIGINT, &sa_sigterm, NULL);
70
- sigaction(SIGHUP, &sa_sigterm, NULL);
36
-
71
-
37
- ctx = aio_context_new(&local_error);
72
- signal(SIGPIPE, SIG_IGN);
38
- if (!ctx) {
73
+ os_setup_early_signal_handling();
39
- error_reportf_err(local_error, "Failed to create AIO Context: ");
74
+ os_setup_signal_handling();
40
- exit(1);
75
#endif
41
- }
76
42
+ qemu_init_main_loop(&error_abort);
77
socket_init();
43
+ ctx = qemu_get_current_aio_context();
78
diff --git a/stubs/blk-exp-close-all.c b/stubs/blk-exp-close-all.c
44
pool = aio_get_thread_pool(ctx);
79
new file mode 100644
45
80
index XXXXXXX..XXXXXXX
46
g_test_init(&argc, &argv, NULL);
81
--- /dev/null
47
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
82
+++ b/stubs/blk-exp-close-all.c
48
83
@@ -XXX,XX +XXX,XX @@
49
ret = g_test_run();
84
+#include "qemu/osdep.h"
50
85
+#include "block/export.h"
51
- aio_context_unref(ctx);
86
+
52
return ret;
87
+/* Only used in programs that support block exports (libblockdev.fa) */
53
}
88
+void blk_exp_close_all(void)
89
+{
90
+}
91
diff --git a/block/export/meson.build b/block/export/meson.build
92
index XXXXXXX..XXXXXXX 100644
93
--- a/block/export/meson.build
94
+++ b/block/export/meson.build
95
@@ -XXX,XX +XXX,XX @@
96
-block_ss.add(files('export.c'))
97
-block_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
98
+blockdev_ss.add(files('export.c'))
99
+blockdev_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
100
diff --git a/meson.build b/meson.build
101
index XXXXXXX..XXXXXXX 100644
102
--- a/meson.build
103
+++ b/meson.build
104
@@ -XXX,XX +XXX,XX @@ subdir('dump')
105
106
block_ss.add(files(
107
'block.c',
108
- 'blockdev-nbd.c',
109
'blockjob.c',
110
'job.c',
111
'qemu-io-cmds.c',
112
@@ -XXX,XX +XXX,XX @@ subdir('block')
113
114
blockdev_ss.add(files(
115
'blockdev.c',
116
+ 'blockdev-nbd.c',
117
'iothread.c',
118
'job-qmp.c',
119
))
120
@@ -XXX,XX +XXX,XX @@ if have_tools
121
qemu_io = executable('qemu-io', files('qemu-io.c'),
122
dependencies: [block, qemuutil], install: true)
123
qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'),
124
- dependencies: [block, qemuutil], install: true)
125
+ dependencies: [blockdev, qemuutil], install: true)
126
127
subdir('storage-daemon')
128
subdir('contrib/rdmacm-mux')
129
diff --git a/nbd/meson.build b/nbd/meson.build
130
index XXXXXXX..XXXXXXX 100644
131
--- a/nbd/meson.build
132
+++ b/nbd/meson.build
133
@@ -XXX,XX +XXX,XX @@
134
block_ss.add(files(
135
'client.c',
136
'common.c',
137
+))
138
+blockdev_ss.add(files(
139
'server.c',
140
))
141
diff --git a/stubs/meson.build b/stubs/meson.build
142
index XXXXXXX..XXXXXXX 100644
143
--- a/stubs/meson.build
144
+++ b/stubs/meson.build
145
@@ -XXX,XX +XXX,XX @@
146
stub_ss.add(files('arch_type.c'))
147
stub_ss.add(files('bdrv-next-monitor-owned.c'))
148
stub_ss.add(files('blk-commit-all.c'))
149
+stub_ss.add(files('blk-exp-close-all.c'))
150
stub_ss.add(files('blockdev-close-all-bdrv-states.c'))
151
stub_ss.add(files('change-state-handler.c'))
152
stub_ss.add(files('cmos.c'))
54
--
153
--
55
2.9.3
154
2.26.2
56
155
57
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
Make it possible to specify the iothread where the export will run. By
2
default the block node can be moved to other AioContexts later and the
3
export will follow. The fixed-iothread option forces strict behavior
4
that prevents changing AioContext while the export is active. See the
5
QAPI docs for details.
2
6
3
In the client, read the reply headers from a coroutine, switching the
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
read side between the "read header" coroutine and the I/O coroutine that
8
Message-id: 20200929125516.186715-5-stefanha@redhat.com
5
reads the body of the reply.
9
[Fix stray '#' character in block-export.json and add missing "(since:
6
10
5.2)" as suggested by Eric Blake.
7
In the server, if the server can read more requests it will create a new
11
--Stefan]
8
"read request" coroutine as soon as a request has been read. Otherwise,
9
the new coroutine is created in nbd_request_put.
10
11
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
13
Reviewed-by: Fam Zheng <famz@redhat.com>
14
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
15
Message-id: 20170213135235.12274-8-pbonzini@redhat.com
16
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
17
---
13
---
18
block/nbd-client.h | 2 +-
14
qapi/block-export.json | 11 ++++++++++
19
block/nbd-client.c | 117 ++++++++++++++++++++++++-----------------------------
15
block/export/export.c | 31 +++++++++++++++++++++++++++-
20
nbd/client.c | 2 +-
16
block/export/vhost-user-blk-server.c | 5 ++++-
21
nbd/common.c | 9 +----
17
nbd/server.c | 2 --
22
nbd/server.c | 94 +++++++++++++-----------------------------
18
4 files changed, 45 insertions(+), 4 deletions(-)
23
5 files changed, 83 insertions(+), 141 deletions(-)
24
19
25
diff --git a/block/nbd-client.h b/block/nbd-client.h
20
diff --git a/qapi/block-export.json b/qapi/block-export.json
26
index XXXXXXX..XXXXXXX 100644
21
index XXXXXXX..XXXXXXX 100644
27
--- a/block/nbd-client.h
22
--- a/qapi/block-export.json
28
+++ b/block/nbd-client.h
23
+++ b/qapi/block-export.json
29
@@ -XXX,XX +XXX,XX @@ typedef struct NBDClientSession {
24
@@ -XXX,XX +XXX,XX @@
30
25
# export before completion is signalled. (since: 5.2;
31
CoMutex send_mutex;
26
# default: false)
32
CoQueue free_sema;
27
#
33
- Coroutine *send_coroutine;
28
+# @iothread: The name of the iothread object where the export will run. The
34
+ Coroutine *read_reply_co;
29
+# default is to use the thread currently associated with the
35
int in_flight;
30
+# block node. (since: 5.2)
36
31
+#
37
Coroutine *recv_coroutine[MAX_NBD_REQUESTS];
32
+# @fixed-iothread: True prevents the block node from being moved to another
38
diff --git a/block/nbd-client.c b/block/nbd-client.c
33
+# thread while the export is active. If true and @iothread is
34
+# given, export creation fails if the block node cannot be
35
+# moved to the iothread. The default is false. (since: 5.2)
36
+#
37
# Since: 4.2
38
##
39
{ 'union': 'BlockExportOptions',
40
'base': { 'type': 'BlockExportType',
41
'id': 'str',
42
+     '*fixed-iothread': 'bool',
43
+     '*iothread': 'str',
44
'node-name': 'str',
45
'*writable': 'bool',
46
'*writethrough': 'bool' },
47
diff --git a/block/export/export.c b/block/export/export.c
39
index XXXXXXX..XXXXXXX 100644
48
index XXXXXXX..XXXXXXX 100644
40
--- a/block/nbd-client.c
49
--- a/block/export/export.c
41
+++ b/block/nbd-client.c
50
+++ b/block/export/export.c
42
@@ -XXX,XX +XXX,XX @@
51
@@ -XXX,XX +XXX,XX @@
43
#define HANDLE_TO_INDEX(bs, handle) ((handle) ^ ((uint64_t)(intptr_t)bs))
52
44
#define INDEX_TO_HANDLE(bs, index) ((index) ^ ((uint64_t)(intptr_t)bs))
53
#include "block/block.h"
45
54
#include "sysemu/block-backend.h"
46
-static void nbd_recv_coroutines_enter_all(NBDClientSession *s)
55
+#include "sysemu/iothread.h"
47
+static void nbd_recv_coroutines_enter_all(BlockDriverState *bs)
56
#include "block/export.h"
57
#include "block/nbd.h"
58
#include "qapi/error.h"
59
@@ -XXX,XX +XXX,XX @@ static const BlockExportDriver *blk_exp_find_driver(BlockExportType type)
60
61
BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
48
{
62
{
49
+ NBDClientSession *s = nbd_get_client_session(bs);
63
+ bool fixed_iothread = export->has_fixed_iothread && export->fixed_iothread;
50
int i;
64
const BlockExportDriver *drv;
51
65
BlockExport *exp = NULL;
52
for (i = 0; i < MAX_NBD_REQUESTS; i++) {
66
BlockDriverState *bs;
53
@@ -XXX,XX +XXX,XX @@ static void nbd_recv_coroutines_enter_all(NBDClientSession *s)
67
- BlockBackend *blk;
54
qemu_coroutine_enter(s->recv_coroutine[i]);
68
+ BlockBackend *blk = NULL;
55
}
69
AioContext *ctx;
70
uint64_t perm;
71
int ret;
72
@@ -XXX,XX +XXX,XX @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
73
ctx = bdrv_get_aio_context(bs);
74
aio_context_acquire(ctx);
75
76
+ if (export->has_iothread) {
77
+ IOThread *iothread;
78
+ AioContext *new_ctx;
79
+
80
+ iothread = iothread_by_id(export->iothread);
81
+ if (!iothread) {
82
+ error_setg(errp, "iothread \"%s\" not found", export->iothread);
83
+ goto fail;
84
+ }
85
+
86
+ new_ctx = iothread_get_aio_context(iothread);
87
+
88
+ ret = bdrv_try_set_aio_context(bs, new_ctx, errp);
89
+ if (ret == 0) {
90
+ aio_context_release(ctx);
91
+ aio_context_acquire(new_ctx);
92
+ ctx = new_ctx;
93
+ } else if (fixed_iothread) {
94
+ goto fail;
95
+ }
96
+ }
97
+
98
/*
99
* Block exports are used for non-shared storage migration. Make sure
100
* that BDRV_O_INACTIVE is cleared and the image is ready for write
101
@@ -XXX,XX +XXX,XX @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
56
}
102
}
57
+ BDRV_POLL_WHILE(bs, s->read_reply_co);
103
104
blk = blk_new(ctx, perm, BLK_PERM_ALL);
105
+
106
+ if (!fixed_iothread) {
107
+ blk_set_allow_aio_context_change(blk, true);
108
+ }
109
+
110
ret = blk_insert_bs(blk, bs, errp);
111
if (ret < 0) {
112
goto fail;
113
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
114
index XXXXXXX..XXXXXXX 100644
115
--- a/block/export/vhost-user-blk-server.c
116
+++ b/block/export/vhost-user-blk-server.c
117
@@ -XXX,XX +XXX,XX @@ static const VuDevIface vu_blk_iface = {
118
static void blk_aio_attached(AioContext *ctx, void *opaque)
119
{
120
VuBlkExport *vexp = opaque;
121
+
122
+ vexp->export.ctx = ctx;
123
vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
58
}
124
}
59
125
60
static void nbd_teardown_connection(BlockDriverState *bs)
126
static void blk_aio_detach(void *opaque)
61
@@ -XXX,XX +XXX,XX @@ static void nbd_teardown_connection(BlockDriverState *bs)
127
{
62
qio_channel_shutdown(client->ioc,
128
VuBlkExport *vexp = opaque;
63
QIO_CHANNEL_SHUTDOWN_BOTH,
129
+
64
NULL);
130
vhost_user_server_detach_aio_context(&vexp->vu_server);
65
- nbd_recv_coroutines_enter_all(client);
131
+ vexp->export.ctx = NULL;
66
+ nbd_recv_coroutines_enter_all(bs);
67
68
nbd_client_detach_aio_context(bs);
69
object_unref(OBJECT(client->sioc));
70
@@ -XXX,XX +XXX,XX @@ static void nbd_teardown_connection(BlockDriverState *bs)
71
client->ioc = NULL;
72
}
132
}
73
133
74
-static void nbd_reply_ready(void *opaque)
134
static void
75
+static coroutine_fn void nbd_read_reply_entry(void *opaque)
135
@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
76
{
136
vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
77
- BlockDriverState *bs = opaque;
137
logical_block_size);
78
- NBDClientSession *s = nbd_get_client_session(bs);
138
79
+ NBDClientSession *s = opaque;
139
- blk_set_allow_aio_context_change(exp->blk, true);
80
uint64_t i;
140
blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
81
int ret;
141
vexp);
82
142
83
- if (!s->ioc) { /* Already closed */
84
- return;
85
- }
86
-
87
- if (s->reply.handle == 0) {
88
- /* No reply already in flight. Fetch a header. It is possible
89
- * that another thread has done the same thing in parallel, so
90
- * the socket is not readable anymore.
91
- */
92
+ for (;;) {
93
+ assert(s->reply.handle == 0);
94
ret = nbd_receive_reply(s->ioc, &s->reply);
95
- if (ret == -EAGAIN) {
96
- return;
97
- }
98
if (ret < 0) {
99
- s->reply.handle = 0;
100
- goto fail;
101
+ break;
102
}
103
- }
104
105
- /* There's no need for a mutex on the receive side, because the
106
- * handler acts as a synchronization point and ensures that only
107
- * one coroutine is called until the reply finishes. */
108
- i = HANDLE_TO_INDEX(s, s->reply.handle);
109
- if (i >= MAX_NBD_REQUESTS) {
110
- goto fail;
111
- }
112
+ /* There's no need for a mutex on the receive side, because the
113
+ * handler acts as a synchronization point and ensures that only
114
+ * one coroutine is called until the reply finishes.
115
+ */
116
+ i = HANDLE_TO_INDEX(s, s->reply.handle);
117
+ if (i >= MAX_NBD_REQUESTS || !s->recv_coroutine[i]) {
118
+ break;
119
+ }
120
121
- if (s->recv_coroutine[i]) {
122
- qemu_coroutine_enter(s->recv_coroutine[i]);
123
- return;
124
+ /* We're woken up by the recv_coroutine itself. Note that there
125
+ * is no race between yielding and reentering read_reply_co. This
126
+ * is because:
127
+ *
128
+ * - if recv_coroutine[i] runs on the same AioContext, it is only
129
+ * entered after we yield
130
+ *
131
+ * - if recv_coroutine[i] runs on a different AioContext, reentering
132
+ * read_reply_co happens through a bottom half, which can only
133
+ * run after we yield.
134
+ */
135
+ aio_co_wake(s->recv_coroutine[i]);
136
+ qemu_coroutine_yield();
137
}
138
-
139
-fail:
140
- nbd_teardown_connection(bs);
141
-}
142
-
143
-static void nbd_restart_write(void *opaque)
144
-{
145
- BlockDriverState *bs = opaque;
146
-
147
- qemu_coroutine_enter(nbd_get_client_session(bs)->send_coroutine);
148
+ s->read_reply_co = NULL;
149
}
150
151
static int nbd_co_send_request(BlockDriverState *bs,
152
@@ -XXX,XX +XXX,XX @@ static int nbd_co_send_request(BlockDriverState *bs,
153
QEMUIOVector *qiov)
154
{
155
NBDClientSession *s = nbd_get_client_session(bs);
156
- AioContext *aio_context;
157
int rc, ret, i;
158
159
qemu_co_mutex_lock(&s->send_mutex);
160
@@ -XXX,XX +XXX,XX @@ static int nbd_co_send_request(BlockDriverState *bs,
161
return -EPIPE;
162
}
163
164
- s->send_coroutine = qemu_coroutine_self();
165
- aio_context = bdrv_get_aio_context(bs);
166
-
167
- aio_set_fd_handler(aio_context, s->sioc->fd, false,
168
- nbd_reply_ready, nbd_restart_write, NULL, bs);
169
if (qiov) {
170
qio_channel_set_cork(s->ioc, true);
171
rc = nbd_send_request(s->ioc, request);
172
@@ -XXX,XX +XXX,XX @@ static int nbd_co_send_request(BlockDriverState *bs,
173
} else {
174
rc = nbd_send_request(s->ioc, request);
175
}
176
- aio_set_fd_handler(aio_context, s->sioc->fd, false,
177
- nbd_reply_ready, NULL, NULL, bs);
178
- s->send_coroutine = NULL;
179
qemu_co_mutex_unlock(&s->send_mutex);
180
return rc;
181
}
182
@@ -XXX,XX +XXX,XX @@ static void nbd_co_receive_reply(NBDClientSession *s,
183
{
184
int ret;
185
186
- /* Wait until we're woken up by the read handler. TODO: perhaps
187
- * peek at the next reply and avoid yielding if it's ours? */
188
+ /* Wait until we're woken up by nbd_read_reply_entry. */
189
qemu_coroutine_yield();
190
*reply = s->reply;
191
if (reply->handle != request->handle ||
192
@@ -XXX,XX +XXX,XX @@ static void nbd_coroutine_start(NBDClientSession *s,
193
/* s->recv_coroutine[i] is set as soon as we get the send_lock. */
194
}
195
196
-static void nbd_coroutine_end(NBDClientSession *s,
197
+static void nbd_coroutine_end(BlockDriverState *bs,
198
NBDRequest *request)
199
{
200
+ NBDClientSession *s = nbd_get_client_session(bs);
201
int i = HANDLE_TO_INDEX(s, request->handle);
202
+
203
s->recv_coroutine[i] = NULL;
204
- if (s->in_flight-- == MAX_NBD_REQUESTS) {
205
- qemu_co_queue_next(&s->free_sema);
206
+ s->in_flight--;
207
+ qemu_co_queue_next(&s->free_sema);
208
+
209
+ /* Kick the read_reply_co to get the next reply. */
210
+ if (s->read_reply_co) {
211
+ aio_co_wake(s->read_reply_co);
212
}
213
}
214
215
@@ -XXX,XX +XXX,XX @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
216
} else {
217
nbd_co_receive_reply(client, &request, &reply, qiov);
218
}
219
- nbd_coroutine_end(client, &request);
220
+ nbd_coroutine_end(bs, &request);
221
return -reply.error;
222
}
223
224
@@ -XXX,XX +XXX,XX @@ int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset,
225
} else {
226
nbd_co_receive_reply(client, &request, &reply, NULL);
227
}
228
- nbd_coroutine_end(client, &request);
229
+ nbd_coroutine_end(bs, &request);
230
return -reply.error;
231
}
232
233
@@ -XXX,XX +XXX,XX @@ int nbd_client_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
234
} else {
235
nbd_co_receive_reply(client, &request, &reply, NULL);
236
}
237
- nbd_coroutine_end(client, &request);
238
+ nbd_coroutine_end(bs, &request);
239
return -reply.error;
240
}
241
242
@@ -XXX,XX +XXX,XX @@ int nbd_client_co_flush(BlockDriverState *bs)
243
} else {
244
nbd_co_receive_reply(client, &request, &reply, NULL);
245
}
246
- nbd_coroutine_end(client, &request);
247
+ nbd_coroutine_end(bs, &request);
248
return -reply.error;
249
}
250
251
@@ -XXX,XX +XXX,XX @@ int nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset, int count)
252
} else {
253
nbd_co_receive_reply(client, &request, &reply, NULL);
254
}
255
- nbd_coroutine_end(client, &request);
256
+ nbd_coroutine_end(bs, &request);
257
return -reply.error;
258
259
}
260
261
void nbd_client_detach_aio_context(BlockDriverState *bs)
262
{
263
- aio_set_fd_handler(bdrv_get_aio_context(bs),
264
- nbd_get_client_session(bs)->sioc->fd,
265
- false, NULL, NULL, NULL, NULL);
266
+ NBDClientSession *client = nbd_get_client_session(bs);
267
+ qio_channel_detach_aio_context(QIO_CHANNEL(client->sioc));
268
}
269
270
void nbd_client_attach_aio_context(BlockDriverState *bs,
271
AioContext *new_context)
272
{
273
- aio_set_fd_handler(new_context, nbd_get_client_session(bs)->sioc->fd,
274
- false, nbd_reply_ready, NULL, NULL, bs);
275
+ NBDClientSession *client = nbd_get_client_session(bs);
276
+ qio_channel_attach_aio_context(QIO_CHANNEL(client->sioc), new_context);
277
+ aio_co_schedule(new_context, client->read_reply_co);
278
}
279
280
void nbd_client_close(BlockDriverState *bs)
281
@@ -XXX,XX +XXX,XX @@ int nbd_client_init(BlockDriverState *bs,
282
/* Now that we're connected, set the socket to be non-blocking and
283
* kick the reply mechanism. */
284
qio_channel_set_blocking(QIO_CHANNEL(sioc), false, NULL);
285
-
286
+ client->read_reply_co = qemu_coroutine_create(nbd_read_reply_entry, client);
287
nbd_client_attach_aio_context(bs, bdrv_get_aio_context(bs));
288
289
logout("Established connection with NBD server\n");
290
diff --git a/nbd/client.c b/nbd/client.c
291
index XXXXXXX..XXXXXXX 100644
292
--- a/nbd/client.c
293
+++ b/nbd/client.c
294
@@ -XXX,XX +XXX,XX @@ ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply)
295
ssize_t ret;
296
297
ret = read_sync(ioc, buf, sizeof(buf));
298
- if (ret < 0) {
299
+ if (ret <= 0) {
300
return ret;
301
}
302
303
diff --git a/nbd/common.c b/nbd/common.c
304
index XXXXXXX..XXXXXXX 100644
305
--- a/nbd/common.c
306
+++ b/nbd/common.c
307
@@ -XXX,XX +XXX,XX @@ ssize_t nbd_wr_syncv(QIOChannel *ioc,
308
}
309
if (len == QIO_CHANNEL_ERR_BLOCK) {
310
if (qemu_in_coroutine()) {
311
- /* XXX figure out if we can create a variant on
312
- * qio_channel_yield() that works with AIO contexts
313
- * and consider using that in this branch */
314
- qemu_coroutine_yield();
315
- } else if (done) {
316
- /* XXX this is needed by nbd_reply_ready. */
317
- qio_channel_wait(ioc,
318
- do_read ? G_IO_IN : G_IO_OUT);
319
+ qio_channel_yield(ioc, do_read ? G_IO_IN : G_IO_OUT);
320
} else {
321
return -EAGAIN;
322
}
323
diff --git a/nbd/server.c b/nbd/server.c
143
diff --git a/nbd/server.c b/nbd/server.c
324
index XXXXXXX..XXXXXXX 100644
144
index XXXXXXX..XXXXXXX 100644
325
--- a/nbd/server.c
145
--- a/nbd/server.c
326
+++ b/nbd/server.c
146
+++ b/nbd/server.c
327
@@ -XXX,XX +XXX,XX @@ struct NBDClient {
147
@@ -XXX,XX +XXX,XX @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
328
CoMutex send_lock;
148
return ret;
329
Coroutine *send_coroutine;
149
}
330
150
331
- bool can_read;
151
- blk_set_allow_aio_context_change(blk, true);
332
-
152
-
333
QTAILQ_ENTRY(NBDClient) next;
153
QTAILQ_INIT(&exp->clients);
334
int nb_requests;
154
exp->name = g_strdup(arg->name);
335
bool closing;
155
exp->description = g_strdup(arg->description);
336
@@ -XXX,XX +XXX,XX @@ struct NBDClient {
337
338
/* That's all folks */
339
340
-static void nbd_set_handlers(NBDClient *client);
341
-static void nbd_unset_handlers(NBDClient *client);
342
-static void nbd_update_can_read(NBDClient *client);
343
+static void nbd_client_receive_next_request(NBDClient *client);
344
345
static gboolean nbd_negotiate_continue(QIOChannel *ioc,
346
GIOCondition condition,
347
@@ -XXX,XX +XXX,XX @@ void nbd_client_put(NBDClient *client)
348
*/
349
assert(client->closing);
350
351
- nbd_unset_handlers(client);
352
+ qio_channel_detach_aio_context(client->ioc);
353
object_unref(OBJECT(client->sioc));
354
object_unref(OBJECT(client->ioc));
355
if (client->tlscreds) {
356
@@ -XXX,XX +XXX,XX @@ static NBDRequestData *nbd_request_get(NBDClient *client)
357
358
assert(client->nb_requests <= MAX_NBD_REQUESTS - 1);
359
client->nb_requests++;
360
- nbd_update_can_read(client);
361
362
req = g_new0(NBDRequestData, 1);
363
nbd_client_get(client);
364
@@ -XXX,XX +XXX,XX @@ static void nbd_request_put(NBDRequestData *req)
365
g_free(req);
366
367
client->nb_requests--;
368
- nbd_update_can_read(client);
369
+ nbd_client_receive_next_request(client);
370
+
371
nbd_client_put(client);
372
}
373
374
@@ -XXX,XX +XXX,XX @@ static void blk_aio_attached(AioContext *ctx, void *opaque)
375
exp->ctx = ctx;
376
377
QTAILQ_FOREACH(client, &exp->clients, next) {
378
- nbd_set_handlers(client);
379
+ qio_channel_attach_aio_context(client->ioc, ctx);
380
+ if (client->recv_coroutine) {
381
+ aio_co_schedule(ctx, client->recv_coroutine);
382
+ }
383
+ if (client->send_coroutine) {
384
+ aio_co_schedule(ctx, client->send_coroutine);
385
+ }
386
}
387
}
388
389
@@ -XXX,XX +XXX,XX @@ static void blk_aio_detach(void *opaque)
390
TRACE("Export %s: Detaching clients from AIO context %p\n", exp->name, exp->ctx);
391
392
QTAILQ_FOREACH(client, &exp->clients, next) {
393
- nbd_unset_handlers(client);
394
+ qio_channel_detach_aio_context(client->ioc);
395
}
396
397
exp->ctx = NULL;
398
@@ -XXX,XX +XXX,XX @@ static ssize_t nbd_co_send_reply(NBDRequestData *req, NBDReply *reply,
399
g_assert(qemu_in_coroutine());
400
qemu_co_mutex_lock(&client->send_lock);
401
client->send_coroutine = qemu_coroutine_self();
402
- nbd_set_handlers(client);
403
404
if (!len) {
405
rc = nbd_send_reply(client->ioc, reply);
406
@@ -XXX,XX +XXX,XX @@ static ssize_t nbd_co_send_reply(NBDRequestData *req, NBDReply *reply,
407
}
408
409
client->send_coroutine = NULL;
410
- nbd_set_handlers(client);
411
qemu_co_mutex_unlock(&client->send_lock);
412
return rc;
413
}
414
@@ -XXX,XX +XXX,XX @@ static ssize_t nbd_co_receive_request(NBDRequestData *req,
415
ssize_t rc;
416
417
g_assert(qemu_in_coroutine());
418
- client->recv_coroutine = qemu_coroutine_self();
419
- nbd_update_can_read(client);
420
-
421
+ assert(client->recv_coroutine == qemu_coroutine_self());
422
rc = nbd_receive_request(client->ioc, request);
423
if (rc < 0) {
424
if (rc != -EAGAIN) {
425
@@ -XXX,XX +XXX,XX @@ static ssize_t nbd_co_receive_request(NBDRequestData *req,
426
427
out:
428
client->recv_coroutine = NULL;
429
- nbd_update_can_read(client);
430
+ nbd_client_receive_next_request(client);
431
432
return rc;
433
}
434
435
-static void nbd_trip(void *opaque)
436
+/* Owns a reference to the NBDClient passed as opaque. */
437
+static coroutine_fn void nbd_trip(void *opaque)
438
{
439
NBDClient *client = opaque;
440
NBDExport *exp = client->exp;
441
NBDRequestData *req;
442
- NBDRequest request;
443
+ NBDRequest request = { 0 }; /* GCC thinks it can be used uninitialized */
444
NBDReply reply;
445
ssize_t ret;
446
int flags;
447
448
TRACE("Reading request.");
449
if (client->closing) {
450
+ nbd_client_put(client);
451
return;
452
}
453
454
@@ -XXX,XX +XXX,XX @@ static void nbd_trip(void *opaque)
455
456
done:
457
nbd_request_put(req);
458
+ nbd_client_put(client);
459
return;
460
461
out:
462
nbd_request_put(req);
463
client_close(client);
464
+ nbd_client_put(client);
465
}
466
467
-static void nbd_read(void *opaque)
468
+static void nbd_client_receive_next_request(NBDClient *client)
469
{
470
- NBDClient *client = opaque;
471
-
472
- if (client->recv_coroutine) {
473
- qemu_coroutine_enter(client->recv_coroutine);
474
- } else {
475
- qemu_coroutine_enter(qemu_coroutine_create(nbd_trip, client));
476
- }
477
-}
478
-
479
-static void nbd_restart_write(void *opaque)
480
-{
481
- NBDClient *client = opaque;
482
-
483
- qemu_coroutine_enter(client->send_coroutine);
484
-}
485
-
486
-static void nbd_set_handlers(NBDClient *client)
487
-{
488
- if (client->exp && client->exp->ctx) {
489
- aio_set_fd_handler(client->exp->ctx, client->sioc->fd, true,
490
- client->can_read ? nbd_read : NULL,
491
- client->send_coroutine ? nbd_restart_write : NULL,
492
- NULL, client);
493
- }
494
-}
495
-
496
-static void nbd_unset_handlers(NBDClient *client)
497
-{
498
- if (client->exp && client->exp->ctx) {
499
- aio_set_fd_handler(client->exp->ctx, client->sioc->fd, true, NULL,
500
- NULL, NULL, NULL);
501
- }
502
-}
503
-
504
-static void nbd_update_can_read(NBDClient *client)
505
-{
506
- bool can_read = client->recv_coroutine ||
507
- client->nb_requests < MAX_NBD_REQUESTS;
508
-
509
- if (can_read != client->can_read) {
510
- client->can_read = can_read;
511
- nbd_set_handlers(client);
512
-
513
- /* There is no need to invoke aio_notify(), since aio_set_fd_handler()
514
- * in nbd_set_handlers() will have taken care of that */
515
+ if (!client->recv_coroutine && client->nb_requests < MAX_NBD_REQUESTS) {
516
+ nbd_client_get(client);
517
+ client->recv_coroutine = qemu_coroutine_create(nbd_trip, client);
518
+ aio_co_schedule(client->exp->ctx, client->recv_coroutine);
519
}
520
}
521
522
@@ -XXX,XX +XXX,XX @@ static coroutine_fn void nbd_co_client_start(void *opaque)
523
goto out;
524
}
525
qemu_co_mutex_init(&client->send_lock);
526
- nbd_set_handlers(client);
527
528
if (exp) {
529
QTAILQ_INSERT_TAIL(&exp->clients, client, next);
530
}
531
+
532
+ nbd_client_receive_next_request(client);
533
+
534
out:
535
g_free(data);
536
}
537
@@ -XXX,XX +XXX,XX @@ void nbd_client_new(NBDExport *exp,
538
object_ref(OBJECT(client->sioc));
539
client->ioc = QIO_CHANNEL(sioc);
540
object_ref(OBJECT(client->ioc));
541
- client->can_read = true;
542
client->close = close_fn;
543
544
data->client = client;
545
--
156
--
546
2.9.3
157
2.26.2
547
158
548
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
Allow the number of queues to be configured using --export
2
vhost-user-blk,num-queues=N. This setting should match the QEMU --device
3
vhost-user-blk-pci,num-queues=N setting but QEMU vhost-user-blk.c lowers
4
its own value if the vhost-user-blk backend offers fewer queues than
5
QEMU.
2
6
3
Support separate coroutines for reading and writing, and place the
7
The vhost-user-blk-server.c code is already capable of multi-queue. All
4
read/write handlers on the AioContext that the QIOChannel is registered
8
virtqueue processing runs in the same AioContext. No new locking is
5
with.
9
needed.
6
10
7
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
11
Add the num-queues=N option and set the VIRTIO_BLK_F_MQ feature bit.
8
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Note that the feature bit only announces the presence of the num_queues
9
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
13
configuration space field. It does not promise that there is more than 1
10
Reviewed-by: Fam Zheng <famz@redhat.com>
14
virtqueue, so we can set it unconditionally.
11
Message-id: 20170213135235.12274-7-pbonzini@redhat.com
15
16
I tested multi-queue by running a random read fio test with numjobs=4 on
17
an -smp 4 guest. After the benchmark finished the guest /proc/interrupts
18
file showed activity on all 4 virtio-blk MSI-X. The /sys/block/vda/mq/
19
directory shows that Linux blk-mq has 4 queues configured.
20
21
An automated test is included in the next commit.
22
23
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
24
Acked-by: Markus Armbruster <armbru@redhat.com>
25
Message-id: 20201001144604.559733-2-stefanha@redhat.com
26
[Fixed accidental tab characters as suggested by Markus Armbruster
27
--Stefan]
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
28
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
---
29
---
14
include/io/channel.h | 47 ++++++++++++++++++++++++++--
30
qapi/block-export.json | 10 +++++++---
15
io/channel.c | 86 +++++++++++++++++++++++++++++++++++++++-------------
31
block/export/vhost-user-blk-server.c | 24 ++++++++++++++++++------
16
2 files changed, 109 insertions(+), 24 deletions(-)
32
2 files changed, 25 insertions(+), 9 deletions(-)
17
33
18
diff --git a/include/io/channel.h b/include/io/channel.h
34
diff --git a/qapi/block-export.json b/qapi/block-export.json
19
index XXXXXXX..XXXXXXX 100644
35
index XXXXXXX..XXXXXXX 100644
20
--- a/include/io/channel.h
36
--- a/qapi/block-export.json
21
+++ b/include/io/channel.h
37
+++ b/qapi/block-export.json
22
@@ -XXX,XX +XXX,XX @@
38
@@ -XXX,XX +XXX,XX @@
23
39
# SocketAddress types are supported. Passed fds must be UNIX domain
24
#include "qemu-common.h"
40
# sockets.
25
#include "qom/object.h"
41
# @logical-block-size: Logical block size in bytes. Defaults to 512 bytes.
26
+#include "qemu/coroutine.h"
42
+# @num-queues: Number of request virtqueues. Must be greater than 0. Defaults
27
#include "block/aio.h"
43
+# to 1.
28
44
#
29
#define TYPE_QIO_CHANNEL "qio-channel"
45
# Since: 5.2
30
@@ -XXX,XX +XXX,XX @@ struct QIOChannel {
46
##
31
Object parent;
47
{ 'struct': 'BlockExportOptionsVhostUserBlk',
32
unsigned int features; /* bitmask of QIOChannelFeatures */
48
- 'data': { 'addr': 'SocketAddress', '*logical-block-size': 'size' } }
33
char *name;
49
+ 'data': { 'addr': 'SocketAddress',
34
+ AioContext *ctx;
50
+     '*logical-block-size': 'size',
35
+ Coroutine *read_coroutine;
51
+ '*num-queues': 'uint16'} }
36
+ Coroutine *write_coroutine;
52
37
#ifdef _WIN32
53
##
38
HANDLE event; /* For use with GSource on Win32 */
54
# @NbdServerAddOptions:
39
#endif
55
@@ -XXX,XX +XXX,XX @@
40
@@ -XXX,XX +XXX,XX @@ guint qio_channel_add_watch(QIOChannel *ioc,
56
{ 'union': 'BlockExportOptions',
41
57
'base': { 'type': 'BlockExportType',
42
58
'id': 'str',
43
/**
59
-     '*fixed-iothread': 'bool',
44
+ * qio_channel_attach_aio_context:
60
-     '*iothread': 'str',
45
+ * @ioc: the channel object
61
+ '*fixed-iothread': 'bool',
46
+ * @ctx: the #AioContext to set the handlers on
62
+ '*iothread': 'str',
47
+ *
63
'node-name': 'str',
48
+ * Request that qio_channel_yield() sets I/O handlers on
64
'*writable': 'bool',
49
+ * the given #AioContext. If @ctx is %NULL, qio_channel_yield()
65
'*writethrough': 'bool' },
50
+ * uses QEMU's main thread event loop.
66
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
51
+ *
67
index XXXXXXX..XXXXXXX 100644
52
+ * You can move a #QIOChannel from one #AioContext to another even if
68
--- a/block/export/vhost-user-blk-server.c
53
+ * I/O handlers are set for a coroutine. However, #QIOChannel provides
69
+++ b/block/export/vhost-user-blk-server.c
54
+ * no synchronization between the calls to qio_channel_yield() and
70
@@ -XXX,XX +XXX,XX @@
55
+ * qio_channel_attach_aio_context().
71
#include "util/block-helpers.h"
56
+ *
72
57
+ * Therefore you should first call qio_channel_detach_aio_context()
73
enum {
58
+ * to ensure that the coroutine is not entered concurrently. Then,
74
- VHOST_USER_BLK_MAX_QUEUES = 1,
59
+ * while the coroutine has yielded, call qio_channel_attach_aio_context(),
75
+ VHOST_USER_BLK_NUM_QUEUES_DEFAULT = 1,
60
+ * and then aio_co_schedule() to place the coroutine on the new
76
};
61
+ * #AioContext. The calls to qio_channel_detach_aio_context()
77
struct virtio_blk_inhdr {
62
+ * and qio_channel_attach_aio_context() should be protected with
78
unsigned char status;
63
+ * aio_context_acquire() and aio_context_release().
79
@@ -XXX,XX +XXX,XX @@ static uint64_t vu_blk_get_features(VuDev *dev)
64
+ */
80
1ull << VIRTIO_BLK_F_DISCARD |
65
+void qio_channel_attach_aio_context(QIOChannel *ioc,
81
1ull << VIRTIO_BLK_F_WRITE_ZEROES |
66
+ AioContext *ctx);
82
1ull << VIRTIO_BLK_F_CONFIG_WCE |
83
+ 1ull << VIRTIO_BLK_F_MQ |
84
1ull << VIRTIO_F_VERSION_1 |
85
1ull << VIRTIO_RING_F_INDIRECT_DESC |
86
1ull << VIRTIO_RING_F_EVENT_IDX |
87
@@ -XXX,XX +XXX,XX @@ static void blk_aio_detach(void *opaque)
88
89
static void
90
vu_blk_initialize_config(BlockDriverState *bs,
91
- struct virtio_blk_config *config, uint32_t blk_size)
92
+ struct virtio_blk_config *config,
93
+ uint32_t blk_size,
94
+ uint16_t num_queues)
95
{
96
config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
97
config->blk_size = blk_size;
98
@@ -XXX,XX +XXX,XX @@ vu_blk_initialize_config(BlockDriverState *bs,
99
config->seg_max = 128 - 2;
100
config->min_io_size = 1;
101
config->opt_io_size = 1;
102
- config->num_queues = VHOST_USER_BLK_MAX_QUEUES;
103
+ config->num_queues = num_queues;
104
config->max_discard_sectors = 32768;
105
config->max_discard_seg = 1;
106
config->discard_sector_alignment = config->blk_size >> 9;
107
@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
108
BlockExportOptionsVhostUserBlk *vu_opts = &opts->u.vhost_user_blk;
109
Error *local_err = NULL;
110
uint64_t logical_block_size;
111
+ uint16_t num_queues = VHOST_USER_BLK_NUM_QUEUES_DEFAULT;
112
113
vexp->writable = opts->writable;
114
vexp->blkcfg.wce = 0;
115
@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
116
}
117
vexp->blk_size = logical_block_size;
118
blk_set_guest_block_size(exp->blk, logical_block_size);
67
+
119
+
68
+/**
120
+ if (vu_opts->has_num_queues) {
69
+ * qio_channel_detach_aio_context:
121
+ num_queues = vu_opts->num_queues;
70
+ * @ioc: the channel object
71
+ *
72
+ * Disable any I/O handlers set by qio_channel_yield(). With the
73
+ * help of aio_co_schedule(), this allows moving a coroutine that was
74
+ * paused by qio_channel_yield() to another context.
75
+ */
76
+void qio_channel_detach_aio_context(QIOChannel *ioc);
77
+
78
+/**
79
* qio_channel_yield:
80
* @ioc: the channel object
81
* @condition: the I/O condition to wait for
82
*
83
- * Yields execution from the current coroutine until
84
- * the condition indicated by @condition becomes
85
- * available.
86
+ * Yields execution from the current coroutine until the condition
87
+ * indicated by @condition becomes available. @condition must
88
+ * be either %G_IO_IN or %G_IO_OUT; it cannot contain both. In
89
+ * addition, no two coroutine can be waiting on the same condition
90
+ * and channel at the same time.
91
*
92
* This must only be called from coroutine context
93
*/
94
diff --git a/io/channel.c b/io/channel.c
95
index XXXXXXX..XXXXXXX 100644
96
--- a/io/channel.c
97
+++ b/io/channel.c
98
@@ -XXX,XX +XXX,XX @@
99
#include "qemu/osdep.h"
100
#include "io/channel.h"
101
#include "qapi/error.h"
102
-#include "qemu/coroutine.h"
103
+#include "qemu/main-loop.h"
104
105
bool qio_channel_has_feature(QIOChannel *ioc,
106
QIOChannelFeature feature)
107
@@ -XXX,XX +XXX,XX @@ off_t qio_channel_io_seek(QIOChannel *ioc,
108
}
109
110
111
-typedef struct QIOChannelYieldData QIOChannelYieldData;
112
-struct QIOChannelYieldData {
113
- QIOChannel *ioc;
114
- Coroutine *co;
115
-};
116
+static void qio_channel_set_aio_fd_handlers(QIOChannel *ioc);
117
118
+static void qio_channel_restart_read(void *opaque)
119
+{
120
+ QIOChannel *ioc = opaque;
121
+ Coroutine *co = ioc->read_coroutine;
122
+
123
+ ioc->read_coroutine = NULL;
124
+ qio_channel_set_aio_fd_handlers(ioc);
125
+ aio_co_wake(co);
126
+}
127
128
-static gboolean qio_channel_yield_enter(QIOChannel *ioc,
129
- GIOCondition condition,
130
- gpointer opaque)
131
+static void qio_channel_restart_write(void *opaque)
132
{
133
- QIOChannelYieldData *data = opaque;
134
- qemu_coroutine_enter(data->co);
135
- return FALSE;
136
+ QIOChannel *ioc = opaque;
137
+ Coroutine *co = ioc->write_coroutine;
138
+
139
+ ioc->write_coroutine = NULL;
140
+ qio_channel_set_aio_fd_handlers(ioc);
141
+ aio_co_wake(co);
142
}
143
144
+static void qio_channel_set_aio_fd_handlers(QIOChannel *ioc)
145
+{
146
+ IOHandler *rd_handler = NULL, *wr_handler = NULL;
147
+ AioContext *ctx;
148
+
149
+ if (ioc->read_coroutine) {
150
+ rd_handler = qio_channel_restart_read;
151
+ }
122
+ }
152
+ if (ioc->write_coroutine) {
123
+ if (num_queues == 0) {
153
+ wr_handler = qio_channel_restart_write;
124
+ error_setg(errp, "num-queues must be greater than 0");
125
+ return -EINVAL;
154
+ }
126
+ }
155
+
127
+
156
+ ctx = ioc->ctx ? ioc->ctx : iohandler_get_aio_context();
128
vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
157
+ qio_channel_set_aio_fd_handler(ioc, ctx, rd_handler, wr_handler, ioc);
129
- logical_block_size);
158
+}
130
+ logical_block_size, num_queues);
159
+
131
160
+void qio_channel_attach_aio_context(QIOChannel *ioc,
132
blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
161
+ AioContext *ctx)
133
vexp);
162
+{
134
163
+ AioContext *old_ctx;
135
if (!vhost_user_server_start(&vexp->vu_server, vu_opts->addr, exp->ctx,
164
+ if (ioc->ctx == ctx) {
136
- VHOST_USER_BLK_MAX_QUEUES, &vu_blk_iface,
165
+ return;
137
- errp)) {
166
+ }
138
+ num_queues, &vu_blk_iface, errp)) {
167
+
139
blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
168
+ old_ctx = ioc->ctx ? ioc->ctx : iohandler_get_aio_context();
140
blk_aio_detach, vexp);
169
+ qio_channel_set_aio_fd_handler(ioc, old_ctx, NULL, NULL, NULL);
141
return -EADDRNOTAVAIL;
170
+ ioc->ctx = ctx;
171
+ qio_channel_set_aio_fd_handlers(ioc);
172
+}
173
+
174
+void qio_channel_detach_aio_context(QIOChannel *ioc)
175
+{
176
+ ioc->read_coroutine = NULL;
177
+ ioc->write_coroutine = NULL;
178
+ qio_channel_set_aio_fd_handlers(ioc);
179
+ ioc->ctx = NULL;
180
+}
181
182
void coroutine_fn qio_channel_yield(QIOChannel *ioc,
183
GIOCondition condition)
184
{
185
- QIOChannelYieldData data;
186
-
187
assert(qemu_in_coroutine());
188
- data.ioc = ioc;
189
- data.co = qemu_coroutine_self();
190
- qio_channel_add_watch(ioc,
191
- condition,
192
- qio_channel_yield_enter,
193
- &data,
194
- NULL);
195
+ if (condition == G_IO_IN) {
196
+ assert(!ioc->read_coroutine);
197
+ ioc->read_coroutine = qemu_coroutine_self();
198
+ } else if (condition == G_IO_OUT) {
199
+ assert(!ioc->write_coroutine);
200
+ ioc->write_coroutine = qemu_coroutine_self();
201
+ } else {
202
+ abort();
203
+ }
204
+ qio_channel_set_aio_fd_handlers(ioc);
205
qemu_coroutine_yield();
206
}
207
208
--
142
--
209
2.9.3
143
2.26.2
210
144
211
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2
2
3
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
3
bdrv_co_block_status_above has several design problems with handling
4
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4
short backing files:
5
Reviewed-by: Fam Zheng <famz@redhat.com>
5
6
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
6
1. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but
7
Message-id: 20170213135235.12274-15-pbonzini@redhat.com
7
without BDRV_BLOCK_ALLOCATED flag, when actually short backing file
8
which produces these after-EOF zeros is inside requested backing
9
sequence.
10
11
2. With want_zero=false, it may return pnum=0 prior to actual EOF,
12
because of EOF of short backing file.
13
14
Fix these things, making logic about short backing files clearer.
15
16
With fixed bdrv_block_status_above we also have to improve is_zero in
17
qcow2 code, otherwise iotest 154 will fail, because with this patch we
18
stop to merge zeros of different types (produced by fully unallocated
19
in the whole backing chain regions vs produced by short backing files).
20
21
Note also, that this patch leaves for another day the general problem
22
around block-status: misuse of BDRV_BLOCK_ALLOCATED as is-fs-allocated
23
vs go-to-backing.
24
25
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
26
Reviewed-by: Alberto Garcia <berto@igalia.com>
27
Reviewed-by: Eric Blake <eblake@redhat.com>
28
Message-id: 20200924194003.22080-2-vsementsov@virtuozzo.com
29
[Fix s/comes/come/ as suggested by Eric Blake
30
--Stefan]
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
31
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
---
32
---
10
block/archipelago.c | 3 +++
33
block/io.c | 68 ++++++++++++++++++++++++++++++++++++++++-----------
11
block/blkreplay.c | 2 +-
34
block/qcow2.c | 16 ++++++++++--
12
block/block-backend.c | 6 ++++++
35
2 files changed, 68 insertions(+), 16 deletions(-)
13
block/curl.c | 26 ++++++++++++++++++--------
14
block/gluster.c | 9 +--------
15
block/io.c | 6 +++++-
16
block/iscsi.c | 6 +++++-
17
block/linux-aio.c | 15 +++++++++------
18
block/nfs.c | 3 ++-
19
block/null.c | 4 ++++
20
block/qed.c | 3 +++
21
block/rbd.c | 4 ++++
22
dma-helpers.c | 2 ++
23
hw/block/virtio-blk.c | 2 ++
24
hw/scsi/scsi-bus.c | 2 ++
25
util/async.c | 4 ++--
26
util/thread-pool.c | 2 ++
27
17 files changed, 71 insertions(+), 28 deletions(-)
28
36
29
diff --git a/block/archipelago.c b/block/archipelago.c
30
index XXXXXXX..XXXXXXX 100644
31
--- a/block/archipelago.c
32
+++ b/block/archipelago.c
33
@@ -XXX,XX +XXX,XX @@ static void qemu_archipelago_complete_aio(void *opaque)
34
{
35
AIORequestData *reqdata = (AIORequestData *) opaque;
36
ArchipelagoAIOCB *aio_cb = (ArchipelagoAIOCB *) reqdata->aio_cb;
37
+ AioContext *ctx = bdrv_get_aio_context(aio_cb->common.bs);
38
39
+ aio_context_acquire(ctx);
40
aio_cb->common.cb(aio_cb->common.opaque, aio_cb->ret);
41
+ aio_context_release(ctx);
42
aio_cb->status = 0;
43
44
qemu_aio_unref(aio_cb);
45
diff --git a/block/blkreplay.c b/block/blkreplay.c
46
index XXXXXXX..XXXXXXX 100755
47
--- a/block/blkreplay.c
48
+++ b/block/blkreplay.c
49
@@ -XXX,XX +XXX,XX @@ static int64_t blkreplay_getlength(BlockDriverState *bs)
50
static void blkreplay_bh_cb(void *opaque)
51
{
52
Request *req = opaque;
53
- qemu_coroutine_enter(req->co);
54
+ aio_co_wake(req->co);
55
qemu_bh_delete(req->bh);
56
g_free(req);
57
}
58
diff --git a/block/block-backend.c b/block/block-backend.c
59
index XXXXXXX..XXXXXXX 100644
60
--- a/block/block-backend.c
61
+++ b/block/block-backend.c
62
@@ -XXX,XX +XXX,XX @@ int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags)
63
static void error_callback_bh(void *opaque)
64
{
65
struct BlockBackendAIOCB *acb = opaque;
66
+ AioContext *ctx = bdrv_get_aio_context(acb->common.bs);
67
68
bdrv_dec_in_flight(acb->common.bs);
69
+ aio_context_acquire(ctx);
70
acb->common.cb(acb->common.opaque, acb->ret);
71
+ aio_context_release(ctx);
72
qemu_aio_unref(acb);
73
}
74
75
@@ -XXX,XX +XXX,XX @@ static void blk_aio_complete(BlkAioEmAIOCB *acb)
76
static void blk_aio_complete_bh(void *opaque)
77
{
78
BlkAioEmAIOCB *acb = opaque;
79
+ AioContext *ctx = bdrv_get_aio_context(acb->common.bs);
80
81
assert(acb->has_returned);
82
+ aio_context_acquire(ctx);
83
blk_aio_complete(acb);
84
+ aio_context_release(ctx);
85
}
86
87
static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
88
diff --git a/block/curl.c b/block/curl.c
89
index XXXXXXX..XXXXXXX 100644
90
--- a/block/curl.c
91
+++ b/block/curl.c
92
@@ -XXX,XX +XXX,XX @@ static void curl_readv_bh_cb(void *p)
93
{
94
CURLState *state;
95
int running;
96
+ int ret = -EINPROGRESS;
97
98
CURLAIOCB *acb = p;
99
- BDRVCURLState *s = acb->common.bs->opaque;
100
+ BlockDriverState *bs = acb->common.bs;
101
+ BDRVCURLState *s = bs->opaque;
102
+ AioContext *ctx = bdrv_get_aio_context(bs);
103
104
size_t start = acb->sector_num * BDRV_SECTOR_SIZE;
105
size_t end;
106
107
+ aio_context_acquire(ctx);
108
+
109
// In case we have the requested data already (e.g. read-ahead),
110
// we can just call the callback and be done.
111
switch (curl_find_buf(s, start, acb->nb_sectors * BDRV_SECTOR_SIZE, acb)) {
112
@@ -XXX,XX +XXX,XX @@ static void curl_readv_bh_cb(void *p)
113
qemu_aio_unref(acb);
114
// fall through
115
case FIND_RET_WAIT:
116
- return;
117
+ goto out;
118
default:
119
break;
120
}
121
@@ -XXX,XX +XXX,XX @@ static void curl_readv_bh_cb(void *p)
122
// No cache found, so let's start a new request
123
state = curl_init_state(acb->common.bs, s);
124
if (!state) {
125
- acb->common.cb(acb->common.opaque, -EIO);
126
- qemu_aio_unref(acb);
127
- return;
128
+ ret = -EIO;
129
+ goto out;
130
}
131
132
acb->start = 0;
133
@@ -XXX,XX +XXX,XX @@ static void curl_readv_bh_cb(void *p)
134
state->orig_buf = g_try_malloc(state->buf_len);
135
if (state->buf_len && state->orig_buf == NULL) {
136
curl_clean_state(state);
137
- acb->common.cb(acb->common.opaque, -ENOMEM);
138
- qemu_aio_unref(acb);
139
- return;
140
+ ret = -ENOMEM;
141
+ goto out;
142
}
143
state->acb[0] = acb;
144
145
@@ -XXX,XX +XXX,XX @@ static void curl_readv_bh_cb(void *p)
146
147
/* Tell curl it needs to kick things off */
148
curl_multi_socket_action(s->multi, CURL_SOCKET_TIMEOUT, 0, &running);
149
+
150
+out:
151
+ if (ret != -EINPROGRESS) {
152
+ acb->common.cb(acb->common.opaque, ret);
153
+ qemu_aio_unref(acb);
154
+ }
155
+ aio_context_release(ctx);
156
}
157
158
static BlockAIOCB *curl_aio_readv(BlockDriverState *bs,
159
diff --git a/block/gluster.c b/block/gluster.c
160
index XXXXXXX..XXXXXXX 100644
161
--- a/block/gluster.c
162
+++ b/block/gluster.c
163
@@ -XXX,XX +XXX,XX @@ static struct glfs *qemu_gluster_init(BlockdevOptionsGluster *gconf,
164
return qemu_gluster_glfs_init(gconf, errp);
165
}
166
167
-static void qemu_gluster_complete_aio(void *opaque)
168
-{
169
- GlusterAIOCB *acb = (GlusterAIOCB *)opaque;
170
-
171
- qemu_coroutine_enter(acb->coroutine);
172
-}
173
-
174
/*
175
* AIO callback routine called from GlusterFS thread.
176
*/
177
@@ -XXX,XX +XXX,XX @@ static void gluster_finish_aiocb(struct glfs_fd *fd, ssize_t ret, void *arg)
178
acb->ret = -EIO; /* Partial read/write - fail it */
179
}
180
181
- aio_bh_schedule_oneshot(acb->aio_context, qemu_gluster_complete_aio, acb);
182
+ aio_co_schedule(acb->aio_context, acb->coroutine);
183
}
184
185
static void qemu_gluster_parse_flags(int bdrv_flags, int *open_flags)
186
diff --git a/block/io.c b/block/io.c
37
diff --git a/block/io.c b/block/io.c
187
index XXXXXXX..XXXXXXX 100644
38
index XXXXXXX..XXXXXXX 100644
188
--- a/block/io.c
39
--- a/block/io.c
189
+++ b/block/io.c
40
+++ b/block/io.c
190
@@ -XXX,XX +XXX,XX @@ static void bdrv_co_drain_bh_cb(void *opaque)
41
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
191
bdrv_dec_in_flight(bs);
42
int64_t *map,
192
bdrv_drained_begin(bs);
43
BlockDriverState **file)
193
data->done = true;
44
{
194
- qemu_coroutine_enter(co);
45
+ int ret;
195
+ aio_co_wake(co);
46
BlockDriverState *p;
47
- int ret = 0;
48
- bool first = true;
49
+ int64_t eof = 0;
50
51
assert(bs != base);
52
- for (p = bs; p != base; p = bdrv_filter_or_cow_bs(p)) {
53
+
54
+ ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
55
+ if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED) {
56
+ return ret;
57
+ }
58
+
59
+ if (ret & BDRV_BLOCK_EOF) {
60
+ eof = offset + *pnum;
61
+ }
62
+
63
+ assert(*pnum <= bytes);
64
+ bytes = *pnum;
65
+
66
+ for (p = bdrv_filter_or_cow_bs(bs); p != base;
67
+ p = bdrv_filter_or_cow_bs(p))
68
+ {
69
ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
70
file);
71
if (ret < 0) {
72
- break;
73
+ return ret;
74
}
75
- if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
76
+ if (*pnum == 0) {
77
/*
78
- * Reading beyond the end of the file continues to read
79
- * zeroes, but we can only widen the result to the
80
- * unallocated length we learned from an earlier
81
- * iteration.
82
+ * The top layer deferred to this layer, and because this layer is
83
+ * short, any zeroes that we synthesize beyond EOF behave as if they
84
+ * were allocated at this layer.
85
+ *
86
+ * We don't include BDRV_BLOCK_EOF into ret, as upper layer may be
87
+ * larger. We'll add BDRV_BLOCK_EOF if needed at function end, see
88
+ * below.
89
*/
90
+ assert(ret & BDRV_BLOCK_EOF);
91
*pnum = bytes;
92
+ if (file) {
93
+ *file = p;
94
+ }
95
+ ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_ALLOCATED;
96
+ break;
97
}
98
- if (ret & (BDRV_BLOCK_ZERO | BDRV_BLOCK_DATA)) {
99
+ if (ret & BDRV_BLOCK_ALLOCATED) {
100
+ /*
101
+ * We've found the node and the status, we must break.
102
+ *
103
+ * Drop BDRV_BLOCK_EOF, as it's not for upper layer, which may be
104
+ * larger. We'll add BDRV_BLOCK_EOF if needed at function end, see
105
+ * below.
106
+ */
107
+ ret &= ~BDRV_BLOCK_EOF;
108
break;
109
}
110
- /* [offset, pnum] unallocated on this layer, which could be only
111
- * the first part of [offset, bytes]. */
112
- bytes = MIN(bytes, *pnum);
113
- first = false;
114
+
115
+ /*
116
+ * OK, [offset, offset + *pnum) region is unallocated on this layer,
117
+ * let's continue the diving.
118
+ */
119
+ assert(*pnum <= bytes);
120
+ bytes = *pnum;
121
+ }
122
+
123
+ if (offset + *pnum == eof) {
124
+ ret |= BDRV_BLOCK_EOF;
125
}
126
+
127
return ret;
196
}
128
}
197
129
198
static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs)
130
diff --git a/block/qcow2.c b/block/qcow2.c
199
@@ -XXX,XX +XXX,XX @@ static void bdrv_co_complete(BlockAIOCBCoroutine *acb)
131
index XXXXXXX..XXXXXXX 100644
200
static void bdrv_co_em_bh(void *opaque)
132
--- a/block/qcow2.c
201
{
133
+++ b/block/qcow2.c
202
BlockAIOCBCoroutine *acb = opaque;
134
@@ -XXX,XX +XXX,XX @@ static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
203
+ BlockDriverState *bs = acb->common.bs;
135
if (!bytes) {
204
+ AioContext *ctx = bdrv_get_aio_context(bs);
136
return true;
205
137
}
206
assert(!acb->need_bh);
138
- res = bdrv_block_status_above(bs, NULL, offset, bytes, &nr, NULL, NULL);
207
+ aio_context_acquire(ctx);
139
- return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
208
bdrv_co_complete(acb);
140
+
209
+ aio_context_release(ctx);
141
+ /*
142
+ * bdrv_block_status_above doesn't merge different types of zeros, for
143
+ * example, zeros which come from the region which is unallocated in
144
+ * the whole backing chain, and zeros which come because of a short
145
+ * backing file. So, we need a loop.
146
+ */
147
+ do {
148
+ res = bdrv_block_status_above(bs, NULL, offset, bytes, &nr, NULL, NULL);
149
+ offset += nr;
150
+ bytes -= nr;
151
+ } while (res >= 0 && (res & BDRV_BLOCK_ZERO) && nr && bytes);
152
+
153
+ return res >= 0 && (res & BDRV_BLOCK_ZERO) && bytes == 0;
210
}
154
}
211
155
212
static void bdrv_co_maybe_schedule_bh(BlockAIOCBCoroutine *acb)
156
static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
213
diff --git a/block/iscsi.c b/block/iscsi.c
214
index XXXXXXX..XXXXXXX 100644
215
--- a/block/iscsi.c
216
+++ b/block/iscsi.c
217
@@ -XXX,XX +XXX,XX @@ static void
218
iscsi_bh_cb(void *p)
219
{
220
IscsiAIOCB *acb = p;
221
+ AioContext *ctx = bdrv_get_aio_context(acb->common.bs);
222
223
qemu_bh_delete(acb->bh);
224
225
g_free(acb->buf);
226
acb->buf = NULL;
227
228
+ aio_context_acquire(ctx);
229
acb->common.cb(acb->common.opaque, acb->status);
230
+ aio_context_release(ctx);
231
232
if (acb->task != NULL) {
233
scsi_free_scsi_task(acb->task);
234
@@ -XXX,XX +XXX,XX @@ iscsi_schedule_bh(IscsiAIOCB *acb)
235
static void iscsi_co_generic_bh_cb(void *opaque)
236
{
237
struct IscsiTask *iTask = opaque;
238
+
239
iTask->complete = 1;
240
- qemu_coroutine_enter(iTask->co);
241
+ aio_co_wake(iTask->co);
242
}
243
244
static void iscsi_retry_timer_expired(void *opaque)
245
diff --git a/block/linux-aio.c b/block/linux-aio.c
246
index XXXXXXX..XXXXXXX 100644
247
--- a/block/linux-aio.c
248
+++ b/block/linux-aio.c
249
@@ -XXX,XX +XXX,XX @@ struct LinuxAioState {
250
io_context_t ctx;
251
EventNotifier e;
252
253
- /* io queue for submit at batch */
254
+ /* io queue for submit at batch. Protected by AioContext lock. */
255
LaioQueue io_q;
256
257
- /* I/O completion processing */
258
+ /* I/O completion processing. Only runs in I/O thread. */
259
QEMUBH *completion_bh;
260
int event_idx;
261
int event_max;
262
@@ -XXX,XX +XXX,XX @@ static inline ssize_t io_event_ret(struct io_event *ev)
263
*/
264
static void qemu_laio_process_completion(struct qemu_laiocb *laiocb)
265
{
266
+ LinuxAioState *s = laiocb->ctx;
267
int ret;
268
269
ret = laiocb->ret;
270
@@ -XXX,XX +XXX,XX @@ static void qemu_laio_process_completion(struct qemu_laiocb *laiocb)
271
}
272
273
laiocb->ret = ret;
274
+ aio_context_acquire(s->aio_context);
275
if (laiocb->co) {
276
/* If the coroutine is already entered it must be in ioq_submit() and
277
* will notice laio->ret has been filled in when it eventually runs
278
@@ -XXX,XX +XXX,XX @@ static void qemu_laio_process_completion(struct qemu_laiocb *laiocb)
279
laiocb->common.cb(laiocb->common.opaque, ret);
280
qemu_aio_unref(laiocb);
281
}
282
+ aio_context_release(s->aio_context);
283
}
284
285
/**
286
@@ -XXX,XX +XXX,XX @@ static void qemu_laio_process_completions(LinuxAioState *s)
287
static void qemu_laio_process_completions_and_submit(LinuxAioState *s)
288
{
289
qemu_laio_process_completions(s);
290
+
291
+ aio_context_acquire(s->aio_context);
292
if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
293
ioq_submit(s);
294
}
295
+ aio_context_release(s->aio_context);
296
}
297
298
static void qemu_laio_completion_bh(void *opaque)
299
@@ -XXX,XX +XXX,XX @@ static void qemu_laio_completion_cb(EventNotifier *e)
300
LinuxAioState *s = container_of(e, LinuxAioState, e);
301
302
if (event_notifier_test_and_clear(&s->e)) {
303
- aio_context_acquire(s->aio_context);
304
qemu_laio_process_completions_and_submit(s);
305
- aio_context_release(s->aio_context);
306
}
307
}
308
309
@@ -XXX,XX +XXX,XX @@ static bool qemu_laio_poll_cb(void *opaque)
310
return false;
311
}
312
313
- aio_context_acquire(s->aio_context);
314
qemu_laio_process_completions_and_submit(s);
315
- aio_context_release(s->aio_context);
316
return true;
317
}
318
319
@@ -XXX,XX +XXX,XX @@ void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context)
320
{
321
aio_set_event_notifier(old_context, &s->e, false, NULL, NULL);
322
qemu_bh_delete(s->completion_bh);
323
+ s->aio_context = NULL;
324
}
325
326
void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context)
327
diff --git a/block/nfs.c b/block/nfs.c
328
index XXXXXXX..XXXXXXX 100644
329
--- a/block/nfs.c
330
+++ b/block/nfs.c
331
@@ -XXX,XX +XXX,XX @@ static void nfs_co_init_task(BlockDriverState *bs, NFSRPC *task)
332
static void nfs_co_generic_bh_cb(void *opaque)
333
{
334
NFSRPC *task = opaque;
335
+
336
task->complete = 1;
337
- qemu_coroutine_enter(task->co);
338
+ aio_co_wake(task->co);
339
}
340
341
static void
342
diff --git a/block/null.c b/block/null.c
343
index XXXXXXX..XXXXXXX 100644
344
--- a/block/null.c
345
+++ b/block/null.c
346
@@ -XXX,XX +XXX,XX @@ static const AIOCBInfo null_aiocb_info = {
347
static void null_bh_cb(void *opaque)
348
{
349
NullAIOCB *acb = opaque;
350
+ AioContext *ctx = bdrv_get_aio_context(acb->common.bs);
351
+
352
+ aio_context_acquire(ctx);
353
acb->common.cb(acb->common.opaque, 0);
354
+ aio_context_release(ctx);
355
qemu_aio_unref(acb);
356
}
357
358
diff --git a/block/qed.c b/block/qed.c
359
index XXXXXXX..XXXXXXX 100644
360
--- a/block/qed.c
361
+++ b/block/qed.c
362
@@ -XXX,XX +XXX,XX @@ static void qed_update_l2_table(BDRVQEDState *s, QEDTable *table, int index,
363
static void qed_aio_complete_bh(void *opaque)
364
{
365
QEDAIOCB *acb = opaque;
366
+ BDRVQEDState *s = acb_to_s(acb);
367
BlockCompletionFunc *cb = acb->common.cb;
368
void *user_opaque = acb->common.opaque;
369
int ret = acb->bh_ret;
370
@@ -XXX,XX +XXX,XX @@ static void qed_aio_complete_bh(void *opaque)
371
qemu_aio_unref(acb);
372
373
/* Invoke callback */
374
+ qed_acquire(s);
375
cb(user_opaque, ret);
376
+ qed_release(s);
377
}
378
379
static void qed_aio_complete(QEDAIOCB *acb, int ret)
380
diff --git a/block/rbd.c b/block/rbd.c
381
index XXXXXXX..XXXXXXX 100644
382
--- a/block/rbd.c
383
+++ b/block/rbd.c
384
@@ -XXX,XX +XXX,XX @@ shutdown:
385
static void qemu_rbd_complete_aio(RADOSCB *rcb)
386
{
387
RBDAIOCB *acb = rcb->acb;
388
+ AioContext *ctx = bdrv_get_aio_context(acb->common.bs);
389
int64_t r;
390
391
r = rcb->ret;
392
@@ -XXX,XX +XXX,XX @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
393
qemu_iovec_from_buf(acb->qiov, 0, acb->bounce, acb->qiov->size);
394
}
395
qemu_vfree(acb->bounce);
396
+
397
+ aio_context_acquire(ctx);
398
acb->common.cb(acb->common.opaque, (acb->ret > 0 ? 0 : acb->ret));
399
+ aio_context_release(ctx);
400
401
qemu_aio_unref(acb);
402
}
403
diff --git a/dma-helpers.c b/dma-helpers.c
404
index XXXXXXX..XXXXXXX 100644
405
--- a/dma-helpers.c
406
+++ b/dma-helpers.c
407
@@ -XXX,XX +XXX,XX @@ static void dma_blk_cb(void *opaque, int ret)
408
QEMU_ALIGN_DOWN(dbs->iov.size, dbs->align));
409
}
410
411
+ aio_context_acquire(dbs->ctx);
412
dbs->acb = dbs->io_func(dbs->offset, &dbs->iov,
413
dma_blk_cb, dbs, dbs->io_func_opaque);
414
+ aio_context_release(dbs->ctx);
415
assert(dbs->acb);
416
}
417
418
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
419
index XXXXXXX..XXXXXXX 100644
420
--- a/hw/block/virtio-blk.c
421
+++ b/hw/block/virtio-blk.c
422
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_dma_restart_bh(void *opaque)
423
424
s->rq = NULL;
425
426
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
427
while (req) {
428
VirtIOBlockReq *next = req->next;
429
if (virtio_blk_handle_request(req, &mrb)) {
430
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_dma_restart_bh(void *opaque)
431
if (mrb.num_reqs) {
432
virtio_blk_submit_multireq(s->blk, &mrb);
433
}
434
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
435
}
436
437
static void virtio_blk_dma_restart_cb(void *opaque, int running,
438
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
439
index XXXXXXX..XXXXXXX 100644
440
--- a/hw/scsi/scsi-bus.c
441
+++ b/hw/scsi/scsi-bus.c
442
@@ -XXX,XX +XXX,XX @@ static void scsi_dma_restart_bh(void *opaque)
443
qemu_bh_delete(s->bh);
444
s->bh = NULL;
445
446
+ aio_context_acquire(blk_get_aio_context(s->conf.blk));
447
QTAILQ_FOREACH_SAFE(req, &s->requests, next, next) {
448
scsi_req_ref(req);
449
if (req->retry) {
450
@@ -XXX,XX +XXX,XX @@ static void scsi_dma_restart_bh(void *opaque)
451
}
452
scsi_req_unref(req);
453
}
454
+ aio_context_release(blk_get_aio_context(s->conf.blk));
455
}
456
457
void scsi_req_retry(SCSIRequest *req)
458
diff --git a/util/async.c b/util/async.c
459
index XXXXXXX..XXXXXXX 100644
460
--- a/util/async.c
461
+++ b/util/async.c
462
@@ -XXX,XX +XXX,XX @@ int aio_bh_poll(AioContext *ctx)
463
ret = 1;
464
}
465
bh->idle = 0;
466
- aio_context_acquire(ctx);
467
aio_bh_call(bh);
468
- aio_context_release(ctx);
469
}
470
if (bh->deleted) {
471
deleted = true;
472
@@ -XXX,XX +XXX,XX @@ static void co_schedule_bh_cb(void *opaque)
473
Coroutine *co = QSLIST_FIRST(&straight);
474
QSLIST_REMOVE_HEAD(&straight, co_scheduled_next);
475
trace_aio_co_schedule_bh_cb(ctx, co);
476
+ aio_context_acquire(ctx);
477
qemu_coroutine_enter(co);
478
+ aio_context_release(ctx);
479
}
480
}
481
482
diff --git a/util/thread-pool.c b/util/thread-pool.c
483
index XXXXXXX..XXXXXXX 100644
484
--- a/util/thread-pool.c
485
+++ b/util/thread-pool.c
486
@@ -XXX,XX +XXX,XX @@ static void thread_pool_completion_bh(void *opaque)
487
ThreadPool *pool = opaque;
488
ThreadPoolElement *elem, *next;
489
490
+ aio_context_acquire(pool->ctx);
491
restart:
492
QLIST_FOREACH_SAFE(elem, &pool->head, all, next) {
493
if (elem->state != THREAD_DONE) {
494
@@ -XXX,XX +XXX,XX @@ restart:
495
qemu_aio_unref(elem);
496
}
497
}
498
+ aio_context_release(pool->ctx);
499
}
500
501
static void thread_pool_cancel(BlockAIOCB *acb)
502
--
157
--
503
2.9.3
158
2.26.2
504
159
505
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2
2
3
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
3
In order to reuse bdrv_common_block_status_above in
4
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4
bdrv_is_allocated_above, let's support include_base parameter.
5
Reviewed-by: Fam Zheng <famz@redhat.com>
5
6
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
6
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
7
Message-id: 20170213135235.12274-16-pbonzini@redhat.com
7
Reviewed-by: Alberto Garcia <berto@igalia.com>
8
Reviewed-by: Eric Blake <eblake@redhat.com>
9
Message-id: 20200924194003.22080-3-vsementsov@virtuozzo.com
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
---
11
---
10
block/archipelago.c | 3 ---
12
block/coroutines.h | 2 ++
11
block/block-backend.c | 7 -------
13
block/io.c | 21 ++++++++++++++-------
12
block/curl.c | 2 +-
14
2 files changed, 16 insertions(+), 7 deletions(-)
13
block/io.c | 6 +-----
14
block/iscsi.c | 3 ---
15
block/linux-aio.c | 5 +----
16
block/mirror.c | 12 +++++++++---
17
block/null.c | 8 --------
18
block/qed-cluster.c | 2 ++
19
block/qed-table.c | 12 ++++++++++--
20
block/qed.c | 4 ++--
21
block/rbd.c | 4 ----
22
block/win32-aio.c | 3 ---
23
hw/block/virtio-blk.c | 12 +++++++++++-
24
hw/scsi/scsi-disk.c | 15 +++++++++++++++
25
hw/scsi/scsi-generic.c | 20 +++++++++++++++++---
26
util/thread-pool.c | 4 +++-
27
17 files changed, 72 insertions(+), 50 deletions(-)
28
15
29
diff --git a/block/archipelago.c b/block/archipelago.c
16
diff --git a/block/coroutines.h b/block/coroutines.h
30
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
31
--- a/block/archipelago.c
18
--- a/block/coroutines.h
32
+++ b/block/archipelago.c
19
+++ b/block/coroutines.h
33
@@ -XXX,XX +XXX,XX @@ static void qemu_archipelago_complete_aio(void *opaque)
20
@@ -XXX,XX +XXX,XX @@ bdrv_pwritev(BdrvChild *child, int64_t offset, unsigned int bytes,
34
{
21
int coroutine_fn
35
AIORequestData *reqdata = (AIORequestData *) opaque;
22
bdrv_co_common_block_status_above(BlockDriverState *bs,
36
ArchipelagoAIOCB *aio_cb = (ArchipelagoAIOCB *) reqdata->aio_cb;
23
BlockDriverState *base,
37
- AioContext *ctx = bdrv_get_aio_context(aio_cb->common.bs);
24
+ bool include_base,
38
25
bool want_zero,
39
- aio_context_acquire(ctx);
26
int64_t offset,
40
aio_cb->common.cb(aio_cb->common.opaque, aio_cb->ret);
27
int64_t bytes,
41
- aio_context_release(ctx);
28
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
42
aio_cb->status = 0;
29
int generated_co_wrapper
43
30
bdrv_common_block_status_above(BlockDriverState *bs,
44
qemu_aio_unref(aio_cb);
31
BlockDriverState *base,
45
diff --git a/block/block-backend.c b/block/block-backend.c
32
+ bool include_base,
46
index XXXXXXX..XXXXXXX 100644
33
bool want_zero,
47
--- a/block/block-backend.c
34
int64_t offset,
48
+++ b/block/block-backend.c
35
int64_t bytes,
49
@@ -XXX,XX +XXX,XX @@ int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags)
50
static void error_callback_bh(void *opaque)
51
{
52
struct BlockBackendAIOCB *acb = opaque;
53
- AioContext *ctx = bdrv_get_aio_context(acb->common.bs);
54
55
bdrv_dec_in_flight(acb->common.bs);
56
- aio_context_acquire(ctx);
57
acb->common.cb(acb->common.opaque, acb->ret);
58
- aio_context_release(ctx);
59
qemu_aio_unref(acb);
60
}
61
62
@@ -XXX,XX +XXX,XX @@ static void blk_aio_complete(BlkAioEmAIOCB *acb)
63
static void blk_aio_complete_bh(void *opaque)
64
{
65
BlkAioEmAIOCB *acb = opaque;
66
- AioContext *ctx = bdrv_get_aio_context(acb->common.bs);
67
-
68
assert(acb->has_returned);
69
- aio_context_acquire(ctx);
70
blk_aio_complete(acb);
71
- aio_context_release(ctx);
72
}
73
74
static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
75
diff --git a/block/curl.c b/block/curl.c
76
index XXXXXXX..XXXXXXX 100644
77
--- a/block/curl.c
78
+++ b/block/curl.c
79
@@ -XXX,XX +XXX,XX @@ static void curl_readv_bh_cb(void *p)
80
curl_multi_socket_action(s->multi, CURL_SOCKET_TIMEOUT, 0, &running);
81
82
out:
83
+ aio_context_release(ctx);
84
if (ret != -EINPROGRESS) {
85
acb->common.cb(acb->common.opaque, ret);
86
qemu_aio_unref(acb);
87
}
88
- aio_context_release(ctx);
89
}
90
91
static BlockAIOCB *curl_aio_readv(BlockDriverState *bs,
92
diff --git a/block/io.c b/block/io.c
36
diff --git a/block/io.c b/block/io.c
93
index XXXXXXX..XXXXXXX 100644
37
index XXXXXXX..XXXXXXX 100644
94
--- a/block/io.c
38
--- a/block/io.c
95
+++ b/block/io.c
39
+++ b/block/io.c
96
@@ -XXX,XX +XXX,XX @@ static void bdrv_co_io_em_complete(void *opaque, int ret)
40
@@ -XXX,XX +XXX,XX @@ early_out:
97
CoroutineIOCompletion *co = opaque;
41
int coroutine_fn
98
42
bdrv_co_common_block_status_above(BlockDriverState *bs,
99
co->ret = ret;
43
BlockDriverState *base,
100
- qemu_coroutine_enter(co->coroutine);
44
+ bool include_base,
101
+ aio_co_wake(co->coroutine);
45
bool want_zero,
46
int64_t offset,
47
int64_t bytes,
48
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
49
BlockDriverState *p;
50
int64_t eof = 0;
51
52
- assert(bs != base);
53
+ assert(include_base || bs != base);
54
+ assert(!include_base || base); /* Can't include NULL base */
55
56
ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
57
- if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED) {
58
+ if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
59
return ret;
60
}
61
62
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
63
assert(*pnum <= bytes);
64
bytes = *pnum;
65
66
- for (p = bdrv_filter_or_cow_bs(bs); p != base;
67
+ for (p = bdrv_filter_or_cow_bs(bs); include_base || p != base;
68
p = bdrv_filter_or_cow_bs(p))
69
{
70
ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
71
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
72
break;
73
}
74
75
+ if (p == base) {
76
+ assert(include_base);
77
+ break;
78
+ }
79
+
80
/*
81
* OK, [offset, offset + *pnum) region is unallocated on this layer,
82
* let's continue the diving.
83
@@ -XXX,XX +XXX,XX @@ int bdrv_block_status_above(BlockDriverState *bs, BlockDriverState *base,
84
int64_t offset, int64_t bytes, int64_t *pnum,
85
int64_t *map, BlockDriverState **file)
86
{
87
- return bdrv_common_block_status_above(bs, base, true, offset, bytes,
88
+ return bdrv_common_block_status_above(bs, base, false, true, offset, bytes,
89
pnum, map, file);
102
}
90
}
103
91
104
static int coroutine_fn bdrv_driver_preadv(BlockDriverState *bs,
92
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
105
@@ -XXX,XX +XXX,XX @@ static void bdrv_co_complete(BlockAIOCBCoroutine *acb)
106
static void bdrv_co_em_bh(void *opaque)
107
{
108
BlockAIOCBCoroutine *acb = opaque;
109
- BlockDriverState *bs = acb->common.bs;
110
- AioContext *ctx = bdrv_get_aio_context(bs);
111
112
assert(!acb->need_bh);
113
- aio_context_acquire(ctx);
114
bdrv_co_complete(acb);
115
- aio_context_release(ctx);
116
}
117
118
static void bdrv_co_maybe_schedule_bh(BlockAIOCBCoroutine *acb)
119
diff --git a/block/iscsi.c b/block/iscsi.c
120
index XXXXXXX..XXXXXXX 100644
121
--- a/block/iscsi.c
122
+++ b/block/iscsi.c
123
@@ -XXX,XX +XXX,XX @@ static void
124
iscsi_bh_cb(void *p)
125
{
126
IscsiAIOCB *acb = p;
127
- AioContext *ctx = bdrv_get_aio_context(acb->common.bs);
128
129
qemu_bh_delete(acb->bh);
130
131
g_free(acb->buf);
132
acb->buf = NULL;
133
134
- aio_context_acquire(ctx);
135
acb->common.cb(acb->common.opaque, acb->status);
136
- aio_context_release(ctx);
137
138
if (acb->task != NULL) {
139
scsi_free_scsi_task(acb->task);
140
diff --git a/block/linux-aio.c b/block/linux-aio.c
141
index XXXXXXX..XXXXXXX 100644
142
--- a/block/linux-aio.c
143
+++ b/block/linux-aio.c
144
@@ -XXX,XX +XXX,XX @@ static inline ssize_t io_event_ret(struct io_event *ev)
145
*/
146
static void qemu_laio_process_completion(struct qemu_laiocb *laiocb)
147
{
148
- LinuxAioState *s = laiocb->ctx;
149
int ret;
93
int ret;
150
94
int64_t dummy;
151
ret = laiocb->ret;
95
152
@@ -XXX,XX +XXX,XX @@ static void qemu_laio_process_completion(struct qemu_laiocb *laiocb)
96
- ret = bdrv_common_block_status_above(bs, bdrv_filter_or_cow_bs(bs), false,
97
- offset, bytes, pnum ? pnum : &dummy,
98
- NULL, NULL);
99
+ ret = bdrv_common_block_status_above(bs, bs, true, false, offset,
100
+ bytes, pnum ? pnum : &dummy, NULL,
101
+ NULL);
102
if (ret < 0) {
103
return ret;
153
}
104
}
154
155
laiocb->ret = ret;
156
- aio_context_acquire(s->aio_context);
157
if (laiocb->co) {
158
/* If the coroutine is already entered it must be in ioq_submit() and
159
* will notice laio->ret has been filled in when it eventually runs
160
@@ -XXX,XX +XXX,XX @@ static void qemu_laio_process_completion(struct qemu_laiocb *laiocb)
161
* that!
162
*/
163
if (!qemu_coroutine_entered(laiocb->co)) {
164
- qemu_coroutine_enter(laiocb->co);
165
+ aio_co_wake(laiocb->co);
166
}
167
} else {
168
laiocb->common.cb(laiocb->common.opaque, ret);
169
qemu_aio_unref(laiocb);
170
}
171
- aio_context_release(s->aio_context);
172
}
173
174
/**
175
diff --git a/block/mirror.c b/block/mirror.c
176
index XXXXXXX..XXXXXXX 100644
177
--- a/block/mirror.c
178
+++ b/block/mirror.c
179
@@ -XXX,XX +XXX,XX @@ static void mirror_write_complete(void *opaque, int ret)
180
{
181
MirrorOp *op = opaque;
182
MirrorBlockJob *s = op->s;
183
+
184
+ aio_context_acquire(blk_get_aio_context(s->common.blk));
185
if (ret < 0) {
186
BlockErrorAction action;
187
188
@@ -XXX,XX +XXX,XX @@ static void mirror_write_complete(void *opaque, int ret)
189
}
190
}
191
mirror_iteration_done(op, ret);
192
+ aio_context_release(blk_get_aio_context(s->common.blk));
193
}
194
195
static void mirror_read_complete(void *opaque, int ret)
196
{
197
MirrorOp *op = opaque;
198
MirrorBlockJob *s = op->s;
199
+
200
+ aio_context_acquire(blk_get_aio_context(s->common.blk));
201
if (ret < 0) {
202
BlockErrorAction action;
203
204
@@ -XXX,XX +XXX,XX @@ static void mirror_read_complete(void *opaque, int ret)
205
}
206
207
mirror_iteration_done(op, ret);
208
- return;
209
+ } else {
210
+ blk_aio_pwritev(s->target, op->sector_num * BDRV_SECTOR_SIZE, &op->qiov,
211
+ 0, mirror_write_complete, op);
212
}
213
- blk_aio_pwritev(s->target, op->sector_num * BDRV_SECTOR_SIZE, &op->qiov,
214
- 0, mirror_write_complete, op);
215
+ aio_context_release(blk_get_aio_context(s->common.blk));
216
}
217
218
static inline void mirror_clip_sectors(MirrorBlockJob *s,
219
diff --git a/block/null.c b/block/null.c
220
index XXXXXXX..XXXXXXX 100644
221
--- a/block/null.c
222
+++ b/block/null.c
223
@@ -XXX,XX +XXX,XX @@ static const AIOCBInfo null_aiocb_info = {
224
static void null_bh_cb(void *opaque)
225
{
226
NullAIOCB *acb = opaque;
227
- AioContext *ctx = bdrv_get_aio_context(acb->common.bs);
228
-
229
- aio_context_acquire(ctx);
230
acb->common.cb(acb->common.opaque, 0);
231
- aio_context_release(ctx);
232
qemu_aio_unref(acb);
233
}
234
235
static void null_timer_cb(void *opaque)
236
{
237
NullAIOCB *acb = opaque;
238
- AioContext *ctx = bdrv_get_aio_context(acb->common.bs);
239
-
240
- aio_context_acquire(ctx);
241
acb->common.cb(acb->common.opaque, 0);
242
- aio_context_release(ctx);
243
timer_deinit(&acb->timer);
244
qemu_aio_unref(acb);
245
}
246
diff --git a/block/qed-cluster.c b/block/qed-cluster.c
247
index XXXXXXX..XXXXXXX 100644
248
--- a/block/qed-cluster.c
249
+++ b/block/qed-cluster.c
250
@@ -XXX,XX +XXX,XX @@ static void qed_find_cluster_cb(void *opaque, int ret)
251
unsigned int index;
252
unsigned int n;
253
254
+ qed_acquire(s);
255
if (ret) {
256
goto out;
257
}
258
@@ -XXX,XX +XXX,XX @@ static void qed_find_cluster_cb(void *opaque, int ret)
259
260
out:
261
find_cluster_cb->cb(find_cluster_cb->opaque, ret, offset, len);
262
+ qed_release(s);
263
g_free(find_cluster_cb);
264
}
265
266
diff --git a/block/qed-table.c b/block/qed-table.c
267
index XXXXXXX..XXXXXXX 100644
268
--- a/block/qed-table.c
269
+++ b/block/qed-table.c
270
@@ -XXX,XX +XXX,XX @@ static void qed_read_table_cb(void *opaque, int ret)
271
{
272
QEDReadTableCB *read_table_cb = opaque;
273
QEDTable *table = read_table_cb->table;
274
+ BDRVQEDState *s = read_table_cb->s;
275
int noffsets = read_table_cb->qiov.size / sizeof(uint64_t);
276
int i;
277
278
@@ -XXX,XX +XXX,XX @@ static void qed_read_table_cb(void *opaque, int ret)
279
}
280
281
/* Byteswap offsets */
282
+ qed_acquire(s);
283
for (i = 0; i < noffsets; i++) {
284
table->offsets[i] = le64_to_cpu(table->offsets[i]);
285
}
286
+ qed_release(s);
287
288
out:
289
/* Completion */
290
- trace_qed_read_table_cb(read_table_cb->s, read_table_cb->table, ret);
291
+ trace_qed_read_table_cb(s, read_table_cb->table, ret);
292
gencb_complete(&read_table_cb->gencb, ret);
293
}
294
295
@@ -XXX,XX +XXX,XX @@ typedef struct {
296
static void qed_write_table_cb(void *opaque, int ret)
297
{
298
QEDWriteTableCB *write_table_cb = opaque;
299
+ BDRVQEDState *s = write_table_cb->s;
300
301
- trace_qed_write_table_cb(write_table_cb->s,
302
+ trace_qed_write_table_cb(s,
303
write_table_cb->orig_table,
304
write_table_cb->flush,
305
ret);
306
@@ -XXX,XX +XXX,XX @@ static void qed_write_table_cb(void *opaque, int ret)
307
if (write_table_cb->flush) {
308
/* We still need to flush first */
309
write_table_cb->flush = false;
310
+ qed_acquire(s);
311
bdrv_aio_flush(write_table_cb->s->bs, qed_write_table_cb,
312
write_table_cb);
313
+ qed_release(s);
314
return;
315
}
316
317
@@ -XXX,XX +XXX,XX @@ static void qed_read_l2_table_cb(void *opaque, int ret)
318
CachedL2Table *l2_table = request->l2_table;
319
uint64_t l2_offset = read_l2_table_cb->l2_offset;
320
321
+ qed_acquire(s);
322
if (ret) {
323
/* can't trust loaded L2 table anymore */
324
qed_unref_l2_cache_entry(l2_table);
325
@@ -XXX,XX +XXX,XX @@ static void qed_read_l2_table_cb(void *opaque, int ret)
326
request->l2_table = qed_find_l2_cache_entry(&s->l2_cache, l2_offset);
327
assert(request->l2_table != NULL);
328
}
329
+ qed_release(s);
330
331
gencb_complete(&read_l2_table_cb->gencb, ret);
332
}
333
diff --git a/block/qed.c b/block/qed.c
334
index XXXXXXX..XXXXXXX 100644
335
--- a/block/qed.c
336
+++ b/block/qed.c
337
@@ -XXX,XX +XXX,XX @@ static void qed_is_allocated_cb(void *opaque, int ret, uint64_t offset, size_t l
338
}
339
340
if (cb->co) {
341
- qemu_coroutine_enter(cb->co);
342
+ aio_co_wake(cb->co);
343
}
344
}
345
346
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn qed_co_pwrite_zeroes_cb(void *opaque, int ret)
347
cb->done = true;
348
cb->ret = ret;
349
if (cb->co) {
350
- qemu_coroutine_enter(cb->co);
351
+ aio_co_wake(cb->co);
352
}
353
}
354
355
diff --git a/block/rbd.c b/block/rbd.c
356
index XXXXXXX..XXXXXXX 100644
357
--- a/block/rbd.c
358
+++ b/block/rbd.c
359
@@ -XXX,XX +XXX,XX @@ shutdown:
360
static void qemu_rbd_complete_aio(RADOSCB *rcb)
361
{
362
RBDAIOCB *acb = rcb->acb;
363
- AioContext *ctx = bdrv_get_aio_context(acb->common.bs);
364
int64_t r;
365
366
r = rcb->ret;
367
@@ -XXX,XX +XXX,XX @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
368
qemu_iovec_from_buf(acb->qiov, 0, acb->bounce, acb->qiov->size);
369
}
370
qemu_vfree(acb->bounce);
371
-
372
- aio_context_acquire(ctx);
373
acb->common.cb(acb->common.opaque, (acb->ret > 0 ? 0 : acb->ret));
374
- aio_context_release(ctx);
375
376
qemu_aio_unref(acb);
377
}
378
diff --git a/block/win32-aio.c b/block/win32-aio.c
379
index XXXXXXX..XXXXXXX 100644
380
--- a/block/win32-aio.c
381
+++ b/block/win32-aio.c
382
@@ -XXX,XX +XXX,XX @@ static void win32_aio_process_completion(QEMUWin32AIOState *s,
383
qemu_vfree(waiocb->buf);
384
}
385
386
-
387
- aio_context_acquire(s->aio_ctx);
388
waiocb->common.cb(waiocb->common.opaque, ret);
389
- aio_context_release(s->aio_ctx);
390
qemu_aio_unref(waiocb);
391
}
392
393
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
394
index XXXXXXX..XXXXXXX 100644
395
--- a/hw/block/virtio-blk.c
396
+++ b/hw/block/virtio-blk.c
397
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error,
398
static void virtio_blk_rw_complete(void *opaque, int ret)
399
{
400
VirtIOBlockReq *next = opaque;
401
+ VirtIOBlock *s = next->dev;
402
403
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
404
while (next) {
405
VirtIOBlockReq *req = next;
406
next = req->mr_next;
407
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_rw_complete(void *opaque, int ret)
408
block_acct_done(blk_get_stats(req->dev->blk), &req->acct);
409
virtio_blk_free_request(req);
410
}
411
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
412
}
413
414
static void virtio_blk_flush_complete(void *opaque, int ret)
415
{
416
VirtIOBlockReq *req = opaque;
417
+ VirtIOBlock *s = req->dev;
418
419
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
420
if (ret) {
421
if (virtio_blk_handle_rw_error(req, -ret, 0)) {
422
- return;
423
+ goto out;
424
}
425
}
426
427
virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
428
block_acct_done(blk_get_stats(req->dev->blk), &req->acct);
429
virtio_blk_free_request(req);
430
+
431
+out:
432
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
433
}
434
435
#ifdef __linux__
436
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_ioctl_complete(void *opaque, int status)
437
virtio_stl_p(vdev, &scsi->data_len, hdr->dxfer_len);
438
439
out:
440
+ aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
441
virtio_blk_req_complete(req, status);
442
virtio_blk_free_request(req);
443
+ aio_context_release(blk_get_aio_context(s->conf.conf.blk));
444
g_free(ioctl_req);
445
}
446
447
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
448
index XXXXXXX..XXXXXXX 100644
449
--- a/hw/scsi/scsi-disk.c
450
+++ b/hw/scsi/scsi-disk.c
451
@@ -XXX,XX +XXX,XX @@ static void scsi_aio_complete(void *opaque, int ret)
452
453
assert(r->req.aiocb != NULL);
454
r->req.aiocb = NULL;
455
+ aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
456
if (scsi_disk_req_check_error(r, ret, true)) {
457
goto done;
458
}
459
@@ -XXX,XX +XXX,XX @@ static void scsi_aio_complete(void *opaque, int ret)
460
scsi_req_complete(&r->req, GOOD);
461
462
done:
463
+ aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
464
scsi_req_unref(&r->req);
465
}
466
467
@@ -XXX,XX +XXX,XX @@ static void scsi_dma_complete(void *opaque, int ret)
468
assert(r->req.aiocb != NULL);
469
r->req.aiocb = NULL;
470
471
+ aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
472
if (ret < 0) {
473
block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct);
474
} else {
475
block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
476
}
477
scsi_dma_complete_noio(r, ret);
478
+ aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
479
}
480
481
static void scsi_read_complete(void * opaque, int ret)
482
@@ -XXX,XX +XXX,XX @@ static void scsi_read_complete(void * opaque, int ret)
483
484
assert(r->req.aiocb != NULL);
485
r->req.aiocb = NULL;
486
+ aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
487
if (scsi_disk_req_check_error(r, ret, true)) {
488
goto done;
489
}
490
@@ -XXX,XX +XXX,XX @@ static void scsi_read_complete(void * opaque, int ret)
491
492
done:
493
scsi_req_unref(&r->req);
494
+ aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
495
}
496
497
/* Actually issue a read to the block device. */
498
@@ -XXX,XX +XXX,XX @@ static void scsi_do_read_cb(void *opaque, int ret)
499
assert (r->req.aiocb != NULL);
500
r->req.aiocb = NULL;
501
502
+ aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
503
if (ret < 0) {
504
block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct);
505
} else {
506
block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
507
}
508
scsi_do_read(opaque, ret);
509
+ aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
510
}
511
512
/* Read more data from scsi device into buffer. */
513
@@ -XXX,XX +XXX,XX @@ static void scsi_write_complete(void * opaque, int ret)
514
assert (r->req.aiocb != NULL);
515
r->req.aiocb = NULL;
516
517
+ aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
518
if (ret < 0) {
519
block_acct_failed(blk_get_stats(s->qdev.conf.blk), &r->acct);
520
} else {
521
block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
522
}
523
scsi_write_complete_noio(r, ret);
524
+ aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
525
}
526
527
static void scsi_write_data(SCSIRequest *req)
528
@@ -XXX,XX +XXX,XX @@ static void scsi_unmap_complete(void *opaque, int ret)
529
{
530
UnmapCBData *data = opaque;
531
SCSIDiskReq *r = data->r;
532
+ SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
533
534
assert(r->req.aiocb != NULL);
535
r->req.aiocb = NULL;
536
537
+ aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
538
scsi_unmap_complete_noio(data, ret);
539
+ aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
540
}
541
542
static void scsi_disk_emulate_unmap(SCSIDiskReq *r, uint8_t *inbuf)
543
@@ -XXX,XX +XXX,XX @@ static void scsi_write_same_complete(void *opaque, int ret)
544
545
assert(r->req.aiocb != NULL);
546
r->req.aiocb = NULL;
547
+ aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
548
if (scsi_disk_req_check_error(r, ret, true)) {
549
goto done;
550
}
551
@@ -XXX,XX +XXX,XX @@ done:
552
scsi_req_unref(&r->req);
553
qemu_vfree(data->iov.iov_base);
554
g_free(data);
555
+ aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
556
}
557
558
static void scsi_disk_emulate_write_same(SCSIDiskReq *r, uint8_t *inbuf)
559
diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
560
index XXXXXXX..XXXXXXX 100644
561
--- a/hw/scsi/scsi-generic.c
562
+++ b/hw/scsi/scsi-generic.c
563
@@ -XXX,XX +XXX,XX @@ done:
564
static void scsi_command_complete(void *opaque, int ret)
565
{
566
SCSIGenericReq *r = (SCSIGenericReq *)opaque;
567
+ SCSIDevice *s = r->req.dev;
568
569
assert(r->req.aiocb != NULL);
570
r->req.aiocb = NULL;
571
+
572
+ aio_context_acquire(blk_get_aio_context(s->conf.blk));
573
scsi_command_complete_noio(r, ret);
574
+ aio_context_release(blk_get_aio_context(s->conf.blk));
575
}
576
577
static int execute_command(BlockBackend *blk,
578
@@ -XXX,XX +XXX,XX @@ static void scsi_read_complete(void * opaque, int ret)
579
assert(r->req.aiocb != NULL);
580
r->req.aiocb = NULL;
581
582
+ aio_context_acquire(blk_get_aio_context(s->conf.blk));
583
+
584
if (ret || r->req.io_canceled) {
585
scsi_command_complete_noio(r, ret);
586
- return;
587
+ goto done;
588
}
589
590
len = r->io_header.dxfer_len - r->io_header.resid;
591
@@ -XXX,XX +XXX,XX @@ static void scsi_read_complete(void * opaque, int ret)
592
r->len = -1;
593
if (len == 0) {
594
scsi_command_complete_noio(r, 0);
595
- return;
596
+ goto done;
597
}
598
599
/* Snoop READ CAPACITY output to set the blocksize. */
600
@@ -XXX,XX +XXX,XX @@ static void scsi_read_complete(void * opaque, int ret)
601
}
602
scsi_req_data(&r->req, len);
603
scsi_req_unref(&r->req);
604
+
605
+done:
606
+ aio_context_release(blk_get_aio_context(s->conf.blk));
607
}
608
609
/* Read more data from scsi device into buffer. */
610
@@ -XXX,XX +XXX,XX @@ static void scsi_write_complete(void * opaque, int ret)
611
assert(r->req.aiocb != NULL);
612
r->req.aiocb = NULL;
613
614
+ aio_context_acquire(blk_get_aio_context(s->conf.blk));
615
+
616
if (ret || r->req.io_canceled) {
617
scsi_command_complete_noio(r, ret);
618
- return;
619
+ goto done;
620
}
621
622
if (r->req.cmd.buf[0] == MODE_SELECT && r->req.cmd.buf[4] == 12 &&
623
@@ -XXX,XX +XXX,XX @@ static void scsi_write_complete(void * opaque, int ret)
624
}
625
626
scsi_command_complete_noio(r, ret);
627
+
628
+done:
629
+ aio_context_release(blk_get_aio_context(s->conf.blk));
630
}
631
632
/* Write data to a scsi device. Returns nonzero on failure.
633
diff --git a/util/thread-pool.c b/util/thread-pool.c
634
index XXXXXXX..XXXXXXX 100644
635
--- a/util/thread-pool.c
636
+++ b/util/thread-pool.c
637
@@ -XXX,XX +XXX,XX @@ restart:
638
*/
639
qemu_bh_schedule(pool->completion_bh);
640
641
+ aio_context_release(pool->ctx);
642
elem->common.cb(elem->common.opaque, elem->ret);
643
+ aio_context_acquire(pool->ctx);
644
qemu_aio_unref(elem);
645
goto restart;
646
} else {
647
@@ -XXX,XX +XXX,XX @@ static void thread_pool_co_cb(void *opaque, int ret)
648
ThreadPoolCo *co = opaque;
649
650
co->ret = ret;
651
- qemu_coroutine_enter(co->co);
652
+ aio_co_wake(co->co);
653
}
654
655
int coroutine_fn thread_pool_submit_co(ThreadPool *pool, ThreadPoolFunc *func,
656
--
105
--
657
2.9.3
106
2.26.2
658
107
659
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2
2
3
All that CoQueue needs in order to become thread-safe is help
3
We are going to reuse bdrv_common_block_status_above in
4
from an external mutex. Add this to the API.
4
bdrv_is_allocated_above. bdrv_is_allocated_above may be called with
5
include_base == false and still bs == base (for ex. from img_rebase()).
5
6
6
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7
So, support this corner case.
7
Reviewed-by: Fam Zheng <famz@redhat.com>
8
8
Message-id: 20170213181244.16297-6-pbonzini@redhat.com
9
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
10
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
11
Reviewed-by: Eric Blake <eblake@redhat.com>
12
Reviewed-by: Alberto Garcia <berto@igalia.com>
13
Message-id: 20200924194003.22080-4-vsementsov@virtuozzo.com
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
15
---
11
include/qemu/coroutine.h | 8 +++++---
16
block/io.c | 6 +++++-
12
block/backup.c | 2 +-
17
1 file changed, 5 insertions(+), 1 deletion(-)
13
block/io.c | 4 ++--
14
block/nbd-client.c | 2 +-
15
block/qcow2-cluster.c | 4 +---
16
block/sheepdog.c | 2 +-
17
block/throttle-groups.c | 2 +-
18
hw/9pfs/9p.c | 2 +-
19
util/qemu-coroutine-lock.c | 24 +++++++++++++++++++++---
20
9 files changed, 34 insertions(+), 16 deletions(-)
21
18
22
diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
23
index XXXXXXX..XXXXXXX 100644
24
--- a/include/qemu/coroutine.h
25
+++ b/include/qemu/coroutine.h
26
@@ -XXX,XX +XXX,XX @@ void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex);
27
28
/**
29
* CoQueues are a mechanism to queue coroutines in order to continue executing
30
- * them later.
31
+ * them later. They are similar to condition variables, but they need help
32
+ * from an external mutex in order to maintain thread-safety.
33
*/
34
typedef struct CoQueue {
35
QSIMPLEQ_HEAD(, Coroutine) entries;
36
@@ -XXX,XX +XXX,XX @@ void qemu_co_queue_init(CoQueue *queue);
37
38
/**
39
* Adds the current coroutine to the CoQueue and transfers control to the
40
- * caller of the coroutine.
41
+ * caller of the coroutine. The mutex is unlocked during the wait and
42
+ * locked again afterwards.
43
*/
44
-void coroutine_fn qemu_co_queue_wait(CoQueue *queue);
45
+void coroutine_fn qemu_co_queue_wait(CoQueue *queue, CoMutex *mutex);
46
47
/**
48
* Restarts the next coroutine in the CoQueue and removes it from the queue.
49
diff --git a/block/backup.c b/block/backup.c
50
index XXXXXXX..XXXXXXX 100644
51
--- a/block/backup.c
52
+++ b/block/backup.c
53
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
54
retry = false;
55
QLIST_FOREACH(req, &job->inflight_reqs, list) {
56
if (end > req->start && start < req->end) {
57
- qemu_co_queue_wait(&req->wait_queue);
58
+ qemu_co_queue_wait(&req->wait_queue, NULL);
59
retry = true;
60
break;
61
}
62
diff --git a/block/io.c b/block/io.c
19
diff --git a/block/io.c b/block/io.c
63
index XXXXXXX..XXXXXXX 100644
20
index XXXXXXX..XXXXXXX 100644
64
--- a/block/io.c
21
--- a/block/io.c
65
+++ b/block/io.c
22
+++ b/block/io.c
66
@@ -XXX,XX +XXX,XX @@ static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
23
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
67
* (instead of producing a deadlock in the former case). */
24
BlockDriverState *p;
68
if (!req->waiting_for) {
25
int64_t eof = 0;
69
self->waiting_for = req;
26
70
- qemu_co_queue_wait(&req->wait_queue);
27
- assert(include_base || bs != base);
71
+ qemu_co_queue_wait(&req->wait_queue, NULL);
28
assert(!include_base || base); /* Can't include NULL base */
72
self->waiting_for = NULL;
29
73
retry = true;
30
+ if (!include_base && bs == base) {
74
waited = true;
31
+ *pnum = bytes;
75
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
32
+ return 0;
76
77
/* Wait until any previous flushes are completed */
78
while (bs->active_flush_req) {
79
- qemu_co_queue_wait(&bs->flush_queue);
80
+ qemu_co_queue_wait(&bs->flush_queue, NULL);
81
}
82
83
bs->active_flush_req = true;
84
diff --git a/block/nbd-client.c b/block/nbd-client.c
85
index XXXXXXX..XXXXXXX 100644
86
--- a/block/nbd-client.c
87
+++ b/block/nbd-client.c
88
@@ -XXX,XX +XXX,XX @@ static void nbd_coroutine_start(NBDClientSession *s,
89
/* Poor man semaphore. The free_sema is locked when no other request
90
* can be accepted, and unlocked after receiving one reply. */
91
if (s->in_flight == MAX_NBD_REQUESTS) {
92
- qemu_co_queue_wait(&s->free_sema);
93
+ qemu_co_queue_wait(&s->free_sema, NULL);
94
assert(s->in_flight < MAX_NBD_REQUESTS);
95
}
96
s->in_flight++;
97
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
98
index XXXXXXX..XXXXXXX 100644
99
--- a/block/qcow2-cluster.c
100
+++ b/block/qcow2-cluster.c
101
@@ -XXX,XX +XXX,XX @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
102
if (bytes == 0) {
103
/* Wait for the dependency to complete. We need to recheck
104
* the free/allocated clusters when we continue. */
105
- qemu_co_mutex_unlock(&s->lock);
106
- qemu_co_queue_wait(&old_alloc->dependent_requests);
107
- qemu_co_mutex_lock(&s->lock);
108
+ qemu_co_queue_wait(&old_alloc->dependent_requests, &s->lock);
109
return -EAGAIN;
110
}
111
}
112
diff --git a/block/sheepdog.c b/block/sheepdog.c
113
index XXXXXXX..XXXXXXX 100644
114
--- a/block/sheepdog.c
115
+++ b/block/sheepdog.c
116
@@ -XXX,XX +XXX,XX @@ static void wait_for_overlapping_aiocb(BDRVSheepdogState *s, SheepdogAIOCB *acb)
117
retry:
118
QLIST_FOREACH(cb, &s->inflight_aiocb_head, aiocb_siblings) {
119
if (AIOCBOverlapping(acb, cb)) {
120
- qemu_co_queue_wait(&s->overlapping_queue);
121
+ qemu_co_queue_wait(&s->overlapping_queue, NULL);
122
goto retry;
123
}
124
}
125
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
126
index XXXXXXX..XXXXXXX 100644
127
--- a/block/throttle-groups.c
128
+++ b/block/throttle-groups.c
129
@@ -XXX,XX +XXX,XX @@ void coroutine_fn throttle_group_co_io_limits_intercept(BlockBackend *blk,
130
if (must_wait || blkp->pending_reqs[is_write]) {
131
blkp->pending_reqs[is_write]++;
132
qemu_mutex_unlock(&tg->lock);
133
- qemu_co_queue_wait(&blkp->throttled_reqs[is_write]);
134
+ qemu_co_queue_wait(&blkp->throttled_reqs[is_write], NULL);
135
qemu_mutex_lock(&tg->lock);
136
blkp->pending_reqs[is_write]--;
137
}
138
diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
139
index XXXXXXX..XXXXXXX 100644
140
--- a/hw/9pfs/9p.c
141
+++ b/hw/9pfs/9p.c
142
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn v9fs_flush(void *opaque)
143
/*
144
* Wait for pdu to complete.
145
*/
146
- qemu_co_queue_wait(&cancel_pdu->complete);
147
+ qemu_co_queue_wait(&cancel_pdu->complete, NULL);
148
cancel_pdu->cancelled = 0;
149
pdu_free(cancel_pdu);
150
}
151
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
152
index XXXXXXX..XXXXXXX 100644
153
--- a/util/qemu-coroutine-lock.c
154
+++ b/util/qemu-coroutine-lock.c
155
@@ -XXX,XX +XXX,XX @@ void qemu_co_queue_init(CoQueue *queue)
156
QSIMPLEQ_INIT(&queue->entries);
157
}
158
159
-void coroutine_fn qemu_co_queue_wait(CoQueue *queue)
160
+void coroutine_fn qemu_co_queue_wait(CoQueue *queue, CoMutex *mutex)
161
{
162
Coroutine *self = qemu_coroutine_self();
163
QSIMPLEQ_INSERT_TAIL(&queue->entries, self, co_queue_next);
164
+
165
+ if (mutex) {
166
+ qemu_co_mutex_unlock(mutex);
167
+ }
33
+ }
168
+
34
+
169
+ /* There is no race condition here. Other threads will call
35
ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
170
+ * aio_co_schedule on our AioContext, which can reenter this
36
if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
171
+ * coroutine but only after this yield and after the main loop
37
return ret;
172
+ * has gone through the next iteration.
173
+ */
174
qemu_coroutine_yield();
175
assert(qemu_in_coroutine());
176
+
177
+ /* TODO: OSv implements wait morphing here, where the wakeup
178
+ * primitive automatically places the woken coroutine on the
179
+ * mutex's queue. This avoids the thundering herd effect.
180
+ */
181
+ if (mutex) {
182
+ qemu_co_mutex_lock(mutex);
183
+ }
184
}
185
186
/**
187
@@ -XXX,XX +XXX,XX @@ void qemu_co_rwlock_rdlock(CoRwlock *lock)
188
Coroutine *self = qemu_coroutine_self();
189
190
while (lock->writer) {
191
- qemu_co_queue_wait(&lock->queue);
192
+ qemu_co_queue_wait(&lock->queue, NULL);
193
}
194
lock->reader++;
195
self->locks_held++;
196
@@ -XXX,XX +XXX,XX @@ void qemu_co_rwlock_wrlock(CoRwlock *lock)
197
Coroutine *self = qemu_coroutine_self();
198
199
while (lock->writer || lock->reader) {
200
- qemu_co_queue_wait(&lock->queue);
201
+ qemu_co_queue_wait(&lock->queue, NULL);
202
}
203
lock->writer = true;
204
self->locks_held++;
205
--
38
--
206
2.9.3
39
2.26.2
207
40
208
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2
2
3
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
3
bdrv_is_allocated_above wrongly handles short backing files: it reports
4
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4
after-EOF space as UNALLOCATED which is wrong, as on read the data is
5
Reviewed-by: Fam Zheng <famz@redhat.com>
5
generated on the level of short backing file (if all overlays have
6
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
6
unallocated areas at that place).
7
Message-id: 20170213135235.12274-13-pbonzini@redhat.com
7
8
Reusing bdrv_common_block_status_above fixes the issue and unifies code
9
path.
10
11
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
12
Reviewed-by: Eric Blake <eblake@redhat.com>
13
Reviewed-by: Alberto Garcia <berto@igalia.com>
14
Message-id: 20200924194003.22080-5-vsementsov@virtuozzo.com
15
[Fix s/has/have/ as suggested by Eric Blake. Fix s/area/areas/.
16
--Stefan]
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
---
18
---
10
block/qed.h | 3 +++
19
block/io.c | 43 +++++--------------------------------------
11
block/curl.c | 2 ++
20
1 file changed, 5 insertions(+), 38 deletions(-)
12
block/io.c | 5 +++++
13
block/iscsi.c | 8 ++++++--
14
block/null.c | 4 ++++
15
block/qed.c | 12 ++++++++++++
16
block/throttle-groups.c | 2 ++
17
util/aio-posix.c | 2 --
18
util/aio-win32.c | 2 --
19
util/qemu-coroutine-sleep.c | 2 +-
20
10 files changed, 35 insertions(+), 7 deletions(-)
21
21
22
diff --git a/block/qed.h b/block/qed.h
23
index XXXXXXX..XXXXXXX 100644
24
--- a/block/qed.h
25
+++ b/block/qed.h
26
@@ -XXX,XX +XXX,XX @@ enum {
27
*/
28
typedef void QEDFindClusterFunc(void *opaque, int ret, uint64_t offset, size_t len);
29
30
+void qed_acquire(BDRVQEDState *s);
31
+void qed_release(BDRVQEDState *s);
32
+
33
/**
34
* Generic callback for chaining async callbacks
35
*/
36
diff --git a/block/curl.c b/block/curl.c
37
index XXXXXXX..XXXXXXX 100644
38
--- a/block/curl.c
39
+++ b/block/curl.c
40
@@ -XXX,XX +XXX,XX @@ static void curl_multi_timeout_do(void *arg)
41
return;
42
}
43
44
+ aio_context_acquire(s->aio_context);
45
curl_multi_socket_action(s->multi, CURL_SOCKET_TIMEOUT, 0, &running);
46
47
curl_multi_check_completion(s);
48
+ aio_context_release(s->aio_context);
49
#else
50
abort();
51
#endif
52
diff --git a/block/io.c b/block/io.c
22
diff --git a/block/io.c b/block/io.c
53
index XXXXXXX..XXXXXXX 100644
23
index XXXXXXX..XXXXXXX 100644
54
--- a/block/io.c
24
--- a/block/io.c
55
+++ b/block/io.c
25
+++ b/block/io.c
56
@@ -XXX,XX +XXX,XX @@ void bdrv_aio_cancel(BlockAIOCB *acb)
26
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
57
if (acb->aiocb_info->get_aio_context) {
27
* at 'offset + *pnum' may return the same allocation status (in other
58
aio_poll(acb->aiocb_info->get_aio_context(acb), true);
28
* words, the result is not necessarily the maximum possible range);
59
} else if (acb->bs) {
29
* but 'pnum' will only be 0 when end of file is reached.
60
+ /* qemu_aio_ref and qemu_aio_unref are not thread-safe, so
30
- *
61
+ * assert that we're not using an I/O thread. Thread-safe
31
*/
62
+ * code should use bdrv_aio_cancel_async exclusively.
32
int bdrv_is_allocated_above(BlockDriverState *top,
63
+ */
33
BlockDriverState *base,
64
+ assert(bdrv_get_aio_context(acb->bs) == qemu_get_aio_context());
34
bool include_base, int64_t offset,
65
aio_poll(bdrv_get_aio_context(acb->bs), true);
35
int64_t bytes, int64_t *pnum)
66
} else {
36
{
67
abort();
37
- BlockDriverState *intermediate;
68
diff --git a/block/iscsi.c b/block/iscsi.c
38
- int ret;
69
index XXXXXXX..XXXXXXX 100644
39
- int64_t n = bytes;
70
--- a/block/iscsi.c
40
-
71
+++ b/block/iscsi.c
41
- assert(base || !include_base);
72
@@ -XXX,XX +XXX,XX @@ static void iscsi_retry_timer_expired(void *opaque)
42
-
73
struct IscsiTask *iTask = opaque;
43
- intermediate = top;
74
iTask->complete = 1;
44
- while (include_base || intermediate != base) {
75
if (iTask->co) {
45
- int64_t pnum_inter;
76
- qemu_coroutine_enter(iTask->co);
46
- int64_t size_inter;
77
+ aio_co_wake(iTask->co);
47
-
48
- assert(intermediate);
49
- ret = bdrv_is_allocated(intermediate, offset, bytes, &pnum_inter);
50
- if (ret < 0) {
51
- return ret;
52
- }
53
- if (ret) {
54
- *pnum = pnum_inter;
55
- return 1;
56
- }
57
-
58
- size_inter = bdrv_getlength(intermediate);
59
- if (size_inter < 0) {
60
- return size_inter;
61
- }
62
- if (n > pnum_inter &&
63
- (intermediate == top || offset + pnum_inter < size_inter)) {
64
- n = pnum_inter;
65
- }
66
-
67
- if (intermediate == base) {
68
- break;
69
- }
70
-
71
- intermediate = bdrv_filter_or_cow_bs(intermediate);
72
+ int ret = bdrv_common_block_status_above(top, base, include_base, false,
73
+ offset, bytes, pnum, NULL, NULL);
74
+ if (ret < 0) {
75
+ return ret;
78
}
76
}
77
78
- *pnum = n;
79
- return 0;
80
+ return !!(ret & BDRV_BLOCK_ALLOCATED);
79
}
81
}
80
82
81
@@ -XXX,XX +XXX,XX @@ static void iscsi_nop_timed_event(void *opaque)
83
int coroutine_fn
82
{
83
IscsiLun *iscsilun = opaque;
84
85
+ aio_context_acquire(iscsilun->aio_context);
86
if (iscsi_get_nops_in_flight(iscsilun->iscsi) >= MAX_NOP_FAILURES) {
87
error_report("iSCSI: NOP timeout. Reconnecting...");
88
iscsilun->request_timed_out = true;
89
} else if (iscsi_nop_out_async(iscsilun->iscsi, NULL, NULL, 0, NULL) != 0) {
90
error_report("iSCSI: failed to sent NOP-Out. Disabling NOP messages.");
91
- return;
92
+ goto out;
93
}
94
95
timer_mod(iscsilun->nop_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL);
96
iscsi_set_events(iscsilun);
97
+
98
+out:
99
+ aio_context_release(iscsilun->aio_context);
100
}
101
102
static void iscsi_readcapacity_sync(IscsiLun *iscsilun, Error **errp)
103
diff --git a/block/null.c b/block/null.c
104
index XXXXXXX..XXXXXXX 100644
105
--- a/block/null.c
106
+++ b/block/null.c
107
@@ -XXX,XX +XXX,XX @@ static void null_bh_cb(void *opaque)
108
static void null_timer_cb(void *opaque)
109
{
110
NullAIOCB *acb = opaque;
111
+ AioContext *ctx = bdrv_get_aio_context(acb->common.bs);
112
+
113
+ aio_context_acquire(ctx);
114
acb->common.cb(acb->common.opaque, 0);
115
+ aio_context_release(ctx);
116
timer_deinit(&acb->timer);
117
qemu_aio_unref(acb);
118
}
119
diff --git a/block/qed.c b/block/qed.c
120
index XXXXXXX..XXXXXXX 100644
121
--- a/block/qed.c
122
+++ b/block/qed.c
123
@@ -XXX,XX +XXX,XX @@ static void qed_need_check_timer_cb(void *opaque)
124
125
trace_qed_need_check_timer_cb(s);
126
127
+ qed_acquire(s);
128
qed_plug_allocating_write_reqs(s);
129
130
/* Ensure writes are on disk before clearing flag */
131
bdrv_aio_flush(s->bs->file->bs, qed_clear_need_check, s);
132
+ qed_release(s);
133
+}
134
+
135
+void qed_acquire(BDRVQEDState *s)
136
+{
137
+ aio_context_acquire(bdrv_get_aio_context(s->bs));
138
+}
139
+
140
+void qed_release(BDRVQEDState *s)
141
+{
142
+ aio_context_release(bdrv_get_aio_context(s->bs));
143
}
144
145
static void qed_start_need_check_timer(BDRVQEDState *s)
146
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
147
index XXXXXXX..XXXXXXX 100644
148
--- a/block/throttle-groups.c
149
+++ b/block/throttle-groups.c
150
@@ -XXX,XX +XXX,XX @@ static void timer_cb(BlockBackend *blk, bool is_write)
151
qemu_mutex_unlock(&tg->lock);
152
153
/* Run the request that was waiting for this timer */
154
+ aio_context_acquire(blk_get_aio_context(blk));
155
empty_queue = !qemu_co_enter_next(&blkp->throttled_reqs[is_write]);
156
+ aio_context_release(blk_get_aio_context(blk));
157
158
/* If the request queue was empty then we have to take care of
159
* scheduling the next one */
160
diff --git a/util/aio-posix.c b/util/aio-posix.c
161
index XXXXXXX..XXXXXXX 100644
162
--- a/util/aio-posix.c
163
+++ b/util/aio-posix.c
164
@@ -XXX,XX +XXX,XX @@ bool aio_dispatch(AioContext *ctx, bool dispatch_fds)
165
}
166
167
/* Run our timers */
168
- aio_context_acquire(ctx);
169
progress |= timerlistgroup_run_timers(&ctx->tlg);
170
- aio_context_release(ctx);
171
172
return progress;
173
}
174
diff --git a/util/aio-win32.c b/util/aio-win32.c
175
index XXXXXXX..XXXXXXX 100644
176
--- a/util/aio-win32.c
177
+++ b/util/aio-win32.c
178
@@ -XXX,XX +XXX,XX @@ bool aio_poll(AioContext *ctx, bool blocking)
179
progress |= aio_dispatch_handlers(ctx, event);
180
} while (count > 0);
181
182
- aio_context_acquire(ctx);
183
progress |= timerlistgroup_run_timers(&ctx->tlg);
184
- aio_context_release(ctx);
185
return progress;
186
}
187
188
diff --git a/util/qemu-coroutine-sleep.c b/util/qemu-coroutine-sleep.c
189
index XXXXXXX..XXXXXXX 100644
190
--- a/util/qemu-coroutine-sleep.c
191
+++ b/util/qemu-coroutine-sleep.c
192
@@ -XXX,XX +XXX,XX @@ static void co_sleep_cb(void *opaque)
193
{
194
CoSleepCB *sleep_cb = opaque;
195
196
- qemu_coroutine_enter(sleep_cb->co);
197
+ aio_co_wake(sleep_cb->co);
198
}
199
200
void coroutine_fn co_aio_sleep_ns(AioContext *ctx, QEMUClockType type,
201
--
84
--
202
2.9.3
85
2.26.2
203
86
204
diff view generated by jsdifflib
1
From: Paolo Bonzini <pbonzini@redhat.com>
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2
2
3
As a small step towards the introduction of multiqueue, we want
3
These cases are fixed by previous patches around block_status and
4
coroutines to remain on the same AioContext that started them,
4
is_allocated.
5
unless they are moved explicitly with e.g. aio_co_schedule. This patch
6
avoids that coroutines switch AioContext when they use a CoMutex.
7
For now it does not make much of a difference, because the CoMutex
8
is not thread-safe and the AioContext itself is used to protect the
9
CoMutex from concurrent access. However, this is going to change.
10
5
11
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
12
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7
Reviewed-by: Eric Blake <eblake@redhat.com>
13
Reviewed-by: Fam Zheng <famz@redhat.com>
8
Reviewed-by: Alberto Garcia <berto@igalia.com>
14
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
9
Message-id: 20200924194003.22080-6-vsementsov@virtuozzo.com
15
Message-id: 20170213135235.12274-9-pbonzini@redhat.com
16
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
17
---
11
---
18
util/qemu-coroutine-lock.c | 5 ++---
12
tests/qemu-iotests/274 | 20 +++++++++++
19
util/trace-events | 1 -
13
tests/qemu-iotests/274.out | 68 ++++++++++++++++++++++++++++++++++++++
20
2 files changed, 2 insertions(+), 4 deletions(-)
14
2 files changed, 88 insertions(+)
21
15
22
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
16
diff --git a/tests/qemu-iotests/274 b/tests/qemu-iotests/274
17
index XXXXXXX..XXXXXXX 100755
18
--- a/tests/qemu-iotests/274
19
+++ b/tests/qemu-iotests/274
20
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('base') as base, \
21
iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, mid)
22
iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), mid)
23
24
+ iotests.log('=== Testing qemu-img commit (top -> base) ===')
25
+
26
+ create_chain()
27
+ iotests.qemu_img_log('commit', '-b', base, top)
28
+ iotests.img_info_log(base)
29
+ iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
30
+ iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), base)
31
+
32
+ iotests.log('=== Testing QMP active commit (top -> base) ===')
33
+
34
+ create_chain()
35
+ with create_vm() as vm:
36
+ vm.launch()
37
+ vm.qmp_log('block-commit', device='top', base_node='base',
38
+ job_id='job0', auto_dismiss=False)
39
+ vm.run_job('job0', wait=5)
40
+
41
+ iotests.img_info_log(mid)
42
+ iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
43
+ iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), base)
44
45
iotests.log('== Resize tests ==')
46
47
diff --git a/tests/qemu-iotests/274.out b/tests/qemu-iotests/274.out
23
index XXXXXXX..XXXXXXX 100644
48
index XXXXXXX..XXXXXXX 100644
24
--- a/util/qemu-coroutine-lock.c
49
--- a/tests/qemu-iotests/274.out
25
+++ b/util/qemu-coroutine-lock.c
50
+++ b/tests/qemu-iotests/274.out
26
@@ -XXX,XX +XXX,XX @@
51
@@ -XXX,XX +XXX,XX @@ read 1048576/1048576 bytes at offset 0
27
#include "qemu/coroutine.h"
52
read 1048576/1048576 bytes at offset 1048576
28
#include "qemu/coroutine_int.h"
53
1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
29
#include "qemu/queue.h"
54
30
+#include "block/aio.h"
55
+=== Testing qemu-img commit (top -> base) ===
31
#include "trace.h"
56
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 lazy_refcounts=off refcount_bits=16
32
57
+
33
void qemu_co_queue_init(CoQueue *queue)
58
+Formatting 'TEST_DIR/PID-mid', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1048576 backing_file=TEST_DIR/PID-base backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
34
@@ -XXX,XX +XXX,XX @@ void qemu_co_queue_run_restart(Coroutine *co)
59
+
35
60
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 backing_file=TEST_DIR/PID-mid backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
36
static bool qemu_co_queue_do_restart(CoQueue *queue, bool single)
61
+
37
{
62
+wrote 2097152/2097152 bytes at offset 0
38
- Coroutine *self = qemu_coroutine_self();
63
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
39
Coroutine *next;
64
+
40
65
+Image committed.
41
if (QSIMPLEQ_EMPTY(&queue->entries)) {
66
+
42
@@ -XXX,XX +XXX,XX @@ static bool qemu_co_queue_do_restart(CoQueue *queue, bool single)
67
+image: TEST_IMG
43
68
+file format: IMGFMT
44
while ((next = QSIMPLEQ_FIRST(&queue->entries)) != NULL) {
69
+virtual size: 2 MiB (2097152 bytes)
45
QSIMPLEQ_REMOVE_HEAD(&queue->entries, co_queue_next);
70
+cluster_size: 65536
46
- QSIMPLEQ_INSERT_TAIL(&self->co_queue_wakeup, next, co_queue_next);
71
+Format specific information:
47
- trace_qemu_co_queue_next(next);
72
+ compat: 1.1
48
+ aio_co_wake(next);
73
+ compression type: zlib
49
if (single) {
74
+ lazy refcounts: false
50
break;
75
+ refcount bits: 16
51
}
76
+ corrupt: false
52
diff --git a/util/trace-events b/util/trace-events
77
+ extended l2: false
53
index XXXXXXX..XXXXXXX 100644
78
+
54
--- a/util/trace-events
79
+read 1048576/1048576 bytes at offset 0
55
+++ b/util/trace-events
80
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
56
@@ -XXX,XX +XXX,XX @@ qemu_coroutine_terminate(void *co) "self %p"
81
+
57
82
+read 1048576/1048576 bytes at offset 1048576
58
# util/qemu-coroutine-lock.c
83
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
59
qemu_co_queue_run_restart(void *co) "co %p"
84
+
60
-qemu_co_queue_next(void *nxt) "next %p"
85
+=== Testing QMP active commit (top -> base) ===
61
qemu_co_mutex_lock_entry(void *mutex, void *self) "mutex %p self %p"
86
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 lazy_refcounts=off refcount_bits=16
62
qemu_co_mutex_lock_return(void *mutex, void *self) "mutex %p self %p"
87
+
63
qemu_co_mutex_unlock_entry(void *mutex, void *self) "mutex %p self %p"
88
+Formatting 'TEST_DIR/PID-mid', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1048576 backing_file=TEST_DIR/PID-base backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
89
+
90
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 backing_file=TEST_DIR/PID-mid backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
91
+
92
+wrote 2097152/2097152 bytes at offset 0
93
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
94
+
95
+{"execute": "block-commit", "arguments": {"auto-dismiss": false, "base-node": "base", "device": "top", "job-id": "job0"}}
96
+{"return": {}}
97
+{"execute": "job-complete", "arguments": {"id": "job0"}}
98
+{"return": {}}
99
+{"data": {"device": "job0", "len": 1048576, "offset": 1048576, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
100
+{"data": {"device": "job0", "len": 1048576, "offset": 1048576, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
101
+{"execute": "job-dismiss", "arguments": {"id": "job0"}}
102
+{"return": {}}
103
+image: TEST_IMG
104
+file format: IMGFMT
105
+virtual size: 1 MiB (1048576 bytes)
106
+cluster_size: 65536
107
+backing file: TEST_DIR/PID-base
108
+backing file format: IMGFMT
109
+Format specific information:
110
+ compat: 1.1
111
+ compression type: zlib
112
+ lazy refcounts: false
113
+ refcount bits: 16
114
+ corrupt: false
115
+ extended l2: false
116
+
117
+read 1048576/1048576 bytes at offset 0
118
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
119
+
120
+read 1048576/1048576 bytes at offset 1048576
121
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
122
+
123
== Resize tests ==
124
=== preallocation=off ===
125
Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=6442450944 lazy_refcounts=off refcount_bits=16
64
--
126
--
65
2.9.3
127
2.26.2
66
128
67
diff view generated by jsdifflib