1
The following changes since commit ac793156f650ae2d77834932d72224175ee69086:
1
The following changes since commit 1214d55d1c41fbab3a9973a05085b8760647e411:
2
2
3
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20201020-1' into staging (2020-10-20 21:11:35 +0100)
3
Merge remote-tracking branch 'remotes/nvme/tags/nvme-next-pull-request' into staging (2021-02-09 13:24:37 +0000)
4
4
5
are available in the Git repository at:
5
are available in the Git repository at:
6
6
7
https://gitlab.com/stefanha/qemu.git tags/block-pull-request
7
https://gitlab.com/stefanha/qemu.git tags/block-pull-request
8
8
9
for you to fetch changes up to 32a3fd65e7e3551337fd26bfc0e2f899d70c028c:
9
for you to fetch changes up to eb847c42296497978942f738cd41dc29a35a49b2:
10
10
11
iotests: add commit top->base cases to 274 (2020-10-22 09:55:39 +0100)
11
docs: fix Parallels Image "dirty bitmap" section (2021-02-10 09:23:28 +0000)
12
12
13
----------------------------------------------------------------
13
----------------------------------------------------------------
14
Pull request
14
Pull request
15
15
16
v2:
16
v4:
17
* Fix format string issues on 32-bit hosts [Peter]
17
* Add PCI_EXPRESS Kconfig dependency to fix s390x in "multi-process: setup PCI
18
* Fix qemu-nbd.c CONFIG_POSIX ifdef issue [Eric]
18
host bridge for remote device" [Philippe and Thomas]
19
* Fix missing eventfd.h header on macOS [Peter]
20
* Drop unreliable vhost-user-blk test (will send a new patch when ready) [Peter]
21
22
This pull request contains the vhost-user-blk server by Coiby Xu along with my
23
additions, block/nvme.c alignment and hardware error statistics by Philippe
24
Mathieu-Daudé, and bdrv_co_block_status_above() fixes by Vladimir
25
Sementsov-Ogievskiy.
26
19
27
----------------------------------------------------------------
20
----------------------------------------------------------------
28
21
29
Coiby Xu (6):
22
Denis V. Lunev (1):
30
libvhost-user: Allow vu_message_read to be replaced
23
docs: fix Parallels Image "dirty bitmap" section
31
libvhost-user: remove watch for kick_fd when de-initialize vu-dev
32
util/vhost-user-server: generic vhost user server
33
block: move logical block size check function to a common utility
34
function
35
block/export: vhost-user block device backend server
36
MAINTAINERS: Add vhost-user block device backend server maintainer
37
24
38
Philippe Mathieu-Daudé (1):
25
Elena Ufimtseva (8):
39
block/nvme: Add driver statistics for access alignment and hw errors
26
multi-process: add configure and usage information
27
io: add qio_channel_writev_full_all helper
28
io: add qio_channel_readv_full_all_eof & qio_channel_readv_full_all
29
helpers
30
multi-process: define MPQemuMsg format and transmission functions
31
multi-process: introduce proxy object
32
multi-process: add proxy communication functions
33
multi-process: Forward PCI config space acceses to the remote process
34
multi-process: perform device reset in the remote process
40
35
41
Stefan Hajnoczi (16):
36
Jagannathan Raman (11):
42
util/vhost-user-server: s/fileds/fields/ typo fix
37
memory: alloc RAM from file at offset
43
util/vhost-user-server: drop unnecessary QOM cast
38
multi-process: Add config option for multi-process QEMU
44
util/vhost-user-server: drop unnecessary watch deletion
39
multi-process: setup PCI host bridge for remote device
45
block/export: consolidate request structs into VuBlockReq
40
multi-process: setup a machine object for remote device process
46
util/vhost-user-server: drop unused DevicePanicNotifier
41
multi-process: Initialize message handler in remote device
47
util/vhost-user-server: fix memory leak in vu_message_read()
42
multi-process: Associate fd of a PCIDevice with its object
48
util/vhost-user-server: check EOF when reading payload
43
multi-process: setup memory manager for remote device
49
util/vhost-user-server: rework vu_client_trip() coroutine lifecycle
44
multi-process: PCI BAR read/write handling for proxy & remote
50
block/export: report flush errors
45
endpoints
51
block/export: convert vhost-user-blk server to block export API
46
multi-process: Synchronize remote memory
52
util/vhost-user-server: move header to include/
47
multi-process: create IOHUB object to handle irq
53
util/vhost-user-server: use static library in meson.build
48
multi-process: Retrieve PCI info from remote process
54
qemu-storage-daemon: avoid compiling blockdev_ss twice
55
block: move block exports to libblockdev
56
block/export: add iothread and fixed-iothread options
57
block/export: add vhost-user-blk multi-queue support
58
49
59
Vladimir Sementsov-Ogievskiy (5):
50
John G Johnson (1):
60
block/io: fix bdrv_co_block_status_above
51
multi-process: add the concept description to
61
block/io: bdrv_common_block_status_above: support include_base
52
docs/devel/qemu-multiprocess
62
block/io: bdrv_common_block_status_above: support bs == base
63
block/io: fix bdrv_is_allocated_above
64
iotests: add commit top->base cases to 274
65
53
66
MAINTAINERS | 9 +
54
Stefan Hajnoczi (6):
67
qapi/block-core.json | 24 +-
55
.github: point Repo Lockdown bot to GitLab repo
68
qapi/block-export.json | 36 +-
56
gitmodules: use GitLab repos instead of qemu.org
69
block/coroutines.h | 2 +
57
gitlab-ci: remove redundant GitLab repo URL command
70
block/export/vhost-user-blk-server.h | 19 +
58
docs: update README to use GitLab repo URLs
71
contrib/libvhost-user/libvhost-user.h | 21 +
59
pc-bios: update mirror URLs to GitLab
72
include/qemu/vhost-user-server.h | 65 +++
60
get_maintainer: update repo URL to GitLab
73
util/block-helpers.h | 19 +
61
74
block/export/export.c | 37 +-
62
MAINTAINERS | 24 +
75
block/export/vhost-user-blk-server.c | 431 ++++++++++++++++++++
63
README.rst | 4 +-
76
block/io.c | 132 +++---
64
docs/devel/index.rst | 1 +
77
block/nvme.c | 27 ++
65
docs/devel/multi-process.rst | 966 ++++++++++++++++++++++
78
block/qcow2.c | 16 +-
66
docs/system/index.rst | 1 +
79
contrib/libvhost-user/libvhost-user-glib.c | 2 +-
67
docs/system/multi-process.rst | 64 ++
80
contrib/libvhost-user/libvhost-user.c | 15 +-
68
docs/interop/parallels.txt | 2 +-
81
hw/core/qdev-properties-system.c | 31 +-
69
configure | 10 +
82
nbd/server.c | 2 -
70
meson.build | 5 +-
83
qemu-nbd.c | 21 +-
71
hw/remote/trace.h | 1 +
84
softmmu/vl.c | 4 +
72
include/exec/memory.h | 2 +
85
stubs/blk-exp-close-all.c | 7 +
73
include/exec/ram_addr.h | 4 +-
86
tests/vhost-user-bridge.c | 2 +
74
include/hw/pci-host/remote.h | 30 +
87
tools/virtiofsd/fuse_virtio.c | 4 +-
75
include/hw/pci/pci_ids.h | 3 +
88
util/block-helpers.c | 46 +++
76
include/hw/remote/iohub.h | 42 +
89
util/vhost-user-server.c | 446 +++++++++++++++++++++
77
include/hw/remote/machine.h | 38 +
90
block/export/meson.build | 3 +-
78
include/hw/remote/memory.h | 19 +
91
contrib/libvhost-user/meson.build | 1 +
79
include/hw/remote/mpqemu-link.h | 99 +++
92
meson.build | 22 +-
80
include/hw/remote/proxy-memory-listener.h | 28 +
93
nbd/meson.build | 2 +
81
include/hw/remote/proxy.h | 48 ++
94
storage-daemon/meson.build | 3 +-
82
include/io/channel.h | 78 ++
95
stubs/meson.build | 1 +
83
include/qemu/mmap-alloc.h | 4 +-
96
tests/qemu-iotests/274 | 20 +
84
include/sysemu/iothread.h | 6 +
97
tests/qemu-iotests/274.out | 68 ++++
85
backends/hostmem-memfd.c | 2 +-
98
util/meson.build | 4 +
86
hw/misc/ivshmem.c | 3 +-
99
33 files changed, 1420 insertions(+), 122 deletions(-)
87
hw/pci-host/remote.c | 75 ++
100
create mode 100644 block/export/vhost-user-blk-server.h
88
hw/remote/iohub.c | 119 +++
101
create mode 100644 include/qemu/vhost-user-server.h
89
hw/remote/machine.c | 80 ++
102
create mode 100644 util/block-helpers.h
90
hw/remote/memory.c | 65 ++
103
create mode 100644 block/export/vhost-user-blk-server.c
91
hw/remote/message.c | 230 ++++++
104
create mode 100644 stubs/blk-exp-close-all.c
92
hw/remote/mpqemu-link.c | 267 ++++++
105
create mode 100644 util/block-helpers.c
93
hw/remote/proxy-memory-listener.c | 227 +++++
106
create mode 100644 util/vhost-user-server.c
94
hw/remote/proxy.c | 379 +++++++++
95
hw/remote/remote-obj.c | 203 +++++
96
io/channel.c | 116 ++-
97
iothread.c | 6 +
98
softmmu/memory.c | 3 +-
99
softmmu/physmem.c | 12 +-
100
util/mmap-alloc.c | 8 +-
101
util/oslib-posix.c | 2 +-
102
.github/lockdown.yml | 8 +-
103
.gitlab-ci.yml | 1 -
104
.gitmodules | 44 +-
105
Kconfig.host | 4 +
106
hw/Kconfig | 1 +
107
hw/meson.build | 1 +
108
hw/pci-host/Kconfig | 3 +
109
hw/pci-host/meson.build | 1 +
110
hw/remote/Kconfig | 4 +
111
hw/remote/meson.build | 13 +
112
hw/remote/trace-events | 4 +
113
pc-bios/README | 4 +-
114
scripts/get_maintainer.pl | 2 +-
115
53 files changed, 3296 insertions(+), 70 deletions(-)
116
create mode 100644 docs/devel/multi-process.rst
117
create mode 100644 docs/system/multi-process.rst
118
create mode 100644 hw/remote/trace.h
119
create mode 100644 include/hw/pci-host/remote.h
120
create mode 100644 include/hw/remote/iohub.h
121
create mode 100644 include/hw/remote/machine.h
122
create mode 100644 include/hw/remote/memory.h
123
create mode 100644 include/hw/remote/mpqemu-link.h
124
create mode 100644 include/hw/remote/proxy-memory-listener.h
125
create mode 100644 include/hw/remote/proxy.h
126
create mode 100644 hw/pci-host/remote.c
127
create mode 100644 hw/remote/iohub.c
128
create mode 100644 hw/remote/machine.c
129
create mode 100644 hw/remote/memory.c
130
create mode 100644 hw/remote/message.c
131
create mode 100644 hw/remote/mpqemu-link.c
132
create mode 100644 hw/remote/proxy-memory-listener.c
133
create mode 100644 hw/remote/proxy.c
134
create mode 100644 hw/remote/remote-obj.c
135
create mode 100644 hw/remote/Kconfig
136
create mode 100644 hw/remote/meson.build
137
create mode 100644 hw/remote/trace-events
107
138
108
--
139
--
109
2.26.2
140
2.29.2
110
141
diff view generated by jsdifflib
1
Allow the number of queues to be configured using --export
1
Use the GitLab repo URL as the main repo location in order to reduce
2
vhost-user-blk,num-queues=N. This setting should match the QEMU --device
2
load on qemu.org.
3
vhost-user-blk-pci,num-queues=N setting but QEMU vhost-user-blk.c lowers
4
its own value if the vhost-user-blk backend offers fewer queues than
5
QEMU.
6
7
The vhost-user-blk-server.c code is already capable of multi-queue. All
8
virtqueue processing runs in the same AioContext. No new locking is
9
needed.
10
11
Add the num-queues=N option and set the VIRTIO_BLK_F_MQ feature bit.
12
Note that the feature bit only announces the presence of the num_queues
13
configuration space field. It does not promise that there is more than 1
14
virtqueue, so we can set it unconditionally.
15
16
I tested multi-queue by running a random read fio test with numjobs=4 on
17
an -smp 4 guest. After the benchmark finished the guest /proc/interrupts
18
file showed activity on all 4 virtio-blk MSI-X. The /sys/block/vda/mq/
19
directory shows that Linux blk-mq has 4 queues configured.
20
21
An automated test is included in the next commit.
22
3
23
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
24
Acked-by: Markus Armbruster <armbru@redhat.com>
5
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
25
Message-id: 20201001144604.559733-2-stefanha@redhat.com
6
Reviewed-by: Thomas Huth <thuth@redhat.com>
26
[Fixed accidental tab characters as suggested by Markus Armbruster
7
Message-id: 20210111115017.156802-2-stefanha@redhat.com
27
--Stefan]
28
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
29
---
9
---
30
qapi/block-export.json | 10 +++++++---
10
.github/lockdown.yml | 8 ++++----
31
block/export/vhost-user-blk-server.c | 24 ++++++++++++++++++------
11
1 file changed, 4 insertions(+), 4 deletions(-)
32
2 files changed, 25 insertions(+), 9 deletions(-)
33
12
34
diff --git a/qapi/block-export.json b/qapi/block-export.json
13
diff --git a/.github/lockdown.yml b/.github/lockdown.yml
35
index XXXXXXX..XXXXXXX 100644
14
index XXXXXXX..XXXXXXX 100644
36
--- a/qapi/block-export.json
15
--- a/.github/lockdown.yml
37
+++ b/qapi/block-export.json
16
+++ b/.github/lockdown.yml
38
@@ -XXX,XX +XXX,XX @@
17
@@ -XXX,XX +XXX,XX @@ issues:
39
# SocketAddress types are supported. Passed fds must be UNIX domain
18
comment: |
40
# sockets.
19
Thank you for your interest in the QEMU project.
41
# @logical-block-size: Logical block size in bytes. Defaults to 512 bytes.
20
42
+# @num-queues: Number of request virtqueues. Must be greater than 0. Defaults
21
- This repository is a read-only mirror of the project's master
43
+# to 1.
22
- repostories hosted on https://git.qemu.org/git/qemu.git.
44
#
23
+ This repository is a read-only mirror of the project's repostories hosted
45
# Since: 5.2
24
+ at https://gitlab.com/qemu-project/qemu.git.
46
##
25
The project does not process issues filed on GitHub.
47
{ 'struct': 'BlockExportOptionsVhostUserBlk',
26
48
- 'data': { 'addr': 'SocketAddress', '*logical-block-size': 'size' } }
27
The project issues are tracked on Launchpad:
49
+ 'data': { 'addr': 'SocketAddress',
28
@@ -XXX,XX +XXX,XX @@ pulls:
50
+     '*logical-block-size': 'size',
29
comment: |
51
+ '*num-queues': 'uint16'} }
30
Thank you for your interest in the QEMU project.
52
31
53
##
32
- This repository is a read-only mirror of the project's master
54
# @NbdServerAddOptions:
33
- repostories hosted on https://git.qemu.org/git/qemu.git.
55
@@ -XXX,XX +XXX,XX @@
34
+ This repository is a read-only mirror of the project's repostories hosted
56
{ 'union': 'BlockExportOptions',
35
+ on https://gitlab.com/qemu-project/qemu.git.
57
'base': { 'type': 'BlockExportType',
36
The project does not process merge requests filed on GitHub.
58
'id': 'str',
37
59
-     '*fixed-iothread': 'bool',
38
QEMU welcomes contributions of code (either fixing bugs or adding new
60
-     '*iothread': 'str',
61
+ '*fixed-iothread': 'bool',
62
+ '*iothread': 'str',
63
'node-name': 'str',
64
'*writable': 'bool',
65
'*writethrough': 'bool' },
66
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
67
index XXXXXXX..XXXXXXX 100644
68
--- a/block/export/vhost-user-blk-server.c
69
+++ b/block/export/vhost-user-blk-server.c
70
@@ -XXX,XX +XXX,XX @@
71
#include "util/block-helpers.h"
72
73
enum {
74
- VHOST_USER_BLK_MAX_QUEUES = 1,
75
+ VHOST_USER_BLK_NUM_QUEUES_DEFAULT = 1,
76
};
77
struct virtio_blk_inhdr {
78
unsigned char status;
79
@@ -XXX,XX +XXX,XX @@ static uint64_t vu_blk_get_features(VuDev *dev)
80
1ull << VIRTIO_BLK_F_DISCARD |
81
1ull << VIRTIO_BLK_F_WRITE_ZEROES |
82
1ull << VIRTIO_BLK_F_CONFIG_WCE |
83
+ 1ull << VIRTIO_BLK_F_MQ |
84
1ull << VIRTIO_F_VERSION_1 |
85
1ull << VIRTIO_RING_F_INDIRECT_DESC |
86
1ull << VIRTIO_RING_F_EVENT_IDX |
87
@@ -XXX,XX +XXX,XX @@ static void blk_aio_detach(void *opaque)
88
89
static void
90
vu_blk_initialize_config(BlockDriverState *bs,
91
- struct virtio_blk_config *config, uint32_t blk_size)
92
+ struct virtio_blk_config *config,
93
+ uint32_t blk_size,
94
+ uint16_t num_queues)
95
{
96
config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
97
config->blk_size = blk_size;
98
@@ -XXX,XX +XXX,XX @@ vu_blk_initialize_config(BlockDriverState *bs,
99
config->seg_max = 128 - 2;
100
config->min_io_size = 1;
101
config->opt_io_size = 1;
102
- config->num_queues = VHOST_USER_BLK_MAX_QUEUES;
103
+ config->num_queues = num_queues;
104
config->max_discard_sectors = 32768;
105
config->max_discard_seg = 1;
106
config->discard_sector_alignment = config->blk_size >> 9;
107
@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
108
BlockExportOptionsVhostUserBlk *vu_opts = &opts->u.vhost_user_blk;
109
Error *local_err = NULL;
110
uint64_t logical_block_size;
111
+ uint16_t num_queues = VHOST_USER_BLK_NUM_QUEUES_DEFAULT;
112
113
vexp->writable = opts->writable;
114
vexp->blkcfg.wce = 0;
115
@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
116
}
117
vexp->blk_size = logical_block_size;
118
blk_set_guest_block_size(exp->blk, logical_block_size);
119
+
120
+ if (vu_opts->has_num_queues) {
121
+ num_queues = vu_opts->num_queues;
122
+ }
123
+ if (num_queues == 0) {
124
+ error_setg(errp, "num-queues must be greater than 0");
125
+ return -EINVAL;
126
+ }
127
+
128
vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
129
- logical_block_size);
130
+ logical_block_size, num_queues);
131
132
blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
133
vexp);
134
135
if (!vhost_user_server_start(&vexp->vu_server, vu_opts->addr, exp->ctx,
136
- VHOST_USER_BLK_MAX_QUEUES, &vu_blk_iface,
137
- errp)) {
138
+ num_queues, &vu_blk_iface, errp)) {
139
blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
140
blk_aio_detach, vexp);
141
return -EADDRNOTAVAIL;
142
--
39
--
143
2.26.2
40
2.29.2
144
41
diff view generated by jsdifflib
1
Propagate the flush return value since errors are possible.
1
qemu.org is running out of bandwidth and the QEMU project is moving
2
towards a gating CI on GitLab. Use the GitLab repos instead of qemu.org
3
(they will become mirrors).
2
4
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
Message-id: 20200924151549.913737-11-stefanha@redhat.com
6
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
7
Reviewed-by: Thomas Huth <thuth@redhat.com>
8
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
9
Message-id: 20210111115017.156802-3-stefanha@redhat.com
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
---
11
---
7
block/export/vhost-user-blk-server.c | 11 +++++++----
12
.gitmodules | 44 ++++++++++++++++++++++----------------------
8
1 file changed, 7 insertions(+), 4 deletions(-)
13
1 file changed, 22 insertions(+), 22 deletions(-)
9
14
10
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
15
diff --git a/.gitmodules b/.gitmodules
11
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
12
--- a/block/export/vhost-user-blk-server.c
17
--- a/.gitmodules
13
+++ b/block/export/vhost-user-blk-server.c
18
+++ b/.gitmodules
14
@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
19
@@ -XXX,XX +XXX,XX @@
15
return -EINVAL;
20
[submodule "roms/seabios"]
16
}
21
    path = roms/seabios
17
22
-    url = https://git.qemu.org/git/seabios.git/
18
-static void coroutine_fn vu_block_flush(VuBlockReq *req)
23
+    url = https://gitlab.com/qemu-project/seabios.git/
19
+static int coroutine_fn vu_block_flush(VuBlockReq *req)
24
[submodule "roms/SLOF"]
20
{
25
    path = roms/SLOF
21
VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
26
-    url = https://git.qemu.org/git/SLOF.git
22
BlockBackend *backend = vdev_blk->backend;
27
+    url = https://gitlab.com/qemu-project/SLOF.git
23
- blk_co_flush(backend);
28
[submodule "roms/ipxe"]
24
+ return blk_co_flush(backend);
29
    path = roms/ipxe
25
}
30
-    url = https://git.qemu.org/git/ipxe.git
26
31
+    url = https://gitlab.com/qemu-project/ipxe.git
27
static void coroutine_fn vu_block_virtio_process_req(void *opaque)
32
[submodule "roms/openbios"]
28
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
33
    path = roms/openbios
29
break;
34
-    url = https://git.qemu.org/git/openbios.git
30
}
35
+    url = https://gitlab.com/qemu-project/openbios.git
31
case VIRTIO_BLK_T_FLUSH:
36
[submodule "roms/qemu-palcode"]
32
- vu_block_flush(req);
37
    path = roms/qemu-palcode
33
- req->in->status = VIRTIO_BLK_S_OK;
38
-    url = https://git.qemu.org/git/qemu-palcode.git
34
+ if (vu_block_flush(req) == 0) {
39
+    url = https://gitlab.com/qemu-project/qemu-palcode.git
35
+ req->in->status = VIRTIO_BLK_S_OK;
40
[submodule "roms/sgabios"]
36
+ } else {
41
    path = roms/sgabios
37
+ req->in->status = VIRTIO_BLK_S_IOERR;
42
-    url = https://git.qemu.org/git/sgabios.git
38
+ }
43
+    url = https://gitlab.com/qemu-project/sgabios.git
39
break;
44
[submodule "dtc"]
40
case VIRTIO_BLK_T_GET_ID: {
45
    path = dtc
41
size_t size = MIN(iov_size(&elem->in_sg[0], in_num),
46
-    url = https://git.qemu.org/git/dtc.git
47
+    url = https://gitlab.com/qemu-project/dtc.git
48
[submodule "roms/u-boot"]
49
    path = roms/u-boot
50
-    url = https://git.qemu.org/git/u-boot.git
51
+    url = https://gitlab.com/qemu-project/u-boot.git
52
[submodule "roms/skiboot"]
53
    path = roms/skiboot
54
-    url = https://git.qemu.org/git/skiboot.git
55
+    url = https://gitlab.com/qemu-project/skiboot.git
56
[submodule "roms/QemuMacDrivers"]
57
    path = roms/QemuMacDrivers
58
-    url = https://git.qemu.org/git/QemuMacDrivers.git
59
+    url = https://gitlab.com/qemu-project/QemuMacDrivers.git
60
[submodule "ui/keycodemapdb"]
61
    path = ui/keycodemapdb
62
-    url = https://git.qemu.org/git/keycodemapdb.git
63
+    url = https://gitlab.com/qemu-project/keycodemapdb.git
64
[submodule "capstone"]
65
    path = capstone
66
-    url = https://git.qemu.org/git/capstone.git
67
+    url = https://gitlab.com/qemu-project/capstone.git
68
[submodule "roms/seabios-hppa"]
69
    path = roms/seabios-hppa
70
-    url = https://git.qemu.org/git/seabios-hppa.git
71
+    url = https://gitlab.com/qemu-project/seabios-hppa.git
72
[submodule "roms/u-boot-sam460ex"]
73
    path = roms/u-boot-sam460ex
74
-    url = https://git.qemu.org/git/u-boot-sam460ex.git
75
+    url = https://gitlab.com/qemu-project/u-boot-sam460ex.git
76
[submodule "tests/fp/berkeley-testfloat-3"]
77
    path = tests/fp/berkeley-testfloat-3
78
-    url = https://git.qemu.org/git/berkeley-testfloat-3.git
79
+    url = https://gitlab.com/qemu-project/berkeley-testfloat-3.git
80
[submodule "tests/fp/berkeley-softfloat-3"]
81
    path = tests/fp/berkeley-softfloat-3
82
-    url = https://git.qemu.org/git/berkeley-softfloat-3.git
83
+    url = https://gitlab.com/qemu-project/berkeley-softfloat-3.git
84
[submodule "roms/edk2"]
85
    path = roms/edk2
86
-    url = https://git.qemu.org/git/edk2.git
87
+    url = https://gitlab.com/qemu-project/edk2.git
88
[submodule "slirp"]
89
    path = slirp
90
-    url = https://git.qemu.org/git/libslirp.git
91
+    url = https://gitlab.com/qemu-project/libslirp.git
92
[submodule "roms/opensbi"]
93
    path = roms/opensbi
94
-    url =     https://git.qemu.org/git/opensbi.git
95
+    url =     https://gitlab.com/qemu-project/opensbi.git
96
[submodule "roms/qboot"]
97
    path = roms/qboot
98
-    url = https://git.qemu.org/git/qboot.git
99
+    url = https://gitlab.com/qemu-project/qboot.git
100
[submodule "meson"]
101
    path = meson
102
-    url = https://git.qemu.org/git/meson.git
103
+    url = https://gitlab.com/qemu-project/meson.git
104
[submodule "roms/vbootrom"]
105
    path = roms/vbootrom
106
-    url = https://git.qemu.org/git/vbootrom.git
107
+    url = https://gitlab.com/qemu-project/vbootrom.git
42
--
108
--
43
2.26.2
109
2.29.2
44
110
diff view generated by jsdifflib
1
fds[] is leaked when qio_channel_readv_full() fails.
1
It is no longer necessary to point .gitmodules at GitLab repos when
2
2
running in GitLab CI since they are now used all the time.
3
Use vmsg->fds[] instead of keeping a local fds[] array. Then we can
4
reuse goto fail to clean up fds. vmsg->fd_num must be zeroed before the
5
loop to make this safe.
6
3
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Message-id: 20200924151549.913737-8-stefanha@redhat.com
5
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
6
Reviewed-by: Thomas Huth <thuth@redhat.com>
7
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
8
Message-id: 20210111115017.156802-4-stefanha@redhat.com
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
10
---
11
util/vhost-user-server.c | 50 ++++++++++++++++++----------------------
11
.gitlab-ci.yml | 1 -
12
1 file changed, 23 insertions(+), 27 deletions(-)
12
1 file changed, 1 deletion(-)
13
13
14
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
14
diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
15
index XXXXXXX..XXXXXXX 100644
15
index XXXXXXX..XXXXXXX 100644
16
--- a/util/vhost-user-server.c
16
--- a/.gitlab-ci.yml
17
+++ b/util/vhost-user-server.c
17
+++ b/.gitlab-ci.yml
18
@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
18
@@ -XXX,XX +XXX,XX @@ include:
19
};
19
image: $CI_REGISTRY_IMAGE/qemu/$IMAGE:latest
20
int rc, read_bytes = 0;
20
before_script:
21
Error *local_err = NULL;
21
- JOBS=$(expr $(nproc) + 1)
22
- /*
22
- - sed -i s,git.qemu.org/git,gitlab.com/qemu-project, .gitmodules
23
- * Store fds/nfds returned from qio_channel_readv_full into
23
script:
24
- * temporary variables.
24
- mkdir build
25
- *
25
- cd build
26
- * VhostUserMsg is a packed structure, gcc will complain about passing
27
- * pointer to a packed structure member if we pass &VhostUserMsg.fd_num
28
- * and &VhostUserMsg.fds directly when calling qio_channel_readv_full,
29
- * thus two temporary variables nfds and fds are used here.
30
- */
31
- size_t nfds = 0, nfds_t = 0;
32
const size_t max_fds = G_N_ELEMENTS(vmsg->fds);
33
- int *fds_t = NULL;
34
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
35
QIOChannel *ioc = server->ioc;
36
37
+ vmsg->fd_num = 0;
38
if (!ioc) {
39
error_report_err(local_err);
40
goto fail;
41
@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
42
43
assert(qemu_in_coroutine());
44
do {
45
+ size_t nfds = 0;
46
+ int *fds = NULL;
47
+
48
/*
49
* qio_channel_readv_full may have short reads, keeping calling it
50
* until getting VHOST_USER_HDR_SIZE or 0 bytes in total
51
*/
52
- rc = qio_channel_readv_full(ioc, &iov, 1, &fds_t, &nfds_t, &local_err);
53
+ rc = qio_channel_readv_full(ioc, &iov, 1, &fds, &nfds, &local_err);
54
if (rc < 0) {
55
if (rc == QIO_CHANNEL_ERR_BLOCK) {
56
+ assert(local_err == NULL);
57
qio_channel_yield(ioc, G_IO_IN);
58
continue;
59
} else {
60
error_report_err(local_err);
61
- return false;
62
+ goto fail;
63
}
64
}
65
- read_bytes += rc;
66
- if (nfds_t > 0) {
67
- if (nfds + nfds_t > max_fds) {
68
+
69
+ if (nfds > 0) {
70
+ if (vmsg->fd_num + nfds > max_fds) {
71
error_report("A maximum of %zu fds are allowed, "
72
"however got %zu fds now",
73
- max_fds, nfds + nfds_t);
74
+ max_fds, vmsg->fd_num + nfds);
75
+ g_free(fds);
76
goto fail;
77
}
78
- memcpy(vmsg->fds + nfds, fds_t,
79
- nfds_t *sizeof(vmsg->fds[0]));
80
- nfds += nfds_t;
81
- g_free(fds_t);
82
+ memcpy(vmsg->fds + vmsg->fd_num, fds, nfds * sizeof(vmsg->fds[0]));
83
+ vmsg->fd_num += nfds;
84
+ g_free(fds);
85
}
86
- if (read_bytes == VHOST_USER_HDR_SIZE || rc == 0) {
87
- break;
88
+
89
+ if (rc == 0) { /* socket closed */
90
+ goto fail;
91
}
92
- iov.iov_base = (char *)vmsg + read_bytes;
93
- iov.iov_len = VHOST_USER_HDR_SIZE - read_bytes;
94
- } while (true);
95
96
- vmsg->fd_num = nfds;
97
+ iov.iov_base += rc;
98
+ iov.iov_len -= rc;
99
+ read_bytes += rc;
100
+ } while (read_bytes != VHOST_USER_HDR_SIZE);
101
+
102
/* qio_channel_readv_full will make socket fds blocking, unblock them */
103
vmsg_unblock_fds(vmsg);
104
if (vmsg->size > sizeof(vmsg->payload)) {
105
--
26
--
106
2.26.2
27
2.29.2
107
28
diff view generated by jsdifflib
1
The device panic notifier callback is not used. Drop it.
1
qemu.org is running out of bandwidth and the QEMU project is moving
2
towards a gating CI on GitLab. Use the GitLab repos instead of qemu.org
3
(they will become mirrors).
2
4
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
Message-id: 20200924151549.913737-7-stefanha@redhat.com
6
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
7
Reviewed-by: Thomas Huth <thuth@redhat.com>
8
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
9
Message-id: 20210111115017.156802-5-stefanha@redhat.com
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
---
11
---
7
util/vhost-user-server.h | 3 ---
12
README.rst | 4 ++--
8
block/export/vhost-user-blk-server.c | 3 +--
13
1 file changed, 2 insertions(+), 2 deletions(-)
9
util/vhost-user-server.c | 6 ------
10
3 files changed, 1 insertion(+), 11 deletions(-)
11
14
12
diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
15
diff --git a/README.rst b/README.rst
13
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
14
--- a/util/vhost-user-server.h
17
--- a/README.rst
15
+++ b/util/vhost-user-server.h
18
+++ b/README.rst
16
@@ -XXX,XX +XXX,XX @@ typedef struct VuFdWatch {
19
@@ -XXX,XX +XXX,XX @@ The QEMU source code is maintained under the GIT version control system.
17
} VuFdWatch;
20
18
21
.. code-block:: shell
19
typedef struct VuServer VuServer;
22
20
-typedef void DevicePanicNotifierFn(VuServer *server);
23
- git clone https://git.qemu.org/git/qemu.git
21
24
+ git clone https://gitlab.com/qemu-project/qemu.git
22
struct VuServer {
25
23
QIONetListener *listener;
26
When submitting patches, one common approach is to use 'git
24
AioContext *ctx;
27
format-patch' and/or 'git send-email' to format & send the mail to the
25
- DevicePanicNotifierFn *device_panic_notifier;
28
@@ -XXX,XX +XXX,XX @@ The QEMU website is also maintained under source control.
26
int max_queues;
29
27
const VuDevIface *vu_iface;
30
.. code-block:: shell
28
VuDev vu_dev;
31
29
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
32
- git clone https://git.qemu.org/git/qemu-web.git
30
SocketAddress *unix_socket,
33
+ git clone https://gitlab.com/qemu-project/qemu-web.git
31
AioContext *ctx,
34
32
uint16_t max_queues,
35
* `<https://www.qemu.org/2017/02/04/the-new-qemu-website-is-up/>`_
33
- DevicePanicNotifierFn *device_panic_notifier,
36
34
const VuDevIface *vu_iface,
35
Error **errp);
36
37
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
38
index XXXXXXX..XXXXXXX 100644
39
--- a/block/export/vhost-user-blk-server.c
40
+++ b/block/export/vhost-user-blk-server.c
41
@@ -XXX,XX +XXX,XX @@ static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
42
ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
43
44
if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
45
- VHOST_USER_BLK_MAX_QUEUES,
46
- NULL, &vu_block_iface,
47
+ VHOST_USER_BLK_MAX_QUEUES, &vu_block_iface,
48
errp)) {
49
goto error;
50
}
51
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
52
index XXXXXXX..XXXXXXX 100644
53
--- a/util/vhost-user-server.c
54
+++ b/util/vhost-user-server.c
55
@@ -XXX,XX +XXX,XX @@ static void panic_cb(VuDev *vu_dev, const char *buf)
56
close_client(server);
57
}
58
59
- if (server->device_panic_notifier) {
60
- server->device_panic_notifier(server);
61
- }
62
-
63
/*
64
* Set the callback function for network listener so another
65
* vhost-user client can connect to this server
66
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
67
SocketAddress *socket_addr,
68
AioContext *ctx,
69
uint16_t max_queues,
70
- DevicePanicNotifierFn *device_panic_notifier,
71
const VuDevIface *vu_iface,
72
Error **errp)
73
{
74
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
75
.vu_iface = vu_iface,
76
.max_queues = max_queues,
77
.ctx = ctx,
78
- .device_panic_notifier = device_panic_notifier,
79
};
80
81
qio_net_listener_set_name(server->listener, "vhost-user-backend-listener");
82
--
37
--
83
2.26.2
38
2.29.2
84
39
diff view generated by jsdifflib
1
Explicitly deleting watches is not necessary since libvhost-user calls
1
qemu.org is running out of bandwidth and the QEMU project is moving
2
remove_watch() during vu_deinit(). Add an assertion to check this
2
towards a gating CI on GitLab. Use the GitLab repos instead of qemu.org
3
though.
3
(they will become mirrors).
4
4
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Message-id: 20200924151549.913737-5-stefanha@redhat.com
6
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
7
Reviewed-by: Thomas Huth <thuth@redhat.com>
8
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
9
Message-id: 20210111115017.156802-6-stefanha@redhat.com
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
---
11
---
9
util/vhost-user-server.c | 19 ++++---------------
12
pc-bios/README | 4 ++--
10
1 file changed, 4 insertions(+), 15 deletions(-)
13
1 file changed, 2 insertions(+), 2 deletions(-)
11
14
12
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
15
diff --git a/pc-bios/README b/pc-bios/README
13
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
14
--- a/util/vhost-user-server.c
17
--- a/pc-bios/README
15
+++ b/util/vhost-user-server.c
18
+++ b/pc-bios/README
16
@@ -XXX,XX +XXX,XX @@ static void close_client(VuServer *server)
19
@@ -XXX,XX +XXX,XX @@
17
/* When this is set vu_client_trip will stop new processing vhost-user message */
20
legacy x86 software to communicate with an attached serial console as
18
server->sioc = NULL;
21
if a video card were attached. The master sources reside in a subversion
19
22
repository at http://sgabios.googlecode.com/svn/trunk. A git mirror is
20
- VuFdWatch *vu_fd_watch, *next;
23
- available at https://git.qemu.org/git/sgabios.git.
21
- QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
24
+ available at https://gitlab.com/qemu-project/sgabios.git.
22
- aio_set_fd_handler(server->ioc->ctx, vu_fd_watch->fd, true, NULL,
25
23
- NULL, NULL, NULL);
26
- The PXE roms come from the iPXE project. Built with BANNER_TIME 0.
24
- }
27
Sources available at http://ipxe.org. Vendor:Device ID -> ROM mapping:
25
-
28
@@ -XXX,XX +XXX,XX @@
26
- while (!QTAILQ_EMPTY(&server->vu_fd_watches)) {
29
27
- QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
30
- The u-boot binary for e500 comes from the upstream denx u-boot project where
28
- if (!vu_fd_watch->processing) {
31
it was compiled using the qemu-ppce500 target.
29
- QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
32
- A git mirror is available at: https://git.qemu.org/git/u-boot.git
30
- g_free(vu_fd_watch);
33
+ A git mirror is available at: https://gitlab.com/qemu-project/u-boot.git
31
- }
34
The hash used to compile the current version is: 2072e72
32
- }
35
33
- }
36
- Skiboot (https://github.com/open-power/skiboot/) is an OPAL
34
-
35
while (server->processing_msg) {
36
if (server->ioc->read_coroutine) {
37
server->ioc->read_coroutine = NULL;
38
@@ -XXX,XX +XXX,XX @@ static void close_client(VuServer *server)
39
}
40
41
vu_deinit(&server->vu_dev);
42
+
43
+ /* vu_deinit() should have called remove_watch() */
44
+ assert(QTAILQ_EMPTY(&server->vu_fd_watches));
45
+
46
object_unref(OBJECT(sioc));
47
object_unref(OBJECT(server->ioc));
48
}
49
--
37
--
50
2.26.2
38
2.29.2
51
39
diff view generated by jsdifflib
1
We already have access to the value with the correct type (ioc and sioc
1
qemu.org is running out of bandwidth and the QEMU project is moving
2
are the same QIOChannel).
2
towards a gating CI on GitLab. Use the GitLab repos instead of qemu.org
3
(they will become mirrors).
3
4
4
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Message-id: 20200924151549.913737-4-stefanha@redhat.com
6
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
7
Reviewed-by: Thomas Huth <thuth@redhat.com>
8
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
9
Message-id: 20210111115017.156802-7-stefanha@redhat.com
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
---
11
---
8
util/vhost-user-server.c | 2 +-
12
scripts/get_maintainer.pl | 2 +-
9
1 file changed, 1 insertion(+), 1 deletion(-)
13
1 file changed, 1 insertion(+), 1 deletion(-)
10
14
11
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
15
diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
12
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100755
13
--- a/util/vhost-user-server.c
17
--- a/scripts/get_maintainer.pl
14
+++ b/util/vhost-user-server.c
18
+++ b/scripts/get_maintainer.pl
15
@@ -XXX,XX +XXX,XX @@ static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
19
@@ -XXX,XX +XXX,XX @@ sub vcs_exists {
16
server->ioc = QIO_CHANNEL(sioc);
20
    warn("$P: No supported VCS found. Add --nogit to options?\n");
17
object_ref(OBJECT(server->ioc));
21
    warn("Using a git repository produces better results.\n");
18
qio_channel_attach_aio_context(server->ioc, server->ctx);
22
    warn("Try latest git repository using:\n");
19
- qio_channel_set_blocking(QIO_CHANNEL(server->sioc), false, NULL);
23
-    warn("git clone https://git.qemu.org/git/qemu.git\n");
20
+ qio_channel_set_blocking(server->ioc, false, NULL);
24
+    warn("git clone https://gitlab.com/qemu-project/qemu.git\n");
21
vu_client_start(server);
25
    $printed_novcs = 1;
22
}
26
}
23
27
return 0;
24
--
28
--
25
2.26.2
29
2.29.2
26
30
diff view generated by jsdifflib
1
Headers used by other subsystems are located in include/. Also add the
1
From: John G Johnson <john.g.johnson@oracle.com>
2
vhost-user-server and vhost-user-blk-server headers to MAINTAINERS.
3
2
4
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
3
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
5
Message-id: 20200924151549.913737-13-stefanha@redhat.com
4
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
5
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
6
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Message-id: 02a68adef99f5df6a380bf8fd7b90948777e411c.1611938319.git.jag.raman@oracle.com
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
---
9
---
8
MAINTAINERS | 4 +++-
10
MAINTAINERS | 7 +
9
{util => include/qemu}/vhost-user-server.h | 0
11
docs/devel/index.rst | 1 +
10
block/export/vhost-user-blk-server.c | 2 +-
12
docs/devel/multi-process.rst | 966 +++++++++++++++++++++++++++++++++++
11
util/vhost-user-server.c | 2 +-
13
3 files changed, 974 insertions(+)
12
4 files changed, 5 insertions(+), 3 deletions(-)
14
create mode 100644 docs/devel/multi-process.rst
13
rename {util => include/qemu}/vhost-user-server.h (100%)
14
15
15
diff --git a/MAINTAINERS b/MAINTAINERS
16
diff --git a/MAINTAINERS b/MAINTAINERS
16
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
17
--- a/MAINTAINERS
18
--- a/MAINTAINERS
18
+++ b/MAINTAINERS
19
+++ b/MAINTAINERS
19
@@ -XXX,XX +XXX,XX @@ Vhost-user block device backend server
20
@@ -XXX,XX +XXX,XX @@ S: Maintained
20
M: Coiby Xu <Coiby.Xu@gmail.com>
21
F: hw/semihosting/
21
S: Maintained
22
F: include/hw/semihosting/
22
F: block/export/vhost-user-blk-server.c
23
23
-F: util/vhost-user-server.c
24
+Multi-process QEMU
24
+F: block/export/vhost-user-blk-server.h
25
+M: Elena Ufimtseva <elena.ufimtseva@oracle.com>
25
+F: include/qemu/vhost-user-server.h
26
+M: Jagannathan Raman <jag.raman@oracle.com>
26
F: tests/qtest/libqos/vhost-user-blk.c
27
+M: John G Johnson <john.g.johnson@oracle.com>
27
+F: util/vhost-user-server.c
28
+S: Maintained
28
29
+F: docs/devel/multi-process.rst
29
Replication
30
+
30
M: Wen Congyang <wencongyang2@huawei.com>
31
Build and test automation
31
diff --git a/util/vhost-user-server.h b/include/qemu/vhost-user-server.h
32
-------------------------
32
similarity index 100%
33
Build and test automation
33
rename from util/vhost-user-server.h
34
diff --git a/docs/devel/index.rst b/docs/devel/index.rst
34
rename to include/qemu/vhost-user-server.h
35
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
36
index XXXXXXX..XXXXXXX 100644
35
index XXXXXXX..XXXXXXX 100644
37
--- a/block/export/vhost-user-blk-server.c
36
--- a/docs/devel/index.rst
38
+++ b/block/export/vhost-user-blk-server.c
37
+++ b/docs/devel/index.rst
38
@@ -XXX,XX +XXX,XX @@ Contents:
39
clocks
40
qom
41
block-coroutine-wrapper
42
+ multi-process
43
diff --git a/docs/devel/multi-process.rst b/docs/devel/multi-process.rst
44
new file mode 100644
45
index XXXXXXX..XXXXXXX
46
--- /dev/null
47
+++ b/docs/devel/multi-process.rst
39
@@ -XXX,XX +XXX,XX @@
48
@@ -XXX,XX +XXX,XX @@
40
#include "block/block.h"
49
+This is the design document for multi-process QEMU. It does not
41
#include "contrib/libvhost-user/libvhost-user.h"
50
+necessarily reflect the status of the current implementation, which
42
#include "standard-headers/linux/virtio_blk.h"
51
+may lack features or be considerably different from what is described
43
-#include "util/vhost-user-server.h"
52
+in this document. This document is still useful as a description of
44
+#include "qemu/vhost-user-server.h"
53
+the goals and general direction of this feature.
45
#include "vhost-user-blk-server.h"
54
+
46
#include "qapi/error.h"
55
+Please refer to the following wiki for latest details:
47
#include "qom/object_interfaces.h"
56
+https://wiki.qemu.org/Features/MultiProcessQEMU
48
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
57
+
49
index XXXXXXX..XXXXXXX 100644
58
+Multi-process QEMU
50
--- a/util/vhost-user-server.c
59
+===================
51
+++ b/util/vhost-user-server.c
60
+
52
@@ -XXX,XX +XXX,XX @@
61
+QEMU is often used as the hypervisor for virtual machines running in the
53
*/
62
+Oracle cloud. Since one of the advantages of cloud computing is the
54
#include "qemu/osdep.h"
63
+ability to run many VMs from different tenants in the same cloud
55
#include "qemu/main-loop.h"
64
+infrastructure, a guest that compromised its hypervisor could
56
+#include "qemu/vhost-user-server.h"
65
+potentially use the hypervisor's access privileges to access data it is
57
#include "block/aio-wait.h"
66
+not authorized for.
58
-#include "vhost-user-server.h"
67
+
59
68
+QEMU can be susceptible to security attacks because it is a large,
60
/*
69
+monolithic program that provides many features to the VMs it services.
61
* Theory of operation:
70
+Many of these features can be configured out of QEMU, but even a reduced
71
+configuration QEMU has a large amount of code a guest can potentially
72
+attack. Separating QEMU reduces the attack surface by aiding to
73
+limit each component in the system to only access the resources that
74
+it needs to perform its job.
75
+
76
+QEMU services
77
+-------------
78
+
79
+QEMU can be broadly described as providing three main services. One is a
80
+VM control point, where VMs can be created, migrated, re-configured, and
81
+destroyed. A second is to emulate the CPU instructions within the VM,
82
+often accelerated by HW virtualization features such as Intel's VT
83
+extensions. Finally, it provides IO services to the VM by emulating HW
84
+IO devices, such as disk and network devices.
85
+
86
+A multi-process QEMU
87
+~~~~~~~~~~~~~~~~~~~~
88
+
89
+A multi-process QEMU involves separating QEMU services into separate
90
+host processes. Each of these processes can be given only the privileges
91
+it needs to provide its service, e.g., a disk service could be given
92
+access only to the disk images it provides, and not be allowed to
93
+access other files, or any network devices. An attacker who compromised
94
+this service would not be able to use this exploit to access files or
95
+devices beyond what the disk service was given access to.
96
+
97
+A QEMU control process would remain, but in multi-process mode, will
98
+have no direct interfaces to the VM. During VM execution, it would still
99
+provide the user interface to hot-plug devices or live migrate the VM.
100
+
101
+A first step in creating a multi-process QEMU is to separate IO services
102
+from the main QEMU program, which would continue to provide CPU
103
+emulation. i.e., the control process would also be the CPU emulation
104
+process. In a later phase, CPU emulation could be separated from the
105
+control process.
106
+
107
+Separating IO services
108
+----------------------
109
+
110
+Separating IO services into individual host processes is a good place to
111
+begin for a couple of reasons. One is the sheer number of IO devices QEMU
112
+can emulate provides a large surface of interfaces which could potentially
113
+be exploited, and, indeed, have been a source of exploits in the past.
114
+Another is the modular nature of QEMU device emulation code provides
115
+interface points where the QEMU functions that perform device emulation
116
+can be separated from the QEMU functions that manage the emulation of
117
+guest CPU instructions. The devices emulated in the separate process are
118
+referred to as remote devices.
119
+
120
+QEMU device emulation
121
+~~~~~~~~~~~~~~~~~~~~~
122
+
123
+QEMU uses an object oriented SW architecture for device emulation code.
124
+Configured objects are all compiled into the QEMU binary, then objects
125
+are instantiated by name when used by the guest VM. For example, the
126
+code to emulate a device named "foo" is always present in QEMU, but its
127
+instantiation code is only run when the device is included in the target
128
+VM. (e.g., via the QEMU command line as *-device foo*)
129
+
130
+The object model is hierarchical, so device emulation code names its
131
+parent object (such as "pci-device" for a PCI device) and QEMU will
132
+instantiate a parent object before calling the device's instantiation
133
+code.
134
+
135
+Current separation models
136
+~~~~~~~~~~~~~~~~~~~~~~~~~
137
+
138
+In order to separate the device emulation code from the CPU emulation
139
+code, the device object code must run in a different process. There are
140
+a couple of existing QEMU features that can run emulation code
141
+separately from the main QEMU process. These are examined below.
142
+
143
+vhost user model
144
+^^^^^^^^^^^^^^^^
145
+
146
+Virtio guest device drivers can be connected to vhost user applications
147
+in order to perform their IO operations. This model uses special virtio
148
+device drivers in the guest and vhost user device objects in QEMU, but
149
+once the QEMU vhost user code has configured the vhost user application,
150
+mission-mode IO is performed by the application. The vhost user
151
+application is a daemon process that can be contacted via a known UNIX
152
+domain socket.
153
+
154
+vhost socket
155
+''''''''''''
156
+
157
+As mentioned above, one of the tasks of the vhost device object within
158
+QEMU is to contact the vhost application and send it configuration
159
+information about this device instance. As part of the configuration
160
+process, the application can also be sent other file descriptors over
161
+the socket, which then can be used by the vhost user application in
162
+various ways, some of which are described below.
163
+
164
+vhost MMIO store acceleration
165
+'''''''''''''''''''''''''''''
166
+
167
+VMs are often run using HW virtualization features via the KVM kernel
168
+driver. This driver allows QEMU to accelerate the emulation of guest CPU
169
+instructions by running the guest in a virtual HW mode. When the guest
170
+executes instructions that cannot be executed by virtual HW mode,
171
+execution returns to the KVM driver so it can inform QEMU to emulate the
172
+instructions in SW.
173
+
174
+One of the events that can cause a return to QEMU is when a guest device
175
+driver accesses an IO location. QEMU then dispatches the memory
176
+operation to the corresponding QEMU device object. In the case of a
177
+vhost user device, the memory operation would need to be sent over a
178
+socket to the vhost application. This path is accelerated by the QEMU
179
+virtio code by setting up an eventfd file descriptor that the vhost
180
+application can directly receive MMIO store notifications from the KVM
181
+driver, instead of needing them to be sent to the QEMU process first.
182
+
183
+vhost interrupt acceleration
184
+''''''''''''''''''''''''''''
185
+
186
+Another optimization used by the vhost application is the ability to
187
+directly inject interrupts into the VM via the KVM driver, again,
188
+bypassing the need to send the interrupt back to the QEMU process first.
189
+The QEMU virtio setup code configures the KVM driver with an eventfd
190
+that triggers the device interrupt in the guest when the eventfd is
191
+written. This irqfd file descriptor is then passed to the vhost user
192
+application program.
193
+
194
+vhost access to guest memory
195
+''''''''''''''''''''''''''''
196
+
197
+The vhost application is also allowed to directly access guest memory,
198
+instead of needing to send the data as messages to QEMU. This is also
199
+done with file descriptors sent to the vhost user application by QEMU.
200
+These descriptors can be passed to ``mmap()`` by the vhost application
201
+to map the guest address space into the vhost application.
202
+
203
+IOMMUs introduce another level of complexity, since the address given to
204
+the guest virtio device to DMA to or from is not a guest physical
205
+address. This case is handled by having vhost code within QEMU register
206
+as a listener for IOMMU mapping changes. The vhost application maintains
207
+a cache of IOMMMU translations: sending translation requests back to
208
+QEMU on cache misses, and in turn receiving flush requests from QEMU
209
+when mappings are purged.
210
+
211
+applicability to device separation
212
+''''''''''''''''''''''''''''''''''
213
+
214
+Much of the vhost model can be re-used by separated device emulation. In
215
+particular, the ideas of using a socket between QEMU and the device
216
+emulation application, using a file descriptor to inject interrupts into
217
+the VM via KVM, and allowing the application to ``mmap()`` the guest
218
+should be re used.
219
+
220
+There are, however, some notable differences between how a vhost
221
+application works and the needs of separated device emulation. The most
222
+basic is that vhost uses custom virtio device drivers which always
223
+trigger IO with MMIO stores. A separated device emulation model must
224
+work with existing IO device models and guest device drivers. MMIO loads
225
+break vhost store acceleration since they are synchronous - guest
226
+progress cannot continue until the load has been emulated. By contrast,
227
+stores are asynchronous, the guest can continue after the store event
228
+has been sent to the vhost application.
229
+
230
+Another difference is that in the vhost user model, a single daemon can
231
+support multiple QEMU instances. This is contrary to the security regime
232
+desired, in which the emulation application should only be allowed to
233
+access the files or devices the VM it's running on behalf of can access.
234
+#### qemu-io model
235
+
236
+Qemu-io is a test harness used to test changes to the QEMU block backend
237
+object code. (e.g., the code that implements disk images for disk driver
238
+emulation) Qemu-io is not a device emulation application per se, but it
239
+does compile the QEMU block objects into a separate binary from the main
240
+QEMU one. This could be useful for disk device emulation, since its
241
+emulation applications will need to include the QEMU block objects.
242
+
243
+New separation model based on proxy objects
244
+-------------------------------------------
245
+
246
+A different model based on proxy objects in the QEMU program
247
+communicating with remote emulation programs could provide separation
248
+while minimizing the changes needed to the device emulation code. The
249
+rest of this section is a discussion of how a proxy object model would
250
+work.
251
+
252
+Remote emulation processes
253
+~~~~~~~~~~~~~~~~~~~~~~~~~~
254
+
255
+The remote emulation process will run the QEMU object hierarchy without
256
+modification. The device emulation objects will be also be based on the
257
+QEMU code, because for anything but the simplest device, it would not be
258
+a tractable to re-implement both the object model and the many device
259
+backends that QEMU has.
260
+
261
+The processes will communicate with the QEMU process over UNIX domain
262
+sockets. The processes can be executed either as standalone processes,
263
+or be executed by QEMU. In both cases, the host backends the emulation
264
+processes will provide are specified on its command line, as they would
265
+be for QEMU. For example:
266
+
267
+::
268
+
269
+ disk-proc -blockdev driver=file,node-name=file0,filename=disk-file0 \
270
+ -blockdev driver=qcow2,node-name=drive0,file=file0
271
+
272
+would indicate process *disk-proc* uses a qcow2 emulated disk named
273
+*file0* as its backend.
274
+
275
+Emulation processes may emulate more than one guest controller. A common
276
+configuration might be to put all controllers of the same device class
277
+(e.g., disk, network, etc.) in a single process, so that all backends of
278
+the same type can be managed by a single QMP monitor.
279
+
280
+communication with QEMU
281
+^^^^^^^^^^^^^^^^^^^^^^^
282
+
283
+The first argument to the remote emulation process will be a Unix domain
284
+socket that connects with the Proxy object. This is a required argument.
285
+
286
+::
287
+
288
+ disk-proc <socket number> <backend list>
289
+
290
+remote process QMP monitor
291
+^^^^^^^^^^^^^^^^^^^^^^^^^^
292
+
293
+Remote emulation processes can be monitored via QMP, similar to QEMU
294
+itself. The QMP monitor socket is specified the same as for a QEMU
295
+process:
296
+
297
+::
298
+
299
+ disk-proc -qmp unix:/tmp/disk-mon,server
300
+
301
+can be monitored over the UNIX socket path */tmp/disk-mon*.
302
+
303
+QEMU command line
304
+~~~~~~~~~~~~~~~~~
305
+
306
+Each remote device emulated in a remote process on the host is
307
+represented as a *-device* of type *pci-proxy-dev*. A socket
308
+sub-option to this option specifies the Unix socket that connects
309
+to the remote process. An *id* sub-option is required, and it should
310
+be the same id as used in the remote process.
311
+
312
+::
313
+
314
+ qemu-system-x86_64 ... -device pci-proxy-dev,id=lsi0,socket=3
315
+
316
+can be used to add a device emulated in a remote process
317
+
318
+
319
+QEMU management of remote processes
320
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
321
+
322
+QEMU is not aware of the type of type of the remote PCI device. It is
323
+a pass through device as far as QEMU is concerned.
324
+
325
+communication with emulation process
326
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
327
+
328
+primary channel
329
+'''''''''''''''
330
+
331
+The primary channel (referred to as com in the code) is used to bootstrap
332
+the remote process. It is also used to pass on device-agnostic commands
333
+like reset.
334
+
335
+per-device channels
336
+'''''''''''''''''''
337
+
338
+Each remote device communicates with QEMU using a dedicated communication
339
+channel. The proxy object sets up this channel using the primary
340
+channel during its initialization.
341
+
342
+QEMU device proxy objects
343
+~~~~~~~~~~~~~~~~~~~~~~~~~
344
+
345
+QEMU has an object model based on sub-classes inherited from the
346
+"object" super-class. The sub-classes that are of interest here are the
347
+"device" and "bus" sub-classes whose child sub-classes make up the
348
+device tree of a QEMU emulated system.
349
+
350
+The proxy object model will use device proxy objects to replace the
351
+device emulation code within the QEMU process. These objects will live
352
+in the same place in the object and bus hierarchies as the objects they
353
+replace. i.e., the proxy object for an LSI SCSI controller will be a
354
+sub-class of the "pci-device" class, and will have the same PCI bus
355
+parent and the same SCSI bus child objects as the LSI controller object
356
+it replaces.
357
+
358
+It is worth noting that the same proxy object is used to mediate with
359
+all types of remote PCI devices.
360
+
361
+object initialization
362
+^^^^^^^^^^^^^^^^^^^^^
363
+
364
+The Proxy device objects are initialized in the exact same manner in
365
+which any other QEMU device would be initialized.
366
+
367
+In addition, the Proxy objects perform the following two tasks:
368
+- Parses the "socket" sub option and connects to the remote process
369
+using this channel
370
+- Uses the "id" sub-option to connect to the emulated device on the
371
+separate process
372
+
373
+class\_init
374
+'''''''''''
375
+
376
+The ``class_init()`` method of a proxy object will, in general behave
377
+similarly to the object it replaces, including setting any static
378
+properties and methods needed by the proxy.
379
+
380
+instance\_init / realize
381
+''''''''''''''''''''''''
382
+
383
+The ``instance_init()`` and ``realize()`` functions would only need to
384
+perform tasks related to being a proxy, such are registering its own
385
+MMIO handlers, or creating a child bus that other proxy devices can be
386
+attached to later.
387
+
388
+Other tasks will be device-specific. For example, PCI device objects
389
+will initialize the PCI config space in order to make a valid PCI device
390
+tree within the QEMU process.
391
+
392
+address space registration
393
+^^^^^^^^^^^^^^^^^^^^^^^^^^
394
+
395
+Most devices are driven by guest device driver accesses to IO addresses
396
+or ports. The QEMU device emulation code uses QEMU's memory region
397
+function calls (such as ``memory_region_init_io()``) to add callback
398
+functions that QEMU will invoke when the guest accesses the device's
399
+areas of the IO address space. When a guest driver does access the
400
+device, the VM will exit HW virtualization mode and return to QEMU,
401
+which will then lookup and execute the corresponding callback function.
402
+
403
+A proxy object would need to mirror the memory region calls the actual
404
+device emulator would perform in its initialization code, but with its
405
+own callbacks. When invoked by QEMU as a result of a guest IO operation,
406
+they will forward the operation to the device emulation process.
407
+
408
+PCI config space
409
+^^^^^^^^^^^^^^^^
410
+
411
+PCI devices also have a configuration space that can be accessed by the
412
+guest driver. Guest accesses to this space is not handled by the device
413
+emulation object, but by its PCI parent object. Much of this space is
414
+read-only, but certain registers (especially BAR and MSI-related ones)
415
+need to be propagated to the emulation process.
416
+
417
+PCI parent proxy
418
+''''''''''''''''
419
+
420
+One way to propagate guest PCI config accesses is to create a
421
+"pci-device-proxy" class that can serve as the parent of a PCI device
422
+proxy object. This class's parent would be "pci-device" and it would
423
+override the PCI parent's ``config_read()`` and ``config_write()``
424
+methods with ones that forward these operations to the emulation
425
+program.
426
+
427
+interrupt receipt
428
+^^^^^^^^^^^^^^^^^
429
+
430
+A proxy for a device that generates interrupts will need to create a
431
+socket to receive interrupt indications from the emulation process. An
432
+incoming interrupt indication would then be sent up to its bus parent to
433
+be injected into the guest. For example, a PCI device object may use
434
+``pci_set_irq()``.
435
+
436
+live migration
437
+^^^^^^^^^^^^^^
438
+
439
+The proxy will register to save and restore any *vmstate* it needs over
440
+a live migration event. The device proxy does not need to manage the
441
+remote device's *vmstate*; that will be handled by the remote process
442
+proxy (see below).
443
+
444
+QEMU remote device operation
445
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
446
+
447
+Generic device operations, such as DMA, will be performed by the remote
448
+process proxy by sending messages to the remote process.
449
+
450
+DMA operations
451
+^^^^^^^^^^^^^^
452
+
453
+DMA operations would be handled much like vhost applications do. One of
454
+the initial messages sent to the emulation process is a guest memory
455
+table. Each entry in this table consists of a file descriptor and size
456
+that the emulation process can ``mmap()`` to directly access guest
457
+memory, similar to ``vhost_user_set_mem_table()``. Note guest memory
458
+must be backed by file descriptors, such as when QEMU is given the
459
+*-mem-path* command line option.
460
+
461
+IOMMU operations
462
+^^^^^^^^^^^^^^^^
463
+
464
+When the emulated system includes an IOMMU, the remote process proxy in
465
+QEMU will need to create a socket for IOMMU requests from the emulation
466
+process. It will handle those requests with an
467
+``address_space_get_iotlb_entry()`` call. In order to handle IOMMU
468
+unmaps, the remote process proxy will also register as a listener on the
469
+device's DMA address space. When an IOMMU memory region is created
470
+within the DMA address space, an IOMMU notifier for unmaps will be added
471
+to the memory region that will forward unmaps to the emulation process
472
+over the IOMMU socket.
473
+
474
+device hot-plug via QMP
475
+^^^^^^^^^^^^^^^^^^^^^^^
476
+
477
+An QMP "device\_add" command can add a device emulated by a remote
478
+process. It will also have "rid" option to the command, just as the
479
+*-device* command line option does. The remote process may either be one
480
+started at QEMU startup, or be one added by the "add-process" QMP
481
+command described above. In either case, the remote process proxy will
482
+forward the new device's JSON description to the corresponding emulation
483
+process.
484
+
485
+live migration
486
+^^^^^^^^^^^^^^
487
+
488
+The remote process proxy will also register for live migration
489
+notifications with ``vmstate_register()``. When called to save state,
490
+the proxy will send the remote process a secondary socket file
491
+descriptor to save the remote process's device *vmstate* over. The
492
+incoming byte stream length and data will be saved as the proxy's
493
+*vmstate*. When the proxy is resumed on its new host, this *vmstate*
494
+will be extracted, and a secondary socket file descriptor will be sent
495
+to the new remote process through which it receives the *vmstate* in
496
+order to restore the devices there.
497
+
498
+device emulation in remote process
499
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
500
+
501
+The parts of QEMU that the emulation program will need include the
502
+object model; the memory emulation objects; the device emulation objects
503
+of the targeted device, and any dependent devices; and, the device's
504
+backends. It will also need code to setup the machine environment,
505
+handle requests from the QEMU process, and route machine-level requests
506
+(such as interrupts or IOMMU mappings) back to the QEMU process.
507
+
508
+initialization
509
+^^^^^^^^^^^^^^
510
+
511
+The process initialization sequence will follow the same sequence
512
+followed by QEMU. It will first initialize the backend objects, then
513
+device emulation objects. The JSON descriptions sent by the QEMU process
514
+will drive which objects need to be created.
515
+
516
+- address spaces
517
+
518
+Before the device objects are created, the initial address spaces and
519
+memory regions must be configured with ``memory_map_init()``. This
520
+creates a RAM memory region object (*system\_memory*) and an IO memory
521
+region object (*system\_io*).
522
+
523
+- RAM
524
+
525
+RAM memory region creation will follow how ``pc_memory_init()`` creates
526
+them, but must use ``memory_region_init_ram_from_fd()`` instead of
527
+``memory_region_allocate_system_memory()``. The file descriptors needed
528
+will be supplied by the guest memory table from above. Those RAM regions
529
+would then be added to the *system\_memory* memory region with
530
+``memory_region_add_subregion()``.
531
+
532
+- PCI
533
+
534
+IO initialization will be driven by the JSON descriptions sent from the
535
+QEMU process. For a PCI device, a PCI bus will need to be created with
536
+``pci_root_bus_new()``, and a PCI memory region will need to be created
537
+and added to the *system\_memory* memory region with
538
+``memory_region_add_subregion_overlap()``. The overlap version is
539
+required for architectures where PCI memory overlaps with RAM memory.
540
+
541
+MMIO handling
542
+^^^^^^^^^^^^^
543
+
544
+The device emulation objects will use ``memory_region_init_io()`` to
545
+install their MMIO handlers, and ``pci_register_bar()`` to associate
546
+those handlers with a PCI BAR, as they do within QEMU currently.
547
+
548
+In order to use ``address_space_rw()`` in the emulation process to
549
+handle MMIO requests from QEMU, the PCI physical addresses must be the
550
+same in the QEMU process and the device emulation process. In order to
551
+accomplish that, guest BAR programming must also be forwarded from QEMU
552
+to the emulation process.
553
+
554
+interrupt injection
555
+^^^^^^^^^^^^^^^^^^^
556
+
557
+When device emulation wants to inject an interrupt into the VM, the
558
+request climbs the device's bus object hierarchy until the point where a
559
+bus object knows how to signal the interrupt to the guest. The details
560
+depend on the type of interrupt being raised.
561
+
562
+- PCI pin interrupts
563
+
564
+On x86 systems, there is an emulated IOAPIC object attached to the root
565
+PCI bus object, and the root PCI object forwards interrupt requests to
566
+it. The IOAPIC object, in turn, calls the KVM driver to inject the
567
+corresponding interrupt into the VM. The simplest way to handle this in
568
+an emulation process would be to setup the root PCI bus driver (via
569
+``pci_bus_irqs()``) to send a interrupt request back to the QEMU
570
+process, and have the device proxy object reflect it up the PCI tree
571
+there.
572
+
573
+- PCI MSI/X interrupts
574
+
575
+PCI MSI/X interrupts are implemented in HW as DMA writes to a
576
+CPU-specific PCI address. In QEMU on x86, a KVM APIC object receives
577
+these DMA writes, then calls into the KVM driver to inject the interrupt
578
+into the VM. A simple emulation process implementation would be to send
579
+the MSI DMA address from QEMU as a message at initialization, then
580
+install an address space handler at that address which forwards the MSI
581
+message back to QEMU.
582
+
583
+DMA operations
584
+^^^^^^^^^^^^^^
585
+
586
+When a emulation object wants to DMA into or out of guest memory, it
587
+first must use dma\_memory\_map() to convert the DMA address to a local
588
+virtual address. The emulation process memory region objects setup above
589
+will be used to translate the DMA address to a local virtual address the
590
+device emulation code can access.
591
+
592
+IOMMU
593
+^^^^^
594
+
595
+When an IOMMU is in use in QEMU, DMA translation uses IOMMU memory
596
+regions to translate the DMA address to a guest physical address before
597
+that physical address can be translated to a local virtual address. The
598
+emulation process will need similar functionality.
599
+
600
+- IOTLB cache
601
+
602
+The emulation process will maintain a cache of recent IOMMU translations
603
+(the IOTLB). When the translate() callback of an IOMMU memory region is
604
+invoked, the IOTLB cache will be searched for an entry that will map the
605
+DMA address to a guest PA. On a cache miss, a message will be sent back
606
+to QEMU requesting the corresponding translation entry, which be both be
607
+used to return a guest address and be added to the cache.
608
+
609
+- IOTLB purge
610
+
611
+The IOMMU emulation will also need to act on unmap requests from QEMU.
612
+These happen when the guest IOMMU driver purges an entry from the
613
+guest's translation table.
614
+
615
+live migration
616
+^^^^^^^^^^^^^^
617
+
618
+When a remote process receives a live migration indication from QEMU, it
619
+will set up a channel using the received file descriptor with
620
+``qio_channel_socket_new_fd()``. This channel will be used to create a
621
+*QEMUfile* that can be passed to ``qemu_save_device_state()`` to send
622
+the process's device state back to QEMU. This method will be reversed on
623
+restore - the channel will be passed to ``qemu_loadvm_state()`` to
624
+restore the device state.
625
+
626
+Accelerating device emulation
627
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
628
+
629
+The messages that are required to be sent between QEMU and the emulation
630
+process can add considerable latency to IO operations. The optimizations
631
+described below attempt to ameliorate this effect by allowing the
632
+emulation process to communicate directly with the kernel KVM driver.
633
+The KVM file descriptors created would be passed to the emulation process
634
+via initialization messages, much like the guest memory table is done.
635
+#### MMIO acceleration
636
+
637
+Vhost user applications can receive guest virtio driver stores directly
638
+from KVM. The issue with the eventfd mechanism used by vhost user is
639
+that it does not pass any data with the event indication, so it cannot
640
+handle guest loads or guest stores that carry store data. This concept
641
+could, however, be expanded to cover more cases.
642
+
643
+The expanded idea would require a new type of KVM device:
644
+*KVM\_DEV\_TYPE\_USER*. This device has two file descriptors: a master
645
+descriptor that QEMU can use for configuration, and a slave descriptor
646
+that the emulation process can use to receive MMIO notifications. QEMU
647
+would create both descriptors using the KVM driver, and pass the slave
648
+descriptor to the emulation process via an initialization message.
649
+
650
+data structures
651
+^^^^^^^^^^^^^^^
652
+
653
+- guest physical range
654
+
655
+The guest physical range structure describes the address range that a
656
+device will respond to. It includes the base and length of the range, as
657
+well as which bus the range resides on (e.g., on an x86machine, it can
658
+specify whether the range refers to memory or IO addresses).
659
+
660
+A device can have multiple physical address ranges it responds to (e.g.,
661
+a PCI device can have multiple BARs), so the structure will also include
662
+an enumerated identifier to specify which of the device's ranges is
663
+being referred to.
664
+
665
++--------+----------------------------+
666
+| Name | Description |
667
++========+============================+
668
+| addr | range base address |
669
++--------+----------------------------+
670
+| len | range length |
671
++--------+----------------------------+
672
+| bus | addr type (memory or IO) |
673
++--------+----------------------------+
674
+| id | range ID (e.g., PCI BAR) |
675
++--------+----------------------------+
676
+
677
+- MMIO request structure
678
+
679
+This structure describes an MMIO operation. It includes which guest
680
+physical range the MMIO was within, the offset within that range, the
681
+MMIO type (e.g., load or store), and its length and data. It also
682
+includes a sequence number that can be used to reply to the MMIO, and
683
+the CPU that issued the MMIO.
684
+
685
++----------+------------------------+
686
+| Name | Description |
687
++==========+========================+
688
+| rid | range MMIO is within |
689
++----------+------------------------+
690
+| offset | offset withing *rid* |
691
++----------+------------------------+
692
+| type | e.g., load or store |
693
++----------+------------------------+
694
+| len | MMIO length |
695
++----------+------------------------+
696
+| data | store data |
697
++----------+------------------------+
698
+| seq | sequence ID |
699
++----------+------------------------+
700
+
701
+- MMIO request queues
702
+
703
+MMIO request queues are FIFO arrays of MMIO request structures. There
704
+are two queues: pending queue is for MMIOs that haven't been read by the
705
+emulation program, and the sent queue is for MMIOs that haven't been
706
+acknowledged. The main use of the second queue is to validate MMIO
707
+replies from the emulation program.
708
+
709
+- scoreboard
710
+
711
+Each CPU in the VM is emulated in QEMU by a separate thread, so multiple
712
+MMIOs may be waiting to be consumed by an emulation program and multiple
713
+threads may be waiting for MMIO replies. The scoreboard would contain a
714
+wait queue and sequence number for the per-CPU threads, allowing them to
715
+be individually woken when the MMIO reply is received from the emulation
716
+program. It also tracks the number of posted MMIO stores to the device
717
+that haven't been replied to, in order to satisfy the PCI constraint
718
+that a load to a device will not complete until all previous stores to
719
+that device have been completed.
720
+
721
+- device shadow memory
722
+
723
+Some MMIO loads do not have device side-effects. These MMIOs can be
724
+completed without sending a MMIO request to the emulation program if the
725
+emulation program shares a shadow image of the device's memory image
726
+with the KVM driver.
727
+
728
+The emulation program will ask the KVM driver to allocate memory for the
729
+shadow image, and will then use ``mmap()`` to directly access it. The
730
+emulation program can control KVM access to the shadow image by sending
731
+KVM an access map telling it which areas of the image have no
732
+side-effects (and can be completed immediately), and which require a
733
+MMIO request to the emulation program. The access map can also inform
734
+the KVM drive which size accesses are allowed to the image.
735
+
736
+master descriptor
737
+^^^^^^^^^^^^^^^^^
738
+
739
+The master descriptor is used by QEMU to configure the new KVM device.
740
+The descriptor would be returned by the KVM driver when QEMU issues a
741
+*KVM\_CREATE\_DEVICE* ``ioctl()`` with a *KVM\_DEV\_TYPE\_USER* type.
742
+
743
+KVM\_DEV\_TYPE\_USER device ops
744
+
745
+
746
+The *KVM\_DEV\_TYPE\_USER* operations vector will be registered by a
747
+``kvm_register_device_ops()`` call when the KVM system in initialized by
748
+``kvm_init()``. These device ops are called by the KVM driver when QEMU
749
+executes certain ``ioctl()`` operations on its KVM file descriptor. They
750
+include:
751
+
752
+- create
753
+
754
+This routine is called when QEMU issues a *KVM\_CREATE\_DEVICE*
755
+``ioctl()`` on its per-VM file descriptor. It will allocate and
756
+initialize a KVM user device specific data structure, and assign the
757
+*kvm\_device* private field to it.
758
+
759
+- ioctl
760
+
761
+This routine is invoked when QEMU issues an ``ioctl()`` on the master
762
+descriptor. The ``ioctl()`` commands supported are defined by the KVM
763
+device type. *KVM\_DEV\_TYPE\_USER* ones will need several commands:
764
+
765
+*KVM\_DEV\_USER\_SLAVE\_FD* creates the slave file descriptor that will
766
+be passed to the device emulation program. Only one slave can be created
767
+by each master descriptor. The file operations performed by this
768
+descriptor are described below.
769
+
770
+The *KVM\_DEV\_USER\_PA\_RANGE* command configures a guest physical
771
+address range that the slave descriptor will receive MMIO notifications
772
+for. The range is specified by a guest physical range structure
773
+argument. For buses that assign addresses to devices dynamically, this
774
+command can be executed while the guest is running, such as the case
775
+when a guest changes a device's PCI BAR registers.
776
+
777
+*KVM\_DEV\_USER\_PA\_RANGE* will use ``kvm_io_bus_register_dev()`` to
778
+register *kvm\_io\_device\_ops* callbacks to be invoked when the guest
779
+performs a MMIO operation within the range. When a range is changed,
780
+``kvm_io_bus_unregister_dev()`` is used to remove the previous
781
+instantiation.
782
+
783
+*KVM\_DEV\_USER\_TIMEOUT* will configure a timeout value that specifies
784
+how long KVM will wait for the emulation process to respond to a MMIO
785
+indication.
786
+
787
+- destroy
788
+
789
+This routine is called when the VM instance is destroyed. It will need
790
+to destroy the slave descriptor; and free any memory allocated by the
791
+driver, as well as the *kvm\_device* structure itself.
792
+
793
+slave descriptor
794
+^^^^^^^^^^^^^^^^
795
+
796
+The slave descriptor will have its own file operations vector, which
797
+responds to system calls on the descriptor performed by the device
798
+emulation program.
799
+
800
+- read
801
+
802
+A read returns any pending MMIO requests from the KVM driver as MMIO
803
+request structures. Multiple structures can be returned if there are
804
+multiple MMIO operations pending. The MMIO requests are moved from the
805
+pending queue to the sent queue, and if there are threads waiting for
806
+space in the pending to add new MMIO operations, they will be woken
807
+here.
808
+
809
+- write
810
+
811
+A write also consists of a set of MMIO requests. They are compared to
812
+the MMIO requests in the sent queue. Matches are removed from the sent
813
+queue, and any threads waiting for the reply are woken. If a store is
814
+removed, then the number of posted stores in the per-CPU scoreboard is
815
+decremented. When the number is zero, and a non side-effect load was
816
+waiting for posted stores to complete, the load is continued.
817
+
818
+- ioctl
819
+
820
+There are several ioctl()s that can be performed on the slave
821
+descriptor.
822
+
823
+A *KVM\_DEV\_USER\_SHADOW\_SIZE* ``ioctl()`` causes the KVM driver to
824
+allocate memory for the shadow image. This memory can later be
825
+``mmap()``\ ed by the emulation process to share the emulation's view of
826
+device memory with the KVM driver.
827
+
828
+A *KVM\_DEV\_USER\_SHADOW\_CTRL* ``ioctl()`` controls access to the
829
+shadow image. It will send the KVM driver a shadow control map, which
830
+specifies which areas of the image can complete guest loads without
831
+sending the load request to the emulation program. It will also specify
832
+the size of load operations that are allowed.
833
+
834
+- poll
835
+
836
+An emulation program will use the ``poll()`` call with a *POLLIN* flag
837
+to determine if there are MMIO requests waiting to be read. It will
838
+return if the pending MMIO request queue is not empty.
839
+
840
+- mmap
841
+
842
+This call allows the emulation program to directly access the shadow
843
+image allocated by the KVM driver. As device emulation updates device
844
+memory, changes with no side-effects will be reflected in the shadow,
845
+and the KVM driver can satisfy guest loads from the shadow image without
846
+needing to wait for the emulation program.
847
+
848
+kvm\_io\_device ops
849
+^^^^^^^^^^^^^^^^^^^
850
+
851
+Each KVM per-CPU thread can handle MMIO operation on behalf of the guest
852
+VM. KVM will use the MMIO's guest physical address to search for a
853
+matching *kvm\_io\_device* to see if the MMIO can be handled by the KVM
854
+driver instead of exiting back to QEMU. If a match is found, the
855
+corresponding callback will be invoked.
856
+
857
+- read
858
+
859
+This callback is invoked when the guest performs a load to the device.
860
+Loads with side-effects must be handled synchronously, with the KVM
861
+driver putting the QEMU thread to sleep waiting for the emulation
862
+process reply before re-starting the guest. Loads that do not have
863
+side-effects may be optimized by satisfying them from the shadow image,
864
+if there are no outstanding stores to the device by this CPU. PCI memory
865
+ordering demands that a load cannot complete before all older stores to
866
+the same device have been completed.
867
+
868
+- write
869
+
870
+Stores can be handled asynchronously unless the pending MMIO request
871
+queue is full. In this case, the QEMU thread must sleep waiting for
872
+space in the queue. Stores will increment the number of posted stores in
873
+the per-CPU scoreboard, in order to implement the PCI ordering
874
+constraint above.
875
+
876
+interrupt acceleration
877
+^^^^^^^^^^^^^^^^^^^^^^
878
+
879
+This performance optimization would work much like a vhost user
880
+application does, where the QEMU process sets up *eventfds* that cause
881
+the device's corresponding interrupt to be triggered by the KVM driver.
882
+These irq file descriptors are sent to the emulation process at
883
+initialization, and are used when the emulation code raises a device
884
+interrupt.
885
+
886
+intx acceleration
887
+'''''''''''''''''
888
+
889
+Traditional PCI pin interrupts are level based, so, in addition to an
890
+irq file descriptor, a re-sampling file descriptor needs to be sent to
891
+the emulation program. This second file descriptor allows multiple
892
+devices sharing an irq to be notified when the interrupt has been
893
+acknowledged by the guest, so they can re-trigger the interrupt if their
894
+device has not de-asserted its interrupt.
895
+
896
+intx irq descriptor
897
+
898
+
899
+The irq descriptors are created by the proxy object
900
+``using event_notifier_init()`` to create the irq and re-sampling
901
+*eventds*, and ``kvm_vm_ioctl(KVM_IRQFD)`` to bind them to an interrupt.
902
+The interrupt route can be found with
903
+``pci_device_route_intx_to_irq()``.
904
+
905
+intx routing changes
906
+
907
+
908
+Intx routing can be changed when the guest programs the APIC the device
909
+pin is connected to. The proxy object in QEMU will use
910
+``pci_device_set_intx_routing_notifier()`` to be informed of any guest
911
+changes to the route. This handler will broadly follow the VFIO
912
+interrupt logic to change the route: de-assigning the existing irq
913
+descriptor from its route, then assigning it the new route. (see
914
+``vfio_intx_update()``)
915
+
916
+MSI/X acceleration
917
+''''''''''''''''''
918
+
919
+MSI/X interrupts are sent as DMA transactions to the host. The interrupt
920
+data contains a vector that is programmed by the guest, A device may have
921
+multiple MSI interrupts associated with it, so multiple irq descriptors
922
+may need to be sent to the emulation program.
923
+
924
+MSI/X irq descriptor
925
+
926
+
927
+This case will also follow the VFIO example. For each MSI/X interrupt,
928
+an *eventfd* is created, a virtual interrupt is allocated by
929
+``kvm_irqchip_add_msi_route()``, and the virtual interrupt is bound to
930
+the eventfd with ``kvm_irqchip_add_irqfd_notifier()``.
931
+
932
+MSI/X config space changes
933
+
934
+
935
+The guest may dynamically update several MSI-related tables in the
936
+device's PCI config space. These include per-MSI interrupt enables and
937
+vector data. Additionally, MSIX tables exist in device memory space, not
938
+config space. Much like the BAR case above, the proxy object must look
939
+at guest config space programming to keep the MSI interrupt state
940
+consistent between QEMU and the emulation program.
941
+
942
+--------------
943
+
944
+Disaggregated CPU emulation
945
+---------------------------
946
+
947
+After IO services have been disaggregated, a second phase would be to
948
+separate a process to handle CPU instruction emulation from the main
949
+QEMU control function. There are no object separation points for this
950
+code, so the first task would be to create one.
951
+
952
+Host access controls
953
+--------------------
954
+
955
+Separating QEMU relies on the host OS's access restriction mechanisms to
956
+enforce that the differing processes can only access the objects they
957
+are entitled to. There are a couple types of mechanisms usually provided
958
+by general purpose OSs.
959
+
960
+Discretionary access control
961
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
962
+
963
+Discretionary access control allows each user to control who can access
964
+their files. In Linux, this type of control is usually too coarse for
965
+QEMU separation, since it only provides three separate access controls:
966
+one for the same user ID, the second for users IDs with the same group
967
+ID, and the third for all other user IDs. Each device instance would
968
+need a separate user ID to provide access control, which is likely to be
969
+unwieldy for dynamically created VMs.
970
+
971
+Mandatory access control
972
+~~~~~~~~~~~~~~~~~~~~~~~~
973
+
974
+Mandatory access control allows the OS to add an additional set of
975
+controls on top of discretionary access for the OS to control. It also
976
+adds other attributes to processes and files such as types, roles, and
977
+categories, and can establish rules for how processes and files can
978
+interact.
979
+
980
+Type enforcement
981
+^^^^^^^^^^^^^^^^
982
+
983
+Type enforcement assigns a *type* attribute to processes and files, and
984
+allows rules to be written on what operations a process with a given
985
+type can perform on a file with a given type. QEMU separation could take
986
+advantage of type enforcement by running the emulation processes with
987
+different types, both from the main QEMU process, and from the emulation
988
+processes of different classes of devices.
989
+
990
+For example, guest disk images and disk emulation processes could have
991
+types separate from the main QEMU process and non-disk emulation
992
+processes, and the type rules could prevent processes other than disk
993
+emulation ones from accessing guest disk images. Similarly, network
994
+emulation processes can have a type separate from the main QEMU process
995
+and non-network emulation process, and only that type can access the
996
+host tun/tap device used to provide guest networking.
997
+
998
+Category enforcement
999
+^^^^^^^^^^^^^^^^^^^^
1000
+
1001
+Category enforcement assigns a set of numbers within a given range to
1002
+the process or file. The process is granted access to the file if the
1003
+process's set is a superset of the file's set. This enforcement can be
1004
+used to separate multiple instances of devices in the same class.
1005
+
1006
+For example, if there are multiple disk devices provides to a guest,
1007
+each device emulation process could be provisioned with a separate
1008
+category. The different device emulation processes would not be able to
1009
+access each other's backing disk images.
1010
+
1011
+Alternatively, categories could be used in lieu of the type enforcement
1012
+scheme described above. In this scenario, different categories would be
1013
+used to prevent device emulation processes in different classes from
1014
+accessing resources assigned to other classes.
62
--
1015
--
63
2.26.2
1016
2.29.2
64
1017
diff view generated by jsdifflib
1
Block exports are used by softmmu, qemu-storage-daemon, and qemu-nbd.
1
From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
2
They are not used by other programs and are not otherwise needed in
3
libblock.
4
2
5
Undo the recent move of blockdev-nbd.c from blockdev_ss into block_ss.
3
Adds documentation explaining the command-line arguments needed
6
Since bdrv_close_all() (libblock) calls blk_exp_close_all()
4
to use multi-process.
7
(libblockdev) a stub function is required..
8
5
9
Make qemu-nbd.c use signal handling utility functions instead of
6
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
10
duplicating the code. This helps because os-posix.c is in libblockdev
7
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
11
and it depends on a qemu_system_killed() symbol that qemu-nbd.c lacks.
8
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
12
Once we use the signal handling utility functions we also end up
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
providing the necessary symbol.
10
Message-id: 49f757a84e5dd6fae14b22544897d1124c5fdbad.1611938319.git.jag.raman@oracle.com
11
12
[Move orphan docs/multi-process.rst document into docs/system/ and add
13
it to index.rst to prevent Sphinx "document isn't included in any
14
toctree" error.
15
--Stefan]
14
16
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
16
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
17
Reviewed-by: Eric Blake <eblake@redhat.com>
18
Message-id: 20200929125516.186715-4-stefanha@redhat.com
19
[Fixed s/ndb/nbd/ typo in commit description as suggested by Eric Blake
20
--Stefan]
21
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
22
---
18
---
23
qemu-nbd.c | 21 ++++++++-------------
19
MAINTAINERS | 1 +
24
stubs/blk-exp-close-all.c | 7 +++++++
20
docs/system/index.rst | 1 +
25
block/export/meson.build | 4 ++--
21
docs/system/multi-process.rst | 64 +++++++++++++++++++++++++++++++++++
26
meson.build | 4 ++--
22
3 files changed, 66 insertions(+)
27
nbd/meson.build | 2 ++
23
create mode 100644 docs/system/multi-process.rst
28
stubs/meson.build | 1 +
29
6 files changed, 22 insertions(+), 17 deletions(-)
30
create mode 100644 stubs/blk-exp-close-all.c
31
24
32
diff --git a/qemu-nbd.c b/qemu-nbd.c
25
diff --git a/MAINTAINERS b/MAINTAINERS
33
index XXXXXXX..XXXXXXX 100644
26
index XXXXXXX..XXXXXXX 100644
34
--- a/qemu-nbd.c
27
--- a/MAINTAINERS
35
+++ b/qemu-nbd.c
28
+++ b/MAINTAINERS
36
@@ -XXX,XX +XXX,XX @@
29
@@ -XXX,XX +XXX,XX @@ M: Jagannathan Raman <jag.raman@oracle.com>
37
#include "qapi/error.h"
30
M: John G Johnson <john.g.johnson@oracle.com>
38
#include "qemu/cutils.h"
31
S: Maintained
39
#include "sysemu/block-backend.h"
32
F: docs/devel/multi-process.rst
40
+#include "sysemu/runstate.h" /* for qemu_system_killed() prototype */
33
+F: docs/system/multi-process.rst
41
#include "block/block_int.h"
34
42
#include "block/nbd.h"
35
Build and test automation
43
#include "qemu/main-loop.h"
36
-------------------------
44
@@ -XXX,XX +XXX,XX @@ QEMU_COPYRIGHT "\n"
37
diff --git a/docs/system/index.rst b/docs/system/index.rst
45
}
38
index XXXXXXX..XXXXXXX 100644
46
39
--- a/docs/system/index.rst
47
#ifdef CONFIG_POSIX
40
+++ b/docs/system/index.rst
48
-static void termsig_handler(int signum)
41
@@ -XXX,XX +XXX,XX @@ Contents:
49
+/*
42
pr-manager
50
+ * The client thread uses SIGTERM to interrupt the server. A signal
43
targets
51
+ * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
44
security
52
+ */
45
+ multi-process
53
+void qemu_system_killed(int signum, pid_t pid)
46
deprecated
54
{
47
removed-features
55
qatomic_cmpxchg(&state, RUNNING, TERMINATE);
48
build-platforms
56
qemu_notify_event();
49
diff --git a/docs/system/multi-process.rst b/docs/system/multi-process.rst
57
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
58
BlockExportOptions *export_opts;
59
60
#ifdef CONFIG_POSIX
61
- /*
62
- * Exit gracefully on various signals, which includes SIGTERM used
63
- * by 'qemu-nbd -v -c'.
64
- */
65
- struct sigaction sa_sigterm;
66
- memset(&sa_sigterm, 0, sizeof(sa_sigterm));
67
- sa_sigterm.sa_handler = termsig_handler;
68
- sigaction(SIGTERM, &sa_sigterm, NULL);
69
- sigaction(SIGINT, &sa_sigterm, NULL);
70
- sigaction(SIGHUP, &sa_sigterm, NULL);
71
-
72
- signal(SIGPIPE, SIG_IGN);
73
+ os_setup_early_signal_handling();
74
+ os_setup_signal_handling();
75
#endif
76
77
socket_init();
78
diff --git a/stubs/blk-exp-close-all.c b/stubs/blk-exp-close-all.c
79
new file mode 100644
50
new file mode 100644
80
index XXXXXXX..XXXXXXX
51
index XXXXXXX..XXXXXXX
81
--- /dev/null
52
--- /dev/null
82
+++ b/stubs/blk-exp-close-all.c
53
+++ b/docs/system/multi-process.rst
83
@@ -XXX,XX +XXX,XX @@
54
@@ -XXX,XX +XXX,XX @@
84
+#include "qemu/osdep.h"
55
+Multi-process QEMU
85
+#include "block/export.h"
56
+==================
86
+
57
+
87
+/* Only used in programs that support block exports (libblockdev.fa) */
58
+This document describes how to configure and use multi-process qemu.
88
+void blk_exp_close_all(void)
59
+For the design document refer to docs/devel/qemu-multiprocess.
89
+{
60
+
90
+}
61
+1) Configuration
91
diff --git a/block/export/meson.build b/block/export/meson.build
62
+----------------
92
index XXXXXXX..XXXXXXX 100644
63
+
93
--- a/block/export/meson.build
64
+multi-process is enabled by default for targets that enable KVM
94
+++ b/block/export/meson.build
65
+
95
@@ -XXX,XX +XXX,XX @@
66
+
96
-block_ss.add(files('export.c'))
67
+2) Usage
97
-block_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
68
+--------
98
+blockdev_ss.add(files('export.c'))
69
+
99
+blockdev_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
70
+Multi-process QEMU requires an orchestrator to launch.
100
diff --git a/meson.build b/meson.build
71
+
101
index XXXXXXX..XXXXXXX 100644
72
+Following is a description of command-line used to launch mpqemu.
102
--- a/meson.build
73
+
103
+++ b/meson.build
74
+* Orchestrator:
104
@@ -XXX,XX +XXX,XX @@ subdir('dump')
75
+
105
76
+ - The Orchestrator creates a unix socketpair
106
block_ss.add(files(
77
+
107
'block.c',
78
+ - It launches the remote process and passes one of the
108
- 'blockdev-nbd.c',
79
+ sockets to it via command-line.
109
'blockjob.c',
80
+
110
'job.c',
81
+ - It then launches QEMU and specifies the other socket as an option
111
'qemu-io-cmds.c',
82
+ to the Proxy device object
112
@@ -XXX,XX +XXX,XX @@ subdir('block')
83
+
113
84
+* Remote Process:
114
blockdev_ss.add(files(
85
+
115
'blockdev.c',
86
+ - QEMU can enter remote process mode by using the "remote" machine
116
+ 'blockdev-nbd.c',
87
+ option.
117
'iothread.c',
88
+
118
'job-qmp.c',
89
+ - The orchestrator creates a "remote-object" with details about
119
))
90
+ the device and the file descriptor for the device
120
@@ -XXX,XX +XXX,XX @@ if have_tools
91
+
121
qemu_io = executable('qemu-io', files('qemu-io.c'),
92
+ - The remaining options are no different from how one launches QEMU with
122
dependencies: [block, qemuutil], install: true)
93
+ devices.
123
qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'),
94
+
124
- dependencies: [block, qemuutil], install: true)
95
+ - Example command-line for the remote process is as follows:
125
+ dependencies: [blockdev, qemuutil], install: true)
96
+
126
97
+ /usr/bin/qemu-system-x86_64 \
127
subdir('storage-daemon')
98
+ -machine x-remote \
128
subdir('contrib/rdmacm-mux')
99
+ -device lsi53c895a,id=lsi0 \
129
diff --git a/nbd/meson.build b/nbd/meson.build
100
+ -drive id=drive_image2,file=/build/ol7-nvme-test-1.qcow2 \
130
index XXXXXXX..XXXXXXX 100644
101
+ -device scsi-hd,id=drive2,drive=drive_image2,bus=lsi0.0,scsi-id=0 \
131
--- a/nbd/meson.build
102
+ -object x-remote-object,id=robj1,devid=lsi1,fd=4,
132
+++ b/nbd/meson.build
103
+
133
@@ -XXX,XX +XXX,XX @@
104
+* QEMU:
134
block_ss.add(files(
105
+
135
'client.c',
106
+ - Since parts of the RAM are shared between QEMU & remote process, a
136
'common.c',
107
+ memory-backend-memfd is required to facilitate this, as follows:
137
+))
108
+
138
+blockdev_ss.add(files(
109
+ -object memory-backend-memfd,id=mem,size=2G
139
'server.c',
110
+
140
))
111
+ - A "x-pci-proxy-dev" device is created for each of the PCI devices emulated
141
diff --git a/stubs/meson.build b/stubs/meson.build
112
+ in the remote process. A "socket" sub-option specifies the other end of
142
index XXXXXXX..XXXXXXX 100644
113
+ unix channel created by orchestrator. The "id" sub-option must be specified
143
--- a/stubs/meson.build
114
+ and should be the same as the "id" specified for the remote PCI device
144
+++ b/stubs/meson.build
115
+
145
@@ -XXX,XX +XXX,XX @@
116
+ - Example commandline for QEMU is as follows:
146
stub_ss.add(files('arch_type.c'))
117
+
147
stub_ss.add(files('bdrv-next-monitor-owned.c'))
118
+ -device x-pci-proxy-dev,id=lsi0,socket=3
148
stub_ss.add(files('blk-commit-all.c'))
149
+stub_ss.add(files('blk-exp-close-all.c'))
150
stub_ss.add(files('blockdev-close-all-bdrv-states.c'))
151
stub_ss.add(files('change-state-handler.c'))
152
stub_ss.add(files('cmos.c'))
153
--
119
--
154
2.26.2
120
2.29.2
155
121
diff view generated by jsdifflib
1
From: Coiby Xu <coiby.xu@gmail.com>
1
From: Jagannathan Raman <jag.raman@oracle.com>
2
2
3
Allow vu_message_read to be replaced by one which will make use of the
3
Allow RAM MemoryRegion to be created from an offset in a file, instead
4
QIOChannel functions. Thus reading vhost-user message won't stall the
4
of allocating at offset of 0 by default. This is needed to synchronize
5
guest. For slave channel, we still use the default vu_message_read.
5
RAM between QEMU & remote process.
6
6
7
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
7
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
8
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
8
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
9
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Message-id: 20200918080912.321299-2-coiby.xu@gmail.com
11
Message-id: 609996697ad8617e3b01df38accc5c208c24d74e.1611938319.git.jag.raman@oracle.com
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
13
---
13
contrib/libvhost-user/libvhost-user.h | 21 +++++++++++++++++++++
14
include/exec/memory.h | 2 ++
14
contrib/libvhost-user/libvhost-user-glib.c | 2 +-
15
include/exec/ram_addr.h | 4 ++--
15
contrib/libvhost-user/libvhost-user.c | 14 +++++++-------
16
include/qemu/mmap-alloc.h | 4 +++-
16
tests/vhost-user-bridge.c | 2 ++
17
backends/hostmem-memfd.c | 2 +-
17
tools/virtiofsd/fuse_virtio.c | 4 ++--
18
hw/misc/ivshmem.c | 3 ++-
18
5 files changed, 33 insertions(+), 10 deletions(-)
19
softmmu/memory.c | 3 ++-
19
20
softmmu/physmem.c | 12 +++++++-----
20
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
21
util/mmap-alloc.c | 8 +++++---
21
index XXXXXXX..XXXXXXX 100644
22
util/oslib-posix.c | 2 +-
22
--- a/contrib/libvhost-user/libvhost-user.h
23
9 files changed, 25 insertions(+), 15 deletions(-)
23
+++ b/contrib/libvhost-user/libvhost-user.h
24
24
@@ -XXX,XX +XXX,XX @@
25
diff --git a/include/exec/memory.h b/include/exec/memory.h
25
*/
26
index XXXXXXX..XXXXXXX 100644
26
#define VHOST_USER_MAX_RAM_SLOTS 32
27
--- a/include/exec/memory.h
27
28
+++ b/include/exec/memory.h
28
+#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
29
@@ -XXX,XX +XXX,XX @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
29
+
30
* @size: size of the region.
30
typedef enum VhostSetConfigType {
31
* @share: %true if memory must be mmaped with the MAP_SHARED flag
31
VHOST_SET_CONFIG_TYPE_MASTER = 0,
32
* @fd: the fd to mmap.
32
VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
33
+ * @offset: offset within the file referenced by fd
33
@@ -XXX,XX +XXX,XX @@ typedef uint64_t (*vu_get_features_cb) (VuDev *dev);
34
* @errp: pointer to Error*, to store an error if it happens.
34
typedef void (*vu_set_features_cb) (VuDev *dev, uint64_t features);
35
*
35
typedef int (*vu_process_msg_cb) (VuDev *dev, VhostUserMsg *vmsg,
36
* Note that this function does not do anything to cause the data in the
36
int *do_reply);
37
@@ -XXX,XX +XXX,XX @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
37
+typedef bool (*vu_read_msg_cb) (VuDev *dev, int sock, VhostUserMsg *vmsg);
38
uint64_t size,
38
typedef void (*vu_queue_set_started_cb) (VuDev *dev, int qidx, bool started);
39
bool share,
39
typedef bool (*vu_queue_is_processed_in_order_cb) (VuDev *dev, int qidx);
40
int fd,
40
typedef int (*vu_get_config_cb) (VuDev *dev, uint8_t *config, uint32_t len);
41
+ ram_addr_t offset,
41
@@ -XXX,XX +XXX,XX @@ struct VuDev {
42
Error **errp);
42
bool broken;
43
#endif
43
uint16_t max_queues;
44
44
45
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
45
+ /* @read_msg: custom method to read vhost-user message
46
index XXXXXXX..XXXXXXX 100644
46
+ *
47
--- a/include/exec/ram_addr.h
47
+ * Read data from vhost_user socket fd and fill up
48
+++ b/include/exec/ram_addr.h
48
+ * the passed VhostUserMsg *vmsg struct.
49
@@ -XXX,XX +XXX,XX @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
49
+ *
50
uint32_t ram_flags, const char *mem_path,
50
+ * If reading fails, it should close the received set of file
51
bool readonly, Error **errp);
51
+ * descriptors as socket message's auxiliary data.
52
RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
52
+ *
53
- uint32_t ram_flags, int fd, bool readonly,
53
+ * For the details, please refer to vu_message_read in libvhost-user.c
54
- Error **errp);
54
+ * which will be used by default if not custom method is provided when
55
+ uint32_t ram_flags, int fd, off_t offset,
55
+ * calling vu_init
56
+ bool readonly, Error **errp);
56
+ *
57
57
+ * Returns: true if vhost-user message successfully received,
58
RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
58
+ * otherwise return false.
59
MemoryRegion *mr, Error **errp);
59
+ *
60
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
60
+ */
61
index XXXXXXX..XXXXXXX 100644
61
+ vu_read_msg_cb read_msg;
62
--- a/include/qemu/mmap-alloc.h
62
/* @set_watch: add or update the given fd to the watch set,
63
+++ b/include/qemu/mmap-alloc.h
63
* call cb when condition is met */
64
@@ -XXX,XX +XXX,XX @@ size_t qemu_mempath_getpagesize(const char *mem_path);
64
vu_set_watch_cb set_watch;
65
* @readonly: true for a read-only mapping, false for read/write.
65
@@ -XXX,XX +XXX,XX @@ bool vu_init(VuDev *dev,
66
* @shared: map has RAM_SHARED flag.
66
uint16_t max_queues,
67
* @is_pmem: map has RAM_PMEM flag.
67
int socket,
68
+ * @map_offset: map starts at offset of map_offset from the start of fd
68
vu_panic_cb panic,
69
*
69
+ vu_read_msg_cb read_msg,
70
* Return:
70
vu_set_watch_cb set_watch,
71
* On success, return a pointer to the mapped area.
71
vu_remove_watch_cb remove_watch,
72
@@ -XXX,XX +XXX,XX @@ void *qemu_ram_mmap(int fd,
72
const VuDevIface *iface);
73
size_t align,
73
diff --git a/contrib/libvhost-user/libvhost-user-glib.c b/contrib/libvhost-user/libvhost-user-glib.c
74
bool readonly,
74
index XXXXXXX..XXXXXXX 100644
75
bool shared,
75
--- a/contrib/libvhost-user/libvhost-user-glib.c
76
- bool is_pmem);
76
+++ b/contrib/libvhost-user/libvhost-user-glib.c
77
+ bool is_pmem,
77
@@ -XXX,XX +XXX,XX @@ vug_init(VugDev *dev, uint16_t max_queues, int socket,
78
+ off_t map_offset);
78
g_assert(dev);
79
79
g_assert(iface);
80
void qemu_ram_munmap(int fd, void *ptr, size_t size);
80
81
81
- if (!vu_init(&dev->parent, max_queues, socket, panic, set_watch,
82
diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
82
+ if (!vu_init(&dev->parent, max_queues, socket, panic, NULL, set_watch,
83
index XXXXXXX..XXXXXXX 100644
83
remove_watch, iface)) {
84
--- a/backends/hostmem-memfd.c
84
return false;
85
+++ b/backends/hostmem-memfd.c
86
@@ -XXX,XX +XXX,XX @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
87
name = host_memory_backend_get_name(backend);
88
memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend),
89
name, backend->size,
90
- backend->share, fd, errp);
91
+ backend->share, fd, 0, errp);
92
g_free(name);
93
}
94
95
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
96
index XXXXXXX..XXXXXXX 100644
97
--- a/hw/misc/ivshmem.c
98
+++ b/hw/misc/ivshmem.c
99
@@ -XXX,XX +XXX,XX @@ static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
100
101
/* mmap the region and map into the BAR2 */
102
memory_region_init_ram_from_fd(&s->server_bar2, OBJECT(s),
103
- "ivshmem.bar2", size, true, fd, &local_err);
104
+ "ivshmem.bar2", size, true, fd, 0,
105
+ &local_err);
106
if (local_err) {
107
error_propagate(errp, local_err);
108
return;
109
diff --git a/softmmu/memory.c b/softmmu/memory.c
110
index XXXXXXX..XXXXXXX 100644
111
--- a/softmmu/memory.c
112
+++ b/softmmu/memory.c
113
@@ -XXX,XX +XXX,XX @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
114
uint64_t size,
115
bool share,
116
int fd,
117
+ ram_addr_t offset,
118
Error **errp)
119
{
120
Error *err = NULL;
121
@@ -XXX,XX +XXX,XX @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
122
mr->destructor = memory_region_destructor_ram;
123
mr->ram_block = qemu_ram_alloc_from_fd(size, mr,
124
share ? RAM_SHARED : 0,
125
- fd, false, &err);
126
+ fd, offset, false, &err);
127
if (err) {
128
mr->size = int128_zero();
129
object_unparent(OBJECT(mr));
130
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
131
index XXXXXXX..XXXXXXX 100644
132
--- a/softmmu/physmem.c
133
+++ b/softmmu/physmem.c
134
@@ -XXX,XX +XXX,XX @@ static void *file_ram_alloc(RAMBlock *block,
135
int fd,
136
bool readonly,
137
bool truncate,
138
+ off_t offset,
139
Error **errp)
140
{
141
void *area;
142
@@ -XXX,XX +XXX,XX @@ static void *file_ram_alloc(RAMBlock *block,
85
}
143
}
86
diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
144
87
index XXXXXXX..XXXXXXX 100644
145
area = qemu_ram_mmap(fd, memory, block->mr->align, readonly,
88
--- a/contrib/libvhost-user/libvhost-user.c
146
- block->flags & RAM_SHARED, block->flags & RAM_PMEM);
89
+++ b/contrib/libvhost-user/libvhost-user.c
147
+ block->flags & RAM_SHARED, block->flags & RAM_PMEM,
90
@@ -XXX,XX +XXX,XX @@
148
+ offset);
91
/* The version of inflight buffer */
149
if (area == MAP_FAILED) {
92
#define INFLIGHT_VERSION 1
150
error_setg_errno(errp, errno,
93
151
"unable to map backing store for guest RAM");
94
-#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
152
@@ -XXX,XX +XXX,XX @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
95
-
153
96
/* The version of the protocol we support */
154
#ifdef CONFIG_POSIX
97
#define VHOST_USER_VERSION 1
155
RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
98
#define LIBVHOST_USER_DEBUG 0
156
- uint32_t ram_flags, int fd, bool readonly,
99
@@ -XXX,XX +XXX,XX @@ have_userfault(void)
157
- Error **errp)
100
}
158
+ uint32_t ram_flags, int fd, off_t offset,
101
159
+ bool readonly, Error **errp)
102
static bool
160
{
103
-vu_message_read(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
161
RAMBlock *new_block;
104
+vu_message_read_default(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
162
Error *local_err = NULL;
105
{
163
@@ -XXX,XX +XXX,XX @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
106
char control[CMSG_SPACE(VHOST_MEMORY_BASELINE_NREGIONS * sizeof(int))] = {};
164
new_block->max_length = size;
107
struct iovec iov = {
165
new_block->flags = ram_flags;
108
@@ -XXX,XX +XXX,XX @@ vu_process_message_reply(VuDev *dev, const VhostUserMsg *vmsg)
166
new_block->host = file_ram_alloc(new_block, size, fd, readonly,
109
goto out;
167
- !file_size, errp);
168
+ !file_size, offset, errp);
169
if (!new_block->host) {
170
g_free(new_block);
171
return NULL;
172
@@ -XXX,XX +XXX,XX @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
173
return NULL;
110
}
174
}
111
175
112
- if (!vu_message_read(dev, dev->slave_fd, &msg_reply)) {
176
- block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, readonly, errp);
113
+ if (!vu_message_read_default(dev, dev->slave_fd, &msg_reply)) {
177
+ block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, 0, readonly, errp);
114
goto out;
178
if (!block) {
179
if (created) {
180
unlink(mem_path);
181
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
182
index XXXXXXX..XXXXXXX 100644
183
--- a/util/mmap-alloc.c
184
+++ b/util/mmap-alloc.c
185
@@ -XXX,XX +XXX,XX @@ void *qemu_ram_mmap(int fd,
186
size_t align,
187
bool readonly,
188
bool shared,
189
- bool is_pmem)
190
+ bool is_pmem,
191
+ off_t map_offset)
192
{
193
int prot;
194
int flags;
195
@@ -XXX,XX +XXX,XX @@ void *qemu_ram_mmap(int fd,
196
197
prot = PROT_READ | (readonly ? 0 : PROT_WRITE);
198
199
- ptr = mmap(guardptr + offset, size, prot, flags | map_sync_flags, fd, 0);
200
+ ptr = mmap(guardptr + offset, size, prot,
201
+ flags | map_sync_flags, fd, map_offset);
202
203
if (ptr == MAP_FAILED && map_sync_flags) {
204
if (errno == ENOTSUP) {
205
@@ -XXX,XX +XXX,XX @@ void *qemu_ram_mmap(int fd,
206
* if map failed with MAP_SHARED_VALIDATE | MAP_SYNC,
207
* we will remove these flags to handle compatibility.
208
*/
209
- ptr = mmap(guardptr + offset, size, prot, flags, fd, 0);
210
+ ptr = mmap(guardptr + offset, size, prot, flags, fd, map_offset);
115
}
211
}
116
212
117
@@ -XXX,XX +XXX,XX @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
213
if (ptr == MAP_FAILED) {
118
/* Wait for QEMU to confirm that it's registered the handler for the
214
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
119
* faults.
215
index XXXXXXX..XXXXXXX 100644
120
*/
216
--- a/util/oslib-posix.c
121
- if (!vu_message_read(dev, dev->sock, vmsg) ||
217
+++ b/util/oslib-posix.c
122
+ if (!dev->read_msg(dev, dev->sock, vmsg) ||
218
@@ -XXX,XX +XXX,XX @@ void *qemu_memalign(size_t alignment, size_t size)
123
vmsg->size != sizeof(vmsg->payload.u64) ||
219
void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
124
vmsg->payload.u64 != 0) {
220
{
125
vu_panic(dev, "failed to receive valid ack for postcopy set-mem-table");
221
size_t align = QEMU_VMALLOC_ALIGN;
126
@@ -XXX,XX +XXX,XX @@ vu_dispatch(VuDev *dev)
222
- void *ptr = qemu_ram_mmap(-1, size, align, false, shared, false);
127
int reply_requested;
223
+ void *ptr = qemu_ram_mmap(-1, size, align, false, shared, false, 0);
128
bool need_reply, success = false;
224
129
225
if (ptr == MAP_FAILED) {
130
- if (!vu_message_read(dev, dev->sock, &vmsg)) {
226
return NULL;
131
+ if (!dev->read_msg(dev, dev->sock, &vmsg)) {
132
goto end;
133
}
134
135
@@ -XXX,XX +XXX,XX @@ vu_init(VuDev *dev,
136
uint16_t max_queues,
137
int socket,
138
vu_panic_cb panic,
139
+ vu_read_msg_cb read_msg,
140
vu_set_watch_cb set_watch,
141
vu_remove_watch_cb remove_watch,
142
const VuDevIface *iface)
143
@@ -XXX,XX +XXX,XX @@ vu_init(VuDev *dev,
144
145
dev->sock = socket;
146
dev->panic = panic;
147
+ dev->read_msg = read_msg ? read_msg : vu_message_read_default;
148
dev->set_watch = set_watch;
149
dev->remove_watch = remove_watch;
150
dev->iface = iface;
151
@@ -XXX,XX +XXX,XX @@ static void _vu_queue_notify(VuDev *dev, VuVirtq *vq, bool sync)
152
153
vu_message_write(dev, dev->slave_fd, &vmsg);
154
if (ack) {
155
- vu_message_read(dev, dev->slave_fd, &vmsg);
156
+ vu_message_read_default(dev, dev->slave_fd, &vmsg);
157
}
158
return;
159
}
160
diff --git a/tests/vhost-user-bridge.c b/tests/vhost-user-bridge.c
161
index XXXXXXX..XXXXXXX 100644
162
--- a/tests/vhost-user-bridge.c
163
+++ b/tests/vhost-user-bridge.c
164
@@ -XXX,XX +XXX,XX @@ vubr_accept_cb(int sock, void *ctx)
165
VHOST_USER_BRIDGE_MAX_QUEUES,
166
conn_fd,
167
vubr_panic,
168
+ NULL,
169
vubr_set_watch,
170
vubr_remove_watch,
171
&vuiface)) {
172
@@ -XXX,XX +XXX,XX @@ vubr_new(const char *path, bool client)
173
VHOST_USER_BRIDGE_MAX_QUEUES,
174
dev->sock,
175
vubr_panic,
176
+ NULL,
177
vubr_set_watch,
178
vubr_remove_watch,
179
&vuiface)) {
180
diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
181
index XXXXXXX..XXXXXXX 100644
182
--- a/tools/virtiofsd/fuse_virtio.c
183
+++ b/tools/virtiofsd/fuse_virtio.c
184
@@ -XXX,XX +XXX,XX @@ int virtio_session_mount(struct fuse_session *se)
185
se->vu_socketfd = data_sock;
186
se->virtio_dev->se = se;
187
pthread_rwlock_init(&se->virtio_dev->vu_dispatch_rwlock, NULL);
188
- vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, fv_set_watch,
189
- fv_remove_watch, &fv_iface);
190
+ vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, NULL,
191
+ fv_set_watch, fv_remove_watch, &fv_iface);
192
193
return 0;
194
}
195
--
227
--
196
2.26.2
228
2.29.2
197
229
diff view generated by jsdifflib
1
Introduce libblkdev.fa to avoid recompiling blockdev_ss twice.
1
From: Jagannathan Raman <jag.raman@oracle.com>
2
2
3
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
3
Add configuration options to enable or disable multiprocess QEMU code
4
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
4
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
6
Message-id: 20200929125516.186715-3-stefanha@redhat.com
6
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
7
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
8
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Message-id: 6cc37253e35418ebd7b675a31a3df6e3c7a12dc1.1611938319.git.jag.raman@oracle.com
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
---
11
---
9
meson.build | 12 ++++++++++--
12
configure | 10 ++++++++++
10
storage-daemon/meson.build | 3 +--
13
meson.build | 4 +++-
11
2 files changed, 11 insertions(+), 4 deletions(-)
14
Kconfig.host | 4 ++++
15
hw/Kconfig | 1 +
16
hw/remote/Kconfig | 3 +++
17
5 files changed, 21 insertions(+), 1 deletion(-)
18
create mode 100644 hw/remote/Kconfig
12
19
20
diff --git a/configure b/configure
21
index XXXXXXX..XXXXXXX 100755
22
--- a/configure
23
+++ b/configure
24
@@ -XXX,XX +XXX,XX @@ skip_meson=no
25
gettext="auto"
26
fuse="auto"
27
fuse_lseek="auto"
28
+multiprocess="no"
29
30
malloc_trim="auto"
31
32
@@ -XXX,XX +XXX,XX @@ Linux)
33
linux="yes"
34
linux_user="yes"
35
vhost_user=${default_feature:-yes}
36
+ multiprocess=${default_feature:-yes}
37
;;
38
esac
39
40
@@ -XXX,XX +XXX,XX @@ for opt do
41
;;
42
--disable-fuse-lseek) fuse_lseek="disabled"
43
;;
44
+ --enable-multiprocess) multiprocess="yes"
45
+ ;;
46
+ --disable-multiprocess) multiprocess="no"
47
+ ;;
48
*)
49
echo "ERROR: unknown option $opt"
50
echo "Try '$0 --help' for more information"
51
@@ -XXX,XX +XXX,XX @@ disabled with --disable-FEATURE, default is enabled if available
52
libdaxctl libdaxctl support
53
fuse FUSE block device export
54
fuse-lseek SEEK_HOLE/SEEK_DATA support for FUSE exports
55
+ multiprocess Multiprocess QEMU support
56
57
NOTE: The object files are built at the place where configure is launched
58
EOF
59
@@ -XXX,XX +XXX,XX @@ fi
60
if test "$have_mlockall" = "yes" ; then
61
echo "HAVE_MLOCKALL=y" >> $config_host_mak
62
fi
63
+if test "$multiprocess" = "yes" ; then
64
+ echo "CONFIG_MULTIPROCESS_ALLOWED=y" >> $config_host_mak
65
+fi
66
if test "$fuzzing" = "yes" ; then
67
# If LIB_FUZZING_ENGINE is set, assume we are running on OSS-Fuzz, and the
68
# needed CFLAGS have already been provided
13
diff --git a/meson.build b/meson.build
69
diff --git a/meson.build b/meson.build
14
index XXXXXXX..XXXXXXX 100644
70
index XXXXXXX..XXXXXXX 100644
15
--- a/meson.build
71
--- a/meson.build
16
+++ b/meson.build
72
+++ b/meson.build
17
@@ -XXX,XX +XXX,XX @@ blockdev_ss.add(files(
73
@@ -XXX,XX +XXX,XX @@ host_kconfig = \
18
# os-win32.c does not
74
('CONFIG_VHOST_KERNEL' in config_host ? ['CONFIG_VHOST_KERNEL=y'] : []) + \
19
blockdev_ss.add(when: 'CONFIG_POSIX', if_true: files('os-posix.c'))
75
(have_virtfs ? ['CONFIG_VIRTFS=y'] : []) + \
20
softmmu_ss.add(when: 'CONFIG_WIN32', if_true: [files('os-win32.c')])
76
('CONFIG_LINUX' in config_host ? ['CONFIG_LINUX=y'] : []) + \
21
-softmmu_ss.add_all(blockdev_ss)
77
- ('CONFIG_PVRDMA' in config_host ? ['CONFIG_PVRDMA=y'] : [])
22
78
+ ('CONFIG_PVRDMA' in config_host ? ['CONFIG_PVRDMA=y'] : []) + \
23
common_ss.add(files('cpus-common.c'))
79
+ ('CONFIG_MULTIPROCESS_ALLOWED' in config_host ? ['CONFIG_MULTIPROCESS_ALLOWED=y'] : [])
24
80
25
@@ -XXX,XX +XXX,XX @@ block = declare_dependency(link_whole: [libblock],
81
ignored = [ 'TARGET_XML_FILES', 'TARGET_ABI_DIR', 'TARGET_ARCH' ]
26
link_args: '@block.syms',
82
27
dependencies: [crypto, io])
83
@@ -XXX,XX +XXX,XX @@ summary_info += {'libpmem support': config_host.has_key('CONFIG_LIBPMEM')}
28
84
summary_info += {'libdaxctl support': config_host.has_key('CONFIG_LIBDAXCTL')}
29
+blockdev_ss = blockdev_ss.apply(config_host, strict: false)
85
summary_info += {'libudev': libudev.found()}
30
+libblockdev = static_library('blockdev', blockdev_ss.sources() + genh,
86
summary_info += {'FUSE lseek': fuse_lseek.found()}
31
+ dependencies: blockdev_ss.dependencies(),
87
+summary_info += {'Multiprocess QEMU': config_host.has_key('CONFIG_MULTIPROCESS_ALLOWED')}
32
+ name_suffix: 'fa',
88
summary(summary_info, bool_yn: true, section: 'Dependencies')
33
+ build_by_default: false)
89
90
if not supported_cpus.contains(cpu)
91
diff --git a/Kconfig.host b/Kconfig.host
92
index XXXXXXX..XXXXXXX 100644
93
--- a/Kconfig.host
94
+++ b/Kconfig.host
95
@@ -XXX,XX +XXX,XX @@ config VIRTFS
96
97
config PVRDMA
98
bool
34
+
99
+
35
+blockdev = declare_dependency(link_whole: [libblockdev],
100
+config MULTIPROCESS_ALLOWED
36
+ dependencies: [block])
101
+ bool
37
+
102
+ imply MULTIPROCESS
38
qmp_ss = qmp_ss.apply(config_host, strict: false)
103
diff --git a/hw/Kconfig b/hw/Kconfig
39
libqmp = static_library('qmp', qmp_ss.sources() + genh,
40
dependencies: qmp_ss.dependencies(),
41
@@ -XXX,XX +XXX,XX @@ foreach m : block_mods + softmmu_mods
42
install_dir: config_host['qemu_moddir'])
43
endforeach
44
45
-softmmu_ss.add(authz, block, chardev, crypto, io, qmp)
46
+softmmu_ss.add(authz, blockdev, chardev, crypto, io, qmp)
47
common_ss.add(qom, qemuutil)
48
49
common_ss.add_all(when: 'CONFIG_SOFTMMU', if_true: [softmmu_ss])
50
diff --git a/storage-daemon/meson.build b/storage-daemon/meson.build
51
index XXXXXXX..XXXXXXX 100644
104
index XXXXXXX..XXXXXXX 100644
52
--- a/storage-daemon/meson.build
105
--- a/hw/Kconfig
53
+++ b/storage-daemon/meson.build
106
+++ b/hw/Kconfig
107
@@ -XXX,XX +XXX,XX @@ source pci-host/Kconfig
108
source pcmcia/Kconfig
109
source pci/Kconfig
110
source rdma/Kconfig
111
+source remote/Kconfig
112
source rtc/Kconfig
113
source scsi/Kconfig
114
source sd/Kconfig
115
diff --git a/hw/remote/Kconfig b/hw/remote/Kconfig
116
new file mode 100644
117
index XXXXXXX..XXXXXXX
118
--- /dev/null
119
+++ b/hw/remote/Kconfig
54
@@ -XXX,XX +XXX,XX @@
120
@@ -XXX,XX +XXX,XX @@
55
qsd_ss = ss.source_set()
121
+config MULTIPROCESS
56
qsd_ss.add(files('qemu-storage-daemon.c'))
122
+ bool
57
-qsd_ss.add(block, chardev, qmp, qom, qemuutil)
123
+ depends on PCI && KVM
58
-qsd_ss.add_all(blockdev_ss)
59
+qsd_ss.add(blockdev, chardev, qmp, qom, qemuutil)
60
61
subdir('qapi')
62
63
--
124
--
64
2.26.2
125
2.29.2
65
126
diff view generated by jsdifflib
1
From: Coiby Xu <coiby.xu@gmail.com>
1
From: Jagannathan Raman <jag.raman@oracle.com>
2
2
3
Sharing QEMU devices via vhost-user protocol.
3
PCI host bridge is setup for the remote device process. It is
4
implemented using remote-pcihost object. It is an extension of the PCI
5
host bridge setup by QEMU.
6
Remote-pcihost configures a PCI bus which could be used by the remote
7
PCI device to latch on to.
4
8
5
Only one vhost-user client can connect to the server one time.
9
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
10
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
11
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
12
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Message-id: 0871ba857abb2eafacde07e7fe66a3f12415bfb2.1611938319.git.jag.raman@oracle.com
6
14
7
Suggested-by: Kevin Wolf <kwolf@redhat.com>
15
[Added PCI_EXPRESS condition in hw/remote/Kconfig since remote-pcihost
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
16
needs PCIe. This solves "make check" failure on s390x. Fix suggested by
9
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
17
Philippe Mathieu-Daudé <philmd@redhat.com> and Thomas Huth
10
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
18
<thuth@redhat.com>.
11
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
12
Message-id: 20200918080912.321299-4-coiby.xu@gmail.com
13
[Fixed size_t %lu -> %zu format string compiler error.
14
--Stefan]
19
--Stefan]
20
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
21
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
16
---
22
---
17
util/vhost-user-server.h | 65 ++++++
23
MAINTAINERS | 2 +
18
util/vhost-user-server.c | 428 +++++++++++++++++++++++++++++++++++++++
24
include/hw/pci-host/remote.h | 29 ++++++++++++++
19
util/meson.build | 1 +
25
hw/pci-host/remote.c | 75 ++++++++++++++++++++++++++++++++++++
20
3 files changed, 494 insertions(+)
26
hw/pci-host/Kconfig | 3 ++
21
create mode 100644 util/vhost-user-server.h
27
hw/pci-host/meson.build | 1 +
22
create mode 100644 util/vhost-user-server.c
28
hw/remote/Kconfig | 3 +-
29
6 files changed, 112 insertions(+), 1 deletion(-)
30
create mode 100644 include/hw/pci-host/remote.h
31
create mode 100644 hw/pci-host/remote.c
23
32
24
diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
33
diff --git a/MAINTAINERS b/MAINTAINERS
34
index XXXXXXX..XXXXXXX 100644
35
--- a/MAINTAINERS
36
+++ b/MAINTAINERS
37
@@ -XXX,XX +XXX,XX @@ M: John G Johnson <john.g.johnson@oracle.com>
38
S: Maintained
39
F: docs/devel/multi-process.rst
40
F: docs/system/multi-process.rst
41
+F: hw/pci-host/remote.c
42
+F: include/hw/pci-host/remote.h
43
44
Build and test automation
45
-------------------------
46
diff --git a/include/hw/pci-host/remote.h b/include/hw/pci-host/remote.h
25
new file mode 100644
47
new file mode 100644
26
index XXXXXXX..XXXXXXX
48
index XXXXXXX..XXXXXXX
27
--- /dev/null
49
--- /dev/null
28
+++ b/util/vhost-user-server.h
50
+++ b/include/hw/pci-host/remote.h
29
@@ -XXX,XX +XXX,XX @@
51
@@ -XXX,XX +XXX,XX @@
30
+/*
52
+/*
31
+ * Sharing QEMU devices via vhost-user protocol
53
+ * PCI Host for remote device
32
+ *
54
+ *
33
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
55
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
34
+ * Copyright (c) 2020 Red Hat, Inc.
35
+ *
56
+ *
36
+ * This work is licensed under the terms of the GNU GPL, version 2 or
57
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
37
+ * later. See the COPYING file in the top-level directory.
58
+ * See the COPYING file in the top-level directory.
59
+ *
38
+ */
60
+ */
39
+
61
+
40
+#ifndef VHOST_USER_SERVER_H
62
+#ifndef REMOTE_PCIHOST_H
41
+#define VHOST_USER_SERVER_H
63
+#define REMOTE_PCIHOST_H
42
+
64
+
43
+#include "contrib/libvhost-user/libvhost-user.h"
65
+#include "exec/memory.h"
44
+#include "io/channel-socket.h"
66
+#include "hw/pci/pcie_host.h"
45
+#include "io/channel-file.h"
46
+#include "io/net-listener.h"
47
+#include "qemu/error-report.h"
48
+#include "qapi/error.h"
49
+#include "standard-headers/linux/virtio_blk.h"
50
+
67
+
51
+typedef struct VuFdWatch {
68
+#define TYPE_REMOTE_PCIHOST "remote-pcihost"
52
+ VuDev *vu_dev;
69
+OBJECT_DECLARE_SIMPLE_TYPE(RemotePCIHost, REMOTE_PCIHOST)
53
+ int fd; /*kick fd*/
54
+ void *pvt;
55
+ vu_watch_cb cb;
56
+ bool processing;
57
+ QTAILQ_ENTRY(VuFdWatch) next;
58
+} VuFdWatch;
59
+
70
+
60
+typedef struct VuServer VuServer;
71
+struct RemotePCIHost {
61
+typedef void DevicePanicNotifierFn(VuServer *server);
72
+ /*< private >*/
73
+ PCIExpressHost parent_obj;
74
+ /*< public >*/
62
+
75
+
63
+struct VuServer {
76
+ MemoryRegion *mr_pci_mem;
64
+ QIONetListener *listener;
77
+ MemoryRegion *mr_sys_io;
65
+ AioContext *ctx;
66
+ DevicePanicNotifierFn *device_panic_notifier;
67
+ int max_queues;
68
+ const VuDevIface *vu_iface;
69
+ VuDev vu_dev;
70
+ QIOChannel *ioc; /* The I/O channel with the client */
71
+ QIOChannelSocket *sioc; /* The underlying data channel with the client */
72
+ /* IOChannel for fd provided via VHOST_USER_SET_SLAVE_REQ_FD */
73
+ QIOChannel *ioc_slave;
74
+ QIOChannelSocket *sioc_slave;
75
+ Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
76
+ QTAILQ_HEAD(, VuFdWatch) vu_fd_watches;
77
+ /* restart coroutine co_trip if AIOContext is changed */
78
+ bool aio_context_changed;
79
+ bool processing_msg;
80
+};
78
+};
81
+
79
+
82
+bool vhost_user_server_start(VuServer *server,
80
+#endif
83
+ SocketAddress *unix_socket,
81
diff --git a/hw/pci-host/remote.c b/hw/pci-host/remote.c
84
+ AioContext *ctx,
85
+ uint16_t max_queues,
86
+ DevicePanicNotifierFn *device_panic_notifier,
87
+ const VuDevIface *vu_iface,
88
+ Error **errp);
89
+
90
+void vhost_user_server_stop(VuServer *server);
91
+
92
+void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx);
93
+
94
+#endif /* VHOST_USER_SERVER_H */
95
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
96
new file mode 100644
82
new file mode 100644
97
index XXXXXXX..XXXXXXX
83
index XXXXXXX..XXXXXXX
98
--- /dev/null
84
--- /dev/null
99
+++ b/util/vhost-user-server.c
85
+++ b/hw/pci-host/remote.c
100
@@ -XXX,XX +XXX,XX @@
86
@@ -XXX,XX +XXX,XX @@
101
+/*
87
+/*
102
+ * Sharing QEMU devices via vhost-user protocol
88
+ * Remote PCI host device
103
+ *
89
+ *
104
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
90
+ * Unlike PCI host devices that model physical hardware, the purpose
105
+ * Copyright (c) 2020 Red Hat, Inc.
91
+ * of this PCI host is to host multi-process QEMU devices.
106
+ *
92
+ *
107
+ * This work is licensed under the terms of the GNU GPL, version 2 or
93
+ * Multi-process QEMU extends the PCI host of a QEMU machine into a
108
+ * later. See the COPYING file in the top-level directory.
94
+ * remote process. Any PCI device attached to the remote process is
95
+ * visible in the QEMU guest. This allows existing QEMU device models
96
+ * to be reused in the remote process.
97
+ *
98
+ * This PCI host is purely a container for PCI devices. It's fake in the
99
+ * sense that the guest never sees this PCI host and has no way of
100
+ * accessing it. Its job is just to provide the environment that QEMU
101
+ * PCI device models need when running in a remote process.
102
+ *
103
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
104
+ *
105
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
106
+ * See the COPYING file in the top-level directory.
107
+ *
109
+ */
108
+ */
109
+
110
+#include "qemu/osdep.h"
110
+#include "qemu/osdep.h"
111
+#include "qemu/main-loop.h"
111
+#include "qemu-common.h"
112
+#include "vhost-user-server.h"
113
+
112
+
114
+static void vmsg_close_fds(VhostUserMsg *vmsg)
113
+#include "hw/pci/pci.h"
114
+#include "hw/pci/pci_host.h"
115
+#include "hw/pci/pcie_host.h"
116
+#include "hw/qdev-properties.h"
117
+#include "hw/pci-host/remote.h"
118
+#include "exec/memory.h"
119
+
120
+static const char *remote_pcihost_root_bus_path(PCIHostState *host_bridge,
121
+ PCIBus *rootbus)
115
+{
122
+{
116
+ int i;
123
+ return "0000:00";
117
+ for (i = 0; i < vmsg->fd_num; i++) {
118
+ close(vmsg->fds[i]);
119
+ }
120
+}
124
+}
121
+
125
+
122
+static void vmsg_unblock_fds(VhostUserMsg *vmsg)
126
+static void remote_pcihost_realize(DeviceState *dev, Error **errp)
123
+{
127
+{
124
+ int i;
128
+ PCIHostState *pci = PCI_HOST_BRIDGE(dev);
125
+ for (i = 0; i < vmsg->fd_num; i++) {
129
+ RemotePCIHost *s = REMOTE_PCIHOST(dev);
126
+ qemu_set_nonblock(vmsg->fds[i]);
130
+
127
+ }
131
+ pci->bus = pci_root_bus_new(DEVICE(s), "remote-pci",
132
+ s->mr_pci_mem, s->mr_sys_io,
133
+ 0, TYPE_PCIE_BUS);
128
+}
134
+}
129
+
135
+
130
+static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
136
+static void remote_pcihost_class_init(ObjectClass *klass, void *data)
131
+ gpointer opaque);
137
+{
138
+ DeviceClass *dc = DEVICE_CLASS(klass);
139
+ PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
132
+
140
+
133
+static void close_client(VuServer *server)
141
+ hc->root_bus_path = remote_pcihost_root_bus_path;
134
+{
142
+ dc->realize = remote_pcihost_realize;
135
+ /*
136
+ * Before closing the client
137
+ *
138
+ * 1. Let vu_client_trip stop processing new vhost-user msg
139
+ *
140
+ * 2. remove kick_handler
141
+ *
142
+ * 3. wait for the kick handler to be finished
143
+ *
144
+ * 4. wait for the current vhost-user msg to be finished processing
145
+ */
146
+
143
+
147
+ QIOChannelSocket *sioc = server->sioc;
144
+ dc->user_creatable = false;
148
+ /* When this is set vu_client_trip will stop new processing vhost-user message */
145
+ set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
149
+ server->sioc = NULL;
146
+ dc->fw_name = "pci";
150
+
151
+ VuFdWatch *vu_fd_watch, *next;
152
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
153
+ aio_set_fd_handler(server->ioc->ctx, vu_fd_watch->fd, true, NULL,
154
+ NULL, NULL, NULL);
155
+ }
156
+
157
+ while (!QTAILQ_EMPTY(&server->vu_fd_watches)) {
158
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
159
+ if (!vu_fd_watch->processing) {
160
+ QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
161
+ g_free(vu_fd_watch);
162
+ }
163
+ }
164
+ }
165
+
166
+ while (server->processing_msg) {
167
+ if (server->ioc->read_coroutine) {
168
+ server->ioc->read_coroutine = NULL;
169
+ qio_channel_set_aio_fd_handler(server->ioc, server->ioc->ctx, NULL,
170
+ NULL, server->ioc);
171
+ server->processing_msg = false;
172
+ }
173
+ }
174
+
175
+ vu_deinit(&server->vu_dev);
176
+ object_unref(OBJECT(sioc));
177
+ object_unref(OBJECT(server->ioc));
178
+}
147
+}
179
+
148
+
180
+static void panic_cb(VuDev *vu_dev, const char *buf)
149
+static const TypeInfo remote_pcihost_info = {
150
+ .name = TYPE_REMOTE_PCIHOST,
151
+ .parent = TYPE_PCIE_HOST_BRIDGE,
152
+ .instance_size = sizeof(RemotePCIHost),
153
+ .class_init = remote_pcihost_class_init,
154
+};
155
+
156
+static void remote_pcihost_register(void)
181
+{
157
+{
182
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
158
+ type_register_static(&remote_pcihost_info);
183
+
184
+ /* avoid while loop in close_client */
185
+ server->processing_msg = false;
186
+
187
+ if (buf) {
188
+ error_report("vu_panic: %s", buf);
189
+ }
190
+
191
+ if (server->sioc) {
192
+ close_client(server);
193
+ }
194
+
195
+ if (server->device_panic_notifier) {
196
+ server->device_panic_notifier(server);
197
+ }
198
+
199
+ /*
200
+ * Set the callback function for network listener so another
201
+ * vhost-user client can connect to this server
202
+ */
203
+ qio_net_listener_set_client_func(server->listener,
204
+ vu_accept,
205
+ server,
206
+ NULL);
207
+}
159
+}
208
+
160
+
209
+static bool coroutine_fn
161
+type_init(remote_pcihost_register)
210
+vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
162
diff --git a/hw/pci-host/Kconfig b/hw/pci-host/Kconfig
211
+{
163
index XXXXXXX..XXXXXXX 100644
212
+ struct iovec iov = {
164
--- a/hw/pci-host/Kconfig
213
+ .iov_base = (char *)vmsg,
165
+++ b/hw/pci-host/Kconfig
214
+ .iov_len = VHOST_USER_HDR_SIZE,
166
@@ -XXX,XX +XXX,XX @@ config PCI_POWERNV
215
+ };
167
select PCI_EXPRESS
216
+ int rc, read_bytes = 0;
168
select MSI_NONBROKEN
217
+ Error *local_err = NULL;
169
select PCIE_PORT
218
+ /*
219
+ * Store fds/nfds returned from qio_channel_readv_full into
220
+ * temporary variables.
221
+ *
222
+ * VhostUserMsg is a packed structure, gcc will complain about passing
223
+ * pointer to a packed structure member if we pass &VhostUserMsg.fd_num
224
+ * and &VhostUserMsg.fds directly when calling qio_channel_readv_full,
225
+ * thus two temporary variables nfds and fds are used here.
226
+ */
227
+ size_t nfds = 0, nfds_t = 0;
228
+ const size_t max_fds = G_N_ELEMENTS(vmsg->fds);
229
+ int *fds_t = NULL;
230
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
231
+ QIOChannel *ioc = server->ioc;
232
+
170
+
233
+ if (!ioc) {
171
+config REMOTE_PCIHOST
234
+ error_report_err(local_err);
172
+ bool
235
+ goto fail;
173
diff --git a/hw/pci-host/meson.build b/hw/pci-host/meson.build
236
+ }
237
+
238
+ assert(qemu_in_coroutine());
239
+ do {
240
+ /*
241
+ * qio_channel_readv_full may have short reads, keeping calling it
242
+ * until getting VHOST_USER_HDR_SIZE or 0 bytes in total
243
+ */
244
+ rc = qio_channel_readv_full(ioc, &iov, 1, &fds_t, &nfds_t, &local_err);
245
+ if (rc < 0) {
246
+ if (rc == QIO_CHANNEL_ERR_BLOCK) {
247
+ qio_channel_yield(ioc, G_IO_IN);
248
+ continue;
249
+ } else {
250
+ error_report_err(local_err);
251
+ return false;
252
+ }
253
+ }
254
+ read_bytes += rc;
255
+ if (nfds_t > 0) {
256
+ if (nfds + nfds_t > max_fds) {
257
+ error_report("A maximum of %zu fds are allowed, "
258
+ "however got %zu fds now",
259
+ max_fds, nfds + nfds_t);
260
+ goto fail;
261
+ }
262
+ memcpy(vmsg->fds + nfds, fds_t,
263
+ nfds_t *sizeof(vmsg->fds[0]));
264
+ nfds += nfds_t;
265
+ g_free(fds_t);
266
+ }
267
+ if (read_bytes == VHOST_USER_HDR_SIZE || rc == 0) {
268
+ break;
269
+ }
270
+ iov.iov_base = (char *)vmsg + read_bytes;
271
+ iov.iov_len = VHOST_USER_HDR_SIZE - read_bytes;
272
+ } while (true);
273
+
274
+ vmsg->fd_num = nfds;
275
+ /* qio_channel_readv_full will make socket fds blocking, unblock them */
276
+ vmsg_unblock_fds(vmsg);
277
+ if (vmsg->size > sizeof(vmsg->payload)) {
278
+ error_report("Error: too big message request: %d, "
279
+ "size: vmsg->size: %u, "
280
+ "while sizeof(vmsg->payload) = %zu",
281
+ vmsg->request, vmsg->size, sizeof(vmsg->payload));
282
+ goto fail;
283
+ }
284
+
285
+ struct iovec iov_payload = {
286
+ .iov_base = (char *)&vmsg->payload,
287
+ .iov_len = vmsg->size,
288
+ };
289
+ if (vmsg->size) {
290
+ rc = qio_channel_readv_all_eof(ioc, &iov_payload, 1, &local_err);
291
+ if (rc == -1) {
292
+ error_report_err(local_err);
293
+ goto fail;
294
+ }
295
+ }
296
+
297
+ return true;
298
+
299
+fail:
300
+ vmsg_close_fds(vmsg);
301
+
302
+ return false;
303
+}
304
+
305
+
306
+static void vu_client_start(VuServer *server);
307
+static coroutine_fn void vu_client_trip(void *opaque)
308
+{
309
+ VuServer *server = opaque;
310
+
311
+ while (!server->aio_context_changed && server->sioc) {
312
+ server->processing_msg = true;
313
+ vu_dispatch(&server->vu_dev);
314
+ server->processing_msg = false;
315
+ }
316
+
317
+ if (server->aio_context_changed && server->sioc) {
318
+ server->aio_context_changed = false;
319
+ vu_client_start(server);
320
+ }
321
+}
322
+
323
+static void vu_client_start(VuServer *server)
324
+{
325
+ server->co_trip = qemu_coroutine_create(vu_client_trip, server);
326
+ aio_co_enter(server->ctx, server->co_trip);
327
+}
328
+
329
+/*
330
+ * a wrapper for vu_kick_cb
331
+ *
332
+ * since aio_dispatch can only pass one user data pointer to the
333
+ * callback function, pack VuDev and pvt into a struct. Then unpack it
334
+ * and pass them to vu_kick_cb
335
+ */
336
+static void kick_handler(void *opaque)
337
+{
338
+ VuFdWatch *vu_fd_watch = opaque;
339
+ vu_fd_watch->processing = true;
340
+ vu_fd_watch->cb(vu_fd_watch->vu_dev, 0, vu_fd_watch->pvt);
341
+ vu_fd_watch->processing = false;
342
+}
343
+
344
+
345
+static VuFdWatch *find_vu_fd_watch(VuServer *server, int fd)
346
+{
347
+
348
+ VuFdWatch *vu_fd_watch, *next;
349
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
350
+ if (vu_fd_watch->fd == fd) {
351
+ return vu_fd_watch;
352
+ }
353
+ }
354
+ return NULL;
355
+}
356
+
357
+static void
358
+set_watch(VuDev *vu_dev, int fd, int vu_evt,
359
+ vu_watch_cb cb, void *pvt)
360
+{
361
+
362
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
363
+ g_assert(vu_dev);
364
+ g_assert(fd >= 0);
365
+ g_assert(cb);
366
+
367
+ VuFdWatch *vu_fd_watch = find_vu_fd_watch(server, fd);
368
+
369
+ if (!vu_fd_watch) {
370
+ VuFdWatch *vu_fd_watch = g_new0(VuFdWatch, 1);
371
+
372
+ QTAILQ_INSERT_TAIL(&server->vu_fd_watches, vu_fd_watch, next);
373
+
374
+ vu_fd_watch->fd = fd;
375
+ vu_fd_watch->cb = cb;
376
+ qemu_set_nonblock(fd);
377
+ aio_set_fd_handler(server->ioc->ctx, fd, true, kick_handler,
378
+ NULL, NULL, vu_fd_watch);
379
+ vu_fd_watch->vu_dev = vu_dev;
380
+ vu_fd_watch->pvt = pvt;
381
+ }
382
+}
383
+
384
+
385
+static void remove_watch(VuDev *vu_dev, int fd)
386
+{
387
+ VuServer *server;
388
+ g_assert(vu_dev);
389
+ g_assert(fd >= 0);
390
+
391
+ server = container_of(vu_dev, VuServer, vu_dev);
392
+
393
+ VuFdWatch *vu_fd_watch = find_vu_fd_watch(server, fd);
394
+
395
+ if (!vu_fd_watch) {
396
+ return;
397
+ }
398
+ aio_set_fd_handler(server->ioc->ctx, fd, true, NULL, NULL, NULL, NULL);
399
+
400
+ QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
401
+ g_free(vu_fd_watch);
402
+}
403
+
404
+
405
+static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
406
+ gpointer opaque)
407
+{
408
+ VuServer *server = opaque;
409
+
410
+ if (server->sioc) {
411
+ warn_report("Only one vhost-user client is allowed to "
412
+ "connect the server one time");
413
+ return;
414
+ }
415
+
416
+ if (!vu_init(&server->vu_dev, server->max_queues, sioc->fd, panic_cb,
417
+ vu_message_read, set_watch, remove_watch, server->vu_iface)) {
418
+ error_report("Failed to initialize libvhost-user");
419
+ return;
420
+ }
421
+
422
+ /*
423
+ * Unset the callback function for network listener to make another
424
+ * vhost-user client keeping waiting until this client disconnects
425
+ */
426
+ qio_net_listener_set_client_func(server->listener,
427
+ NULL,
428
+ NULL,
429
+ NULL);
430
+ server->sioc = sioc;
431
+ /*
432
+ * Increase the object reference, so sioc will not freed by
433
+ * qio_net_listener_channel_func which will call object_unref(OBJECT(sioc))
434
+ */
435
+ object_ref(OBJECT(server->sioc));
436
+ qio_channel_set_name(QIO_CHANNEL(sioc), "vhost-user client");
437
+ server->ioc = QIO_CHANNEL(sioc);
438
+ object_ref(OBJECT(server->ioc));
439
+ qio_channel_attach_aio_context(server->ioc, server->ctx);
440
+ qio_channel_set_blocking(QIO_CHANNEL(server->sioc), false, NULL);
441
+ vu_client_start(server);
442
+}
443
+
444
+
445
+void vhost_user_server_stop(VuServer *server)
446
+{
447
+ if (server->sioc) {
448
+ close_client(server);
449
+ }
450
+
451
+ if (server->listener) {
452
+ qio_net_listener_disconnect(server->listener);
453
+ object_unref(OBJECT(server->listener));
454
+ }
455
+
456
+}
457
+
458
+void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx)
459
+{
460
+ VuFdWatch *vu_fd_watch, *next;
461
+ void *opaque = NULL;
462
+ IOHandler *io_read = NULL;
463
+ bool attach;
464
+
465
+ server->ctx = ctx ? ctx : qemu_get_aio_context();
466
+
467
+ if (!server->sioc) {
468
+ /* not yet serving any client*/
469
+ return;
470
+ }
471
+
472
+ if (ctx) {
473
+ qio_channel_attach_aio_context(server->ioc, ctx);
474
+ server->aio_context_changed = true;
475
+ io_read = kick_handler;
476
+ attach = true;
477
+ } else {
478
+ qio_channel_detach_aio_context(server->ioc);
479
+ /* server->ioc->ctx keeps the old AioConext */
480
+ ctx = server->ioc->ctx;
481
+ attach = false;
482
+ }
483
+
484
+ QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
485
+ if (vu_fd_watch->cb) {
486
+ opaque = attach ? vu_fd_watch : NULL;
487
+ aio_set_fd_handler(ctx, vu_fd_watch->fd, true,
488
+ io_read, NULL, NULL,
489
+ opaque);
490
+ }
491
+ }
492
+}
493
+
494
+
495
+bool vhost_user_server_start(VuServer *server,
496
+ SocketAddress *socket_addr,
497
+ AioContext *ctx,
498
+ uint16_t max_queues,
499
+ DevicePanicNotifierFn *device_panic_notifier,
500
+ const VuDevIface *vu_iface,
501
+ Error **errp)
502
+{
503
+ QIONetListener *listener = qio_net_listener_new();
504
+ if (qio_net_listener_open_sync(listener, socket_addr, 1,
505
+ errp) < 0) {
506
+ object_unref(OBJECT(listener));
507
+ return false;
508
+ }
509
+
510
+ /* zero out unspecified fileds */
511
+ *server = (VuServer) {
512
+ .listener = listener,
513
+ .vu_iface = vu_iface,
514
+ .max_queues = max_queues,
515
+ .ctx = ctx,
516
+ .device_panic_notifier = device_panic_notifier,
517
+ };
518
+
519
+ qio_net_listener_set_name(server->listener, "vhost-user-backend-listener");
520
+
521
+ qio_net_listener_set_client_func(server->listener,
522
+ vu_accept,
523
+ server,
524
+ NULL);
525
+
526
+ QTAILQ_INIT(&server->vu_fd_watches);
527
+ return true;
528
+}
529
diff --git a/util/meson.build b/util/meson.build
530
index XXXXXXX..XXXXXXX 100644
174
index XXXXXXX..XXXXXXX 100644
531
--- a/util/meson.build
175
--- a/hw/pci-host/meson.build
532
+++ b/util/meson.build
176
+++ b/hw/pci-host/meson.build
533
@@ -XXX,XX +XXX,XX @@ if have_block
177
@@ -XXX,XX +XXX,XX @@ pci_ss.add(when: 'CONFIG_PCI_EXPRESS_XILINX', if_true: files('xilinx-pcie.c'))
534
util_ss.add(files('main-loop.c'))
178
pci_ss.add(when: 'CONFIG_PCI_I440FX', if_true: files('i440fx.c'))
535
util_ss.add(files('nvdimm-utils.c'))
179
pci_ss.add(when: 'CONFIG_PCI_SABRE', if_true: files('sabre.c'))
536
util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
180
pci_ss.add(when: 'CONFIG_XEN_IGD_PASSTHROUGH', if_true: files('xen_igd_pt.c'))
537
+ util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
181
+pci_ss.add(when: 'CONFIG_REMOTE_PCIHOST', if_true: files('remote.c'))
538
util_ss.add(files('qemu-coroutine-sleep.c'))
182
539
util_ss.add(files('qemu-co-shared-resource.c'))
183
# PPC devices
540
util_ss.add(files('thread-pool.c', 'qemu-timer.c'))
184
pci_ss.add(when: 'CONFIG_PREP_PCI', if_true: files('prep.c'))
185
diff --git a/hw/remote/Kconfig b/hw/remote/Kconfig
186
index XXXXXXX..XXXXXXX 100644
187
--- a/hw/remote/Kconfig
188
+++ b/hw/remote/Kconfig
189
@@ -XXX,XX +XXX,XX @@
190
config MULTIPROCESS
191
bool
192
- depends on PCI && KVM
193
+ depends on PCI && PCI_EXPRESS && KVM
194
+ select REMOTE_PCIHOST
541
--
195
--
542
2.26.2
196
2.29.2
543
197
diff view generated by jsdifflib
1
From: Philippe Mathieu-Daudé <philmd@redhat.com>
1
From: Jagannathan Raman <jag.raman@oracle.com>
2
2
3
Keep statistics of some hardware errors, and number of
3
x-remote-machine object sets up various subsystems of the remote
4
aligned/unaligned I/O accesses.
4
device process. Instantiate PCI host bridge object and initialize RAM, IO &
5
PCI memory regions.
5
6
6
QMP example booting a full RHEL 8.3 aarch64 guest:
7
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
7
8
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
8
{ "execute": "query-blockstats" }
9
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9
{
10
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
"return": [
11
Message-id: c537f38d17f90453ca610c6b70cf3480274e0ba1.1611938319.git.jag.raman@oracle.com
11
{
12
"device": "",
13
"node-name": "drive0",
14
"stats": {
15
"flush_total_time_ns": 6026948,
16
"wr_highest_offset": 3383991230464,
17
"wr_total_time_ns": 807450995,
18
"failed_wr_operations": 0,
19
"failed_rd_operations": 0,
20
"wr_merged": 3,
21
"wr_bytes": 50133504,
22
"failed_unmap_operations": 0,
23
"failed_flush_operations": 0,
24
"account_invalid": false,
25
"rd_total_time_ns": 1846979900,
26
"flush_operations": 130,
27
"wr_operations": 659,
28
"rd_merged": 1192,
29
"rd_bytes": 218244096,
30
"account_failed": false,
31
"idle_time_ns": 2678641497,
32
"rd_operations": 7406,
33
},
34
"driver-specific": {
35
"driver": "nvme",
36
"completion-errors": 0,
37
"unaligned-accesses": 2959,
38
"aligned-accesses": 4477
39
},
40
"qdev": "/machine/peripheral-anon/device[0]/virtio-backend"
41
}
42
]
43
}
44
45
Suggested-by: Stefan Hajnoczi <stefanha@gmail.com>
46
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
47
Acked-by: Markus Armbruster <armbru@redhat.com>
48
Message-id: 20201001162939.1567915-1-philmd@redhat.com
49
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
50
---
13
---
51
qapi/block-core.json | 24 +++++++++++++++++++++++-
14
MAINTAINERS | 2 ++
52
block/nvme.c | 27 +++++++++++++++++++++++++++
15
include/hw/pci-host/remote.h | 1 +
53
2 files changed, 50 insertions(+), 1 deletion(-)
16
include/hw/remote/machine.h | 27 ++++++++++++++
17
hw/remote/machine.c | 70 ++++++++++++++++++++++++++++++++++++
18
hw/meson.build | 1 +
19
hw/remote/meson.build | 5 +++
20
6 files changed, 106 insertions(+)
21
create mode 100644 include/hw/remote/machine.h
22
create mode 100644 hw/remote/machine.c
23
create mode 100644 hw/remote/meson.build
54
24
55
diff --git a/qapi/block-core.json b/qapi/block-core.json
25
diff --git a/MAINTAINERS b/MAINTAINERS
56
index XXXXXXX..XXXXXXX 100644
26
index XXXXXXX..XXXXXXX 100644
57
--- a/qapi/block-core.json
27
--- a/MAINTAINERS
58
+++ b/qapi/block-core.json
28
+++ b/MAINTAINERS
29
@@ -XXX,XX +XXX,XX @@ F: docs/devel/multi-process.rst
30
F: docs/system/multi-process.rst
31
F: hw/pci-host/remote.c
32
F: include/hw/pci-host/remote.h
33
+F: hw/remote/machine.c
34
+F: include/hw/remote/machine.h
35
36
Build and test automation
37
-------------------------
38
diff --git a/include/hw/pci-host/remote.h b/include/hw/pci-host/remote.h
39
index XXXXXXX..XXXXXXX 100644
40
--- a/include/hw/pci-host/remote.h
41
+++ b/include/hw/pci-host/remote.h
42
@@ -XXX,XX +XXX,XX @@ struct RemotePCIHost {
43
44
MemoryRegion *mr_pci_mem;
45
MemoryRegion *mr_sys_io;
46
+ MemoryRegion *mr_sys_mem;
47
};
48
49
#endif
50
diff --git a/include/hw/remote/machine.h b/include/hw/remote/machine.h
51
new file mode 100644
52
index XXXXXXX..XXXXXXX
53
--- /dev/null
54
+++ b/include/hw/remote/machine.h
59
@@ -XXX,XX +XXX,XX @@
55
@@ -XXX,XX +XXX,XX @@
60
'discard-nb-failed': 'uint64',
56
+/*
61
'discard-bytes-ok': 'uint64' } }
57
+ * Remote machine configuration
62
58
+ *
63
+##
59
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
64
+# @BlockStatsSpecificNvme:
60
+ *
65
+#
61
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
66
+# NVMe driver statistics
62
+ * See the COPYING file in the top-level directory.
67
+#
63
+ *
68
+# @completion-errors: The number of completion errors.
64
+ */
69
+#
70
+# @aligned-accesses: The number of aligned accesses performed by
71
+# the driver.
72
+#
73
+# @unaligned-accesses: The number of unaligned accesses performed by
74
+# the driver.
75
+#
76
+# Since: 5.2
77
+##
78
+{ 'struct': 'BlockStatsSpecificNvme',
79
+ 'data': {
80
+ 'completion-errors': 'uint64',
81
+ 'aligned-accesses': 'uint64',
82
+ 'unaligned-accesses': 'uint64' } }
83
+
65
+
84
##
66
+#ifndef REMOTE_MACHINE_H
85
# @BlockStatsSpecific:
67
+#define REMOTE_MACHINE_H
86
#
68
+
69
+#include "qom/object.h"
70
+#include "hw/boards.h"
71
+#include "hw/pci-host/remote.h"
72
+
73
+struct RemoteMachineState {
74
+ MachineState parent_obj;
75
+
76
+ RemotePCIHost *host;
77
+};
78
+
79
+#define TYPE_REMOTE_MACHINE "x-remote-machine"
80
+OBJECT_DECLARE_SIMPLE_TYPE(RemoteMachineState, REMOTE_MACHINE)
81
+
82
+#endif
83
diff --git a/hw/remote/machine.c b/hw/remote/machine.c
84
new file mode 100644
85
index XXXXXXX..XXXXXXX
86
--- /dev/null
87
+++ b/hw/remote/machine.c
87
@@ -XXX,XX +XXX,XX @@
88
@@ -XXX,XX +XXX,XX @@
88
'discriminator': 'driver',
89
+/*
89
'data': {
90
+ * Machine for remote device
90
'file': 'BlockStatsSpecificFile',
91
+ *
91
- 'host_device': 'BlockStatsSpecificFile' } }
92
+ * This machine type is used by the remote device process in multi-process
92
+ 'host_device': 'BlockStatsSpecificFile',
93
+ * QEMU. QEMU device models depend on parent busses, interrupt controllers,
93
+ 'nvme': 'BlockStatsSpecificNvme' } }
94
+ * memory regions, etc. The remote machine type offers this environment so
94
95
+ * that QEMU device models can be used as remote devices.
95
##
96
+ *
96
# @BlockStats:
97
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
97
diff --git a/block/nvme.c b/block/nvme.c
98
+ *
98
index XXXXXXX..XXXXXXX 100644
99
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
99
--- a/block/nvme.c
100
+ * See the COPYING file in the top-level directory.
100
+++ b/block/nvme.c
101
+ *
101
@@ -XXX,XX +XXX,XX @@ struct BDRVNVMeState {
102
+ */
102
103
/* PCI address (required for nvme_refresh_filename()) */
104
char *device;
105
+
103
+
106
+ struct {
104
+#include "qemu/osdep.h"
107
+ uint64_t completion_errors;
105
+#include "qemu-common.h"
108
+ uint64_t aligned_accesses;
106
+
109
+ uint64_t unaligned_accesses;
107
+#include "hw/remote/machine.h"
110
+ } stats;
108
+#include "exec/address-spaces.h"
111
};
109
+#include "exec/memory.h"
112
110
+#include "qapi/error.h"
113
#define NVME_BLOCK_OPT_DEVICE "device"
111
+
114
@@ -XXX,XX +XXX,XX @@ static bool nvme_process_completion(NVMeQueuePair *q)
112
+static void remote_machine_init(MachineState *machine)
115
break;
116
}
117
ret = nvme_translate_error(c);
118
+ if (ret) {
119
+ s->stats.completion_errors++;
120
+ }
121
q->cq.head = (q->cq.head + 1) % NVME_QUEUE_SIZE;
122
if (!q->cq.head) {
123
q->cq_phase = !q->cq_phase;
124
@@ -XXX,XX +XXX,XX @@ static int nvme_co_prw(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
125
assert(QEMU_IS_ALIGNED(bytes, s->page_size));
126
assert(bytes <= s->max_transfer);
127
if (nvme_qiov_aligned(bs, qiov)) {
128
+ s->stats.aligned_accesses++;
129
return nvme_co_prw_aligned(bs, offset, bytes, qiov, is_write, flags);
130
}
131
+ s->stats.unaligned_accesses++;
132
trace_nvme_prw_buffered(s, offset, bytes, qiov->niov, is_write);
133
buf = qemu_try_memalign(s->page_size, bytes);
134
135
@@ -XXX,XX +XXX,XX @@ static void nvme_unregister_buf(BlockDriverState *bs, void *host)
136
qemu_vfio_dma_unmap(s->vfio, host);
137
}
138
139
+static BlockStatsSpecific *nvme_get_specific_stats(BlockDriverState *bs)
140
+{
113
+{
141
+ BlockStatsSpecific *stats = g_new(BlockStatsSpecific, 1);
114
+ MemoryRegion *system_memory, *system_io, *pci_memory;
142
+ BDRVNVMeState *s = bs->opaque;
115
+ RemoteMachineState *s = REMOTE_MACHINE(machine);
116
+ RemotePCIHost *rem_host;
143
+
117
+
144
+ stats->driver = BLOCKDEV_DRIVER_NVME;
118
+ system_memory = get_system_memory();
145
+ stats->u.nvme = (BlockStatsSpecificNvme) {
119
+ system_io = get_system_io();
146
+ .completion_errors = s->stats.completion_errors,
147
+ .aligned_accesses = s->stats.aligned_accesses,
148
+ .unaligned_accesses = s->stats.unaligned_accesses,
149
+ };
150
+
120
+
151
+ return stats;
121
+ pci_memory = g_new(MemoryRegion, 1);
122
+ memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
123
+
124
+ rem_host = REMOTE_PCIHOST(qdev_new(TYPE_REMOTE_PCIHOST));
125
+
126
+ rem_host->mr_pci_mem = pci_memory;
127
+ rem_host->mr_sys_mem = system_memory;
128
+ rem_host->mr_sys_io = system_io;
129
+
130
+ s->host = rem_host;
131
+
132
+ object_property_add_child(OBJECT(s), "remote-pcihost", OBJECT(rem_host));
133
+ memory_region_add_subregion_overlap(system_memory, 0x0, pci_memory, -1);
134
+
135
+ qdev_realize(DEVICE(rem_host), sysbus_get_default(), &error_fatal);
152
+}
136
+}
153
+
137
+
154
static const char *const nvme_strong_runtime_opts[] = {
138
+static void remote_machine_class_init(ObjectClass *oc, void *data)
155
NVME_BLOCK_OPT_DEVICE,
139
+{
156
NVME_BLOCK_OPT_NAMESPACE,
140
+ MachineClass *mc = MACHINE_CLASS(oc);
157
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_nvme = {
141
+
158
.bdrv_refresh_filename = nvme_refresh_filename,
142
+ mc->init = remote_machine_init;
159
.bdrv_refresh_limits = nvme_refresh_limits,
143
+ mc->desc = "Experimental remote machine";
160
.strong_runtime_opts = nvme_strong_runtime_opts,
144
+}
161
+ .bdrv_get_specific_stats = nvme_get_specific_stats,
145
+
162
146
+static const TypeInfo remote_machine = {
163
.bdrv_detach_aio_context = nvme_detach_aio_context,
147
+ .name = TYPE_REMOTE_MACHINE,
164
.bdrv_attach_aio_context = nvme_attach_aio_context,
148
+ .parent = TYPE_MACHINE,
149
+ .instance_size = sizeof(RemoteMachineState),
150
+ .class_init = remote_machine_class_init,
151
+};
152
+
153
+static void remote_machine_register_types(void)
154
+{
155
+ type_register_static(&remote_machine);
156
+}
157
+
158
+type_init(remote_machine_register_types);
159
diff --git a/hw/meson.build b/hw/meson.build
160
index XXXXXXX..XXXXXXX 100644
161
--- a/hw/meson.build
162
+++ b/hw/meson.build
163
@@ -XXX,XX +XXX,XX @@ subdir('moxie')
164
subdir('nios2')
165
subdir('openrisc')
166
subdir('ppc')
167
+subdir('remote')
168
subdir('riscv')
169
subdir('rx')
170
subdir('s390x')
171
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
172
new file mode 100644
173
index XXXXXXX..XXXXXXX
174
--- /dev/null
175
+++ b/hw/remote/meson.build
176
@@ -XXX,XX +XXX,XX @@
177
+remote_ss = ss.source_set()
178
+
179
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('machine.c'))
180
+
181
+softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
165
--
182
--
166
2.26.2
183
2.29.2
167
184
diff view generated by jsdifflib
1
From: Coiby Xu <coiby.xu@gmail.com>
1
From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
2
2
3
When the client is running in gdb and quit command is run in gdb,
3
Adds qio_channel_writev_full_all() to transmit both data and FDs.
4
QEMU will still dispatch the event which will cause segment fault in
4
Refactors existing code to use this helper.
5
the callback function.
6
5
7
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
6
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
7
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
8
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
8
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
10
Acked-by: Daniel P. Berrangé <berrange@redhat.com>
10
Message-id: 20200918080912.321299-3-coiby.xu@gmail.com
11
Message-id: 480fbf1fe4152495d60596c9b665124549b426a5.1611938319.git.jag.raman@oracle.com
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
13
---
13
contrib/libvhost-user/libvhost-user.c | 1 +
14
include/io/channel.h | 25 +++++++++++++++++++++++++
14
1 file changed, 1 insertion(+)
15
io/channel.c | 15 ++++++++++++++-
16
2 files changed, 39 insertions(+), 1 deletion(-)
15
17
16
diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
18
diff --git a/include/io/channel.h b/include/io/channel.h
17
index XXXXXXX..XXXXXXX 100644
19
index XXXXXXX..XXXXXXX 100644
18
--- a/contrib/libvhost-user/libvhost-user.c
20
--- a/include/io/channel.h
19
+++ b/contrib/libvhost-user/libvhost-user.c
21
+++ b/include/io/channel.h
20
@@ -XXX,XX +XXX,XX @@ vu_deinit(VuDev *dev)
22
@@ -XXX,XX +XXX,XX @@ void qio_channel_set_aio_fd_handler(QIOChannel *ioc,
23
IOHandler *io_write,
24
void *opaque);
25
26
+/**
27
+ * qio_channel_writev_full_all:
28
+ * @ioc: the channel object
29
+ * @iov: the array of memory regions to write data from
30
+ * @niov: the length of the @iov array
31
+ * @fds: an array of file handles to send
32
+ * @nfds: number of file handles in @fds
33
+ * @errp: pointer to a NULL-initialized error object
34
+ *
35
+ *
36
+ * Behaves like qio_channel_writev_full but will attempt
37
+ * to send all data passed (file handles and memory regions).
38
+ * The function will wait for all requested data
39
+ * to be written, yielding from the current coroutine
40
+ * if required.
41
+ *
42
+ * Returns: 0 if all bytes were written, or -1 on error
43
+ */
44
+
45
+int qio_channel_writev_full_all(QIOChannel *ioc,
46
+ const struct iovec *iov,
47
+ size_t niov,
48
+ int *fds, size_t nfds,
49
+ Error **errp);
50
+
51
#endif /* QIO_CHANNEL_H */
52
diff --git a/io/channel.c b/io/channel.c
53
index XXXXXXX..XXXXXXX 100644
54
--- a/io/channel.c
55
+++ b/io/channel.c
56
@@ -XXX,XX +XXX,XX @@ int qio_channel_writev_all(QIOChannel *ioc,
57
const struct iovec *iov,
58
size_t niov,
59
Error **errp)
60
+{
61
+ return qio_channel_writev_full_all(ioc, iov, niov, NULL, 0, errp);
62
+}
63
+
64
+int qio_channel_writev_full_all(QIOChannel *ioc,
65
+ const struct iovec *iov,
66
+ size_t niov,
67
+ int *fds, size_t nfds,
68
+ Error **errp)
69
{
70
int ret = -1;
71
struct iovec *local_iov = g_new(struct iovec, niov);
72
@@ -XXX,XX +XXX,XX @@ int qio_channel_writev_all(QIOChannel *ioc,
73
74
while (nlocal_iov > 0) {
75
ssize_t len;
76
- len = qio_channel_writev(ioc, local_iov, nlocal_iov, errp);
77
+ len = qio_channel_writev_full(ioc, local_iov, nlocal_iov, fds, nfds,
78
+ errp);
79
if (len == QIO_CHANNEL_ERR_BLOCK) {
80
if (qemu_in_coroutine()) {
81
qio_channel_yield(ioc, G_IO_OUT);
82
@@ -XXX,XX +XXX,XX @@ int qio_channel_writev_all(QIOChannel *ioc,
21
}
83
}
22
84
23
if (vq->kick_fd != -1) {
85
iov_discard_front(&local_iov, &nlocal_iov, len);
24
+ dev->remove_watch(dev, vq->kick_fd);
86
+
25
close(vq->kick_fd);
87
+ fds = NULL;
26
vq->kick_fd = -1;
88
+ nfds = 0;
27
}
89
}
90
91
ret = 0;
28
--
92
--
29
2.26.2
93
2.29.2
30
94
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
2
2
3
bdrv_is_allocated_above wrongly handles short backing files: it reports
3
Adds qio_channel_readv_full_all_eof() and qio_channel_readv_full_all()
4
after-EOF space as UNALLOCATED which is wrong, as on read the data is
4
to read both data and FDs. Refactors existing code to use these helpers.
5
generated on the level of short backing file (if all overlays have
5
6
unallocated areas at that place).
6
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
7
7
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
8
Reusing bdrv_common_block_status_above fixes the issue and unifies code
8
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
9
path.
9
Acked-by: Daniel P. Berrangé <berrange@redhat.com>
10
10
Message-id: b059c4cc0fb741e794d644c144cc21372cad877d.1611938319.git.jag.raman@oracle.com
11
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
12
Reviewed-by: Eric Blake <eblake@redhat.com>
13
Reviewed-by: Alberto Garcia <berto@igalia.com>
14
Message-id: 20200924194003.22080-5-vsementsov@virtuozzo.com
15
[Fix s/has/have/ as suggested by Eric Blake. Fix s/area/areas/.
16
--Stefan]
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
18
---
12
---
19
block/io.c | 43 +++++--------------------------------------
13
include/io/channel.h | 53 +++++++++++++++++++++++
20
1 file changed, 5 insertions(+), 38 deletions(-)
14
io/channel.c | 101 ++++++++++++++++++++++++++++++++++---------
21
15
2 files changed, 134 insertions(+), 20 deletions(-)
22
diff --git a/block/io.c b/block/io.c
16
17
diff --git a/include/io/channel.h b/include/io/channel.h
23
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
24
--- a/block/io.c
19
--- a/include/io/channel.h
25
+++ b/block/io.c
20
+++ b/include/io/channel.h
26
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
21
@@ -XXX,XX +XXX,XX @@ void qio_channel_set_aio_fd_handler(QIOChannel *ioc,
27
* at 'offset + *pnum' may return the same allocation status (in other
22
IOHandler *io_write,
28
* words, the result is not necessarily the maximum possible range);
23
void *opaque);
29
* but 'pnum' will only be 0 when end of file is reached.
24
30
- *
25
+/**
31
*/
26
+ * qio_channel_readv_full_all_eof:
32
int bdrv_is_allocated_above(BlockDriverState *top,
27
+ * @ioc: the channel object
33
BlockDriverState *base,
28
+ * @iov: the array of memory regions to read data to
34
bool include_base, int64_t offset,
29
+ * @niov: the length of the @iov array
35
int64_t bytes, int64_t *pnum)
30
+ * @fds: an array of file handles to read
31
+ * @nfds: number of file handles in @fds
32
+ * @errp: pointer to a NULL-initialized error object
33
+ *
34
+ *
35
+ * Performs same function as qio_channel_readv_all_eof.
36
+ * Additionally, attempts to read file descriptors shared
37
+ * over the channel. The function will wait for all
38
+ * requested data to be read, yielding from the current
39
+ * coroutine if required. data refers to both file
40
+ * descriptors and the iovs.
41
+ *
42
+ * Returns: 1 if all bytes were read, 0 if end-of-file
43
+ * occurs without data, or -1 on error
44
+ */
45
+
46
+int qio_channel_readv_full_all_eof(QIOChannel *ioc,
47
+ const struct iovec *iov,
48
+ size_t niov,
49
+ int **fds, size_t *nfds,
50
+ Error **errp);
51
+
52
+/**
53
+ * qio_channel_readv_full_all:
54
+ * @ioc: the channel object
55
+ * @iov: the array of memory regions to read data to
56
+ * @niov: the length of the @iov array
57
+ * @fds: an array of file handles to read
58
+ * @nfds: number of file handles in @fds
59
+ * @errp: pointer to a NULL-initialized error object
60
+ *
61
+ *
62
+ * Performs same function as qio_channel_readv_all_eof.
63
+ * Additionally, attempts to read file descriptors shared
64
+ * over the channel. The function will wait for all
65
+ * requested data to be read, yielding from the current
66
+ * coroutine if required. data refers to both file
67
+ * descriptors and the iovs.
68
+ *
69
+ * Returns: 0 if all bytes were read, or -1 on error
70
+ */
71
+
72
+int qio_channel_readv_full_all(QIOChannel *ioc,
73
+ const struct iovec *iov,
74
+ size_t niov,
75
+ int **fds, size_t *nfds,
76
+ Error **errp);
77
+
78
/**
79
* qio_channel_writev_full_all:
80
* @ioc: the channel object
81
diff --git a/io/channel.c b/io/channel.c
82
index XXXXXXX..XXXXXXX 100644
83
--- a/io/channel.c
84
+++ b/io/channel.c
85
@@ -XXX,XX +XXX,XX @@ int qio_channel_readv_all_eof(QIOChannel *ioc,
86
const struct iovec *iov,
87
size_t niov,
88
Error **errp)
89
+{
90
+ return qio_channel_readv_full_all_eof(ioc, iov, niov, NULL, NULL, errp);
91
+}
92
+
93
+int qio_channel_readv_all(QIOChannel *ioc,
94
+ const struct iovec *iov,
95
+ size_t niov,
96
+ Error **errp)
97
+{
98
+ return qio_channel_readv_full_all(ioc, iov, niov, NULL, NULL, errp);
99
+}
100
+
101
+int qio_channel_readv_full_all_eof(QIOChannel *ioc,
102
+ const struct iovec *iov,
103
+ size_t niov,
104
+ int **fds, size_t *nfds,
105
+ Error **errp)
36
{
106
{
37
- BlockDriverState *intermediate;
107
int ret = -1;
38
- int ret;
108
struct iovec *local_iov = g_new(struct iovec, niov);
39
- int64_t n = bytes;
109
struct iovec *local_iov_head = local_iov;
40
-
110
unsigned int nlocal_iov = niov;
41
- assert(base || !include_base);
111
+ int **local_fds = fds;
42
-
112
+ size_t *local_nfds = nfds;
43
- intermediate = top;
113
bool partial = false;
44
- while (include_base || intermediate != base) {
114
45
- int64_t pnum_inter;
115
+ if (nfds) {
46
- int64_t size_inter;
116
+ *nfds = 0;
47
-
117
+ }
48
- assert(intermediate);
118
+
49
- ret = bdrv_is_allocated(intermediate, offset, bytes, &pnum_inter);
119
+ if (fds) {
50
- if (ret < 0) {
120
+ *fds = NULL;
51
- return ret;
121
+ }
52
- }
122
+
53
- if (ret) {
123
nlocal_iov = iov_copy(local_iov, nlocal_iov,
54
- *pnum = pnum_inter;
124
iov, niov,
55
- return 1;
125
0, iov_size(iov, niov));
56
- }
126
57
-
127
- while (nlocal_iov > 0) {
58
- size_inter = bdrv_getlength(intermediate);
128
+ while ((nlocal_iov > 0) || local_fds) {
59
- if (size_inter < 0) {
129
ssize_t len;
60
- return size_inter;
130
- len = qio_channel_readv(ioc, local_iov, nlocal_iov, errp);
61
- }
131
+ len = qio_channel_readv_full(ioc, local_iov, nlocal_iov, local_fds,
62
- if (n > pnum_inter &&
132
+ local_nfds, errp);
63
- (intermediate == top || offset + pnum_inter < size_inter)) {
133
if (len == QIO_CHANNEL_ERR_BLOCK) {
64
- n = pnum_inter;
134
if (qemu_in_coroutine()) {
65
- }
135
qio_channel_yield(ioc, G_IO_IN);
66
-
136
@@ -XXX,XX +XXX,XX @@ int qio_channel_readv_all_eof(QIOChannel *ioc,
67
- if (intermediate == base) {
137
qio_channel_wait(ioc, G_IO_IN);
68
- break;
138
}
69
- }
139
continue;
70
-
140
- } else if (len < 0) {
71
- intermediate = bdrv_filter_or_cow_bs(intermediate);
141
- goto cleanup;
72
+ int ret = bdrv_common_block_status_above(top, base, include_base, false,
142
- } else if (len == 0) {
73
+ offset, bytes, pnum, NULL, NULL);
143
- if (partial) {
74
+ if (ret < 0) {
144
- error_setg(errp,
75
+ return ret;
145
- "Unexpected end-of-file before all bytes were read");
146
- } else {
147
+ }
148
+
149
+ if (len == 0) {
150
+ if (local_nfds && *local_nfds) {
151
+ /*
152
+ * Got some FDs, but no data yet. This isn't an EOF
153
+ * scenario (yet), so carry on to try to read data
154
+ * on next loop iteration
155
+ */
156
+ goto next_iter;
157
+ } else if (!partial) {
158
+ /* No fds and no data - EOF before any data read */
159
ret = 0;
160
+ goto cleanup;
161
+ } else {
162
+ len = -1;
163
+ error_setg(errp,
164
+ "Unexpected end-of-file before all data were read");
165
+ /* Fallthrough into len < 0 handling */
166
+ }
167
+ }
168
+
169
+ if (len < 0) {
170
+ /* Close any FDs we previously received */
171
+ if (nfds && fds) {
172
+ size_t i;
173
+ for (i = 0; i < (*nfds); i++) {
174
+ close((*fds)[i]);
175
+ }
176
+ g_free(*fds);
177
+ *fds = NULL;
178
+ *nfds = 0;
179
}
180
goto cleanup;
181
}
182
183
+ if (nlocal_iov) {
184
+ iov_discard_front(&local_iov, &nlocal_iov, len);
185
+ }
186
+
187
+next_iter:
188
partial = true;
189
- iov_discard_front(&local_iov, &nlocal_iov, len);
190
+ local_fds = NULL;
191
+ local_nfds = NULL;
76
}
192
}
77
193
78
- *pnum = n;
194
ret = 1;
79
- return 0;
195
@@ -XXX,XX +XXX,XX @@ int qio_channel_readv_all_eof(QIOChannel *ioc,
80
+ return !!(ret & BDRV_BLOCK_ALLOCATED);
196
return ret;
81
}
197
}
82
198
83
int coroutine_fn
199
-int qio_channel_readv_all(QIOChannel *ioc,
200
- const struct iovec *iov,
201
- size_t niov,
202
- Error **errp)
203
+int qio_channel_readv_full_all(QIOChannel *ioc,
204
+ const struct iovec *iov,
205
+ size_t niov,
206
+ int **fds, size_t *nfds,
207
+ Error **errp)
208
{
209
- int ret = qio_channel_readv_all_eof(ioc, iov, niov, errp);
210
+ int ret = qio_channel_readv_full_all_eof(ioc, iov, niov, fds, nfds, errp);
211
212
if (ret == 0) {
213
- ret = -1;
214
- error_setg(errp,
215
- "Unexpected end-of-file before all bytes were read");
216
- } else if (ret == 1) {
217
- ret = 0;
218
+ error_prepend(errp,
219
+ "Unexpected end-of-file before all data were read.");
220
+ return -1;
221
}
222
+ if (ret == 1) {
223
+ return 0;
224
+ }
225
+
226
return ret;
227
}
228
84
--
229
--
85
2.26.2
230
2.29.2
86
231
diff view generated by jsdifflib
1
Don't compile contrib/libvhost-user/libvhost-user.c again. Instead build
1
From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
2
the static library once and then reuse it throughout QEMU.
2
3
3
Defines MPQemuMsg, which is the message that is sent to the remote
4
Also switch from CONFIG_LINUX to CONFIG_VHOST_USER, which is what the
4
process. This message is sent over QIOChannel and is used to
5
vhost-user tools (vhost-user-gpu, etc) do.
5
command the remote process to perform various tasks.
6
6
Define transmission functions used by proxy and by remote.
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
8
Message-id: 20200924151549.913737-14-stefanha@redhat.com
8
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
9
[Added CONFIG_LINUX again because libvhost-user doesn't build on macOS.
9
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
10
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
11
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Message-id: 56ca8bcf95195b2b195b08f6b9565b6d7410bce5.1611938319.git.jag.raman@oracle.com
13
14
[Replace struct iovec send[2] = {0} with {} to make clang happy as
15
suggested by Peter Maydell <peter.maydell@linaro.org>.
10
--Stefan]
16
--Stefan]
17
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
18
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
19
---
13
block/export/export.c | 8 ++++----
20
MAINTAINERS | 2 +
14
block/export/meson.build | 2 +-
21
meson.build | 1 +
15
contrib/libvhost-user/meson.build | 1 +
22
hw/remote/trace.h | 1 +
16
meson.build | 6 +++++-
23
include/hw/remote/mpqemu-link.h | 63 ++++++++++
17
util/meson.build | 4 +++-
24
include/sysemu/iothread.h | 6 +
18
5 files changed, 14 insertions(+), 7 deletions(-)
25
hw/remote/mpqemu-link.c | 205 ++++++++++++++++++++++++++++++++
19
26
iothread.c | 6 +
20
diff --git a/block/export/export.c b/block/export/export.c
27
hw/remote/meson.build | 1 +
21
index XXXXXXX..XXXXXXX 100644
28
hw/remote/trace-events | 4 +
22
--- a/block/export/export.c
29
9 files changed, 289 insertions(+)
23
+++ b/block/export/export.c
30
create mode 100644 hw/remote/trace.h
24
@@ -XXX,XX +XXX,XX @@
31
create mode 100644 include/hw/remote/mpqemu-link.h
25
#include "sysemu/block-backend.h"
32
create mode 100644 hw/remote/mpqemu-link.c
26
#include "block/export.h"
33
create mode 100644 hw/remote/trace-events
27
#include "block/nbd.h"
34
28
-#if CONFIG_LINUX
35
diff --git a/MAINTAINERS b/MAINTAINERS
29
-#include "block/export/vhost-user-blk-server.h"
36
index XXXXXXX..XXXXXXX 100644
30
-#endif
37
--- a/MAINTAINERS
31
#include "qapi/error.h"
38
+++ b/MAINTAINERS
32
#include "qapi/qapi-commands-block-export.h"
39
@@ -XXX,XX +XXX,XX @@ F: hw/pci-host/remote.c
33
#include "qapi/qapi-events-block-export.h"
40
F: include/hw/pci-host/remote.h
34
#include "qemu/id.h"
41
F: hw/remote/machine.c
35
+#ifdef CONFIG_VHOST_USER
42
F: include/hw/remote/machine.h
36
+#include "vhost-user-blk-server.h"
43
+F: hw/remote/mpqemu-link.c
37
+#endif
44
+F: include/hw/remote/mpqemu-link.h
38
45
39
static const BlockExportDriver *blk_exp_drivers[] = {
46
Build and test automation
40
&blk_exp_nbd,
47
-------------------------
41
-#if CONFIG_LINUX
42
+#ifdef CONFIG_VHOST_USER
43
&blk_exp_vhost_user_blk,
44
#endif
45
};
46
diff --git a/block/export/meson.build b/block/export/meson.build
47
index XXXXXXX..XXXXXXX 100644
48
--- a/block/export/meson.build
49
+++ b/block/export/meson.build
50
@@ -XXX,XX +XXX,XX @@
51
block_ss.add(files('export.c'))
52
-block_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-blk-server.c', '../../contrib/libvhost-user/libvhost-user.c'))
53
+block_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
54
diff --git a/contrib/libvhost-user/meson.build b/contrib/libvhost-user/meson.build
55
index XXXXXXX..XXXXXXX 100644
56
--- a/contrib/libvhost-user/meson.build
57
+++ b/contrib/libvhost-user/meson.build
58
@@ -XXX,XX +XXX,XX @@
59
libvhost_user = static_library('vhost-user',
60
files('libvhost-user.c', 'libvhost-user-glib.c'),
61
build_by_default: false)
62
+vhost_user = declare_dependency(link_with: libvhost_user)
63
diff --git a/meson.build b/meson.build
48
diff --git a/meson.build b/meson.build
64
index XXXXXXX..XXXXXXX 100644
49
index XXXXXXX..XXXXXXX 100644
65
--- a/meson.build
50
--- a/meson.build
66
+++ b/meson.build
51
+++ b/meson.build
67
@@ -XXX,XX +XXX,XX @@ trace_events_subdirs += [
52
@@ -XXX,XX +XXX,XX @@ if have_system
68
'util',
53
'net',
69
]
54
'softmmu',
70
55
'ui',
71
+vhost_user = not_found
56
+ 'hw/remote',
72
+if 'CONFIG_VHOST_USER' in config_host
57
]
73
+ subdir('contrib/libvhost-user')
58
endif
74
+endif
59
if have_system or have_user
75
+
60
diff --git a/hw/remote/trace.h b/hw/remote/trace.h
76
subdir('qapi')
61
new file mode 100644
77
subdir('qobject')
62
index XXXXXXX..XXXXXXX
78
subdir('stubs')
63
--- /dev/null
79
@@ -XXX,XX +XXX,XX @@ if have_tools
64
+++ b/hw/remote/trace.h
80
install: true)
65
@@ -0,0 +1 @@
81
66
+#include "trace/trace-hw_remote.h"
82
if 'CONFIG_VHOST_USER' in config_host
67
diff --git a/include/hw/remote/mpqemu-link.h b/include/hw/remote/mpqemu-link.h
83
- subdir('contrib/libvhost-user')
68
new file mode 100644
84
subdir('contrib/vhost-user-blk')
69
index XXXXXXX..XXXXXXX
85
subdir('contrib/vhost-user-gpu')
70
--- /dev/null
86
subdir('contrib/vhost-user-input')
71
+++ b/include/hw/remote/mpqemu-link.h
87
diff --git a/util/meson.build b/util/meson.build
72
@@ -XXX,XX +XXX,XX @@
88
index XXXXXXX..XXXXXXX 100644
73
+/*
89
--- a/util/meson.build
74
+ * Communication channel between QEMU and remote device process
90
+++ b/util/meson.build
75
+ *
91
@@ -XXX,XX +XXX,XX @@ if have_block
76
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
92
util_ss.add(files('main-loop.c'))
77
+ *
93
util_ss.add(files('nvdimm-utils.c'))
78
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
94
util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
79
+ * See the COPYING file in the top-level directory.
95
- util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
80
+ *
96
+ util_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: [
81
+ */
97
+ files('vhost-user-server.c'), vhost_user
82
+
98
+ ])
83
+#ifndef MPQEMU_LINK_H
99
util_ss.add(files('block-helpers.c'))
84
+#define MPQEMU_LINK_H
100
util_ss.add(files('qemu-coroutine-sleep.c'))
85
+
101
util_ss.add(files('qemu-co-shared-resource.c'))
86
+#include "qom/object.h"
87
+#include "qemu/thread.h"
88
+#include "io/channel.h"
89
+
90
+#define REMOTE_MAX_FDS 8
91
+
92
+#define MPQEMU_MSG_HDR_SIZE offsetof(MPQemuMsg, data.u64)
93
+
94
+/**
95
+ * MPQemuCmd:
96
+ *
97
+ * MPQemuCmd enum type to specify the command to be executed on the remote
98
+ * device.
99
+ *
100
+ * This uses a private protocol between QEMU and the remote process. vfio-user
101
+ * protocol would supersede this in the future.
102
+ *
103
+ */
104
+typedef enum {
105
+ MPQEMU_CMD_MAX,
106
+} MPQemuCmd;
107
+
108
+/**
109
+ * MPQemuMsg:
110
+ * @cmd: The remote command
111
+ * @size: Size of the data to be shared
112
+ * @data: Structured data
113
+ * @fds: File descriptors to be shared with remote device
114
+ *
115
+ * MPQemuMsg Format of the message sent to the remote device from QEMU.
116
+ *
117
+ */
118
+typedef struct {
119
+ int cmd;
120
+ size_t size;
121
+
122
+ union {
123
+ uint64_t u64;
124
+ } data;
125
+
126
+ int fds[REMOTE_MAX_FDS];
127
+ int num_fds;
128
+} MPQemuMsg;
129
+
130
+bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp);
131
+bool mpqemu_msg_recv(MPQemuMsg *msg, QIOChannel *ioc, Error **errp);
132
+
133
+bool mpqemu_msg_valid(MPQemuMsg *msg);
134
+
135
+#endif
136
diff --git a/include/sysemu/iothread.h b/include/sysemu/iothread.h
137
index XXXXXXX..XXXXXXX 100644
138
--- a/include/sysemu/iothread.h
139
+++ b/include/sysemu/iothread.h
140
@@ -XXX,XX +XXX,XX @@ IOThread *iothread_create(const char *id, Error **errp);
141
void iothread_stop(IOThread *iothread);
142
void iothread_destroy(IOThread *iothread);
143
144
+/*
145
+ * Returns true if executing withing IOThread context,
146
+ * false otherwise.
147
+ */
148
+bool qemu_in_iothread(void);
149
+
150
#endif /* IOTHREAD_H */
151
diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
152
new file mode 100644
153
index XXXXXXX..XXXXXXX
154
--- /dev/null
155
+++ b/hw/remote/mpqemu-link.c
156
@@ -XXX,XX +XXX,XX @@
157
+/*
158
+ * Communication channel between QEMU and remote device process
159
+ *
160
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
161
+ *
162
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
163
+ * See the COPYING file in the top-level directory.
164
+ *
165
+ */
166
+
167
+#include "qemu/osdep.h"
168
+#include "qemu-common.h"
169
+
170
+#include "qemu/module.h"
171
+#include "hw/remote/mpqemu-link.h"
172
+#include "qapi/error.h"
173
+#include "qemu/iov.h"
174
+#include "qemu/error-report.h"
175
+#include "qemu/main-loop.h"
176
+#include "io/channel.h"
177
+#include "sysemu/iothread.h"
178
+#include "trace.h"
179
+
180
+/*
181
+ * Send message over the ioc QIOChannel.
182
+ * This function is safe to call from:
183
+ * - main loop in co-routine context. Will block the main loop if not in
184
+ * co-routine context;
185
+ * - vCPU thread with no co-routine context and if the channel is not part
186
+ * of the main loop handling;
187
+ * - IOThread within co-routine context, outside of co-routine context
188
+ * will block IOThread;
189
+ * Returns true if no errors were encountered, false otherwise.
190
+ */
191
+bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
192
+{
193
+ ERRP_GUARD();
194
+ bool iolock = qemu_mutex_iothread_locked();
195
+ bool iothread = qemu_in_iothread();
196
+ struct iovec send[2] = {};
197
+ int *fds = NULL;
198
+ size_t nfds = 0;
199
+ bool ret = false;
200
+
201
+ send[0].iov_base = msg;
202
+ send[0].iov_len = MPQEMU_MSG_HDR_SIZE;
203
+
204
+ send[1].iov_base = (void *)&msg->data;
205
+ send[1].iov_len = msg->size;
206
+
207
+ if (msg->num_fds) {
208
+ nfds = msg->num_fds;
209
+ fds = msg->fds;
210
+ }
211
+
212
+ /*
213
+ * Dont use in IOThread out of co-routine context as
214
+ * it will block IOThread.
215
+ */
216
+ assert(qemu_in_coroutine() || !iothread);
217
+
218
+ /*
219
+ * Skip unlocking/locking iothread lock when the IOThread is running
220
+ * in co-routine context. Co-routine context is asserted above
221
+ * for IOThread case.
222
+ * Also skip lock handling while in a co-routine in the main context.
223
+ */
224
+ if (iolock && !iothread && !qemu_in_coroutine()) {
225
+ qemu_mutex_unlock_iothread();
226
+ }
227
+
228
+ if (!qio_channel_writev_full_all(ioc, send, G_N_ELEMENTS(send),
229
+ fds, nfds, errp)) {
230
+ ret = true;
231
+ } else {
232
+ trace_mpqemu_send_io_error(msg->cmd, msg->size, nfds);
233
+ }
234
+
235
+ if (iolock && !iothread && !qemu_in_coroutine()) {
236
+ /* See above comment why skip locking here. */
237
+ qemu_mutex_lock_iothread();
238
+ }
239
+
240
+ return ret;
241
+}
242
+
243
+/*
244
+ * Read message from the ioc QIOChannel.
245
+ * This function is safe to call from:
246
+ * - From main loop in co-routine context. Will block the main loop if not in
247
+ * co-routine context;
248
+ * - From vCPU thread with no co-routine context and if the channel is not part
249
+ * of the main loop handling;
250
+ * - From IOThread within co-routine context, outside of co-routine context
251
+ * will block IOThread;
252
+ */
253
+static ssize_t mpqemu_read(QIOChannel *ioc, void *buf, size_t len, int **fds,
254
+ size_t *nfds, Error **errp)
255
+{
256
+ ERRP_GUARD();
257
+ struct iovec iov = { .iov_base = buf, .iov_len = len };
258
+ bool iolock = qemu_mutex_iothread_locked();
259
+ bool iothread = qemu_in_iothread();
260
+ int ret = -1;
261
+
262
+ /*
263
+ * Dont use in IOThread out of co-routine context as
264
+ * it will block IOThread.
265
+ */
266
+ assert(qemu_in_coroutine() || !iothread);
267
+
268
+ if (iolock && !iothread && !qemu_in_coroutine()) {
269
+ qemu_mutex_unlock_iothread();
270
+ }
271
+
272
+ ret = qio_channel_readv_full_all_eof(ioc, &iov, 1, fds, nfds, errp);
273
+
274
+ if (iolock && !iothread && !qemu_in_coroutine()) {
275
+ qemu_mutex_lock_iothread();
276
+ }
277
+
278
+ return (ret <= 0) ? ret : iov.iov_len;
279
+}
280
+
281
+bool mpqemu_msg_recv(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
282
+{
283
+ ERRP_GUARD();
284
+ g_autofree int *fds = NULL;
285
+ size_t nfds = 0;
286
+ ssize_t len;
287
+ bool ret = false;
288
+
289
+ len = mpqemu_read(ioc, msg, MPQEMU_MSG_HDR_SIZE, &fds, &nfds, errp);
290
+ if (len <= 0) {
291
+ goto fail;
292
+ } else if (len != MPQEMU_MSG_HDR_SIZE) {
293
+ error_setg(errp, "Message header corrupted");
294
+ goto fail;
295
+ }
296
+
297
+ if (msg->size > sizeof(msg->data)) {
298
+ error_setg(errp, "Invalid size for message");
299
+ goto fail;
300
+ }
301
+
302
+ if (!msg->size) {
303
+ goto copy_fds;
304
+ }
305
+
306
+ len = mpqemu_read(ioc, &msg->data, msg->size, NULL, NULL, errp);
307
+ if (len <= 0) {
308
+ goto fail;
309
+ }
310
+ if (len != msg->size) {
311
+ error_setg(errp, "Unable to read full message");
312
+ goto fail;
313
+ }
314
+
315
+copy_fds:
316
+ msg->num_fds = nfds;
317
+ if (nfds > G_N_ELEMENTS(msg->fds)) {
318
+ error_setg(errp,
319
+ "Overflow error: received %zu fds, more than max of %d fds",
320
+ nfds, REMOTE_MAX_FDS);
321
+ goto fail;
322
+ }
323
+ if (nfds) {
324
+ memcpy(msg->fds, fds, nfds * sizeof(int));
325
+ }
326
+
327
+ ret = true;
328
+
329
+fail:
330
+ if (*errp) {
331
+ trace_mpqemu_recv_io_error(msg->cmd, msg->size, nfds);
332
+ }
333
+ while (*errp && nfds) {
334
+ close(fds[nfds - 1]);
335
+ nfds--;
336
+ }
337
+
338
+ return ret;
339
+}
340
+
341
+bool mpqemu_msg_valid(MPQemuMsg *msg)
342
+{
343
+ if (msg->cmd >= MPQEMU_CMD_MAX && msg->cmd < 0) {
344
+ return false;
345
+ }
346
+
347
+ /* Verify FDs. */
348
+ if (msg->num_fds >= REMOTE_MAX_FDS) {
349
+ return false;
350
+ }
351
+
352
+ if (msg->num_fds > 0) {
353
+ for (int i = 0; i < msg->num_fds; i++) {
354
+ if (fcntl(msg->fds[i], F_GETFL) == -1) {
355
+ return false;
356
+ }
357
+ }
358
+ }
359
+
360
+ return true;
361
+}
362
diff --git a/iothread.c b/iothread.c
363
index XXXXXXX..XXXXXXX 100644
364
--- a/iothread.c
365
+++ b/iothread.c
366
@@ -XXX,XX +XXX,XX @@ IOThread *iothread_by_id(const char *id)
367
{
368
return IOTHREAD(object_resolve_path_type(id, TYPE_IOTHREAD, NULL));
369
}
370
+
371
+bool qemu_in_iothread(void)
372
+{
373
+ return qemu_get_current_aio_context() == qemu_get_aio_context() ?
374
+ false : true;
375
+}
376
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
377
index XXXXXXX..XXXXXXX 100644
378
--- a/hw/remote/meson.build
379
+++ b/hw/remote/meson.build
380
@@ -XXX,XX +XXX,XX @@
381
remote_ss = ss.source_set()
382
383
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('machine.c'))
384
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('mpqemu-link.c'))
385
386
softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
387
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
388
new file mode 100644
389
index XXXXXXX..XXXXXXX
390
--- /dev/null
391
+++ b/hw/remote/trace-events
392
@@ -XXX,XX +XXX,XX @@
393
+# multi-process trace events
394
+
395
+mpqemu_send_io_error(int cmd, int size, int nfds) "send command %d size %d, %d file descriptors to remote process"
396
+mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d, %d file descriptors to remote process"
102
--
397
--
103
2.26.2
398
2.29.2
104
399
diff view generated by jsdifflib
1
Use the new QAPI block exports API instead of defining our own QOM
1
From: Jagannathan Raman <jag.raman@oracle.com>
2
objects.
3
2
4
This is a large change because the lifecycle of VuBlockDev needs to
3
Initializes the message handler function in the remote process. It is
5
follow BlockExportDriver. QOM properties are replaced by QAPI options
4
called whenever there's an event pending on QIOChannel that registers
6
objects.
5
this function.
7
6
8
VuBlockDev is renamed VuBlkExport and contains a BlockExport field.
7
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9
Several fields can be dropped since BlockExport already has equivalents.
8
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
10
9
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
11
The file names and meson build integration will be adjusted in a future
10
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
12
patch. libvhost-user should probably be built as a static library that
11
Message-id: 99d38d8b93753a6409ac2340e858858cda59ab1b.1611938319.git.jag.raman@oracle.com
13
is linked into QEMU instead of as a .c file that results in duplicate
14
compilation.
15
16
The new command-line syntax is:
17
18
$ qemu-storage-daemon \
19
--blockdev file,node-name=drive0,filename=test.img \
20
--export vhost-user-blk,node-name=drive0,id=export0,unix-socket=/tmp/vhost-user-blk.sock
21
22
Note that unix-socket is optional because we may wish to accept chardevs
23
too in the future.
24
25
Markus noted that supported address families are not explicit in the
26
QAPI schema. It is unlikely that support for more address families will
27
be added since file descriptor passing is required and few address
28
families support it. If a new address family needs to be added, then the
29
QAPI 'features' syntax can be used to advertize them.
30
31
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
32
Acked-by: Markus Armbruster <armbru@redhat.com>
33
Message-id: 20200924151549.913737-12-stefanha@redhat.com
34
[Skip test on big-endian host architectures because this device doesn't
35
support them yet (as already mentioned in a code comment).
36
--Stefan]
37
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
38
---
13
---
39
qapi/block-export.json | 21 +-
14
MAINTAINERS | 1 +
40
block/export/vhost-user-blk-server.h | 23 +-
15
include/hw/remote/machine.h | 9 ++++++
41
block/export/export.c | 6 +
16
hw/remote/message.c | 57 +++++++++++++++++++++++++++++++++++++
42
block/export/vhost-user-blk-server.c | 452 +++++++--------------------
17
hw/remote/meson.build | 1 +
43
util/vhost-user-server.c | 10 +-
18
4 files changed, 68 insertions(+)
44
block/export/meson.build | 1 +
19
create mode 100644 hw/remote/message.c
45
block/meson.build | 1 -
46
7 files changed, 156 insertions(+), 358 deletions(-)
47
20
48
diff --git a/qapi/block-export.json b/qapi/block-export.json
21
diff --git a/MAINTAINERS b/MAINTAINERS
49
index XXXXXXX..XXXXXXX 100644
22
index XXXXXXX..XXXXXXX 100644
50
--- a/qapi/block-export.json
23
--- a/MAINTAINERS
51
+++ b/qapi/block-export.json
24
+++ b/MAINTAINERS
25
@@ -XXX,XX +XXX,XX @@ F: hw/remote/machine.c
26
F: include/hw/remote/machine.h
27
F: hw/remote/mpqemu-link.c
28
F: include/hw/remote/mpqemu-link.h
29
+F: hw/remote/message.c
30
31
Build and test automation
32
-------------------------
33
diff --git a/include/hw/remote/machine.h b/include/hw/remote/machine.h
34
index XXXXXXX..XXXXXXX 100644
35
--- a/include/hw/remote/machine.h
36
+++ b/include/hw/remote/machine.h
52
@@ -XXX,XX +XXX,XX @@
37
@@ -XXX,XX +XXX,XX @@
53
'data': { '*name': 'str', '*description': 'str',
38
#include "qom/object.h"
54
'*bitmap': 'str' } }
39
#include "hw/boards.h"
55
40
#include "hw/pci-host/remote.h"
56
+##
41
+#include "io/channel.h"
57
+# @BlockExportOptionsVhostUserBlk:
42
58
+#
43
struct RemoteMachineState {
59
+# A vhost-user-blk block export.
44
MachineState parent_obj;
60
+#
45
@@ -XXX,XX +XXX,XX @@ struct RemoteMachineState {
61
+# @addr: The vhost-user socket on which to listen. Both 'unix' and 'fd'
46
RemotePCIHost *host;
62
+# SocketAddress types are supported. Passed fds must be UNIX domain
47
};
63
+# sockets.
48
64
+# @logical-block-size: Logical block size in bytes. Defaults to 512 bytes.
49
+/* Used to pass to co-routine device and ioc. */
65
+#
50
+typedef struct RemoteCommDev {
66
+# Since: 5.2
51
+ PCIDevice *dev;
67
+##
52
+ QIOChannel *ioc;
68
+{ 'struct': 'BlockExportOptionsVhostUserBlk',
53
+} RemoteCommDev;
69
+ 'data': { 'addr': 'SocketAddress', '*logical-block-size': 'size' } }
70
+
54
+
71
##
55
#define TYPE_REMOTE_MACHINE "x-remote-machine"
72
# @NbdServerAddOptions:
56
OBJECT_DECLARE_SIMPLE_TYPE(RemoteMachineState, REMOTE_MACHINE)
73
#
57
58
+void coroutine_fn mpqemu_remote_msg_loop_co(void *data);
59
+
60
#endif
61
diff --git a/hw/remote/message.c b/hw/remote/message.c
62
new file mode 100644
63
index XXXXXXX..XXXXXXX
64
--- /dev/null
65
+++ b/hw/remote/message.c
74
@@ -XXX,XX +XXX,XX @@
66
@@ -XXX,XX +XXX,XX @@
75
# An enumeration of block export types
67
+/*
76
#
68
+ * Copyright © 2020, 2021 Oracle and/or its affiliates.
77
# @nbd: NBD export
69
+ *
78
+# @vhost-user-blk: vhost-user-blk export (since 5.2)
70
+ * This work is licensed under the terms of the GNU GPL-v2, version 2 or later.
79
#
71
+ *
80
# Since: 4.2
72
+ * See the COPYING file in the top-level directory.
81
##
73
+ *
82
{ 'enum': 'BlockExportType',
74
+ */
83
- 'data': [ 'nbd' ] }
84
+ 'data': [ 'nbd', 'vhost-user-blk' ] }
85
86
##
87
# @BlockExportOptions:
88
@@ -XXX,XX +XXX,XX @@
89
'*writethrough': 'bool' },
90
'discriminator': 'type',
91
'data': {
92
- 'nbd': 'BlockExportOptionsNbd'
93
+ 'nbd': 'BlockExportOptionsNbd',
94
+ 'vhost-user-blk': 'BlockExportOptionsVhostUserBlk'
95
} }
96
97
##
98
diff --git a/block/export/vhost-user-blk-server.h b/block/export/vhost-user-blk-server.h
99
index XXXXXXX..XXXXXXX 100644
100
--- a/block/export/vhost-user-blk-server.h
101
+++ b/block/export/vhost-user-blk-server.h
102
@@ -XXX,XX +XXX,XX @@
103
104
#ifndef VHOST_USER_BLK_SERVER_H
105
#define VHOST_USER_BLK_SERVER_H
106
-#include "util/vhost-user-server.h"
107
108
-typedef struct VuBlockDev VuBlockDev;
109
-#define TYPE_VHOST_USER_BLK_SERVER "vhost-user-blk-server"
110
-#define VHOST_USER_BLK_SERVER(obj) \
111
- OBJECT_CHECK(VuBlockDev, obj, TYPE_VHOST_USER_BLK_SERVER)
112
+#include "block/export.h"
113
114
-/* vhost user block device */
115
-struct VuBlockDev {
116
- Object parent_obj;
117
- char *node_name;
118
- SocketAddress *addr;
119
- AioContext *ctx;
120
- VuServer vu_server;
121
- bool running;
122
- uint32_t blk_size;
123
- BlockBackend *backend;
124
- QIOChannelSocket *sioc;
125
- QTAILQ_ENTRY(VuBlockDev) next;
126
- struct virtio_blk_config blkcfg;
127
- bool writable;
128
-};
129
+/* For block/export/export.c */
130
+extern const BlockExportDriver blk_exp_vhost_user_blk;
131
132
#endif /* VHOST_USER_BLK_SERVER_H */
133
diff --git a/block/export/export.c b/block/export/export.c
134
index XXXXXXX..XXXXXXX 100644
135
--- a/block/export/export.c
136
+++ b/block/export/export.c
137
@@ -XXX,XX +XXX,XX @@
138
#include "sysemu/block-backend.h"
139
#include "block/export.h"
140
#include "block/nbd.h"
141
+#if CONFIG_LINUX
142
+#include "block/export/vhost-user-blk-server.h"
143
+#endif
144
#include "qapi/error.h"
145
#include "qapi/qapi-commands-block-export.h"
146
#include "qapi/qapi-events-block-export.h"
147
@@ -XXX,XX +XXX,XX @@
148
149
static const BlockExportDriver *blk_exp_drivers[] = {
150
&blk_exp_nbd,
151
+#if CONFIG_LINUX
152
+ &blk_exp_vhost_user_blk,
153
+#endif
154
};
155
156
/* Only accessed from the main thread */
157
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
158
index XXXXXXX..XXXXXXX 100644
159
--- a/block/export/vhost-user-blk-server.c
160
+++ b/block/export/vhost-user-blk-server.c
161
@@ -XXX,XX +XXX,XX @@
162
*/
163
#include "qemu/osdep.h"
164
#include "block/block.h"
165
+#include "contrib/libvhost-user/libvhost-user.h"
166
+#include "standard-headers/linux/virtio_blk.h"
167
+#include "util/vhost-user-server.h"
168
#include "vhost-user-blk-server.h"
169
#include "qapi/error.h"
170
#include "qom/object_interfaces.h"
171
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_inhdr {
172
unsigned char status;
173
};
174
175
-typedef struct VuBlockReq {
176
+typedef struct VuBlkReq {
177
VuVirtqElement elem;
178
int64_t sector_num;
179
size_t size;
180
@@ -XXX,XX +XXX,XX @@ typedef struct VuBlockReq {
181
struct virtio_blk_outhdr out;
182
VuServer *server;
183
struct VuVirtq *vq;
184
-} VuBlockReq;
185
+} VuBlkReq;
186
187
-static void vu_block_req_complete(VuBlockReq *req)
188
+/* vhost user block device */
189
+typedef struct {
190
+ BlockExport export;
191
+ VuServer vu_server;
192
+ uint32_t blk_size;
193
+ QIOChannelSocket *sioc;
194
+ struct virtio_blk_config blkcfg;
195
+ bool writable;
196
+} VuBlkExport;
197
+
75
+
198
+static void vu_blk_req_complete(VuBlkReq *req)
76
+#include "qemu/osdep.h"
199
{
77
+#include "qemu-common.h"
200
VuDev *vu_dev = &req->server->vu_dev;
78
+
201
79
+#include "hw/remote/machine.h"
202
@@ -XXX,XX +XXX,XX @@ static void vu_block_req_complete(VuBlockReq *req)
80
+#include "io/channel.h"
203
free(req);
81
+#include "hw/remote/mpqemu-link.h"
204
}
82
+#include "qapi/error.h"
205
83
+#include "sysemu/runstate.h"
206
-static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
84
+
207
-{
85
+void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
208
- return container_of(server, VuBlockDev, vu_server);
86
+{
209
-}
87
+ g_autofree RemoteCommDev *com = (RemoteCommDev *)data;
210
-
88
+ PCIDevice *pci_dev = NULL;
211
static int coroutine_fn
89
+ Error *local_err = NULL;
212
-vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
90
+
213
- uint32_t iovcnt, uint32_t type)
91
+ assert(com->ioc);
214
+vu_blk_discard_write_zeroes(BlockBackend *blk, struct iovec *iov,
92
+
215
+ uint32_t iovcnt, uint32_t type)
93
+ pci_dev = com->dev;
216
{
94
+ for (; !local_err;) {
217
struct virtio_blk_discard_write_zeroes desc;
95
+ MPQemuMsg msg = {0};
218
ssize_t size = iov_to_buf(iov, iovcnt, 0, &desc, sizeof(desc));
96
+
219
@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
97
+ if (!mpqemu_msg_recv(&msg, com->ioc, &local_err)) {
220
return -EINVAL;
221
}
222
223
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
224
uint64_t range[2] = { le64_to_cpu(desc.sector) << 9,
225
le32_to_cpu(desc.num_sectors) << 9 };
226
if (type == VIRTIO_BLK_T_DISCARD) {
227
- if (blk_co_pdiscard(vdev_blk->backend, range[0], range[1]) == 0) {
228
+ if (blk_co_pdiscard(blk, range[0], range[1]) == 0) {
229
return 0;
230
}
231
} else if (type == VIRTIO_BLK_T_WRITE_ZEROES) {
232
- if (blk_co_pwrite_zeroes(vdev_blk->backend,
233
- range[0], range[1], 0) == 0) {
234
+ if (blk_co_pwrite_zeroes(blk, range[0], range[1], 0) == 0) {
235
return 0;
236
}
237
}
238
@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
239
return -EINVAL;
240
}
241
242
-static int coroutine_fn vu_block_flush(VuBlockReq *req)
243
+static void coroutine_fn vu_blk_virtio_process_req(void *opaque)
244
{
245
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
246
- BlockBackend *backend = vdev_blk->backend;
247
- return blk_co_flush(backend);
248
-}
249
-
250
-static void coroutine_fn vu_block_virtio_process_req(void *opaque)
251
-{
252
- VuBlockReq *req = opaque;
253
+ VuBlkReq *req = opaque;
254
VuServer *server = req->server;
255
VuVirtqElement *elem = &req->elem;
256
uint32_t type;
257
258
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
259
- BlockBackend *backend = vdev_blk->backend;
260
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
261
+ BlockBackend *blk = vexp->export.blk;
262
263
struct iovec *in_iov = elem->in_sg;
264
struct iovec *out_iov = elem->out_sg;
265
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
266
bool is_write = type & VIRTIO_BLK_T_OUT;
267
req->sector_num = le64_to_cpu(req->out.sector);
268
269
- int64_t offset = req->sector_num * vdev_blk->blk_size;
270
+ if (is_write && !vexp->writable) {
271
+ req->in->status = VIRTIO_BLK_S_IOERR;
272
+ break;
98
+ break;
273
+ }
99
+ }
274
+
100
+
275
+ int64_t offset = req->sector_num * vexp->blk_size;
101
+ if (!mpqemu_msg_valid(&msg)) {
276
QEMUIOVector qiov;
102
+ error_setg(&local_err, "Received invalid message from proxy"
277
if (is_write) {
103
+ "in remote process pid="FMT_pid"",
278
qemu_iovec_init_external(&qiov, out_iov, out_num);
104
+ getpid());
279
- ret = blk_co_pwritev(backend, offset, qiov.size,
280
- &qiov, 0);
281
+ ret = blk_co_pwritev(blk, offset, qiov.size, &qiov, 0);
282
} else {
283
qemu_iovec_init_external(&qiov, in_iov, in_num);
284
- ret = blk_co_preadv(backend, offset, qiov.size,
285
- &qiov, 0);
286
+ ret = blk_co_preadv(blk, offset, qiov.size, &qiov, 0);
287
}
288
if (ret >= 0) {
289
req->in->status = VIRTIO_BLK_S_OK;
290
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
291
break;
292
}
293
case VIRTIO_BLK_T_FLUSH:
294
- if (vu_block_flush(req) == 0) {
295
+ if (blk_co_flush(blk) == 0) {
296
req->in->status = VIRTIO_BLK_S_OK;
297
} else {
298
req->in->status = VIRTIO_BLK_S_IOERR;
299
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
300
case VIRTIO_BLK_T_DISCARD:
301
case VIRTIO_BLK_T_WRITE_ZEROES: {
302
int rc;
303
- rc = vu_block_discard_write_zeroes(req, &elem->out_sg[1],
304
- out_num, type);
305
+
306
+ if (!vexp->writable) {
307
+ req->in->status = VIRTIO_BLK_S_IOERR;
308
+ break;
105
+ break;
309
+ }
106
+ }
310
+
107
+
311
+ rc = vu_blk_discard_write_zeroes(blk, &elem->out_sg[1], out_num, type);
108
+ switch (msg.cmd) {
312
if (rc == 0) {
109
+ default:
313
req->in->status = VIRTIO_BLK_S_OK;
110
+ error_setg(&local_err,
314
} else {
111
+ "Unknown command (%d) received for device %s"
315
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
112
+ " (pid="FMT_pid")",
316
break;
113
+ msg.cmd, DEVICE(pci_dev)->id, getpid());
317
}
114
+ }
318
319
- vu_block_req_complete(req);
320
+ vu_blk_req_complete(req);
321
return;
322
323
err:
324
- free(elem);
325
+ free(req);
326
}
327
328
-static void vu_block_process_vq(VuDev *vu_dev, int idx)
329
+static void vu_blk_process_vq(VuDev *vu_dev, int idx)
330
{
331
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
332
VuVirtq *vq = vu_get_queue(vu_dev, idx);
333
334
while (1) {
335
- VuBlockReq *req;
336
+ VuBlkReq *req;
337
338
- req = vu_queue_pop(vu_dev, vq, sizeof(VuBlockReq));
339
+ req = vu_queue_pop(vu_dev, vq, sizeof(VuBlkReq));
340
if (!req) {
341
break;
342
}
343
@@ -XXX,XX +XXX,XX @@ static void vu_block_process_vq(VuDev *vu_dev, int idx)
344
req->vq = vq;
345
346
Coroutine *co =
347
- qemu_coroutine_create(vu_block_virtio_process_req, req);
348
+ qemu_coroutine_create(vu_blk_virtio_process_req, req);
349
qemu_coroutine_enter(co);
350
}
351
}
352
353
-static void vu_block_queue_set_started(VuDev *vu_dev, int idx, bool started)
354
+static void vu_blk_queue_set_started(VuDev *vu_dev, int idx, bool started)
355
{
356
VuVirtq *vq;
357
358
assert(vu_dev);
359
360
vq = vu_get_queue(vu_dev, idx);
361
- vu_set_queue_handler(vu_dev, vq, started ? vu_block_process_vq : NULL);
362
+ vu_set_queue_handler(vu_dev, vq, started ? vu_blk_process_vq : NULL);
363
}
364
365
-static uint64_t vu_block_get_features(VuDev *dev)
366
+static uint64_t vu_blk_get_features(VuDev *dev)
367
{
368
uint64_t features;
369
VuServer *server = container_of(dev, VuServer, vu_dev);
370
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
371
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
372
features = 1ull << VIRTIO_BLK_F_SIZE_MAX |
373
1ull << VIRTIO_BLK_F_SEG_MAX |
374
1ull << VIRTIO_BLK_F_TOPOLOGY |
375
@@ -XXX,XX +XXX,XX @@ static uint64_t vu_block_get_features(VuDev *dev)
376
1ull << VIRTIO_RING_F_EVENT_IDX |
377
1ull << VHOST_USER_F_PROTOCOL_FEATURES;
378
379
- if (!vdev_blk->writable) {
380
+ if (!vexp->writable) {
381
features |= 1ull << VIRTIO_BLK_F_RO;
382
}
383
384
return features;
385
}
386
387
-static uint64_t vu_block_get_protocol_features(VuDev *dev)
388
+static uint64_t vu_blk_get_protocol_features(VuDev *dev)
389
{
390
return 1ull << VHOST_USER_PROTOCOL_F_CONFIG |
391
1ull << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD;
392
}
393
394
static int
395
-vu_block_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
396
+vu_blk_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
397
{
398
+ /* TODO blkcfg must be little-endian for VIRTIO 1.0 */
399
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
400
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
401
- memcpy(config, &vdev_blk->blkcfg, len);
402
-
403
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
404
+ memcpy(config, &vexp->blkcfg, len);
405
return 0;
406
}
407
408
static int
409
-vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
410
+vu_blk_set_config(VuDev *vu_dev, const uint8_t *data,
411
uint32_t offset, uint32_t size, uint32_t flags)
412
{
413
VuServer *server = container_of(vu_dev, VuServer, vu_dev);
414
- VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
415
+ VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
416
uint8_t wce;
417
418
/* don't support live migration */
419
@@ -XXX,XX +XXX,XX @@ vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
420
}
421
422
wce = *data;
423
- vdev_blk->blkcfg.wce = wce;
424
- blk_set_enable_write_cache(vdev_blk->backend, wce);
425
+ vexp->blkcfg.wce = wce;
426
+ blk_set_enable_write_cache(vexp->export.blk, wce);
427
return 0;
428
}
429
430
@@ -XXX,XX +XXX,XX @@ vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
431
* of vu_process_message.
432
*
433
*/
434
-static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
435
+static int vu_blk_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
436
{
437
if (vmsg->request == VHOST_USER_NONE) {
438
dev->panic(dev, "disconnect");
439
@@ -XXX,XX +XXX,XX @@ static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
440
return false;
441
}
442
443
-static const VuDevIface vu_block_iface = {
444
- .get_features = vu_block_get_features,
445
- .queue_set_started = vu_block_queue_set_started,
446
- .get_protocol_features = vu_block_get_protocol_features,
447
- .get_config = vu_block_get_config,
448
- .set_config = vu_block_set_config,
449
- .process_msg = vu_block_process_msg,
450
+static const VuDevIface vu_blk_iface = {
451
+ .get_features = vu_blk_get_features,
452
+ .queue_set_started = vu_blk_queue_set_started,
453
+ .get_protocol_features = vu_blk_get_protocol_features,
454
+ .get_config = vu_blk_get_config,
455
+ .set_config = vu_blk_set_config,
456
+ .process_msg = vu_blk_process_msg,
457
};
458
459
static void blk_aio_attached(AioContext *ctx, void *opaque)
460
{
461
- VuBlockDev *vub_dev = opaque;
462
- vhost_user_server_attach_aio_context(&vub_dev->vu_server, ctx);
463
+ VuBlkExport *vexp = opaque;
464
+ vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
465
}
466
467
static void blk_aio_detach(void *opaque)
468
{
469
- VuBlockDev *vub_dev = opaque;
470
- vhost_user_server_detach_aio_context(&vub_dev->vu_server);
471
+ VuBlkExport *vexp = opaque;
472
+ vhost_user_server_detach_aio_context(&vexp->vu_server);
473
}
474
475
static void
476
-vu_block_initialize_config(BlockDriverState *bs,
477
+vu_blk_initialize_config(BlockDriverState *bs,
478
struct virtio_blk_config *config, uint32_t blk_size)
479
{
480
config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
481
@@ -XXX,XX +XXX,XX @@ vu_block_initialize_config(BlockDriverState *bs,
482
config->max_write_zeroes_seg = 1;
483
}
484
485
-static VuBlockDev *vu_block_init(VuBlockDev *vu_block_device, Error **errp)
486
+static void vu_blk_exp_request_shutdown(BlockExport *exp)
487
{
488
+ VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
489
490
- BlockBackend *blk;
491
- Error *local_error = NULL;
492
- const char *node_name = vu_block_device->node_name;
493
- bool writable = vu_block_device->writable;
494
- uint64_t perm = BLK_PERM_CONSISTENT_READ;
495
- int ret;
496
-
497
- AioContext *ctx;
498
-
499
- BlockDriverState *bs = bdrv_lookup_bs(node_name, node_name, &local_error);
500
-
501
- if (!bs) {
502
- error_propagate(errp, local_error);
503
- return NULL;
504
- }
505
-
506
- if (bdrv_is_read_only(bs)) {
507
- writable = false;
508
- }
509
-
510
- if (writable) {
511
- perm |= BLK_PERM_WRITE;
512
- }
513
-
514
- ctx = bdrv_get_aio_context(bs);
515
- aio_context_acquire(ctx);
516
- bdrv_invalidate_cache(bs, NULL);
517
- aio_context_release(ctx);
518
-
519
- /*
520
- * Don't allow resize while the vhost user server is running,
521
- * otherwise we don't care what happens with the node.
522
- */
523
- blk = blk_new(bdrv_get_aio_context(bs), perm,
524
- BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
525
- BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD);
526
- ret = blk_insert_bs(blk, bs, errp);
527
-
528
- if (ret < 0) {
529
- goto fail;
530
- }
531
-
532
- blk_set_enable_write_cache(blk, false);
533
-
534
- blk_set_allow_aio_context_change(blk, true);
535
-
536
- vu_block_device->blkcfg.wce = 0;
537
- vu_block_device->backend = blk;
538
- if (!vu_block_device->blk_size) {
539
- vu_block_device->blk_size = BDRV_SECTOR_SIZE;
540
- }
541
- vu_block_device->blkcfg.blk_size = vu_block_device->blk_size;
542
- blk_set_guest_block_size(blk, vu_block_device->blk_size);
543
- vu_block_initialize_config(bs, &vu_block_device->blkcfg,
544
- vu_block_device->blk_size);
545
- return vu_block_device;
546
-
547
-fail:
548
- blk_unref(blk);
549
- return NULL;
550
-}
551
-
552
-static void vu_block_deinit(VuBlockDev *vu_block_device)
553
-{
554
- if (vu_block_device->backend) {
555
- blk_remove_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
556
- blk_aio_detach, vu_block_device);
557
- }
558
-
559
- blk_unref(vu_block_device->backend);
560
-}
561
-
562
-static void vhost_user_blk_server_stop(VuBlockDev *vu_block_device)
563
-{
564
- vhost_user_server_stop(&vu_block_device->vu_server);
565
- vu_block_deinit(vu_block_device);
566
-}
567
-
568
-static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
569
- Error **errp)
570
-{
571
- AioContext *ctx;
572
- SocketAddress *addr = vu_block_device->addr;
573
-
574
- if (!vu_block_init(vu_block_device, errp)) {
575
- return;
576
- }
577
-
578
- ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
579
-
580
- if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
581
- VHOST_USER_BLK_MAX_QUEUES, &vu_block_iface,
582
- errp)) {
583
- goto error;
584
- }
585
-
586
- blk_add_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
587
- blk_aio_detach, vu_block_device);
588
- vu_block_device->running = true;
589
- return;
590
-
591
- error:
592
- vu_block_deinit(vu_block_device);
593
-}
594
-
595
-static bool vu_prop_modifiable(VuBlockDev *vus, Error **errp)
596
-{
597
- if (vus->running) {
598
- error_setg(errp, "The property can't be modified "
599
- "while the server is running");
600
- return false;
601
- }
602
- return true;
603
-}
604
-
605
-static void vu_set_node_name(Object *obj, const char *value, Error **errp)
606
-{
607
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
608
-
609
- if (!vu_prop_modifiable(vus, errp)) {
610
- return;
611
- }
612
-
613
- if (vus->node_name) {
614
- g_free(vus->node_name);
615
- }
616
-
617
- vus->node_name = g_strdup(value);
618
-}
619
-
620
-static char *vu_get_node_name(Object *obj, Error **errp)
621
-{
622
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
623
- return g_strdup(vus->node_name);
624
-}
625
-
626
-static void free_socket_addr(SocketAddress *addr)
627
-{
628
- g_free(addr->u.q_unix.path);
629
- g_free(addr);
630
-}
631
-
632
-static void vu_set_unix_socket(Object *obj, const char *value,
633
- Error **errp)
634
-{
635
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
636
-
637
- if (!vu_prop_modifiable(vus, errp)) {
638
- return;
639
- }
640
-
641
- if (vus->addr) {
642
- free_socket_addr(vus->addr);
643
- }
644
-
645
- SocketAddress *addr = g_new0(SocketAddress, 1);
646
- addr->type = SOCKET_ADDRESS_TYPE_UNIX;
647
- addr->u.q_unix.path = g_strdup(value);
648
- vus->addr = addr;
649
+ vhost_user_server_stop(&vexp->vu_server);
650
}
651
652
-static char *vu_get_unix_socket(Object *obj, Error **errp)
653
+static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
654
+ Error **errp)
655
{
656
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
657
- return g_strdup(vus->addr->u.q_unix.path);
658
-}
659
-
660
-static bool vu_get_block_writable(Object *obj, Error **errp)
661
-{
662
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
663
- return vus->writable;
664
-}
665
-
666
-static void vu_set_block_writable(Object *obj, bool value, Error **errp)
667
-{
668
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
669
-
670
- if (!vu_prop_modifiable(vus, errp)) {
671
- return;
672
- }
673
-
674
- vus->writable = value;
675
-}
676
-
677
-static void vu_get_blk_size(Object *obj, Visitor *v, const char *name,
678
- void *opaque, Error **errp)
679
-{
680
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
681
- uint32_t value = vus->blk_size;
682
-
683
- visit_type_uint32(v, name, &value, errp);
684
-}
685
-
686
-static void vu_set_blk_size(Object *obj, Visitor *v, const char *name,
687
- void *opaque, Error **errp)
688
-{
689
- VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
690
-
691
+ VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
692
+ BlockExportOptionsVhostUserBlk *vu_opts = &opts->u.vhost_user_blk;
693
Error *local_err = NULL;
694
- uint32_t value;
695
+ uint64_t logical_block_size;
696
697
- if (!vu_prop_modifiable(vus, errp)) {
698
- return;
699
- }
700
+ vexp->writable = opts->writable;
701
+ vexp->blkcfg.wce = 0;
702
703
- visit_type_uint32(v, name, &value, &local_err);
704
- if (local_err) {
705
- goto out;
706
+ if (vu_opts->has_logical_block_size) {
707
+ logical_block_size = vu_opts->logical_block_size;
708
+ } else {
709
+ logical_block_size = BDRV_SECTOR_SIZE;
710
}
711
-
712
- check_block_size(object_get_typename(obj), name, value, &local_err);
713
+ check_block_size(exp->id, "logical-block-size", logical_block_size,
714
+ &local_err);
715
if (local_err) {
716
- goto out;
717
+ error_propagate(errp, local_err);
718
+ return -EINVAL;
719
+ }
720
+ vexp->blk_size = logical_block_size;
721
+ blk_set_guest_block_size(exp->blk, logical_block_size);
722
+ vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
723
+ logical_block_size);
724
+
725
+ blk_set_allow_aio_context_change(exp->blk, true);
726
+ blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
727
+ vexp);
728
+
729
+ if (!vhost_user_server_start(&vexp->vu_server, vu_opts->addr, exp->ctx,
730
+ VHOST_USER_BLK_MAX_QUEUES, &vu_blk_iface,
731
+ errp)) {
732
+ blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
733
+ blk_aio_detach, vexp);
734
+ return -EADDRNOTAVAIL;
735
}
736
737
- vus->blk_size = value;
738
-
739
-out:
740
- error_propagate(errp, local_err);
741
-}
742
-
743
-static void vhost_user_blk_server_instance_finalize(Object *obj)
744
-{
745
- VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
746
-
747
- vhost_user_blk_server_stop(vub);
748
-
749
- /*
750
- * Unlike object_property_add_str, object_class_property_add_str
751
- * doesn't have a release method. Thus manual memory freeing is
752
- * needed.
753
- */
754
- free_socket_addr(vub->addr);
755
- g_free(vub->node_name);
756
-}
757
-
758
-static void vhost_user_blk_server_complete(UserCreatable *obj, Error **errp)
759
-{
760
- VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
761
-
762
- vhost_user_blk_server_start(vub, errp);
763
+ return 0;
764
}
765
766
-static void vhost_user_blk_server_class_init(ObjectClass *klass,
767
- void *class_data)
768
+static void vu_blk_exp_delete(BlockExport *exp)
769
{
770
- UserCreatableClass *ucc = USER_CREATABLE_CLASS(klass);
771
- ucc->complete = vhost_user_blk_server_complete;
772
-
773
- object_class_property_add_bool(klass, "writable",
774
- vu_get_block_writable,
775
- vu_set_block_writable);
776
-
777
- object_class_property_add_str(klass, "node-name",
778
- vu_get_node_name,
779
- vu_set_node_name);
780
-
781
- object_class_property_add_str(klass, "unix-socket",
782
- vu_get_unix_socket,
783
- vu_set_unix_socket);
784
+ VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
785
786
- object_class_property_add(klass, "logical-block-size", "uint32",
787
- vu_get_blk_size, vu_set_blk_size,
788
- NULL, NULL);
789
+ blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
790
+ vexp);
791
}
792
793
-static const TypeInfo vhost_user_blk_server_info = {
794
- .name = TYPE_VHOST_USER_BLK_SERVER,
795
- .parent = TYPE_OBJECT,
796
- .instance_size = sizeof(VuBlockDev),
797
- .instance_finalize = vhost_user_blk_server_instance_finalize,
798
- .class_init = vhost_user_blk_server_class_init,
799
- .interfaces = (InterfaceInfo[]) {
800
- {TYPE_USER_CREATABLE},
801
- {}
802
- },
803
+const BlockExportDriver blk_exp_vhost_user_blk = {
804
+ .type = BLOCK_EXPORT_TYPE_VHOST_USER_BLK,
805
+ .instance_size = sizeof(VuBlkExport),
806
+ .create = vu_blk_exp_create,
807
+ .delete = vu_blk_exp_delete,
808
+ .request_shutdown = vu_blk_exp_request_shutdown,
809
};
810
-
811
-static void vhost_user_blk_server_register_types(void)
812
-{
813
- type_register_static(&vhost_user_blk_server_info);
814
-}
815
-
816
-type_init(vhost_user_blk_server_register_types)
817
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
818
index XXXXXXX..XXXXXXX 100644
819
--- a/util/vhost-user-server.c
820
+++ b/util/vhost-user-server.c
821
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
822
Error **errp)
823
{
824
QEMUBH *bh;
825
- QIONetListener *listener = qio_net_listener_new();
826
+ QIONetListener *listener;
827
+
828
+ if (socket_addr->type != SOCKET_ADDRESS_TYPE_UNIX &&
829
+ socket_addr->type != SOCKET_ADDRESS_TYPE_FD) {
830
+ error_setg(errp, "Only socket address types 'unix' and 'fd' are supported");
831
+ return false;
832
+ }
115
+ }
833
+
116
+
834
+ listener = qio_net_listener_new();
117
+ if (local_err) {
835
if (qio_net_listener_open_sync(listener, socket_addr, 1,
118
+ error_report_err(local_err);
836
errp) < 0) {
119
+ qemu_system_shutdown_request(SHUTDOWN_CAUSE_HOST_ERROR);
837
object_unref(OBJECT(listener));
120
+ } else {
838
diff --git a/block/export/meson.build b/block/export/meson.build
121
+ qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
122
+ }
123
+}
124
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
839
index XXXXXXX..XXXXXXX 100644
125
index XXXXXXX..XXXXXXX 100644
840
--- a/block/export/meson.build
126
--- a/hw/remote/meson.build
841
+++ b/block/export/meson.build
127
+++ b/hw/remote/meson.build
842
@@ -1 +1,2 @@
128
@@ -XXX,XX +XXX,XX @@ remote_ss = ss.source_set()
843
block_ss.add(files('export.c'))
129
844
+block_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-blk-server.c', '../../contrib/libvhost-user/libvhost-user.c'))
130
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('machine.c'))
845
diff --git a/block/meson.build b/block/meson.build
131
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('mpqemu-link.c'))
846
index XXXXXXX..XXXXXXX 100644
132
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
847
--- a/block/meson.build
133
848
+++ b/block/meson.build
134
softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
849
@@ -XXX,XX +XXX,XX @@ block_ss.add(when: 'CONFIG_WIN32', if_true: files('file-win32.c', 'win32-aio.c')
850
block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, iokit])
851
block_ss.add(when: 'CONFIG_LIBISCSI', if_true: files('iscsi-opts.c'))
852
block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c'))
853
-block_ss.add(when: 'CONFIG_LINUX', if_true: files('export/vhost-user-blk-server.c', '../contrib/libvhost-user/libvhost-user.c'))
854
block_ss.add(when: 'CONFIG_REPLICATION', if_true: files('replication.c'))
855
block_ss.add(when: 'CONFIG_SHEEPDOG', if_true: files('sheepdog.c'))
856
block_ss.add(when: ['CONFIG_LINUX_AIO', libaio], if_true: files('linux-aio.c'))
857
--
135
--
858
2.26.2
136
2.29.2
859
137
diff view generated by jsdifflib
1
From: Coiby Xu <coiby.xu@gmail.com>
1
From: Jagannathan Raman <jag.raman@oracle.com>
2
2
3
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
3
Associate the file descriptor for a PCIDevice in remote process with
4
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
4
DeviceState object.
5
6
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
7
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
8
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
5
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
10
Message-id: f405a2ed5d7518b87bea7c59cfdf334d67e5ee51.1611938319.git.jag.raman@oracle.com
7
Message-id: 20200918080912.321299-8-coiby.xu@gmail.com
8
[Removed reference to vhost-user-blk-test.c, it will be sent in a
9
separate pull request.
10
--Stefan]
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
12
---
13
MAINTAINERS | 7 +++++++
13
MAINTAINERS | 1 +
14
1 file changed, 7 insertions(+)
14
hw/remote/remote-obj.c | 203 +++++++++++++++++++++++++++++++++++++++++
15
hw/remote/meson.build | 1 +
16
3 files changed, 205 insertions(+)
17
create mode 100644 hw/remote/remote-obj.c
15
18
16
diff --git a/MAINTAINERS b/MAINTAINERS
19
diff --git a/MAINTAINERS b/MAINTAINERS
17
index XXXXXXX..XXXXXXX 100644
20
index XXXXXXX..XXXXXXX 100644
18
--- a/MAINTAINERS
21
--- a/MAINTAINERS
19
+++ b/MAINTAINERS
22
+++ b/MAINTAINERS
20
@@ -XXX,XX +XXX,XX @@ L: qemu-block@nongnu.org
23
@@ -XXX,XX +XXX,XX @@ F: include/hw/remote/machine.h
21
S: Supported
24
F: hw/remote/mpqemu-link.c
22
F: tests/image-fuzzer/
25
F: include/hw/remote/mpqemu-link.h
23
26
F: hw/remote/message.c
24
+Vhost-user block device backend server
27
+F: hw/remote/remote-obj.c
25
+M: Coiby Xu <Coiby.Xu@gmail.com>
28
26
+S: Maintained
29
Build and test automation
27
+F: block/export/vhost-user-blk-server.c
30
-------------------------
28
+F: util/vhost-user-server.c
31
diff --git a/hw/remote/remote-obj.c b/hw/remote/remote-obj.c
29
+F: tests/qtest/libqos/vhost-user-blk.c
32
new file mode 100644
30
+
33
index XXXXXXX..XXXXXXX
31
Replication
34
--- /dev/null
32
M: Wen Congyang <wencongyang2@huawei.com>
35
+++ b/hw/remote/remote-obj.c
33
M: Xie Changlong <xiechanglong.d@gmail.com>
36
@@ -XXX,XX +XXX,XX @@
37
+/*
38
+ * Copyright © 2020, 2021 Oracle and/or its affiliates.
39
+ *
40
+ * This work is licensed under the terms of the GNU GPL-v2, version 2 or later.
41
+ *
42
+ * See the COPYING file in the top-level directory.
43
+ *
44
+ */
45
+
46
+#include "qemu/osdep.h"
47
+#include "qemu-common.h"
48
+
49
+#include "qemu/error-report.h"
50
+#include "qemu/notify.h"
51
+#include "qom/object_interfaces.h"
52
+#include "hw/qdev-core.h"
53
+#include "io/channel.h"
54
+#include "hw/qdev-core.h"
55
+#include "hw/remote/machine.h"
56
+#include "io/channel-util.h"
57
+#include "qapi/error.h"
58
+#include "sysemu/sysemu.h"
59
+#include "hw/pci/pci.h"
60
+#include "qemu/sockets.h"
61
+#include "monitor/monitor.h"
62
+
63
+#define TYPE_REMOTE_OBJECT "x-remote-object"
64
+OBJECT_DECLARE_TYPE(RemoteObject, RemoteObjectClass, REMOTE_OBJECT)
65
+
66
+struct RemoteObjectClass {
67
+ ObjectClass parent_class;
68
+
69
+ unsigned int nr_devs;
70
+ unsigned int max_devs;
71
+};
72
+
73
+struct RemoteObject {
74
+ /* private */
75
+ Object parent;
76
+
77
+ Notifier machine_done;
78
+
79
+ int32_t fd;
80
+ char *devid;
81
+
82
+ QIOChannel *ioc;
83
+
84
+ DeviceState *dev;
85
+ DeviceListener listener;
86
+};
87
+
88
+static void remote_object_set_fd(Object *obj, const char *str, Error **errp)
89
+{
90
+ RemoteObject *o = REMOTE_OBJECT(obj);
91
+ int fd = -1;
92
+
93
+ fd = monitor_fd_param(monitor_cur(), str, errp);
94
+ if (fd == -1) {
95
+ error_prepend(errp, "Could not parse remote object fd %s:", str);
96
+ return;
97
+ }
98
+
99
+ if (!fd_is_socket(fd)) {
100
+ error_setg(errp, "File descriptor '%s' is not a socket", str);
101
+ close(fd);
102
+ return;
103
+ }
104
+
105
+ o->fd = fd;
106
+}
107
+
108
+static void remote_object_set_devid(Object *obj, const char *str, Error **errp)
109
+{
110
+ RemoteObject *o = REMOTE_OBJECT(obj);
111
+
112
+ g_free(o->devid);
113
+
114
+ o->devid = g_strdup(str);
115
+}
116
+
117
+static void remote_object_unrealize_listener(DeviceListener *listener,
118
+ DeviceState *dev)
119
+{
120
+ RemoteObject *o = container_of(listener, RemoteObject, listener);
121
+
122
+ if (o->dev == dev) {
123
+ object_unref(OBJECT(o));
124
+ }
125
+}
126
+
127
+static void remote_object_machine_done(Notifier *notifier, void *data)
128
+{
129
+ RemoteObject *o = container_of(notifier, RemoteObject, machine_done);
130
+ DeviceState *dev = NULL;
131
+ QIOChannel *ioc = NULL;
132
+ Coroutine *co = NULL;
133
+ RemoteCommDev *comdev = NULL;
134
+ Error *err = NULL;
135
+
136
+ dev = qdev_find_recursive(sysbus_get_default(), o->devid);
137
+ if (!dev || !object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
138
+ error_report("%s is not a PCI device", o->devid);
139
+ return;
140
+ }
141
+
142
+ ioc = qio_channel_new_fd(o->fd, &err);
143
+ if (!ioc) {
144
+ error_report_err(err);
145
+ return;
146
+ }
147
+ qio_channel_set_blocking(ioc, false, NULL);
148
+
149
+ o->dev = dev;
150
+
151
+ o->listener.unrealize = remote_object_unrealize_listener;
152
+ device_listener_register(&o->listener);
153
+
154
+ /* co-routine should free this. */
155
+ comdev = g_new0(RemoteCommDev, 1);
156
+ *comdev = (RemoteCommDev) {
157
+ .ioc = ioc,
158
+ .dev = PCI_DEVICE(dev),
159
+ };
160
+
161
+ co = qemu_coroutine_create(mpqemu_remote_msg_loop_co, comdev);
162
+ qemu_coroutine_enter(co);
163
+}
164
+
165
+static void remote_object_init(Object *obj)
166
+{
167
+ RemoteObjectClass *k = REMOTE_OBJECT_GET_CLASS(obj);
168
+ RemoteObject *o = REMOTE_OBJECT(obj);
169
+
170
+ if (k->nr_devs >= k->max_devs) {
171
+ error_report("Reached maximum number of devices: %u", k->max_devs);
172
+ return;
173
+ }
174
+
175
+ o->ioc = NULL;
176
+ o->fd = -1;
177
+ o->devid = NULL;
178
+
179
+ k->nr_devs++;
180
+
181
+ o->machine_done.notify = remote_object_machine_done;
182
+ qemu_add_machine_init_done_notifier(&o->machine_done);
183
+}
184
+
185
+static void remote_object_finalize(Object *obj)
186
+{
187
+ RemoteObjectClass *k = REMOTE_OBJECT_GET_CLASS(obj);
188
+ RemoteObject *o = REMOTE_OBJECT(obj);
189
+
190
+ device_listener_unregister(&o->listener);
191
+
192
+ if (o->ioc) {
193
+ qio_channel_shutdown(o->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
194
+ qio_channel_close(o->ioc, NULL);
195
+ }
196
+
197
+ object_unref(OBJECT(o->ioc));
198
+
199
+ k->nr_devs--;
200
+ g_free(o->devid);
201
+}
202
+
203
+static void remote_object_class_init(ObjectClass *klass, void *data)
204
+{
205
+ RemoteObjectClass *k = REMOTE_OBJECT_CLASS(klass);
206
+
207
+ /*
208
+ * Limit number of supported devices to 1. This is done to avoid devices
209
+ * from one VM accessing the RAM of another VM. This is done until we
210
+ * start using separate address spaces for individual devices.
211
+ */
212
+ k->max_devs = 1;
213
+ k->nr_devs = 0;
214
+
215
+ object_class_property_add_str(klass, "fd", NULL, remote_object_set_fd);
216
+ object_class_property_add_str(klass, "devid", NULL,
217
+ remote_object_set_devid);
218
+}
219
+
220
+static const TypeInfo remote_object_info = {
221
+ .name = TYPE_REMOTE_OBJECT,
222
+ .parent = TYPE_OBJECT,
223
+ .instance_size = sizeof(RemoteObject),
224
+ .instance_init = remote_object_init,
225
+ .instance_finalize = remote_object_finalize,
226
+ .class_size = sizeof(RemoteObjectClass),
227
+ .class_init = remote_object_class_init,
228
+ .interfaces = (InterfaceInfo[]) {
229
+ { TYPE_USER_CREATABLE },
230
+ { }
231
+ }
232
+};
233
+
234
+static void register_types(void)
235
+{
236
+ type_register_static(&remote_object_info);
237
+}
238
+
239
+type_init(register_types);
240
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
241
index XXXXXXX..XXXXXXX 100644
242
--- a/hw/remote/meson.build
243
+++ b/hw/remote/meson.build
244
@@ -XXX,XX +XXX,XX @@ remote_ss = ss.source_set()
245
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('machine.c'))
246
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('mpqemu-link.c'))
247
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
248
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
249
250
softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
34
--
251
--
35
2.26.2
252
2.29.2
36
253
diff view generated by jsdifflib
1
Make it possible to specify the iothread where the export will run. By
1
From: Jagannathan Raman <jag.raman@oracle.com>
2
default the block node can be moved to other AioContexts later and the
2
3
export will follow. The fixed-iothread option forces strict behavior
3
SyncSysMemMsg message format is defined. It is used to send
4
that prevents changing AioContext while the export is active. See the
4
file descriptors of the RAM regions to remote device.
5
QAPI docs for details.
5
RAM on the remote device is configured with a set of file descriptors.
6
6
Old RAM regions are deleted and new regions, each with an fd, is
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
added to the RAM.
8
Message-id: 20200929125516.186715-5-stefanha@redhat.com
8
9
[Fix stray '#' character in block-export.json and add missing "(since:
9
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
10
5.2)" as suggested by Eric Blake.
10
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
11
--Stefan]
11
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
12
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Message-id: 7d2d1831d812e85f681e7a8ab99e032cf4704689.1611938319.git.jag.raman@oracle.com
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
---
15
---
14
qapi/block-export.json | 11 ++++++++++
16
MAINTAINERS | 2 +
15
block/export/export.c | 31 +++++++++++++++++++++++++++-
17
include/hw/remote/memory.h | 19 ++++++++++
16
block/export/vhost-user-blk-server.c | 5 ++++-
18
include/hw/remote/mpqemu-link.h | 10 +++++
17
nbd/server.c | 2 --
19
hw/remote/memory.c | 65 +++++++++++++++++++++++++++++++++
18
4 files changed, 45 insertions(+), 4 deletions(-)
20
hw/remote/mpqemu-link.c | 11 ++++++
19
21
hw/remote/meson.build | 2 +
20
diff --git a/qapi/block-export.json b/qapi/block-export.json
22
6 files changed, 109 insertions(+)
21
index XXXXXXX..XXXXXXX 100644
23
create mode 100644 include/hw/remote/memory.h
22
--- a/qapi/block-export.json
24
create mode 100644 hw/remote/memory.c
23
+++ b/qapi/block-export.json
25
24
@@ -XXX,XX +XXX,XX @@
26
diff --git a/MAINTAINERS b/MAINTAINERS
25
# export before completion is signalled. (since: 5.2;
27
index XXXXXXX..XXXXXXX 100644
26
# default: false)
28
--- a/MAINTAINERS
27
#
29
+++ b/MAINTAINERS
28
+# @iothread: The name of the iothread object where the export will run. The
30
@@ -XXX,XX +XXX,XX @@ F: hw/remote/mpqemu-link.c
29
+# default is to use the thread currently associated with the
31
F: include/hw/remote/mpqemu-link.h
30
+# block node. (since: 5.2)
32
F: hw/remote/message.c
31
+#
33
F: hw/remote/remote-obj.c
32
+# @fixed-iothread: True prevents the block node from being moved to another
34
+F: include/hw/remote/memory.h
33
+# thread while the export is active. If true and @iothread is
35
+F: hw/remote/memory.c
34
+# given, export creation fails if the block node cannot be
36
35
+# moved to the iothread. The default is false. (since: 5.2)
37
Build and test automation
36
+#
38
-------------------------
37
# Since: 4.2
39
diff --git a/include/hw/remote/memory.h b/include/hw/remote/memory.h
38
##
40
new file mode 100644
39
{ 'union': 'BlockExportOptions',
41
index XXXXXXX..XXXXXXX
40
'base': { 'type': 'BlockExportType',
42
--- /dev/null
41
'id': 'str',
43
+++ b/include/hw/remote/memory.h
42
+     '*fixed-iothread': 'bool',
44
@@ -XXX,XX +XXX,XX @@
43
+     '*iothread': 'str',
45
+/*
44
'node-name': 'str',
46
+ * Memory manager for remote device
45
'*writable': 'bool',
47
+ *
46
'*writethrough': 'bool' },
48
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
47
diff --git a/block/export/export.c b/block/export/export.c
49
+ *
48
index XXXXXXX..XXXXXXX 100644
50
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
49
--- a/block/export/export.c
51
+ * See the COPYING file in the top-level directory.
50
+++ b/block/export/export.c
52
+ *
51
@@ -XXX,XX +XXX,XX @@
53
+ */
52
54
+
53
#include "block/block.h"
55
+#ifndef REMOTE_MEMORY_H
54
#include "sysemu/block-backend.h"
56
+#define REMOTE_MEMORY_H
55
+#include "sysemu/iothread.h"
57
+
56
#include "block/export.h"
58
+#include "exec/hwaddr.h"
57
#include "block/nbd.h"
59
+#include "hw/remote/mpqemu-link.h"
58
#include "qapi/error.h"
60
+
59
@@ -XXX,XX +XXX,XX @@ static const BlockExportDriver *blk_exp_find_driver(BlockExportType type)
61
+void remote_sysmem_reconfig(MPQemuMsg *msg, Error **errp);
60
62
+
61
BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
63
+#endif
62
{
64
diff --git a/include/hw/remote/mpqemu-link.h b/include/hw/remote/mpqemu-link.h
63
+ bool fixed_iothread = export->has_fixed_iothread && export->fixed_iothread;
65
index XXXXXXX..XXXXXXX 100644
64
const BlockExportDriver *drv;
66
--- a/include/hw/remote/mpqemu-link.h
65
BlockExport *exp = NULL;
67
+++ b/include/hw/remote/mpqemu-link.h
66
BlockDriverState *bs;
68
@@ -XXX,XX +XXX,XX @@
67
- BlockBackend *blk;
69
#include "qom/object.h"
68
+ BlockBackend *blk = NULL;
70
#include "qemu/thread.h"
69
AioContext *ctx;
71
#include "io/channel.h"
70
uint64_t perm;
72
+#include "exec/hwaddr.h"
71
int ret;
73
72
@@ -XXX,XX +XXX,XX @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
74
#define REMOTE_MAX_FDS 8
73
ctx = bdrv_get_aio_context(bs);
75
74
aio_context_acquire(ctx);
76
@@ -XXX,XX +XXX,XX @@
75
77
*
76
+ if (export->has_iothread) {
78
*/
77
+ IOThread *iothread;
79
typedef enum {
78
+ AioContext *new_ctx;
80
+ MPQEMU_CMD_SYNC_SYSMEM,
79
+
81
MPQEMU_CMD_MAX,
80
+ iothread = iothread_by_id(export->iothread);
82
} MPQemuCmd;
81
+ if (!iothread) {
83
82
+ error_setg(errp, "iothread \"%s\" not found", export->iothread);
84
+typedef struct {
83
+ goto fail;
85
+ hwaddr gpas[REMOTE_MAX_FDS];
84
+ }
86
+ uint64_t sizes[REMOTE_MAX_FDS];
85
+
87
+ off_t offsets[REMOTE_MAX_FDS];
86
+ new_ctx = iothread_get_aio_context(iothread);
88
+} SyncSysmemMsg;
87
+
89
+
88
+ ret = bdrv_try_set_aio_context(bs, new_ctx, errp);
90
/**
89
+ if (ret == 0) {
91
* MPQemuMsg:
90
+ aio_context_release(ctx);
92
* @cmd: The remote command
91
+ aio_context_acquire(new_ctx);
93
@@ -XXX,XX +XXX,XX @@ typedef enum {
92
+ ctx = new_ctx;
94
* MPQemuMsg Format of the message sent to the remote device from QEMU.
93
+ } else if (fixed_iothread) {
95
*
94
+ goto fail;
96
*/
97
+
98
typedef struct {
99
int cmd;
100
size_t size;
101
102
union {
103
uint64_t u64;
104
+ SyncSysmemMsg sync_sysmem;
105
} data;
106
107
int fds[REMOTE_MAX_FDS];
108
diff --git a/hw/remote/memory.c b/hw/remote/memory.c
109
new file mode 100644
110
index XXXXXXX..XXXXXXX
111
--- /dev/null
112
+++ b/hw/remote/memory.c
113
@@ -XXX,XX +XXX,XX @@
114
+/*
115
+ * Memory manager for remote device
116
+ *
117
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
118
+ *
119
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
120
+ * See the COPYING file in the top-level directory.
121
+ *
122
+ */
123
+
124
+#include "qemu/osdep.h"
125
+#include "qemu-common.h"
126
+
127
+#include "hw/remote/memory.h"
128
+#include "exec/address-spaces.h"
129
+#include "exec/ram_addr.h"
130
+#include "qapi/error.h"
131
+
132
+static void remote_sysmem_reset(void)
133
+{
134
+ MemoryRegion *sysmem, *subregion, *next;
135
+
136
+ sysmem = get_system_memory();
137
+
138
+ QTAILQ_FOREACH_SAFE(subregion, &sysmem->subregions, subregions_link, next) {
139
+ if (subregion->ram) {
140
+ memory_region_del_subregion(sysmem, subregion);
141
+ object_unparent(OBJECT(subregion));
95
+ }
142
+ }
96
+ }
143
+ }
97
+
144
+}
98
/*
145
+
99
* Block exports are used for non-shared storage migration. Make sure
146
+void remote_sysmem_reconfig(MPQemuMsg *msg, Error **errp)
100
* that BDRV_O_INACTIVE is cleared and the image is ready for write
147
+{
101
@@ -XXX,XX +XXX,XX @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
148
+ ERRP_GUARD();
149
+ SyncSysmemMsg *sysmem_info = &msg->data.sync_sysmem;
150
+ MemoryRegion *sysmem, *subregion;
151
+ static unsigned int suffix;
152
+ int region;
153
+
154
+ sysmem = get_system_memory();
155
+
156
+ remote_sysmem_reset();
157
+
158
+ for (region = 0; region < msg->num_fds; region++) {
159
+ g_autofree char *name;
160
+ subregion = g_new(MemoryRegion, 1);
161
+ name = g_strdup_printf("remote-mem-%u", suffix++);
162
+ memory_region_init_ram_from_fd(subregion, NULL,
163
+ name, sysmem_info->sizes[region],
164
+ true, msg->fds[region],
165
+ sysmem_info->offsets[region],
166
+ errp);
167
+
168
+ if (*errp) {
169
+ g_free(subregion);
170
+ remote_sysmem_reset();
171
+ return;
172
+ }
173
+
174
+ memory_region_add_subregion(sysmem, sysmem_info->gpas[region],
175
+ subregion);
176
+
177
+ }
178
+}
179
diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
180
index XXXXXXX..XXXXXXX 100644
181
--- a/hw/remote/mpqemu-link.c
182
+++ b/hw/remote/mpqemu-link.c
183
@@ -XXX,XX +XXX,XX @@ bool mpqemu_msg_valid(MPQemuMsg *msg)
184
}
102
}
185
}
103
186
104
blk = blk_new(ctx, perm, BLK_PERM_ALL);
187
+ /* Verify message specific fields. */
105
+
188
+ switch (msg->cmd) {
106
+ if (!fixed_iothread) {
189
+ case MPQEMU_CMD_SYNC_SYSMEM:
107
+ blk_set_allow_aio_context_change(blk, true);
190
+ if (msg->num_fds == 0 || msg->size != sizeof(SyncSysmemMsg)) {
191
+ return false;
192
+ }
193
+ break;
194
+ default:
195
+ break;
108
+ }
196
+ }
109
+
197
+
110
ret = blk_insert_bs(blk, bs, errp);
198
return true;
111
if (ret < 0) {
112
goto fail;
113
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
114
index XXXXXXX..XXXXXXX 100644
115
--- a/block/export/vhost-user-blk-server.c
116
+++ b/block/export/vhost-user-blk-server.c
117
@@ -XXX,XX +XXX,XX @@ static const VuDevIface vu_blk_iface = {
118
static void blk_aio_attached(AioContext *ctx, void *opaque)
119
{
120
VuBlkExport *vexp = opaque;
121
+
122
+ vexp->export.ctx = ctx;
123
vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
124
}
199
}
125
200
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
126
static void blk_aio_detach(void *opaque)
201
index XXXXXXX..XXXXXXX 100644
127
{
202
--- a/hw/remote/meson.build
128
VuBlkExport *vexp = opaque;
203
+++ b/hw/remote/meson.build
129
+
204
@@ -XXX,XX +XXX,XX @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('mpqemu-link.c'))
130
vhost_user_server_detach_aio_context(&vexp->vu_server);
205
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
131
+ vexp->export.ctx = NULL;
206
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
132
}
207
133
208
+specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
134
static void
209
+
135
@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
210
softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
136
vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
137
logical_block_size);
138
139
- blk_set_allow_aio_context_change(exp->blk, true);
140
blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
141
vexp);
142
143
diff --git a/nbd/server.c b/nbd/server.c
144
index XXXXXXX..XXXXXXX 100644
145
--- a/nbd/server.c
146
+++ b/nbd/server.c
147
@@ -XXX,XX +XXX,XX @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
148
return ret;
149
}
150
151
- blk_set_allow_aio_context_change(blk, true);
152
-
153
QTAILQ_INIT(&exp->clients);
154
exp->name = g_strdup(arg->name);
155
exp->description = g_strdup(arg->description);
156
--
211
--
157
2.26.2
212
2.29.2
158
213
diff view generated by jsdifflib
1
From: Coiby Xu <coiby.xu@gmail.com>
1
From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
2
2
3
By making use of libvhost-user, block device drive can be shared to
3
Defines a PCI Device proxy object as a child of TYPE_PCI_DEVICE.
4
the connected vhost-user client. Only one client can connect to the
5
server one time.
6
4
7
Since vhost-user-server needs a block drive to be created first, delay
5
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
8
the creation of this object.
6
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
9
7
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
10
Suggested-by: Kevin Wolf <kwolf@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
13
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
14
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
9
Message-id: b5186ebfedf8e557044d09a768846c59230ad3a7.1611938319.git.jag.raman@oracle.com
15
Message-id: 20200918080912.321299-6-coiby.xu@gmail.com
16
[Shorten "vhost_user_blk_server" string to "vhost_user_blk" to avoid the
17
following compiler warning:
18
../block/export/vhost-user-blk-server.c:178:50: error: ‘%s’ directive output truncated writing 21 bytes into a region of size 20 [-Werror=format-truncation=]
19
and fix "Invalid size %ld ..." ssize_t format string arguments for
20
32-bit hosts.
21
--Stefan]
22
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
23
---
11
---
24
block/export/vhost-user-blk-server.h | 36 ++
12
MAINTAINERS | 2 +
25
block/export/vhost-user-blk-server.c | 661 +++++++++++++++++++++++++++
13
include/hw/remote/proxy.h | 33 +++++++++++++
26
softmmu/vl.c | 4 +
14
hw/remote/proxy.c | 99 +++++++++++++++++++++++++++++++++++++++
27
block/meson.build | 1 +
15
hw/remote/meson.build | 1 +
28
4 files changed, 702 insertions(+)
16
4 files changed, 135 insertions(+)
29
create mode 100644 block/export/vhost-user-blk-server.h
17
create mode 100644 include/hw/remote/proxy.h
30
create mode 100644 block/export/vhost-user-blk-server.c
18
create mode 100644 hw/remote/proxy.c
31
19
32
diff --git a/block/export/vhost-user-blk-server.h b/block/export/vhost-user-blk-server.h
20
diff --git a/MAINTAINERS b/MAINTAINERS
21
index XXXXXXX..XXXXXXX 100644
22
--- a/MAINTAINERS
23
+++ b/MAINTAINERS
24
@@ -XXX,XX +XXX,XX @@ F: hw/remote/message.c
25
F: hw/remote/remote-obj.c
26
F: include/hw/remote/memory.h
27
F: hw/remote/memory.c
28
+F: hw/remote/proxy.c
29
+F: include/hw/remote/proxy.h
30
31
Build and test automation
32
-------------------------
33
diff --git a/include/hw/remote/proxy.h b/include/hw/remote/proxy.h
33
new file mode 100644
34
new file mode 100644
34
index XXXXXXX..XXXXXXX
35
index XXXXXXX..XXXXXXX
35
--- /dev/null
36
--- /dev/null
36
+++ b/block/export/vhost-user-blk-server.h
37
+++ b/include/hw/remote/proxy.h
37
@@ -XXX,XX +XXX,XX @@
38
@@ -XXX,XX +XXX,XX @@
38
+/*
39
+/*
39
+ * Sharing QEMU block devices via vhost-user protocal
40
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
40
+ *
41
+ *
41
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
42
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
42
+ * Copyright (c) 2020 Red Hat, Inc.
43
+ * See the COPYING file in the top-level directory.
43
+ *
44
+ *
44
+ * This work is licensed under the terms of the GNU GPL, version 2 or
45
+ * later. See the COPYING file in the top-level directory.
46
+ */
45
+ */
47
+
46
+
48
+#ifndef VHOST_USER_BLK_SERVER_H
47
+#ifndef PROXY_H
49
+#define VHOST_USER_BLK_SERVER_H
48
+#define PROXY_H
50
+#include "util/vhost-user-server.h"
51
+
49
+
52
+typedef struct VuBlockDev VuBlockDev;
50
+#include "hw/pci/pci.h"
53
+#define TYPE_VHOST_USER_BLK_SERVER "vhost-user-blk-server"
51
+#include "io/channel.h"
54
+#define VHOST_USER_BLK_SERVER(obj) \
55
+ OBJECT_CHECK(VuBlockDev, obj, TYPE_VHOST_USER_BLK_SERVER)
56
+
52
+
57
+/* vhost user block device */
53
+#define TYPE_PCI_PROXY_DEV "x-pci-proxy-dev"
58
+struct VuBlockDev {
54
+OBJECT_DECLARE_SIMPLE_TYPE(PCIProxyDev, PCI_PROXY_DEV)
59
+ Object parent_obj;
55
+
60
+ char *node_name;
56
+struct PCIProxyDev {
61
+ SocketAddress *addr;
57
+ PCIDevice parent_dev;
62
+ AioContext *ctx;
58
+ char *fd;
63
+ VuServer vu_server;
59
+
64
+ bool running;
60
+ /*
65
+ uint32_t blk_size;
61
+ * Mutex used to protect the QIOChannel fd from
66
+ BlockBackend *backend;
62
+ * the concurrent access by the VCPUs since proxy
67
+ QIOChannelSocket *sioc;
63
+ * blocks while awaiting for the replies from the
68
+ QTAILQ_ENTRY(VuBlockDev) next;
64
+ * process remote.
69
+ struct virtio_blk_config blkcfg;
65
+ */
70
+ bool writable;
66
+ QemuMutex io_mutex;
67
+ QIOChannel *ioc;
68
+ Error *migration_blocker;
71
+};
69
+};
72
+
70
+
73
+#endif /* VHOST_USER_BLK_SERVER_H */
71
+#endif /* PROXY_H */
74
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
72
diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
75
new file mode 100644
73
new file mode 100644
76
index XXXXXXX..XXXXXXX
74
index XXXXXXX..XXXXXXX
77
--- /dev/null
75
--- /dev/null
78
+++ b/block/export/vhost-user-blk-server.c
76
+++ b/hw/remote/proxy.c
79
@@ -XXX,XX +XXX,XX @@
77
@@ -XXX,XX +XXX,XX @@
80
+/*
78
+/*
81
+ * Sharing QEMU block devices via vhost-user protocal
79
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
82
+ *
80
+ *
83
+ * Parts of the code based on nbd/server.c.
81
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
82
+ * See the COPYING file in the top-level directory.
84
+ *
83
+ *
85
+ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
86
+ * Copyright (c) 2020 Red Hat, Inc.
87
+ *
88
+ * This work is licensed under the terms of the GNU GPL, version 2 or
89
+ * later. See the COPYING file in the top-level directory.
90
+ */
84
+ */
85
+
91
+#include "qemu/osdep.h"
86
+#include "qemu/osdep.h"
92
+#include "block/block.h"
87
+#include "qemu-common.h"
93
+#include "vhost-user-blk-server.h"
88
+
89
+#include "hw/remote/proxy.h"
90
+#include "hw/pci/pci.h"
94
+#include "qapi/error.h"
91
+#include "qapi/error.h"
95
+#include "qom/object_interfaces.h"
92
+#include "io/channel-util.h"
96
+#include "sysemu/block-backend.h"
93
+#include "hw/qdev-properties.h"
97
+#include "util/block-helpers.h"
94
+#include "monitor/monitor.h"
95
+#include "migration/blocker.h"
96
+#include "qemu/sockets.h"
98
+
97
+
99
+enum {
98
+static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
100
+ VHOST_USER_BLK_MAX_QUEUES = 1,
99
+{
101
+};
100
+ ERRP_GUARD();
102
+struct virtio_blk_inhdr {
101
+ PCIProxyDev *dev = PCI_PROXY_DEV(device);
103
+ unsigned char status;
102
+ int fd;
104
+};
105
+
103
+
106
+typedef struct VuBlockReq {
104
+ if (!dev->fd) {
107
+ VuVirtqElement *elem;
105
+ error_setg(errp, "fd parameter not specified for %s",
108
+ int64_t sector_num;
106
+ DEVICE(device)->id);
109
+ size_t size;
110
+ struct virtio_blk_inhdr *in;
111
+ struct virtio_blk_outhdr out;
112
+ VuServer *server;
113
+ struct VuVirtq *vq;
114
+} VuBlockReq;
115
+
116
+static void vu_block_req_complete(VuBlockReq *req)
117
+{
118
+ VuDev *vu_dev = &req->server->vu_dev;
119
+
120
+ /* IO size with 1 extra status byte */
121
+ vu_queue_push(vu_dev, req->vq, req->elem, req->size + 1);
122
+ vu_queue_notify(vu_dev, req->vq);
123
+
124
+ if (req->elem) {
125
+ free(req->elem);
126
+ }
127
+
128
+ g_free(req);
129
+}
130
+
131
+static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
132
+{
133
+ return container_of(server, VuBlockDev, vu_server);
134
+}
135
+
136
+static int coroutine_fn
137
+vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
138
+ uint32_t iovcnt, uint32_t type)
139
+{
140
+ struct virtio_blk_discard_write_zeroes desc;
141
+ ssize_t size = iov_to_buf(iov, iovcnt, 0, &desc, sizeof(desc));
142
+ if (unlikely(size != sizeof(desc))) {
143
+ error_report("Invalid size %zd, expect %zu", size, sizeof(desc));
144
+ return -EINVAL;
145
+ }
146
+
147
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
148
+ uint64_t range[2] = { le64_to_cpu(desc.sector) << 9,
149
+ le32_to_cpu(desc.num_sectors) << 9 };
150
+ if (type == VIRTIO_BLK_T_DISCARD) {
151
+ if (blk_co_pdiscard(vdev_blk->backend, range[0], range[1]) == 0) {
152
+ return 0;
153
+ }
154
+ } else if (type == VIRTIO_BLK_T_WRITE_ZEROES) {
155
+ if (blk_co_pwrite_zeroes(vdev_blk->backend,
156
+ range[0], range[1], 0) == 0) {
157
+ return 0;
158
+ }
159
+ }
160
+
161
+ return -EINVAL;
162
+}
163
+
164
+static void coroutine_fn vu_block_flush(VuBlockReq *req)
165
+{
166
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
167
+ BlockBackend *backend = vdev_blk->backend;
168
+ blk_co_flush(backend);
169
+}
170
+
171
+struct req_data {
172
+ VuServer *server;
173
+ VuVirtq *vq;
174
+ VuVirtqElement *elem;
175
+};
176
+
177
+static void coroutine_fn vu_block_virtio_process_req(void *opaque)
178
+{
179
+ struct req_data *data = opaque;
180
+ VuServer *server = data->server;
181
+ VuVirtq *vq = data->vq;
182
+ VuVirtqElement *elem = data->elem;
183
+ uint32_t type;
184
+ VuBlockReq *req;
185
+
186
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
187
+ BlockBackend *backend = vdev_blk->backend;
188
+
189
+ struct iovec *in_iov = elem->in_sg;
190
+ struct iovec *out_iov = elem->out_sg;
191
+ unsigned in_num = elem->in_num;
192
+ unsigned out_num = elem->out_num;
193
+ /* refer to hw/block/virtio_blk.c */
194
+ if (elem->out_num < 1 || elem->in_num < 1) {
195
+ error_report("virtio-blk request missing headers");
196
+ free(elem);
197
+ return;
107
+ return;
198
+ }
108
+ }
199
+
109
+
200
+ req = g_new0(VuBlockReq, 1);
110
+ fd = monitor_fd_param(monitor_cur(), dev->fd, errp);
201
+ req->server = server;
111
+ if (fd == -1) {
202
+ req->vq = vq;
112
+ error_prepend(errp, "proxy: unable to parse fd %s: ", dev->fd);
203
+ req->elem = elem;
204
+
205
+ if (unlikely(iov_to_buf(out_iov, out_num, 0, &req->out,
206
+ sizeof(req->out)) != sizeof(req->out))) {
207
+ error_report("virtio-blk request outhdr too short");
208
+ goto err;
209
+ }
210
+
211
+ iov_discard_front(&out_iov, &out_num, sizeof(req->out));
212
+
213
+ if (in_iov[in_num - 1].iov_len < sizeof(struct virtio_blk_inhdr)) {
214
+ error_report("virtio-blk request inhdr too short");
215
+ goto err;
216
+ }
217
+
218
+ /* We always touch the last byte, so just see how big in_iov is. */
219
+ req->in = (void *)in_iov[in_num - 1].iov_base
220
+ + in_iov[in_num - 1].iov_len
221
+ - sizeof(struct virtio_blk_inhdr);
222
+ iov_discard_back(in_iov, &in_num, sizeof(struct virtio_blk_inhdr));
223
+
224
+ type = le32_to_cpu(req->out.type);
225
+ switch (type & ~VIRTIO_BLK_T_BARRIER) {
226
+ case VIRTIO_BLK_T_IN:
227
+ case VIRTIO_BLK_T_OUT: {
228
+ ssize_t ret = 0;
229
+ bool is_write = type & VIRTIO_BLK_T_OUT;
230
+ req->sector_num = le64_to_cpu(req->out.sector);
231
+
232
+ int64_t offset = req->sector_num * vdev_blk->blk_size;
233
+ QEMUIOVector qiov;
234
+ if (is_write) {
235
+ qemu_iovec_init_external(&qiov, out_iov, out_num);
236
+ ret = blk_co_pwritev(backend, offset, qiov.size,
237
+ &qiov, 0);
238
+ } else {
239
+ qemu_iovec_init_external(&qiov, in_iov, in_num);
240
+ ret = blk_co_preadv(backend, offset, qiov.size,
241
+ &qiov, 0);
242
+ }
243
+ if (ret >= 0) {
244
+ req->in->status = VIRTIO_BLK_S_OK;
245
+ } else {
246
+ req->in->status = VIRTIO_BLK_S_IOERR;
247
+ }
248
+ break;
249
+ }
250
+ case VIRTIO_BLK_T_FLUSH:
251
+ vu_block_flush(req);
252
+ req->in->status = VIRTIO_BLK_S_OK;
253
+ break;
254
+ case VIRTIO_BLK_T_GET_ID: {
255
+ size_t size = MIN(iov_size(&elem->in_sg[0], in_num),
256
+ VIRTIO_BLK_ID_BYTES);
257
+ snprintf(elem->in_sg[0].iov_base, size, "%s", "vhost_user_blk");
258
+ req->in->status = VIRTIO_BLK_S_OK;
259
+ req->size = elem->in_sg[0].iov_len;
260
+ break;
261
+ }
262
+ case VIRTIO_BLK_T_DISCARD:
263
+ case VIRTIO_BLK_T_WRITE_ZEROES: {
264
+ int rc;
265
+ rc = vu_block_discard_write_zeroes(req, &elem->out_sg[1],
266
+ out_num, type);
267
+ if (rc == 0) {
268
+ req->in->status = VIRTIO_BLK_S_OK;
269
+ } else {
270
+ req->in->status = VIRTIO_BLK_S_IOERR;
271
+ }
272
+ break;
273
+ }
274
+ default:
275
+ req->in->status = VIRTIO_BLK_S_UNSUPP;
276
+ break;
277
+ }
278
+
279
+ vu_block_req_complete(req);
280
+ return;
281
+
282
+err:
283
+ free(elem);
284
+ g_free(req);
285
+ return;
286
+}
287
+
288
+static void vu_block_process_vq(VuDev *vu_dev, int idx)
289
+{
290
+ VuServer *server;
291
+ VuVirtq *vq;
292
+ struct req_data *req_data;
293
+
294
+ server = container_of(vu_dev, VuServer, vu_dev);
295
+ assert(server);
296
+
297
+ vq = vu_get_queue(vu_dev, idx);
298
+ assert(vq);
299
+ VuVirtqElement *elem;
300
+ while (1) {
301
+ elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) +
302
+ sizeof(VuBlockReq));
303
+ if (elem) {
304
+ req_data = g_new0(struct req_data, 1);
305
+ req_data->server = server;
306
+ req_data->vq = vq;
307
+ req_data->elem = elem;
308
+ Coroutine *co = qemu_coroutine_create(vu_block_virtio_process_req,
309
+ req_data);
310
+ aio_co_enter(server->ioc->ctx, co);
311
+ } else {
312
+ break;
313
+ }
314
+ }
315
+}
316
+
317
+static void vu_block_queue_set_started(VuDev *vu_dev, int idx, bool started)
318
+{
319
+ VuVirtq *vq;
320
+
321
+ assert(vu_dev);
322
+
323
+ vq = vu_get_queue(vu_dev, idx);
324
+ vu_set_queue_handler(vu_dev, vq, started ? vu_block_process_vq : NULL);
325
+}
326
+
327
+static uint64_t vu_block_get_features(VuDev *dev)
328
+{
329
+ uint64_t features;
330
+ VuServer *server = container_of(dev, VuServer, vu_dev);
331
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
332
+ features = 1ull << VIRTIO_BLK_F_SIZE_MAX |
333
+ 1ull << VIRTIO_BLK_F_SEG_MAX |
334
+ 1ull << VIRTIO_BLK_F_TOPOLOGY |
335
+ 1ull << VIRTIO_BLK_F_BLK_SIZE |
336
+ 1ull << VIRTIO_BLK_F_FLUSH |
337
+ 1ull << VIRTIO_BLK_F_DISCARD |
338
+ 1ull << VIRTIO_BLK_F_WRITE_ZEROES |
339
+ 1ull << VIRTIO_BLK_F_CONFIG_WCE |
340
+ 1ull << VIRTIO_F_VERSION_1 |
341
+ 1ull << VIRTIO_RING_F_INDIRECT_DESC |
342
+ 1ull << VIRTIO_RING_F_EVENT_IDX |
343
+ 1ull << VHOST_USER_F_PROTOCOL_FEATURES;
344
+
345
+ if (!vdev_blk->writable) {
346
+ features |= 1ull << VIRTIO_BLK_F_RO;
347
+ }
348
+
349
+ return features;
350
+}
351
+
352
+static uint64_t vu_block_get_protocol_features(VuDev *dev)
353
+{
354
+ return 1ull << VHOST_USER_PROTOCOL_F_CONFIG |
355
+ 1ull << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD;
356
+}
357
+
358
+static int
359
+vu_block_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
360
+{
361
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
362
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
363
+ memcpy(config, &vdev_blk->blkcfg, len);
364
+
365
+ return 0;
366
+}
367
+
368
+static int
369
+vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
370
+ uint32_t offset, uint32_t size, uint32_t flags)
371
+{
372
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
373
+ VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
374
+ uint8_t wce;
375
+
376
+ /* don't support live migration */
377
+ if (flags != VHOST_SET_CONFIG_TYPE_MASTER) {
378
+ return -EINVAL;
379
+ }
380
+
381
+ if (offset != offsetof(struct virtio_blk_config, wce) ||
382
+ size != 1) {
383
+ return -EINVAL;
384
+ }
385
+
386
+ wce = *data;
387
+ vdev_blk->blkcfg.wce = wce;
388
+ blk_set_enable_write_cache(vdev_blk->backend, wce);
389
+ return 0;
390
+}
391
+
392
+/*
393
+ * When the client disconnects, it sends a VHOST_USER_NONE request
394
+ * and vu_process_message will simple call exit which cause the VM
395
+ * to exit abruptly.
396
+ * To avoid this issue, process VHOST_USER_NONE request ahead
397
+ * of vu_process_message.
398
+ *
399
+ */
400
+static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
401
+{
402
+ if (vmsg->request == VHOST_USER_NONE) {
403
+ dev->panic(dev, "disconnect");
404
+ return true;
405
+ }
406
+ return false;
407
+}
408
+
409
+static const VuDevIface vu_block_iface = {
410
+ .get_features = vu_block_get_features,
411
+ .queue_set_started = vu_block_queue_set_started,
412
+ .get_protocol_features = vu_block_get_protocol_features,
413
+ .get_config = vu_block_get_config,
414
+ .set_config = vu_block_set_config,
415
+ .process_msg = vu_block_process_msg,
416
+};
417
+
418
+static void blk_aio_attached(AioContext *ctx, void *opaque)
419
+{
420
+ VuBlockDev *vub_dev = opaque;
421
+ aio_context_acquire(ctx);
422
+ vhost_user_server_set_aio_context(&vub_dev->vu_server, ctx);
423
+ aio_context_release(ctx);
424
+}
425
+
426
+static void blk_aio_detach(void *opaque)
427
+{
428
+ VuBlockDev *vub_dev = opaque;
429
+ AioContext *ctx = vub_dev->vu_server.ctx;
430
+ aio_context_acquire(ctx);
431
+ vhost_user_server_set_aio_context(&vub_dev->vu_server, NULL);
432
+ aio_context_release(ctx);
433
+}
434
+
435
+static void
436
+vu_block_initialize_config(BlockDriverState *bs,
437
+ struct virtio_blk_config *config, uint32_t blk_size)
438
+{
439
+ config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
440
+ config->blk_size = blk_size;
441
+ config->size_max = 0;
442
+ config->seg_max = 128 - 2;
443
+ config->min_io_size = 1;
444
+ config->opt_io_size = 1;
445
+ config->num_queues = VHOST_USER_BLK_MAX_QUEUES;
446
+ config->max_discard_sectors = 32768;
447
+ config->max_discard_seg = 1;
448
+ config->discard_sector_alignment = config->blk_size >> 9;
449
+ config->max_write_zeroes_sectors = 32768;
450
+ config->max_write_zeroes_seg = 1;
451
+}
452
+
453
+static VuBlockDev *vu_block_init(VuBlockDev *vu_block_device, Error **errp)
454
+{
455
+
456
+ BlockBackend *blk;
457
+ Error *local_error = NULL;
458
+ const char *node_name = vu_block_device->node_name;
459
+ bool writable = vu_block_device->writable;
460
+ uint64_t perm = BLK_PERM_CONSISTENT_READ;
461
+ int ret;
462
+
463
+ AioContext *ctx;
464
+
465
+ BlockDriverState *bs = bdrv_lookup_bs(node_name, node_name, &local_error);
466
+
467
+ if (!bs) {
468
+ error_propagate(errp, local_error);
469
+ return NULL;
470
+ }
471
+
472
+ if (bdrv_is_read_only(bs)) {
473
+ writable = false;
474
+ }
475
+
476
+ if (writable) {
477
+ perm |= BLK_PERM_WRITE;
478
+ }
479
+
480
+ ctx = bdrv_get_aio_context(bs);
481
+ aio_context_acquire(ctx);
482
+ bdrv_invalidate_cache(bs, NULL);
483
+ aio_context_release(ctx);
484
+
485
+ /*
486
+ * Don't allow resize while the vhost user server is running,
487
+ * otherwise we don't care what happens with the node.
488
+ */
489
+ blk = blk_new(bdrv_get_aio_context(bs), perm,
490
+ BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
491
+ BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD);
492
+ ret = blk_insert_bs(blk, bs, errp);
493
+
494
+ if (ret < 0) {
495
+ goto fail;
496
+ }
497
+
498
+ blk_set_enable_write_cache(blk, false);
499
+
500
+ blk_set_allow_aio_context_change(blk, true);
501
+
502
+ vu_block_device->blkcfg.wce = 0;
503
+ vu_block_device->backend = blk;
504
+ if (!vu_block_device->blk_size) {
505
+ vu_block_device->blk_size = BDRV_SECTOR_SIZE;
506
+ }
507
+ vu_block_device->blkcfg.blk_size = vu_block_device->blk_size;
508
+ blk_set_guest_block_size(blk, vu_block_device->blk_size);
509
+ vu_block_initialize_config(bs, &vu_block_device->blkcfg,
510
+ vu_block_device->blk_size);
511
+ return vu_block_device;
512
+
513
+fail:
514
+ blk_unref(blk);
515
+ return NULL;
516
+}
517
+
518
+static void vu_block_deinit(VuBlockDev *vu_block_device)
519
+{
520
+ if (vu_block_device->backend) {
521
+ blk_remove_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
522
+ blk_aio_detach, vu_block_device);
523
+ }
524
+
525
+ blk_unref(vu_block_device->backend);
526
+}
527
+
528
+static void vhost_user_blk_server_stop(VuBlockDev *vu_block_device)
529
+{
530
+ vhost_user_server_stop(&vu_block_device->vu_server);
531
+ vu_block_deinit(vu_block_device);
532
+}
533
+
534
+static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
535
+ Error **errp)
536
+{
537
+ AioContext *ctx;
538
+ SocketAddress *addr = vu_block_device->addr;
539
+
540
+ if (!vu_block_init(vu_block_device, errp)) {
541
+ return;
113
+ return;
542
+ }
114
+ }
543
+
115
+
544
+ ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
116
+ if (!fd_is_socket(fd)) {
545
+
117
+ error_setg(errp, "proxy: fd %d is not a socket", fd);
546
+ if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
118
+ close(fd);
547
+ VHOST_USER_BLK_MAX_QUEUES,
548
+ NULL, &vu_block_iface,
549
+ errp)) {
550
+ goto error;
551
+ }
552
+
553
+ blk_add_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
554
+ blk_aio_detach, vu_block_device);
555
+ vu_block_device->running = true;
556
+ return;
557
+
558
+ error:
559
+ vu_block_deinit(vu_block_device);
560
+}
561
+
562
+static bool vu_prop_modifiable(VuBlockDev *vus, Error **errp)
563
+{
564
+ if (vus->running) {
565
+ error_setg(errp, "The property can't be modified "
566
+ "while the server is running");
567
+ return false;
568
+ }
569
+ return true;
570
+}
571
+
572
+static void vu_set_node_name(Object *obj, const char *value, Error **errp)
573
+{
574
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
575
+
576
+ if (!vu_prop_modifiable(vus, errp)) {
577
+ return;
119
+ return;
578
+ }
120
+ }
579
+
121
+
580
+ if (vus->node_name) {
122
+ dev->ioc = qio_channel_new_fd(fd, errp);
581
+ g_free(vus->node_name);
123
+
124
+ error_setg(&dev->migration_blocker, "%s does not support migration",
125
+ TYPE_PCI_PROXY_DEV);
126
+ migrate_add_blocker(dev->migration_blocker, errp);
127
+
128
+ qemu_mutex_init(&dev->io_mutex);
129
+ qio_channel_set_blocking(dev->ioc, true, NULL);
130
+}
131
+
132
+static void pci_proxy_dev_exit(PCIDevice *pdev)
133
+{
134
+ PCIProxyDev *dev = PCI_PROXY_DEV(pdev);
135
+
136
+ if (dev->ioc) {
137
+ qio_channel_close(dev->ioc, NULL);
582
+ }
138
+ }
583
+
139
+
584
+ vus->node_name = g_strdup(value);
140
+ migrate_del_blocker(dev->migration_blocker);
141
+
142
+ error_free(dev->migration_blocker);
585
+}
143
+}
586
+
144
+
587
+static char *vu_get_node_name(Object *obj, Error **errp)
145
+static Property proxy_properties[] = {
146
+ DEFINE_PROP_STRING("fd", PCIProxyDev, fd),
147
+ DEFINE_PROP_END_OF_LIST(),
148
+};
149
+
150
+static void pci_proxy_dev_class_init(ObjectClass *klass, void *data)
588
+{
151
+{
589
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
152
+ DeviceClass *dc = DEVICE_CLASS(klass);
590
+ return g_strdup(vus->node_name);
153
+ PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
154
+
155
+ k->realize = pci_proxy_dev_realize;
156
+ k->exit = pci_proxy_dev_exit;
157
+ device_class_set_props(dc, proxy_properties);
591
+}
158
+}
592
+
159
+
593
+static void free_socket_addr(SocketAddress *addr)
160
+static const TypeInfo pci_proxy_dev_type_info = {
594
+{
161
+ .name = TYPE_PCI_PROXY_DEV,
595
+ g_free(addr->u.q_unix.path);
162
+ .parent = TYPE_PCI_DEVICE,
596
+ g_free(addr);
163
+ .instance_size = sizeof(PCIProxyDev),
597
+}
164
+ .class_init = pci_proxy_dev_class_init,
598
+
599
+static void vu_set_unix_socket(Object *obj, const char *value,
600
+ Error **errp)
601
+{
602
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
603
+
604
+ if (!vu_prop_modifiable(vus, errp)) {
605
+ return;
606
+ }
607
+
608
+ if (vus->addr) {
609
+ free_socket_addr(vus->addr);
610
+ }
611
+
612
+ SocketAddress *addr = g_new0(SocketAddress, 1);
613
+ addr->type = SOCKET_ADDRESS_TYPE_UNIX;
614
+ addr->u.q_unix.path = g_strdup(value);
615
+ vus->addr = addr;
616
+}
617
+
618
+static char *vu_get_unix_socket(Object *obj, Error **errp)
619
+{
620
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
621
+ return g_strdup(vus->addr->u.q_unix.path);
622
+}
623
+
624
+static bool vu_get_block_writable(Object *obj, Error **errp)
625
+{
626
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
627
+ return vus->writable;
628
+}
629
+
630
+static void vu_set_block_writable(Object *obj, bool value, Error **errp)
631
+{
632
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
633
+
634
+ if (!vu_prop_modifiable(vus, errp)) {
635
+ return;
636
+ }
637
+
638
+ vus->writable = value;
639
+}
640
+
641
+static void vu_get_blk_size(Object *obj, Visitor *v, const char *name,
642
+ void *opaque, Error **errp)
643
+{
644
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
645
+ uint32_t value = vus->blk_size;
646
+
647
+ visit_type_uint32(v, name, &value, errp);
648
+}
649
+
650
+static void vu_set_blk_size(Object *obj, Visitor *v, const char *name,
651
+ void *opaque, Error **errp)
652
+{
653
+ VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
654
+
655
+ Error *local_err = NULL;
656
+ uint32_t value;
657
+
658
+ if (!vu_prop_modifiable(vus, errp)) {
659
+ return;
660
+ }
661
+
662
+ visit_type_uint32(v, name, &value, &local_err);
663
+ if (local_err) {
664
+ goto out;
665
+ }
666
+
667
+ check_block_size(object_get_typename(obj), name, value, &local_err);
668
+ if (local_err) {
669
+ goto out;
670
+ }
671
+
672
+ vus->blk_size = value;
673
+
674
+out:
675
+ error_propagate(errp, local_err);
676
+}
677
+
678
+static void vhost_user_blk_server_instance_finalize(Object *obj)
679
+{
680
+ VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
681
+
682
+ vhost_user_blk_server_stop(vub);
683
+
684
+ /*
685
+ * Unlike object_property_add_str, object_class_property_add_str
686
+ * doesn't have a release method. Thus manual memory freeing is
687
+ * needed.
688
+ */
689
+ free_socket_addr(vub->addr);
690
+ g_free(vub->node_name);
691
+}
692
+
693
+static void vhost_user_blk_server_complete(UserCreatable *obj, Error **errp)
694
+{
695
+ VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
696
+
697
+ vhost_user_blk_server_start(vub, errp);
698
+}
699
+
700
+static void vhost_user_blk_server_class_init(ObjectClass *klass,
701
+ void *class_data)
702
+{
703
+ UserCreatableClass *ucc = USER_CREATABLE_CLASS(klass);
704
+ ucc->complete = vhost_user_blk_server_complete;
705
+
706
+ object_class_property_add_bool(klass, "writable",
707
+ vu_get_block_writable,
708
+ vu_set_block_writable);
709
+
710
+ object_class_property_add_str(klass, "node-name",
711
+ vu_get_node_name,
712
+ vu_set_node_name);
713
+
714
+ object_class_property_add_str(klass, "unix-socket",
715
+ vu_get_unix_socket,
716
+ vu_set_unix_socket);
717
+
718
+ object_class_property_add(klass, "logical-block-size", "uint32",
719
+ vu_get_blk_size, vu_set_blk_size,
720
+ NULL, NULL);
721
+}
722
+
723
+static const TypeInfo vhost_user_blk_server_info = {
724
+ .name = TYPE_VHOST_USER_BLK_SERVER,
725
+ .parent = TYPE_OBJECT,
726
+ .instance_size = sizeof(VuBlockDev),
727
+ .instance_finalize = vhost_user_blk_server_instance_finalize,
728
+ .class_init = vhost_user_blk_server_class_init,
729
+ .interfaces = (InterfaceInfo[]) {
165
+ .interfaces = (InterfaceInfo[]) {
730
+ {TYPE_USER_CREATABLE},
166
+ { INTERFACE_CONVENTIONAL_PCI_DEVICE },
731
+ {}
167
+ { },
732
+ },
168
+ },
733
+};
169
+};
734
+
170
+
735
+static void vhost_user_blk_server_register_types(void)
171
+static void pci_proxy_dev_register_types(void)
736
+{
172
+{
737
+ type_register_static(&vhost_user_blk_server_info);
173
+ type_register_static(&pci_proxy_dev_type_info);
738
+}
174
+}
739
+
175
+
740
+type_init(vhost_user_blk_server_register_types)
176
+type_init(pci_proxy_dev_register_types)
741
diff --git a/softmmu/vl.c b/softmmu/vl.c
177
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
742
index XXXXXXX..XXXXXXX 100644
178
index XXXXXXX..XXXXXXX 100644
743
--- a/softmmu/vl.c
179
--- a/hw/remote/meson.build
744
+++ b/softmmu/vl.c
180
+++ b/hw/remote/meson.build
745
@@ -XXX,XX +XXX,XX @@ static bool object_create_initial(const char *type, QemuOpts *opts)
181
@@ -XXX,XX +XXX,XX @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('machine.c'))
746
}
182
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('mpqemu-link.c'))
747
#endif
183
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
748
184
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
749
+ /* Reason: vhost-user-blk-server property "node-name" */
185
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
750
+ if (g_str_equal(type, "vhost-user-blk-server")) {
186
751
+ return false;
187
specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
752
+ }
188
753
/*
754
* Reason: filter-* property "netdev" etc.
755
*/
756
diff --git a/block/meson.build b/block/meson.build
757
index XXXXXXX..XXXXXXX 100644
758
--- a/block/meson.build
759
+++ b/block/meson.build
760
@@ -XXX,XX +XXX,XX @@ block_ss.add(when: 'CONFIG_WIN32', if_true: files('file-win32.c', 'win32-aio.c')
761
block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, iokit])
762
block_ss.add(when: 'CONFIG_LIBISCSI', if_true: files('iscsi-opts.c'))
763
block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c'))
764
+block_ss.add(when: 'CONFIG_LINUX', if_true: files('export/vhost-user-blk-server.c', '../contrib/libvhost-user/libvhost-user.c'))
765
block_ss.add(when: 'CONFIG_REPLICATION', if_true: files('replication.c'))
766
block_ss.add(when: 'CONFIG_SHEEPDOG', if_true: files('sheepdog.c'))
767
block_ss.add(when: ['CONFIG_LINUX_AIO', libaio], if_true: files('linux-aio.c'))
768
--
189
--
769
2.26.2
190
2.29.2
770
191
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
2
2
3
bdrv_co_block_status_above has several design problems with handling
3
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
4
short backing files:
4
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
5
5
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
6
1. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but
6
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7
without BDRV_BLOCK_ALLOCATED flag, when actually short backing file
7
Message-id: d54edb4176361eed86b903e8f27058363b6c83b3.1611938319.git.jag.raman@oracle.com
8
which produces these after-EOF zeros is inside requested backing
9
sequence.
10
11
2. With want_zero=false, it may return pnum=0 prior to actual EOF,
12
because of EOF of short backing file.
13
14
Fix these things, making logic about short backing files clearer.
15
16
With fixed bdrv_block_status_above we also have to improve is_zero in
17
qcow2 code, otherwise iotest 154 will fail, because with this patch we
18
stop to merge zeros of different types (produced by fully unallocated
19
in the whole backing chain regions vs produced by short backing files).
20
21
Note also, that this patch leaves for another day the general problem
22
around block-status: misuse of BDRV_BLOCK_ALLOCATED as is-fs-allocated
23
vs go-to-backing.
24
25
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
26
Reviewed-by: Alberto Garcia <berto@igalia.com>
27
Reviewed-by: Eric Blake <eblake@redhat.com>
28
Message-id: 20200924194003.22080-2-vsementsov@virtuozzo.com
29
[Fix s/comes/come/ as suggested by Eric Blake
30
--Stefan]
31
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
32
---
9
---
33
block/io.c | 68 ++++++++++++++++++++++++++++++++++++++++-----------
10
include/hw/remote/mpqemu-link.h | 4 ++++
34
block/qcow2.c | 16 ++++++++++--
11
hw/remote/mpqemu-link.c | 34 +++++++++++++++++++++++++++++++++
35
2 files changed, 68 insertions(+), 16 deletions(-)
12
2 files changed, 38 insertions(+)
36
13
37
diff --git a/block/io.c b/block/io.c
14
diff --git a/include/hw/remote/mpqemu-link.h b/include/hw/remote/mpqemu-link.h
38
index XXXXXXX..XXXXXXX 100644
15
index XXXXXXX..XXXXXXX 100644
39
--- a/block/io.c
16
--- a/include/hw/remote/mpqemu-link.h
40
+++ b/block/io.c
17
+++ b/include/hw/remote/mpqemu-link.h
41
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
18
@@ -XXX,XX +XXX,XX @@
42
int64_t *map,
19
#include "qemu/thread.h"
43
BlockDriverState **file)
20
#include "io/channel.h"
44
{
21
#include "exec/hwaddr.h"
45
+ int ret;
22
+#include "io/channel-socket.h"
46
BlockDriverState *p;
23
+#include "hw/remote/proxy.h"
47
- int ret = 0;
24
48
- bool first = true;
25
#define REMOTE_MAX_FDS 8
49
+ int64_t eof = 0;
26
50
27
@@ -XXX,XX +XXX,XX @@ typedef struct {
51
assert(bs != base);
28
bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp);
52
- for (p = bs; p != base; p = bdrv_filter_or_cow_bs(p)) {
29
bool mpqemu_msg_recv(MPQemuMsg *msg, QIOChannel *ioc, Error **errp);
30
31
+uint64_t mpqemu_msg_send_and_await_reply(MPQemuMsg *msg, PCIProxyDev *pdev,
32
+ Error **errp);
33
bool mpqemu_msg_valid(MPQemuMsg *msg);
34
35
#endif
36
diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
37
index XXXXXXX..XXXXXXX 100644
38
--- a/hw/remote/mpqemu-link.c
39
+++ b/hw/remote/mpqemu-link.c
40
@@ -XXX,XX +XXX,XX @@ fail:
41
return ret;
42
}
43
44
+/*
45
+ * Send msg and wait for a reply with command code RET_MSG.
46
+ * Returns the message received of size u64 or UINT64_MAX
47
+ * on error.
48
+ * Called from VCPU thread in non-coroutine context.
49
+ * Used by the Proxy object to communicate to remote processes.
50
+ */
51
+uint64_t mpqemu_msg_send_and_await_reply(MPQemuMsg *msg, PCIProxyDev *pdev,
52
+ Error **errp)
53
+{
54
+ ERRP_GUARD();
55
+ MPQemuMsg msg_reply = {0};
56
+ uint64_t ret = UINT64_MAX;
53
+
57
+
54
+ ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
58
+ assert(!qemu_in_coroutine());
55
+ if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED) {
59
+
60
+ QEMU_LOCK_GUARD(&pdev->io_mutex);
61
+ if (!mpqemu_msg_send(msg, pdev->ioc, errp)) {
56
+ return ret;
62
+ return ret;
57
+ }
63
+ }
58
+
64
+
59
+ if (ret & BDRV_BLOCK_EOF) {
65
+ if (!mpqemu_msg_recv(&msg_reply, pdev->ioc, errp)) {
60
+ eof = offset + *pnum;
66
+ return ret;
61
+ }
67
+ }
62
+
68
+
63
+ assert(*pnum <= bytes);
69
+ if (!mpqemu_msg_valid(&msg_reply)) {
64
+ bytes = *pnum;
70
+ error_setg(errp, "ERROR: Invalid reply received for command %d",
65
+
71
+ msg->cmd);
66
+ for (p = bdrv_filter_or_cow_bs(bs); p != base;
72
+ return ret;
67
+ p = bdrv_filter_or_cow_bs(p))
68
+ {
69
ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
70
file);
71
if (ret < 0) {
72
- break;
73
+ return ret;
74
}
75
- if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
76
+ if (*pnum == 0) {
77
/*
78
- * Reading beyond the end of the file continues to read
79
- * zeroes, but we can only widen the result to the
80
- * unallocated length we learned from an earlier
81
- * iteration.
82
+ * The top layer deferred to this layer, and because this layer is
83
+ * short, any zeroes that we synthesize beyond EOF behave as if they
84
+ * were allocated at this layer.
85
+ *
86
+ * We don't include BDRV_BLOCK_EOF into ret, as upper layer may be
87
+ * larger. We'll add BDRV_BLOCK_EOF if needed at function end, see
88
+ * below.
89
*/
90
+ assert(ret & BDRV_BLOCK_EOF);
91
*pnum = bytes;
92
+ if (file) {
93
+ *file = p;
94
+ }
95
+ ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_ALLOCATED;
96
+ break;
97
}
98
- if (ret & (BDRV_BLOCK_ZERO | BDRV_BLOCK_DATA)) {
99
+ if (ret & BDRV_BLOCK_ALLOCATED) {
100
+ /*
101
+ * We've found the node and the status, we must break.
102
+ *
103
+ * Drop BDRV_BLOCK_EOF, as it's not for upper layer, which may be
104
+ * larger. We'll add BDRV_BLOCK_EOF if needed at function end, see
105
+ * below.
106
+ */
107
+ ret &= ~BDRV_BLOCK_EOF;
108
break;
109
}
110
- /* [offset, pnum] unallocated on this layer, which could be only
111
- * the first part of [offset, bytes]. */
112
- bytes = MIN(bytes, *pnum);
113
- first = false;
114
+
115
+ /*
116
+ * OK, [offset, offset + *pnum) region is unallocated on this layer,
117
+ * let's continue the diving.
118
+ */
119
+ assert(*pnum <= bytes);
120
+ bytes = *pnum;
121
+ }
73
+ }
122
+
74
+
123
+ if (offset + *pnum == eof) {
75
+ return msg_reply.data.u64;
124
+ ret |= BDRV_BLOCK_EOF;
76
+}
125
}
126
+
77
+
127
return ret;
78
bool mpqemu_msg_valid(MPQemuMsg *msg)
128
}
79
{
129
80
if (msg->cmd >= MPQEMU_CMD_MAX && msg->cmd < 0) {
130
diff --git a/block/qcow2.c b/block/qcow2.c
131
index XXXXXXX..XXXXXXX 100644
132
--- a/block/qcow2.c
133
+++ b/block/qcow2.c
134
@@ -XXX,XX +XXX,XX @@ static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
135
if (!bytes) {
136
return true;
137
}
138
- res = bdrv_block_status_above(bs, NULL, offset, bytes, &nr, NULL, NULL);
139
- return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
140
+
141
+ /*
142
+ * bdrv_block_status_above doesn't merge different types of zeros, for
143
+ * example, zeros which come from the region which is unallocated in
144
+ * the whole backing chain, and zeros which come because of a short
145
+ * backing file. So, we need a loop.
146
+ */
147
+ do {
148
+ res = bdrv_block_status_above(bs, NULL, offset, bytes, &nr, NULL, NULL);
149
+ offset += nr;
150
+ bytes -= nr;
151
+ } while (res >= 0 && (res & BDRV_BLOCK_ZERO) && nr && bytes);
152
+
153
+ return res >= 0 && (res & BDRV_BLOCK_ZERO) && bytes == 0;
154
}
155
156
static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
157
--
81
--
158
2.26.2
82
2.29.2
159
83
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
2
2
3
In order to reuse bdrv_common_block_status_above in
3
The Proxy Object sends the PCI config space accesses as messages
4
bdrv_is_allocated_above, let's support include_base parameter.
4
to the remote process over the communication channel
5
5
6
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
6
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
7
Reviewed-by: Alberto Garcia <berto@igalia.com>
7
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
8
Reviewed-by: Eric Blake <eblake@redhat.com>
8
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
9
Message-id: 20200924194003.22080-3-vsementsov@virtuozzo.com
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Message-id: d3c94f4618813234655356c60e6f0d0362ff42d6.1611938319.git.jag.raman@oracle.com
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
12
---
12
block/coroutines.h | 2 ++
13
include/hw/remote/mpqemu-link.h | 10 ++++++
13
block/io.c | 21 ++++++++++++++-------
14
hw/remote/message.c | 60 +++++++++++++++++++++++++++++++++
14
2 files changed, 16 insertions(+), 7 deletions(-)
15
hw/remote/mpqemu-link.c | 8 ++++-
15
16
hw/remote/proxy.c | 55 ++++++++++++++++++++++++++++++
16
diff --git a/block/coroutines.h b/block/coroutines.h
17
4 files changed, 132 insertions(+), 1 deletion(-)
17
index XXXXXXX..XXXXXXX 100644
18
18
--- a/block/coroutines.h
19
diff --git a/include/hw/remote/mpqemu-link.h b/include/hw/remote/mpqemu-link.h
19
+++ b/block/coroutines.h
20
index XXXXXXX..XXXXXXX 100644
20
@@ -XXX,XX +XXX,XX @@ bdrv_pwritev(BdrvChild *child, int64_t offset, unsigned int bytes,
21
--- a/include/hw/remote/mpqemu-link.h
21
int coroutine_fn
22
+++ b/include/hw/remote/mpqemu-link.h
22
bdrv_co_common_block_status_above(BlockDriverState *bs,
23
@@ -XXX,XX +XXX,XX @@
23
BlockDriverState *base,
24
*/
24
+ bool include_base,
25
typedef enum {
25
bool want_zero,
26
MPQEMU_CMD_SYNC_SYSMEM,
26
int64_t offset,
27
+ MPQEMU_CMD_RET,
27
int64_t bytes,
28
+ MPQEMU_CMD_PCI_CFGWRITE,
28
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
29
+ MPQEMU_CMD_PCI_CFGREAD,
29
int generated_co_wrapper
30
MPQEMU_CMD_MAX,
30
bdrv_common_block_status_above(BlockDriverState *bs,
31
} MPQemuCmd;
31
BlockDriverState *base,
32
32
+ bool include_base,
33
@@ -XXX,XX +XXX,XX @@ typedef struct {
33
bool want_zero,
34
off_t offsets[REMOTE_MAX_FDS];
34
int64_t offset,
35
} SyncSysmemMsg;
35
int64_t bytes,
36
36
diff --git a/block/io.c b/block/io.c
37
+typedef struct {
37
index XXXXXXX..XXXXXXX 100644
38
+ uint32_t addr;
38
--- a/block/io.c
39
+ uint32_t val;
39
+++ b/block/io.c
40
+ int len;
40
@@ -XXX,XX +XXX,XX @@ early_out:
41
+} PciConfDataMsg;
41
int coroutine_fn
42
+
42
bdrv_co_common_block_status_above(BlockDriverState *bs,
43
/**
43
BlockDriverState *base,
44
* MPQemuMsg:
44
+ bool include_base,
45
* @cmd: The remote command
45
bool want_zero,
46
@@ -XXX,XX +XXX,XX @@ typedef struct {
46
int64_t offset,
47
47
int64_t bytes,
48
union {
48
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
49
uint64_t u64;
49
BlockDriverState *p;
50
+ PciConfDataMsg pci_conf_data;
50
int64_t eof = 0;
51
SyncSysmemMsg sync_sysmem;
51
52
} data;
52
- assert(bs != base);
53
53
+ assert(include_base || bs != base);
54
diff --git a/hw/remote/message.c b/hw/remote/message.c
54
+ assert(!include_base || base); /* Can't include NULL base */
55
index XXXXXXX..XXXXXXX 100644
55
56
--- a/hw/remote/message.c
56
ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
57
+++ b/hw/remote/message.c
57
- if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED) {
58
@@ -XXX,XX +XXX,XX @@
58
+ if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
59
#include "hw/remote/mpqemu-link.h"
60
#include "qapi/error.h"
61
#include "sysemu/runstate.h"
62
+#include "hw/pci/pci.h"
63
+
64
+static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
65
+ MPQemuMsg *msg, Error **errp);
66
+static void process_config_read(QIOChannel *ioc, PCIDevice *dev,
67
+ MPQemuMsg *msg, Error **errp);
68
69
void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
70
{
71
@@ -XXX,XX +XXX,XX @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
72
}
73
74
switch (msg.cmd) {
75
+ case MPQEMU_CMD_PCI_CFGWRITE:
76
+ process_config_write(com->ioc, pci_dev, &msg, &local_err);
77
+ break;
78
+ case MPQEMU_CMD_PCI_CFGREAD:
79
+ process_config_read(com->ioc, pci_dev, &msg, &local_err);
80
+ break;
81
default:
82
error_setg(&local_err,
83
"Unknown command (%d) received for device %s"
84
@@ -XXX,XX +XXX,XX @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
85
qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
86
}
87
}
88
+
89
+static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
90
+ MPQemuMsg *msg, Error **errp)
91
+{
92
+ ERRP_GUARD();
93
+ PciConfDataMsg *conf = (PciConfDataMsg *)&msg->data.pci_conf_data;
94
+ MPQemuMsg ret = { 0 };
95
+
96
+ if ((conf->addr + sizeof(conf->val)) > pci_config_size(dev)) {
97
+ error_setg(errp, "Bad address for PCI config write, pid "FMT_pid".",
98
+ getpid());
99
+ ret.data.u64 = UINT64_MAX;
100
+ } else {
101
+ pci_default_write_config(dev, conf->addr, conf->val, conf->len);
102
+ }
103
+
104
+ ret.cmd = MPQEMU_CMD_RET;
105
+ ret.size = sizeof(ret.data.u64);
106
+
107
+ if (!mpqemu_msg_send(&ret, ioc, NULL)) {
108
+ error_prepend(errp, "Error returning code to proxy, pid "FMT_pid": ",
109
+ getpid());
110
+ }
111
+}
112
+
113
+static void process_config_read(QIOChannel *ioc, PCIDevice *dev,
114
+ MPQemuMsg *msg, Error **errp)
115
+{
116
+ ERRP_GUARD();
117
+ PciConfDataMsg *conf = (PciConfDataMsg *)&msg->data.pci_conf_data;
118
+ MPQemuMsg ret = { 0 };
119
+
120
+ if ((conf->addr + sizeof(conf->val)) > pci_config_size(dev)) {
121
+ error_setg(errp, "Bad address for PCI config read, pid "FMT_pid".",
122
+ getpid());
123
+ ret.data.u64 = UINT64_MAX;
124
+ } else {
125
+ ret.data.u64 = pci_default_read_config(dev, conf->addr, conf->len);
126
+ }
127
+
128
+ ret.cmd = MPQEMU_CMD_RET;
129
+ ret.size = sizeof(ret.data.u64);
130
+
131
+ if (!mpqemu_msg_send(&ret, ioc, NULL)) {
132
+ error_prepend(errp, "Error returning code to proxy, pid "FMT_pid": ",
133
+ getpid());
134
+ }
135
+}
136
diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
137
index XXXXXXX..XXXXXXX 100644
138
--- a/hw/remote/mpqemu-link.c
139
+++ b/hw/remote/mpqemu-link.c
140
@@ -XXX,XX +XXX,XX @@ uint64_t mpqemu_msg_send_and_await_reply(MPQemuMsg *msg, PCIProxyDev *pdev,
59
return ret;
141
return ret;
60
}
142
}
61
143
62
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
144
- if (!mpqemu_msg_valid(&msg_reply)) {
63
assert(*pnum <= bytes);
145
+ if (!mpqemu_msg_valid(&msg_reply) || msg_reply.cmd != MPQEMU_CMD_RET) {
64
bytes = *pnum;
146
error_setg(errp, "ERROR: Invalid reply received for command %d",
65
147
msg->cmd);
66
- for (p = bdrv_filter_or_cow_bs(bs); p != base;
148
return ret;
67
+ for (p = bdrv_filter_or_cow_bs(bs); include_base || p != base;
149
@@ -XXX,XX +XXX,XX @@ bool mpqemu_msg_valid(MPQemuMsg *msg)
68
p = bdrv_filter_or_cow_bs(p))
150
return false;
69
{
70
ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
71
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
72
break;
73
}
151
}
74
152
break;
75
+ if (p == base) {
153
+ case MPQEMU_CMD_PCI_CFGWRITE:
76
+ assert(include_base);
154
+ case MPQEMU_CMD_PCI_CFGREAD:
77
+ break;
155
+ if (msg->size != sizeof(PciConfDataMsg)) {
156
+ return false;
78
+ }
157
+ }
79
+
158
+ break;
80
/*
159
default:
81
* OK, [offset, offset + *pnum) region is unallocated on this layer,
160
break;
82
* let's continue the diving.
161
}
83
@@ -XXX,XX +XXX,XX @@ int bdrv_block_status_above(BlockDriverState *bs, BlockDriverState *base,
162
diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
84
int64_t offset, int64_t bytes, int64_t *pnum,
163
index XXXXXXX..XXXXXXX 100644
85
int64_t *map, BlockDriverState **file)
164
--- a/hw/remote/proxy.c
165
+++ b/hw/remote/proxy.c
166
@@ -XXX,XX +XXX,XX @@
167
#include "monitor/monitor.h"
168
#include "migration/blocker.h"
169
#include "qemu/sockets.h"
170
+#include "hw/remote/mpqemu-link.h"
171
+#include "qemu/error-report.h"
172
173
static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
86
{
174
{
87
- return bdrv_common_block_status_above(bs, base, true, offset, bytes,
175
@@ -XXX,XX +XXX,XX @@ static void pci_proxy_dev_exit(PCIDevice *pdev)
88
+ return bdrv_common_block_status_above(bs, base, false, true, offset, bytes,
176
error_free(dev->migration_blocker);
89
pnum, map, file);
90
}
177
}
91
178
92
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
179
+static void config_op_send(PCIProxyDev *pdev, uint32_t addr, uint32_t *val,
93
int ret;
180
+ int len, unsigned int op)
94
int64_t dummy;
181
+{
95
182
+ MPQemuMsg msg = { 0 };
96
- ret = bdrv_common_block_status_above(bs, bdrv_filter_or_cow_bs(bs), false,
183
+ uint64_t ret = -EINVAL;
97
- offset, bytes, pnum ? pnum : &dummy,
184
+ Error *local_err = NULL;
98
- NULL, NULL);
185
+
99
+ ret = bdrv_common_block_status_above(bs, bs, true, false, offset,
186
+ msg.cmd = op;
100
+ bytes, pnum ? pnum : &dummy, NULL,
187
+ msg.data.pci_conf_data.addr = addr;
101
+ NULL);
188
+ msg.data.pci_conf_data.val = (op == MPQEMU_CMD_PCI_CFGWRITE) ? *val : 0;
102
if (ret < 0) {
189
+ msg.data.pci_conf_data.len = len;
103
return ret;
190
+ msg.size = sizeof(PciConfDataMsg);
104
}
191
+
192
+ ret = mpqemu_msg_send_and_await_reply(&msg, pdev, &local_err);
193
+ if (local_err) {
194
+ error_report_err(local_err);
195
+ }
196
+
197
+ if (ret == UINT64_MAX) {
198
+ error_report("Failed to perform PCI config %s operation",
199
+ (op == MPQEMU_CMD_PCI_CFGREAD) ? "READ" : "WRITE");
200
+ }
201
+
202
+ if (op == MPQEMU_CMD_PCI_CFGREAD) {
203
+ *val = (uint32_t)ret;
204
+ }
205
+}
206
+
207
+static uint32_t pci_proxy_read_config(PCIDevice *d, uint32_t addr, int len)
208
+{
209
+ uint32_t val;
210
+
211
+ config_op_send(PCI_PROXY_DEV(d), addr, &val, len, MPQEMU_CMD_PCI_CFGREAD);
212
+
213
+ return val;
214
+}
215
+
216
+static void pci_proxy_write_config(PCIDevice *d, uint32_t addr, uint32_t val,
217
+ int len)
218
+{
219
+ /*
220
+ * Some of the functions access the copy of remote device's PCI config
221
+ * space which is cached in the proxy device. Therefore, maintain
222
+ * it updated.
223
+ */
224
+ pci_default_write_config(d, addr, val, len);
225
+
226
+ config_op_send(PCI_PROXY_DEV(d), addr, &val, len, MPQEMU_CMD_PCI_CFGWRITE);
227
+}
228
+
229
static Property proxy_properties[] = {
230
DEFINE_PROP_STRING("fd", PCIProxyDev, fd),
231
DEFINE_PROP_END_OF_LIST(),
232
@@ -XXX,XX +XXX,XX @@ static void pci_proxy_dev_class_init(ObjectClass *klass, void *data)
233
234
k->realize = pci_proxy_dev_realize;
235
k->exit = pci_proxy_dev_exit;
236
+ k->config_read = pci_proxy_read_config;
237
+ k->config_write = pci_proxy_write_config;
238
+
239
device_class_set_props(dc, proxy_properties);
240
}
241
105
--
242
--
106
2.26.2
243
2.29.2
107
244
diff view generated by jsdifflib
1
The vu_client_trip() coroutine is leaked during AioContext switching. It
1
From: Jagannathan Raman <jag.raman@oracle.com>
2
is also unsafe to destroy the vu_dev in panic_cb() since its callers
2
3
still access it in some cases.
3
Proxy device object implements handler for PCI BAR writes and reads.
4
4
The handler uses BAR_WRITE/BAR_READ message to communicate to the
5
Rework the lifecycle to solve these safety issues.
5
remote process with the BAR address and value to be written/read.
6
6
The remote process implements handler for BAR_WRITE/BAR_READ
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
7
message.
8
Message-id: 20200924151549.913737-10-stefanha@redhat.com
8
9
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
10
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
11
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
12
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Message-id: a8b76714a9688be5552c4c92d089bc9e8a4707ff.1611938319.git.jag.raman@oracle.com
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
15
---
11
util/vhost-user-server.h | 29 ++--
16
include/hw/remote/mpqemu-link.h | 10 ++++
12
block/export/vhost-user-blk-server.c | 9 +-
17
include/hw/remote/proxy.h | 9 ++++
13
util/vhost-user-server.c | 245 +++++++++++++++------------
18
hw/remote/message.c | 83 +++++++++++++++++++++++++++++++++
14
3 files changed, 155 insertions(+), 128 deletions(-)
19
hw/remote/mpqemu-link.c | 6 +++
15
20
hw/remote/proxy.c | 60 ++++++++++++++++++++++++
16
diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
21
5 files changed, 168 insertions(+)
17
index XXXXXXX..XXXXXXX 100644
22
18
--- a/util/vhost-user-server.h
23
diff --git a/include/hw/remote/mpqemu-link.h b/include/hw/remote/mpqemu-link.h
19
+++ b/util/vhost-user-server.h
24
index XXXXXXX..XXXXXXX 100644
25
--- a/include/hw/remote/mpqemu-link.h
26
+++ b/include/hw/remote/mpqemu-link.h
27
@@ -XXX,XX +XXX,XX @@ typedef enum {
28
MPQEMU_CMD_RET,
29
MPQEMU_CMD_PCI_CFGWRITE,
30
MPQEMU_CMD_PCI_CFGREAD,
31
+ MPQEMU_CMD_BAR_WRITE,
32
+ MPQEMU_CMD_BAR_READ,
33
MPQEMU_CMD_MAX,
34
} MPQemuCmd;
35
36
@@ -XXX,XX +XXX,XX @@ typedef struct {
37
int len;
38
} PciConfDataMsg;
39
40
+typedef struct {
41
+ hwaddr addr;
42
+ uint64_t val;
43
+ unsigned size;
44
+ bool memory;
45
+} BarAccessMsg;
46
+
47
/**
48
* MPQemuMsg:
49
* @cmd: The remote command
50
@@ -XXX,XX +XXX,XX @@ typedef struct {
51
uint64_t u64;
52
PciConfDataMsg pci_conf_data;
53
SyncSysmemMsg sync_sysmem;
54
+ BarAccessMsg bar_access;
55
} data;
56
57
int fds[REMOTE_MAX_FDS];
58
diff --git a/include/hw/remote/proxy.h b/include/hw/remote/proxy.h
59
index XXXXXXX..XXXXXXX 100644
60
--- a/include/hw/remote/proxy.h
61
+++ b/include/hw/remote/proxy.h
62
@@ -XXX,XX +XXX,XX @@
63
#define TYPE_PCI_PROXY_DEV "x-pci-proxy-dev"
64
OBJECT_DECLARE_SIMPLE_TYPE(PCIProxyDev, PCI_PROXY_DEV)
65
66
+typedef struct ProxyMemoryRegion {
67
+ PCIProxyDev *dev;
68
+ MemoryRegion mr;
69
+ bool memory;
70
+ bool present;
71
+ uint8_t type;
72
+} ProxyMemoryRegion;
73
+
74
struct PCIProxyDev {
75
PCIDevice parent_dev;
76
char *fd;
77
@@ -XXX,XX +XXX,XX @@ struct PCIProxyDev {
78
QemuMutex io_mutex;
79
QIOChannel *ioc;
80
Error *migration_blocker;
81
+ ProxyMemoryRegion region[PCI_NUM_REGIONS];
82
};
83
84
#endif /* PROXY_H */
85
diff --git a/hw/remote/message.c b/hw/remote/message.c
86
index XXXXXXX..XXXXXXX 100644
87
--- a/hw/remote/message.c
88
+++ b/hw/remote/message.c
20
@@ -XXX,XX +XXX,XX @@
89
@@ -XXX,XX +XXX,XX @@
21
#include "qapi/error.h"
90
#include "qapi/error.h"
22
#include "standard-headers/linux/virtio_blk.h"
91
#include "sysemu/runstate.h"
23
92
#include "hw/pci/pci.h"
24
+/* A kick fd that we monitor on behalf of libvhost-user */
93
+#include "exec/memattrs.h"
25
typedef struct VuFdWatch {
94
26
VuDev *vu_dev;
95
static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
27
int fd; /*kick fd*/
96
MPQemuMsg *msg, Error **errp);
28
void *pvt;
97
static void process_config_read(QIOChannel *ioc, PCIDevice *dev,
29
vu_watch_cb cb;
98
MPQemuMsg *msg, Error **errp);
30
- bool processing;
99
+static void process_bar_write(QIOChannel *ioc, MPQemuMsg *msg, Error **errp);
31
QTAILQ_ENTRY(VuFdWatch) next;
100
+static void process_bar_read(QIOChannel *ioc, MPQemuMsg *msg, Error **errp);
32
} VuFdWatch;
101
33
102
void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
34
-typedef struct VuServer VuServer;
35
-
36
-struct VuServer {
37
+/**
38
+ * VuServer:
39
+ * A vhost-user server instance with user-defined VuDevIface callbacks.
40
+ * Vhost-user device backends can be implemented using VuServer. VuDevIface
41
+ * callbacks and virtqueue kicks run in the given AioContext.
42
+ */
43
+typedef struct {
44
QIONetListener *listener;
45
+ QEMUBH *restart_listener_bh;
46
AioContext *ctx;
47
int max_queues;
48
const VuDevIface *vu_iface;
49
+
50
+ /* Protected by ctx lock */
51
VuDev vu_dev;
52
QIOChannel *ioc; /* The I/O channel with the client */
53
QIOChannelSocket *sioc; /* The underlying data channel with the client */
54
- /* IOChannel for fd provided via VHOST_USER_SET_SLAVE_REQ_FD */
55
- QIOChannel *ioc_slave;
56
- QIOChannelSocket *sioc_slave;
57
- Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
58
QTAILQ_HEAD(, VuFdWatch) vu_fd_watches;
59
- /* restart coroutine co_trip if AIOContext is changed */
60
- bool aio_context_changed;
61
- bool processing_msg;
62
-};
63
+
64
+ Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
65
+} VuServer;
66
67
bool vhost_user_server_start(VuServer *server,
68
SocketAddress *unix_socket,
69
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
70
71
void vhost_user_server_stop(VuServer *server);
72
73
-void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx);
74
+void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx);
75
+void vhost_user_server_detach_aio_context(VuServer *server);
76
77
#endif /* VHOST_USER_SERVER_H */
78
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
79
index XXXXXXX..XXXXXXX 100644
80
--- a/block/export/vhost-user-blk-server.c
81
+++ b/block/export/vhost-user-blk-server.c
82
@@ -XXX,XX +XXX,XX @@ static const VuDevIface vu_block_iface = {
83
static void blk_aio_attached(AioContext *ctx, void *opaque)
84
{
103
{
85
VuBlockDev *vub_dev = opaque;
104
@@ -XXX,XX +XXX,XX @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
86
- aio_context_acquire(ctx);
105
case MPQEMU_CMD_PCI_CFGREAD:
87
- vhost_user_server_set_aio_context(&vub_dev->vu_server, ctx);
106
process_config_read(com->ioc, pci_dev, &msg, &local_err);
88
- aio_context_release(ctx);
107
break;
89
+ vhost_user_server_attach_aio_context(&vub_dev->vu_server, ctx);
108
+ case MPQEMU_CMD_BAR_WRITE:
90
}
109
+ process_bar_write(com->ioc, &msg, &local_err);
91
110
+ break;
92
static void blk_aio_detach(void *opaque)
111
+ case MPQEMU_CMD_BAR_READ:
93
{
112
+ process_bar_read(com->ioc, &msg, &local_err);
94
VuBlockDev *vub_dev = opaque;
113
+ break;
95
- AioContext *ctx = vub_dev->vu_server.ctx;
114
default:
96
- aio_context_acquire(ctx);
115
error_setg(&local_err,
97
- vhost_user_server_set_aio_context(&vub_dev->vu_server, NULL);
116
"Unknown command (%d) received for device %s"
98
- aio_context_release(ctx);
117
@@ -XXX,XX +XXX,XX @@ static void process_config_read(QIOChannel *ioc, PCIDevice *dev,
99
+ vhost_user_server_detach_aio_context(&vub_dev->vu_server);
118
getpid());
100
}
101
102
static void
103
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
104
index XXXXXXX..XXXXXXX 100644
105
--- a/util/vhost-user-server.c
106
+++ b/util/vhost-user-server.c
107
@@ -XXX,XX +XXX,XX @@
108
*/
109
#include "qemu/osdep.h"
110
#include "qemu/main-loop.h"
111
+#include "block/aio-wait.h"
112
#include "vhost-user-server.h"
113
114
+/*
115
+ * Theory of operation:
116
+ *
117
+ * VuServer is started and stopped by vhost_user_server_start() and
118
+ * vhost_user_server_stop() from the main loop thread. Starting the server
119
+ * opens a vhost-user UNIX domain socket and listens for incoming connections.
120
+ * Only one connection is allowed at a time.
121
+ *
122
+ * The connection is handled by the vu_client_trip() coroutine in the
123
+ * VuServer->ctx AioContext. The coroutine consists of a vu_dispatch() loop
124
+ * where libvhost-user calls vu_message_read() to receive the next vhost-user
125
+ * protocol messages over the UNIX domain socket.
126
+ *
127
+ * When virtqueues are set up libvhost-user calls set_watch() to monitor kick
128
+ * fds. These fds are also handled in the VuServer->ctx AioContext.
129
+ *
130
+ * Both vu_client_trip() and kick fd monitoring can be stopped by shutting down
131
+ * the socket connection. Shutting down the socket connection causes
132
+ * vu_message_read() to fail since no more data can be received from the socket.
133
+ * After vu_dispatch() fails, vu_client_trip() calls vu_deinit() to stop
134
+ * libvhost-user before terminating the coroutine. vu_deinit() calls
135
+ * remove_watch() to stop monitoring kick fds and this stops virtqueue
136
+ * processing.
137
+ *
138
+ * When vu_client_trip() has finished cleaning up it schedules a BH in the main
139
+ * loop thread to accept the next client connection.
140
+ *
141
+ * When libvhost-user detects an error it calls panic_cb() and sets the
142
+ * dev->broken flag. Both vu_client_trip() and kick fd processing stop when
143
+ * the dev->broken flag is set.
144
+ *
145
+ * It is possible to switch AioContexts using
146
+ * vhost_user_server_detach_aio_context() and
147
+ * vhost_user_server_attach_aio_context(). They stop monitoring fds in the old
148
+ * AioContext and resume monitoring in the new AioContext. The vu_client_trip()
149
+ * coroutine remains in a yielded state during the switch. This is made
150
+ * possible by QIOChannel's support for spurious coroutine re-entry in
151
+ * qio_channel_yield(). The coroutine will restart I/O when re-entered from the
152
+ * new AioContext.
153
+ */
154
+
155
static void vmsg_close_fds(VhostUserMsg *vmsg)
156
{
157
int i;
158
@@ -XXX,XX +XXX,XX @@ static void vmsg_unblock_fds(VhostUserMsg *vmsg)
159
}
119
}
160
}
120
}
161
121
+
162
-static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
122
+static void process_bar_write(QIOChannel *ioc, MPQemuMsg *msg, Error **errp)
163
- gpointer opaque);
123
+{
164
-
124
+ ERRP_GUARD();
165
-static void close_client(VuServer *server)
125
+ BarAccessMsg *bar_access = &msg->data.bar_access;
166
-{
126
+ AddressSpace *as =
167
- /*
127
+ bar_access->memory ? &address_space_memory : &address_space_io;
168
- * Before closing the client
128
+ MPQemuMsg ret = { 0 };
169
- *
129
+ MemTxResult res;
170
- * 1. Let vu_client_trip stop processing new vhost-user msg
130
+ uint64_t val;
171
- *
131
+
172
- * 2. remove kick_handler
132
+ if (!is_power_of_2(bar_access->size) ||
173
- *
133
+ (bar_access->size > sizeof(uint64_t))) {
174
- * 3. wait for the kick handler to be finished
134
+ ret.data.u64 = UINT64_MAX;
175
- *
135
+ goto fail;
176
- * 4. wait for the current vhost-user msg to be finished processing
136
+ }
177
- */
137
+
178
-
138
+ val = cpu_to_le64(bar_access->val);
179
- QIOChannelSocket *sioc = server->sioc;
139
+
180
- /* When this is set vu_client_trip will stop new processing vhost-user message */
140
+ res = address_space_rw(as, bar_access->addr, MEMTXATTRS_UNSPECIFIED,
181
- server->sioc = NULL;
141
+ (void *)&val, bar_access->size, true);
182
-
142
+
183
- while (server->processing_msg) {
143
+ if (res != MEMTX_OK) {
184
- if (server->ioc->read_coroutine) {
144
+ error_setg(errp, "Bad address %"PRIx64" for mem write, pid "FMT_pid".",
185
- server->ioc->read_coroutine = NULL;
145
+ bar_access->addr, getpid());
186
- qio_channel_set_aio_fd_handler(server->ioc, server->ioc->ctx, NULL,
146
+ ret.data.u64 = -1;
187
- NULL, server->ioc);
147
+ }
188
- server->processing_msg = false;
148
+
189
- }
149
+fail:
190
- }
150
+ ret.cmd = MPQEMU_CMD_RET;
191
-
151
+ ret.size = sizeof(ret.data.u64);
192
- vu_deinit(&server->vu_dev);
152
+
193
-
153
+ if (!mpqemu_msg_send(&ret, ioc, NULL)) {
194
- /* vu_deinit() should have called remove_watch() */
154
+ error_prepend(errp, "Error returning code to proxy, pid "FMT_pid": ",
195
- assert(QTAILQ_EMPTY(&server->vu_fd_watches));
155
+ getpid());
196
-
156
+ }
197
- object_unref(OBJECT(sioc));
157
+}
198
- object_unref(OBJECT(server->ioc));
158
+
199
-}
159
+static void process_bar_read(QIOChannel *ioc, MPQemuMsg *msg, Error **errp)
200
-
160
+{
201
static void panic_cb(VuDev *vu_dev, const char *buf)
161
+ ERRP_GUARD();
202
{
162
+ BarAccessMsg *bar_access = &msg->data.bar_access;
203
- VuServer *server = container_of(vu_dev, VuServer, vu_dev);
163
+ MPQemuMsg ret = { 0 };
204
-
164
+ AddressSpace *as;
205
- /* avoid while loop in close_client */
165
+ MemTxResult res;
206
- server->processing_msg = false;
166
+ uint64_t val = 0;
207
-
167
+
208
- if (buf) {
168
+ as = bar_access->memory ? &address_space_memory : &address_space_io;
209
- error_report("vu_panic: %s", buf);
169
+
210
- }
170
+ if (!is_power_of_2(bar_access->size) ||
211
-
171
+ (bar_access->size > sizeof(uint64_t))) {
212
- if (server->sioc) {
172
+ val = UINT64_MAX;
213
- close_client(server);
173
+ goto fail;
214
- }
174
+ }
215
-
175
+
216
- /*
176
+ res = address_space_rw(as, bar_access->addr, MEMTXATTRS_UNSPECIFIED,
217
- * Set the callback function for network listener so another
177
+ (void *)&val, bar_access->size, false);
218
- * vhost-user client can connect to this server
178
+
219
- */
179
+ if (res != MEMTX_OK) {
220
- qio_net_listener_set_client_func(server->listener,
180
+ error_setg(errp, "Bad address %"PRIx64" for mem read, pid "FMT_pid".",
221
- vu_accept,
181
+ bar_access->addr, getpid());
222
- server,
182
+ val = UINT64_MAX;
223
- NULL);
183
+ }
224
+ error_report("vu_panic: %s", buf);
184
+
185
+fail:
186
+ ret.cmd = MPQEMU_CMD_RET;
187
+ ret.data.u64 = le64_to_cpu(val);
188
+ ret.size = sizeof(ret.data.u64);
189
+
190
+ if (!mpqemu_msg_send(&ret, ioc, NULL)) {
191
+ error_prepend(errp, "Error returning code to proxy, pid "FMT_pid": ",
192
+ getpid());
193
+ }
194
+}
195
diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
196
index XXXXXXX..XXXXXXX 100644
197
--- a/hw/remote/mpqemu-link.c
198
+++ b/hw/remote/mpqemu-link.c
199
@@ -XXX,XX +XXX,XX @@ bool mpqemu_msg_valid(MPQemuMsg *msg)
200
return false;
201
}
202
break;
203
+ case MPQEMU_CMD_BAR_WRITE:
204
+ case MPQEMU_CMD_BAR_READ:
205
+ if ((msg->size != sizeof(BarAccessMsg)) || (msg->num_fds != 0)) {
206
+ return false;
207
+ }
208
+ break;
209
default:
210
break;
211
}
212
diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
213
index XXXXXXX..XXXXXXX 100644
214
--- a/hw/remote/proxy.c
215
+++ b/hw/remote/proxy.c
216
@@ -XXX,XX +XXX,XX @@ static void pci_proxy_dev_register_types(void)
225
}
217
}
226
218
227
static bool coroutine_fn
219
type_init(pci_proxy_dev_register_types)
228
@@ -XXX,XX +XXX,XX @@ fail:
220
+
229
return false;
221
+static void send_bar_access_msg(PCIProxyDev *pdev, MemoryRegion *mr,
230
}
222
+ bool write, hwaddr addr, uint64_t *val,
231
223
+ unsigned size, bool memory)
232
-
224
+{
233
-static void vu_client_start(VuServer *server);
225
+ MPQemuMsg msg = { 0 };
234
static coroutine_fn void vu_client_trip(void *opaque)
226
+ long ret = -EINVAL;
235
{
227
+ Error *local_err = NULL;
236
VuServer *server = opaque;
228
+
237
+ VuDev *vu_dev = &server->vu_dev;
229
+ msg.size = sizeof(BarAccessMsg);
238
230
+ msg.data.bar_access.addr = mr->addr + addr;
239
- while (!server->aio_context_changed && server->sioc) {
231
+ msg.data.bar_access.size = size;
240
- server->processing_msg = true;
232
+ msg.data.bar_access.memory = memory;
241
- vu_dispatch(&server->vu_dev);
233
+
242
- server->processing_msg = false;
234
+ if (write) {
243
+ while (!vu_dev->broken && vu_dispatch(vu_dev)) {
235
+ msg.cmd = MPQEMU_CMD_BAR_WRITE;
244
+ /* Keep running */
236
+ msg.data.bar_access.val = *val;
245
}
237
+ } else {
246
238
+ msg.cmd = MPQEMU_CMD_BAR_READ;
247
- if (server->aio_context_changed && server->sioc) {
239
+ }
248
- server->aio_context_changed = false;
240
+
249
- vu_client_start(server);
241
+ ret = mpqemu_msg_send_and_await_reply(&msg, pdev, &local_err);
250
- }
242
+ if (local_err) {
251
-}
243
+ error_report_err(local_err);
252
+ vu_deinit(vu_dev);
244
+ }
253
+
245
+
254
+ /* vu_deinit() should have called remove_watch() */
246
+ if (!write) {
255
+ assert(QTAILQ_EMPTY(&server->vu_fd_watches));
247
+ *val = ret;
256
+
248
+ }
257
+ object_unref(OBJECT(server->sioc));
249
+}
258
+ server->sioc = NULL;
250
+
259
251
+static void proxy_bar_write(void *opaque, hwaddr addr, uint64_t val,
260
-static void vu_client_start(VuServer *server)
252
+ unsigned size)
261
-{
253
+{
262
- server->co_trip = qemu_coroutine_create(vu_client_trip, server);
254
+ ProxyMemoryRegion *pmr = opaque;
263
- aio_co_enter(server->ctx, server->co_trip);
255
+
264
+ object_unref(OBJECT(server->ioc));
256
+ send_bar_access_msg(pmr->dev, &pmr->mr, true, addr, &val, size,
265
+ server->ioc = NULL;
257
+ pmr->memory);
266
+
258
+}
267
+ server->co_trip = NULL;
259
+
268
+ if (server->restart_listener_bh) {
260
+static uint64_t proxy_bar_read(void *opaque, hwaddr addr, unsigned size)
269
+ qemu_bh_schedule(server->restart_listener_bh);
261
+{
270
+ }
262
+ ProxyMemoryRegion *pmr = opaque;
271
+ aio_wait_kick();
263
+ uint64_t val;
272
}
264
+
273
265
+ send_bar_access_msg(pmr->dev, &pmr->mr, false, addr, &val, size,
274
/*
266
+ pmr->memory);
275
@@ -XXX,XX +XXX,XX @@ static void vu_client_start(VuServer *server)
267
+
276
static void kick_handler(void *opaque)
268
+ return val;
277
{
269
+}
278
VuFdWatch *vu_fd_watch = opaque;
270
+
279
- vu_fd_watch->processing = true;
271
+const MemoryRegionOps proxy_mr_ops = {
280
- vu_fd_watch->cb(vu_fd_watch->vu_dev, 0, vu_fd_watch->pvt);
272
+ .read = proxy_bar_read,
281
- vu_fd_watch->processing = false;
273
+ .write = proxy_bar_write,
282
+ VuDev *vu_dev = vu_fd_watch->vu_dev;
274
+ .endianness = DEVICE_NATIVE_ENDIAN,
283
+
275
+ .impl = {
284
+ vu_fd_watch->cb(vu_dev, 0, vu_fd_watch->pvt);
276
+ .min_access_size = 1,
285
+
277
+ .max_access_size = 8,
286
+ /* Stop vu_client_trip() if an error occurred in vu_fd_watch->cb() */
278
+ },
287
+ if (vu_dev->broken) {
279
+};
288
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
289
+
290
+ qio_channel_shutdown(server->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
291
+ }
292
}
293
294
-
295
static VuFdWatch *find_vu_fd_watch(VuServer *server, int fd)
296
{
297
298
@@ -XXX,XX +XXX,XX @@ static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
299
qio_channel_set_name(QIO_CHANNEL(sioc), "vhost-user client");
300
server->ioc = QIO_CHANNEL(sioc);
301
object_ref(OBJECT(server->ioc));
302
- qio_channel_attach_aio_context(server->ioc, server->ctx);
303
+
304
+ /* TODO vu_message_write() spins if non-blocking! */
305
qio_channel_set_blocking(server->ioc, false, NULL);
306
- vu_client_start(server);
307
+
308
+ server->co_trip = qemu_coroutine_create(vu_client_trip, server);
309
+
310
+ aio_context_acquire(server->ctx);
311
+ vhost_user_server_attach_aio_context(server, server->ctx);
312
+ aio_context_release(server->ctx);
313
}
314
315
-
316
void vhost_user_server_stop(VuServer *server)
317
{
318
+ aio_context_acquire(server->ctx);
319
+
320
+ qemu_bh_delete(server->restart_listener_bh);
321
+ server->restart_listener_bh = NULL;
322
+
323
if (server->sioc) {
324
- close_client(server);
325
+ VuFdWatch *vu_fd_watch;
326
+
327
+ QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
328
+ aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true,
329
+ NULL, NULL, NULL, vu_fd_watch);
330
+ }
331
+
332
+ qio_channel_shutdown(server->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
333
+
334
+ AIO_WAIT_WHILE(server->ctx, server->co_trip);
335
}
336
337
+ aio_context_release(server->ctx);
338
+
339
if (server->listener) {
340
qio_net_listener_disconnect(server->listener);
341
object_unref(OBJECT(server->listener));
342
}
343
+}
344
+
345
+/*
346
+ * Allow the next client to connect to the server. Called from a BH in the main
347
+ * loop.
348
+ */
349
+static void restart_listener_bh(void *opaque)
350
+{
351
+ VuServer *server = opaque;
352
353
+ qio_net_listener_set_client_func(server->listener, vu_accept, server,
354
+ NULL);
355
}
356
357
-void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx)
358
+/* Called with ctx acquired */
359
+void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx)
360
{
361
- VuFdWatch *vu_fd_watch, *next;
362
- void *opaque = NULL;
363
- IOHandler *io_read = NULL;
364
- bool attach;
365
+ VuFdWatch *vu_fd_watch;
366
367
- server->ctx = ctx ? ctx : qemu_get_aio_context();
368
+ server->ctx = ctx;
369
370
if (!server->sioc) {
371
- /* not yet serving any client*/
372
return;
373
}
374
375
- if (ctx) {
376
- qio_channel_attach_aio_context(server->ioc, ctx);
377
- server->aio_context_changed = true;
378
- io_read = kick_handler;
379
- attach = true;
380
- } else {
381
+ qio_channel_attach_aio_context(server->ioc, ctx);
382
+
383
+ QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
384
+ aio_set_fd_handler(ctx, vu_fd_watch->fd, true, kick_handler, NULL,
385
+ NULL, vu_fd_watch);
386
+ }
387
+
388
+ aio_co_schedule(ctx, server->co_trip);
389
+}
390
+
391
+/* Called with server->ctx acquired */
392
+void vhost_user_server_detach_aio_context(VuServer *server)
393
+{
394
+ if (server->sioc) {
395
+ VuFdWatch *vu_fd_watch;
396
+
397
+ QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
398
+ aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true,
399
+ NULL, NULL, NULL, vu_fd_watch);
400
+ }
401
+
402
qio_channel_detach_aio_context(server->ioc);
403
- /* server->ioc->ctx keeps the old AioConext */
404
- ctx = server->ioc->ctx;
405
- attach = false;
406
}
407
408
- QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
409
- if (vu_fd_watch->cb) {
410
- opaque = attach ? vu_fd_watch : NULL;
411
- aio_set_fd_handler(ctx, vu_fd_watch->fd, true,
412
- io_read, NULL, NULL,
413
- opaque);
414
- }
415
- }
416
+ server->ctx = NULL;
417
}
418
419
-
420
bool vhost_user_server_start(VuServer *server,
421
SocketAddress *socket_addr,
422
AioContext *ctx,
423
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
424
const VuDevIface *vu_iface,
425
Error **errp)
426
{
427
+ QEMUBH *bh;
428
QIONetListener *listener = qio_net_listener_new();
429
if (qio_net_listener_open_sync(listener, socket_addr, 1,
430
errp) < 0) {
431
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
432
return false;
433
}
434
435
+ bh = qemu_bh_new(restart_listener_bh, server);
436
+
437
/* zero out unspecified fields */
438
*server = (VuServer) {
439
.listener = listener,
440
+ .restart_listener_bh = bh,
441
.vu_iface = vu_iface,
442
.max_queues = max_queues,
443
.ctx = ctx,
444
--
280
--
445
2.26.2
281
2.29.2
446
282
diff view generated by jsdifflib
1
From: Coiby Xu <coiby.xu@gmail.com>
1
From: Jagannathan Raman <jag.raman@oracle.com>
2
2
3
Move the constants from hw/core/qdev-properties.c to
3
Add ProxyMemoryListener object which is used to keep the view of the RAM
4
util/block-helpers.h so that knowledge of the min/max values is
4
in sync between QEMU and remote process.
5
A MemoryListener is registered for system-memory AddressSpace. The
6
listener sends SYNC_SYSMEM message to the remote process when memory
7
listener commits the changes to memory, the remote process receives
8
the message and processes it in the handler for SYNC_SYSMEM message.
5
9
6
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
7
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
11
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
12
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
8
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
14
Message-id: 04fe4e6a9ca90d4f11ab6f59be7652f5b086a071.1611938319.git.jag.raman@oracle.com
10
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
11
Message-id: 20200918080912.321299-5-coiby.xu@gmail.com
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
---
16
---
14
util/block-helpers.h | 19 +++++++++++++
17
MAINTAINERS | 2 +
15
hw/core/qdev-properties-system.c | 31 ++++-----------------
18
include/hw/remote/proxy-memory-listener.h | 28 +++
16
util/block-helpers.c | 46 ++++++++++++++++++++++++++++++++
19
include/hw/remote/proxy.h | 2 +
17
util/meson.build | 1 +
20
hw/remote/message.c | 4 +
18
4 files changed, 71 insertions(+), 26 deletions(-)
21
hw/remote/proxy-memory-listener.c | 227 ++++++++++++++++++++++
19
create mode 100644 util/block-helpers.h
22
hw/remote/proxy.c | 6 +
20
create mode 100644 util/block-helpers.c
23
hw/remote/meson.build | 1 +
24
7 files changed, 270 insertions(+)
25
create mode 100644 include/hw/remote/proxy-memory-listener.h
26
create mode 100644 hw/remote/proxy-memory-listener.c
21
27
22
diff --git a/util/block-helpers.h b/util/block-helpers.h
28
diff --git a/MAINTAINERS b/MAINTAINERS
29
index XXXXXXX..XXXXXXX 100644
30
--- a/MAINTAINERS
31
+++ b/MAINTAINERS
32
@@ -XXX,XX +XXX,XX @@ F: include/hw/remote/memory.h
33
F: hw/remote/memory.c
34
F: hw/remote/proxy.c
35
F: include/hw/remote/proxy.h
36
+F: hw/remote/proxy-memory-listener.c
37
+F: include/hw/remote/proxy-memory-listener.h
38
39
Build and test automation
40
-------------------------
41
diff --git a/include/hw/remote/proxy-memory-listener.h b/include/hw/remote/proxy-memory-listener.h
23
new file mode 100644
42
new file mode 100644
24
index XXXXXXX..XXXXXXX
43
index XXXXXXX..XXXXXXX
25
--- /dev/null
44
--- /dev/null
26
+++ b/util/block-helpers.h
45
+++ b/include/hw/remote/proxy-memory-listener.h
27
@@ -XXX,XX +XXX,XX @@
46
@@ -XXX,XX +XXX,XX @@
28
+#ifndef BLOCK_HELPERS_H
29
+#define BLOCK_HELPERS_H
30
+
31
+#include "qemu/units.h"
32
+
33
+/* lower limit is sector size */
34
+#define MIN_BLOCK_SIZE INT64_C(512)
35
+#define MIN_BLOCK_SIZE_STR "512 B"
36
+/*
47
+/*
37
+ * upper limit is arbitrary, 2 MiB looks sufficient for all sensible uses, and
48
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
38
+ * matches qcow2 cluster size limit
49
+ *
50
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
51
+ * See the COPYING file in the top-level directory.
52
+ *
39
+ */
53
+ */
40
+#define MAX_BLOCK_SIZE (2 * MiB)
54
+
41
+#define MAX_BLOCK_SIZE_STR "2 MiB"
55
+#ifndef PROXY_MEMORY_LISTENER_H
42
+
56
+#define PROXY_MEMORY_LISTENER_H
43
+void check_block_size(const char *id, const char *name, int64_t value,
57
+
44
+ Error **errp);
58
+#include "exec/memory.h"
45
+
59
+#include "io/channel.h"
46
+#endif /* BLOCK_HELPERS_H */
60
+
47
diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
61
+typedef struct ProxyMemoryListener {
48
index XXXXXXX..XXXXXXX 100644
62
+ MemoryListener listener;
49
--- a/hw/core/qdev-properties-system.c
63
+
50
+++ b/hw/core/qdev-properties-system.c
64
+ int n_mr_sections;
51
@@ -XXX,XX +XXX,XX @@
65
+ MemoryRegionSection *mr_sections;
52
#include "sysemu/blockdev.h"
66
+
53
#include "net/net.h"
67
+ QIOChannel *ioc;
68
+} ProxyMemoryListener;
69
+
70
+void proxy_memory_listener_configure(ProxyMemoryListener *proxy_listener,
71
+ QIOChannel *ioc);
72
+void proxy_memory_listener_deconfigure(ProxyMemoryListener *proxy_listener);
73
+
74
+#endif
75
diff --git a/include/hw/remote/proxy.h b/include/hw/remote/proxy.h
76
index XXXXXXX..XXXXXXX 100644
77
--- a/include/hw/remote/proxy.h
78
+++ b/include/hw/remote/proxy.h
79
@@ -XXX,XX +XXX,XX @@
80
54
#include "hw/pci/pci.h"
81
#include "hw/pci/pci.h"
55
+#include "util/block-helpers.h"
82
#include "io/channel.h"
56
83
+#include "hw/remote/proxy-memory-listener.h"
57
static bool check_prop_still_unset(DeviceState *dev, const char *name,
84
58
const void *old_val, const char *new_val,
85
#define TYPE_PCI_PROXY_DEV "x-pci-proxy-dev"
59
@@ -XXX,XX +XXX,XX @@ const PropertyInfo qdev_prop_losttickpolicy = {
86
OBJECT_DECLARE_SIMPLE_TYPE(PCIProxyDev, PCI_PROXY_DEV)
60
87
@@ -XXX,XX +XXX,XX @@ struct PCIProxyDev {
61
/* --- blocksize --- */
88
QemuMutex io_mutex;
62
89
QIOChannel *ioc;
63
-/* lower limit is sector size */
90
Error *migration_blocker;
64
-#define MIN_BLOCK_SIZE 512
91
+ ProxyMemoryListener proxy_listener;
65
-#define MIN_BLOCK_SIZE_STR "512 B"
92
ProxyMemoryRegion region[PCI_NUM_REGIONS];
66
-/*
93
};
67
- * upper limit is arbitrary, 2 MiB looks sufficient for all sensible uses, and
94
68
- * matches qcow2 cluster size limit
95
diff --git a/hw/remote/message.c b/hw/remote/message.c
69
- */
96
index XXXXXXX..XXXXXXX 100644
70
-#define MAX_BLOCK_SIZE (2 * MiB)
97
--- a/hw/remote/message.c
71
-#define MAX_BLOCK_SIZE_STR "2 MiB"
98
+++ b/hw/remote/message.c
72
-
99
@@ -XXX,XX +XXX,XX @@
73
static void set_blocksize(Object *obj, Visitor *v, const char *name,
100
#include "sysemu/runstate.h"
74
void *opaque, Error **errp)
101
#include "hw/pci/pci.h"
75
{
102
#include "exec/memattrs.h"
76
@@ -XXX,XX +XXX,XX @@ static void set_blocksize(Object *obj, Visitor *v, const char *name,
103
+#include "hw/remote/memory.h"
77
Property *prop = opaque;
104
78
uint32_t *ptr = qdev_get_prop_ptr(dev, prop);
105
static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
79
uint64_t value;
106
MPQemuMsg *msg, Error **errp);
80
+ Error *local_err = NULL;
107
@@ -XXX,XX +XXX,XX @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
81
108
case MPQEMU_CMD_BAR_READ:
82
if (dev->realized) {
109
process_bar_read(com->ioc, &msg, &local_err);
83
qdev_prop_set_after_realize(dev, name, errp);
110
break;
84
@@ -XXX,XX +XXX,XX @@ static void set_blocksize(Object *obj, Visitor *v, const char *name,
111
+ case MPQEMU_CMD_SYNC_SYSMEM:
85
if (!visit_type_size(v, name, &value, errp)) {
112
+ remote_sysmem_reconfig(&msg, &local_err);
86
return;
113
+ break;
87
}
114
default:
88
- /* value of 0 means "unset" */
115
error_setg(&local_err,
89
- if (value && (value < MIN_BLOCK_SIZE || value > MAX_BLOCK_SIZE)) {
116
"Unknown command (%d) received for device %s"
90
- error_setg(errp,
117
diff --git a/hw/remote/proxy-memory-listener.c b/hw/remote/proxy-memory-listener.c
91
- "Property %s.%s doesn't take value %" PRIu64
92
- " (minimum: " MIN_BLOCK_SIZE_STR
93
- ", maximum: " MAX_BLOCK_SIZE_STR ")",
94
- dev->id ? : "", name, value);
95
+ check_block_size(dev->id ? : "", name, value, &local_err);
96
+ if (local_err) {
97
+ error_propagate(errp, local_err);
98
return;
99
}
100
-
101
- /* We rely on power-of-2 blocksizes for bitmasks */
102
- if ((value & (value - 1)) != 0) {
103
- error_setg(errp,
104
- "Property %s.%s doesn't take value '%" PRId64 "', "
105
- "it's not a power of 2", dev->id ?: "", name, (int64_t)value);
106
- return;
107
- }
108
-
109
*ptr = value;
110
}
111
112
diff --git a/util/block-helpers.c b/util/block-helpers.c
113
new file mode 100644
118
new file mode 100644
114
index XXXXXXX..XXXXXXX
119
index XXXXXXX..XXXXXXX
115
--- /dev/null
120
--- /dev/null
116
+++ b/util/block-helpers.c
121
+++ b/hw/remote/proxy-memory-listener.c
117
@@ -XXX,XX +XXX,XX @@
122
@@ -XXX,XX +XXX,XX @@
118
+/*
123
+/*
119
+ * Block utility functions
124
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
120
+ *
121
+ * Copyright IBM, Corp. 2011
122
+ * Copyright (c) 2020 Coiby Xu <coiby.xu@gmail.com>
123
+ *
125
+ *
124
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
126
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
125
+ * See the COPYING file in the top-level directory.
127
+ * See the COPYING file in the top-level directory.
128
+ *
126
+ */
129
+ */
127
+
130
+
128
+#include "qemu/osdep.h"
131
+#include "qemu/osdep.h"
132
+#include "qemu-common.h"
133
+
134
+#include "qemu/compiler.h"
135
+#include "qemu/int128.h"
136
+#include "qemu/range.h"
137
+#include "exec/memory.h"
138
+#include "exec/cpu-common.h"
139
+#include "cpu.h"
140
+#include "exec/ram_addr.h"
141
+#include "exec/address-spaces.h"
129
+#include "qapi/error.h"
142
+#include "qapi/error.h"
130
+#include "qapi/qmp/qerror.h"
143
+#include "hw/remote/mpqemu-link.h"
131
+#include "block-helpers.h"
144
+#include "hw/remote/proxy-memory-listener.h"
132
+
145
+
133
+/**
146
+/*
134
+ * check_block_size:
147
+ * TODO: get_fd_from_hostaddr(), proxy_mrs_can_merge() and
135
+ * @id: The unique ID of the object
148
+ * proxy_memory_listener_commit() defined below perform tasks similar to the
136
+ * @name: The name of the property being validated
149
+ * functions defined in vhost-user.c. These functions are good candidates
137
+ * @value: The block size in bytes
150
+ * for refactoring.
138
+ * @errp: A pointer to an area to store an error
151
+ *
139
+ *
140
+ * This function checks that the block size meets the following conditions:
141
+ * 1. At least MIN_BLOCK_SIZE
142
+ * 2. No larger than MAX_BLOCK_SIZE
143
+ * 3. A power of 2
144
+ */
152
+ */
145
+void check_block_size(const char *id, const char *name, int64_t value,
153
+
146
+ Error **errp)
154
+static void proxy_memory_listener_reset(MemoryListener *listener)
147
+{
155
+{
148
+ /* value of 0 means "unset" */
156
+ ProxyMemoryListener *proxy_listener = container_of(listener,
149
+ if (value && (value < MIN_BLOCK_SIZE || value > MAX_BLOCK_SIZE)) {
157
+ ProxyMemoryListener,
150
+ error_setg(errp, QERR_PROPERTY_VALUE_OUT_OF_RANGE,
158
+ listener);
151
+ id, name, value, MIN_BLOCK_SIZE, MAX_BLOCK_SIZE);
159
+ int mrs;
160
+
161
+ for (mrs = 0; mrs < proxy_listener->n_mr_sections; mrs++) {
162
+ memory_region_unref(proxy_listener->mr_sections[mrs].mr);
163
+ }
164
+
165
+ g_free(proxy_listener->mr_sections);
166
+ proxy_listener->mr_sections = NULL;
167
+ proxy_listener->n_mr_sections = 0;
168
+}
169
+
170
+static int get_fd_from_hostaddr(uint64_t host, ram_addr_t *offset)
171
+{
172
+ MemoryRegion *mr;
173
+ ram_addr_t off;
174
+
175
+ /**
176
+ * Assumes that the host address is a valid address as it's
177
+ * coming from the MemoryListener system. In the case host
178
+ * address is not valid, the following call would return
179
+ * the default subregion of "system_memory" region, and
180
+ * not NULL. So it's not possible to check for NULL here.
181
+ */
182
+ mr = memory_region_from_host((void *)(uintptr_t)host, &off);
183
+
184
+ if (offset) {
185
+ *offset = off;
186
+ }
187
+
188
+ return memory_region_get_fd(mr);
189
+}
190
+
191
+static bool proxy_mrs_can_merge(uint64_t host, uint64_t prev_host, size_t size)
192
+{
193
+ if (((prev_host + size) != host)) {
194
+ return false;
195
+ }
196
+
197
+ if (get_fd_from_hostaddr(host, NULL) !=
198
+ get_fd_from_hostaddr(prev_host, NULL)) {
199
+ return false;
200
+ }
201
+
202
+ return true;
203
+}
204
+
205
+static bool try_merge(ProxyMemoryListener *proxy_listener,
206
+ MemoryRegionSection *section)
207
+{
208
+ uint64_t mrs_size, mrs_gpa, mrs_page;
209
+ MemoryRegionSection *prev_sec;
210
+ bool merged = false;
211
+ uintptr_t mrs_host;
212
+ RAMBlock *mrs_rb;
213
+
214
+ if (!proxy_listener->n_mr_sections) {
215
+ return false;
216
+ }
217
+
218
+ mrs_rb = section->mr->ram_block;
219
+ mrs_page = (uint64_t)qemu_ram_pagesize(mrs_rb);
220
+ mrs_size = int128_get64(section->size);
221
+ mrs_gpa = section->offset_within_address_space;
222
+ mrs_host = (uintptr_t)memory_region_get_ram_ptr(section->mr) +
223
+ section->offset_within_region;
224
+
225
+ if (get_fd_from_hostaddr(mrs_host, NULL) < 0) {
226
+ return true;
227
+ }
228
+
229
+ mrs_host = mrs_host & ~(mrs_page - 1);
230
+ mrs_gpa = mrs_gpa & ~(mrs_page - 1);
231
+ mrs_size = ROUND_UP(mrs_size, mrs_page);
232
+
233
+ prev_sec = proxy_listener->mr_sections +
234
+ (proxy_listener->n_mr_sections - 1);
235
+ uint64_t prev_gpa_start = prev_sec->offset_within_address_space;
236
+ uint64_t prev_size = int128_get64(prev_sec->size);
237
+ uint64_t prev_gpa_end = range_get_last(prev_gpa_start, prev_size);
238
+ uint64_t prev_host_start =
239
+ (uintptr_t)memory_region_get_ram_ptr(prev_sec->mr) +
240
+ prev_sec->offset_within_region;
241
+ uint64_t prev_host_end = range_get_last(prev_host_start, prev_size);
242
+
243
+ if (mrs_gpa <= (prev_gpa_end + 1)) {
244
+ g_assert(mrs_gpa > prev_gpa_start);
245
+
246
+ if ((section->mr == prev_sec->mr) &&
247
+ proxy_mrs_can_merge(mrs_host, prev_host_start,
248
+ (mrs_gpa - prev_gpa_start))) {
249
+ uint64_t max_end = MAX(prev_host_end, mrs_host + mrs_size);
250
+ merged = true;
251
+ prev_sec->offset_within_address_space =
252
+ MIN(prev_gpa_start, mrs_gpa);
253
+ prev_sec->offset_within_region =
254
+ MIN(prev_host_start, mrs_host) -
255
+ (uintptr_t)memory_region_get_ram_ptr(prev_sec->mr);
256
+ prev_sec->size = int128_make64(max_end - MIN(prev_host_start,
257
+ mrs_host));
258
+ }
259
+ }
260
+
261
+ return merged;
262
+}
263
+
264
+static void proxy_memory_listener_region_addnop(MemoryListener *listener,
265
+ MemoryRegionSection *section)
266
+{
267
+ ProxyMemoryListener *proxy_listener = container_of(listener,
268
+ ProxyMemoryListener,
269
+ listener);
270
+
271
+ if (!memory_region_is_ram(section->mr) ||
272
+ memory_region_is_rom(section->mr)) {
152
+ return;
273
+ return;
153
+ }
274
+ }
154
+
275
+
155
+ /* We rely on power-of-2 blocksizes for bitmasks */
276
+ if (try_merge(proxy_listener, section)) {
156
+ if ((value & (value - 1)) != 0) {
157
+ error_setg(errp,
158
+ "Property %s.%s doesn't take value '%" PRId64
159
+ "', it's not a power of 2",
160
+ id, name, value);
161
+ return;
277
+ return;
162
+ }
278
+ }
163
+}
279
+
164
diff --git a/util/meson.build b/util/meson.build
280
+ ++proxy_listener->n_mr_sections;
165
index XXXXXXX..XXXXXXX 100644
281
+ proxy_listener->mr_sections = g_renew(MemoryRegionSection,
166
--- a/util/meson.build
282
+ proxy_listener->mr_sections,
167
+++ b/util/meson.build
283
+ proxy_listener->n_mr_sections);
168
@@ -XXX,XX +XXX,XX @@ if have_block
284
+ proxy_listener->mr_sections[proxy_listener->n_mr_sections - 1] = *section;
169
util_ss.add(files('nvdimm-utils.c'))
285
+ proxy_listener->mr_sections[proxy_listener->n_mr_sections - 1].fv = NULL;
170
util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
286
+ memory_region_ref(section->mr);
171
util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
287
+}
172
+ util_ss.add(files('block-helpers.c'))
288
+
173
util_ss.add(files('qemu-coroutine-sleep.c'))
289
+static void proxy_memory_listener_commit(MemoryListener *listener)
174
util_ss.add(files('qemu-co-shared-resource.c'))
290
+{
175
util_ss.add(files('thread-pool.c', 'qemu-timer.c'))
291
+ ProxyMemoryListener *proxy_listener = container_of(listener,
292
+ ProxyMemoryListener,
293
+ listener);
294
+ MPQemuMsg msg;
295
+ MemoryRegionSection *section;
296
+ ram_addr_t offset;
297
+ uintptr_t host_addr;
298
+ int region;
299
+ Error *local_err = NULL;
300
+
301
+ memset(&msg, 0, sizeof(MPQemuMsg));
302
+
303
+ msg.cmd = MPQEMU_CMD_SYNC_SYSMEM;
304
+ msg.num_fds = proxy_listener->n_mr_sections;
305
+ msg.size = sizeof(SyncSysmemMsg);
306
+ if (msg.num_fds > REMOTE_MAX_FDS) {
307
+ error_report("Number of fds is more than %d", REMOTE_MAX_FDS);
308
+ return;
309
+ }
310
+
311
+ for (region = 0; region < proxy_listener->n_mr_sections; region++) {
312
+ section = &proxy_listener->mr_sections[region];
313
+ msg.data.sync_sysmem.gpas[region] =
314
+ section->offset_within_address_space;
315
+ msg.data.sync_sysmem.sizes[region] = int128_get64(section->size);
316
+ host_addr = (uintptr_t)memory_region_get_ram_ptr(section->mr) +
317
+ section->offset_within_region;
318
+ msg.fds[region] = get_fd_from_hostaddr(host_addr, &offset);
319
+ msg.data.sync_sysmem.offsets[region] = offset;
320
+ }
321
+ if (!mpqemu_msg_send(&msg, proxy_listener->ioc, &local_err)) {
322
+ error_report_err(local_err);
323
+ }
324
+}
325
+
326
+void proxy_memory_listener_deconfigure(ProxyMemoryListener *proxy_listener)
327
+{
328
+ memory_listener_unregister(&proxy_listener->listener);
329
+
330
+ proxy_memory_listener_reset(&proxy_listener->listener);
331
+}
332
+
333
+void proxy_memory_listener_configure(ProxyMemoryListener *proxy_listener,
334
+ QIOChannel *ioc)
335
+{
336
+ proxy_listener->n_mr_sections = 0;
337
+ proxy_listener->mr_sections = NULL;
338
+
339
+ proxy_listener->ioc = ioc;
340
+
341
+ proxy_listener->listener.begin = proxy_memory_listener_reset;
342
+ proxy_listener->listener.commit = proxy_memory_listener_commit;
343
+ proxy_listener->listener.region_add = proxy_memory_listener_region_addnop;
344
+ proxy_listener->listener.region_nop = proxy_memory_listener_region_addnop;
345
+ proxy_listener->listener.priority = 10;
346
+
347
+ memory_listener_register(&proxy_listener->listener,
348
+ &address_space_memory);
349
+}
350
diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
351
index XXXXXXX..XXXXXXX 100644
352
--- a/hw/remote/proxy.c
353
+++ b/hw/remote/proxy.c
354
@@ -XXX,XX +XXX,XX @@
355
#include "qemu/sockets.h"
356
#include "hw/remote/mpqemu-link.h"
357
#include "qemu/error-report.h"
358
+#include "hw/remote/proxy-memory-listener.h"
359
+#include "qom/object.h"
360
361
static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
362
{
363
@@ -XXX,XX +XXX,XX @@ static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
364
365
qemu_mutex_init(&dev->io_mutex);
366
qio_channel_set_blocking(dev->ioc, true, NULL);
367
+
368
+ proxy_memory_listener_configure(&dev->proxy_listener, dev->ioc);
369
}
370
371
static void pci_proxy_dev_exit(PCIDevice *pdev)
372
@@ -XXX,XX +XXX,XX @@ static void pci_proxy_dev_exit(PCIDevice *pdev)
373
migrate_del_blocker(dev->migration_blocker);
374
375
error_free(dev->migration_blocker);
376
+
377
+ proxy_memory_listener_deconfigure(&dev->proxy_listener);
378
}
379
380
static void config_op_send(PCIProxyDev *pdev, uint32_t addr, uint32_t *val,
381
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
382
index XXXXXXX..XXXXXXX 100644
383
--- a/hw/remote/meson.build
384
+++ b/hw/remote/meson.build
385
@@ -XXX,XX +XXX,XX @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
386
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
387
388
specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
389
+specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy-memory-listener.c'))
390
391
softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
176
--
392
--
177
2.26.2
393
2.29.2
178
394
diff view generated by jsdifflib
1
Only one struct is needed per request. Drop req_data and the separate
1
From: Jagannathan Raman <jag.raman@oracle.com>
2
VuBlockReq instance. Instead let vu_queue_pop() allocate everything at
3
once.
4
2
5
This fixes the req_data memory leak in vu_block_virtio_process_req().
3
IOHUB object is added to manage PCI IRQs. It uses KVM_IRQFD
4
ioctl to create irqfd to injecting PCI interrupts to the guest.
5
IOHUB object forwards the irqfd to the remote process. Remote process
6
uses this fd to directly send interrupts to the guest, bypassing QEMU.
6
7
7
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
8
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
8
Message-id: 20200924151549.913737-6-stefanha@redhat.com
9
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
10
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
11
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Message-id: 51d5c3d54e28a68b002e3875c59599c9f5a424a1.1611938319.git.jag.raman@oracle.com
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
14
---
11
block/export/vhost-user-blk-server.c | 68 +++++++++-------------------
15
MAINTAINERS | 2 +
12
1 file changed, 21 insertions(+), 47 deletions(-)
16
include/hw/pci/pci_ids.h | 3 +
17
include/hw/remote/iohub.h | 42 +++++++++++
18
include/hw/remote/machine.h | 2 +
19
include/hw/remote/mpqemu-link.h | 1 +
20
include/hw/remote/proxy.h | 4 ++
21
hw/remote/iohub.c | 119 ++++++++++++++++++++++++++++++++
22
hw/remote/machine.c | 10 +++
23
hw/remote/message.c | 4 ++
24
hw/remote/mpqemu-link.c | 5 ++
25
hw/remote/proxy.c | 56 +++++++++++++++
26
hw/remote/meson.build | 1 +
27
12 files changed, 249 insertions(+)
28
create mode 100644 include/hw/remote/iohub.h
29
create mode 100644 hw/remote/iohub.c
13
30
14
diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
31
diff --git a/MAINTAINERS b/MAINTAINERS
15
index XXXXXXX..XXXXXXX 100644
32
index XXXXXXX..XXXXXXX 100644
16
--- a/block/export/vhost-user-blk-server.c
33
--- a/MAINTAINERS
17
+++ b/block/export/vhost-user-blk-server.c
34
+++ b/MAINTAINERS
18
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_inhdr {
35
@@ -XXX,XX +XXX,XX @@ F: hw/remote/proxy.c
36
F: include/hw/remote/proxy.h
37
F: hw/remote/proxy-memory-listener.c
38
F: include/hw/remote/proxy-memory-listener.h
39
+F: hw/remote/iohub.c
40
+F: include/hw/remote/iohub.h
41
42
Build and test automation
43
-------------------------
44
diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h
45
index XXXXXXX..XXXXXXX 100644
46
--- a/include/hw/pci/pci_ids.h
47
+++ b/include/hw/pci/pci_ids.h
48
@@ -XXX,XX +XXX,XX @@
49
#define PCI_DEVICE_ID_SUN_SIMBA 0x5000
50
#define PCI_DEVICE_ID_SUN_SABRE 0xa000
51
52
+#define PCI_VENDOR_ID_ORACLE 0x108e
53
+#define PCI_DEVICE_ID_REMOTE_IOHUB 0xb000
54
+
55
#define PCI_VENDOR_ID_CMD 0x1095
56
#define PCI_DEVICE_ID_CMD_646 0x0646
57
58
diff --git a/include/hw/remote/iohub.h b/include/hw/remote/iohub.h
59
new file mode 100644
60
index XXXXXXX..XXXXXXX
61
--- /dev/null
62
+++ b/include/hw/remote/iohub.h
63
@@ -XXX,XX +XXX,XX @@
64
+/*
65
+ * IO Hub for remote device
66
+ *
67
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
68
+ *
69
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
70
+ * See the COPYING file in the top-level directory.
71
+ *
72
+ */
73
+
74
+#ifndef REMOTE_IOHUB_H
75
+#define REMOTE_IOHUB_H
76
+
77
+#include "hw/pci/pci.h"
78
+#include "qemu/event_notifier.h"
79
+#include "qemu/thread-posix.h"
80
+#include "hw/remote/mpqemu-link.h"
81
+
82
+#define REMOTE_IOHUB_NB_PIRQS PCI_DEVFN_MAX
83
+
84
+typedef struct ResampleToken {
85
+ void *iohub;
86
+ int pirq;
87
+} ResampleToken;
88
+
89
+typedef struct RemoteIOHubState {
90
+ PCIDevice d;
91
+ EventNotifier irqfds[REMOTE_IOHUB_NB_PIRQS];
92
+ EventNotifier resamplefds[REMOTE_IOHUB_NB_PIRQS];
93
+ unsigned int irq_level[REMOTE_IOHUB_NB_PIRQS];
94
+ ResampleToken token[REMOTE_IOHUB_NB_PIRQS];
95
+ QemuMutex irq_level_lock[REMOTE_IOHUB_NB_PIRQS];
96
+} RemoteIOHubState;
97
+
98
+int remote_iohub_map_irq(PCIDevice *pci_dev, int intx);
99
+void remote_iohub_set_irq(void *opaque, int pirq, int level);
100
+void process_set_irqfd_msg(PCIDevice *pci_dev, MPQemuMsg *msg);
101
+
102
+void remote_iohub_init(RemoteIOHubState *iohub);
103
+void remote_iohub_finalize(RemoteIOHubState *iohub);
104
+
105
+#endif
106
diff --git a/include/hw/remote/machine.h b/include/hw/remote/machine.h
107
index XXXXXXX..XXXXXXX 100644
108
--- a/include/hw/remote/machine.h
109
+++ b/include/hw/remote/machine.h
110
@@ -XXX,XX +XXX,XX @@
111
#include "hw/boards.h"
112
#include "hw/pci-host/remote.h"
113
#include "io/channel.h"
114
+#include "hw/remote/iohub.h"
115
116
struct RemoteMachineState {
117
MachineState parent_obj;
118
119
RemotePCIHost *host;
120
+ RemoteIOHubState iohub;
19
};
121
};
20
122
21
typedef struct VuBlockReq {
123
/* Used to pass to co-routine device and ioc. */
22
- VuVirtqElement *elem;
124
diff --git a/include/hw/remote/mpqemu-link.h b/include/hw/remote/mpqemu-link.h
23
+ VuVirtqElement elem;
125
index XXXXXXX..XXXXXXX 100644
24
int64_t sector_num;
126
--- a/include/hw/remote/mpqemu-link.h
25
size_t size;
127
+++ b/include/hw/remote/mpqemu-link.h
26
struct virtio_blk_inhdr *in;
128
@@ -XXX,XX +XXX,XX @@ typedef enum {
27
@@ -XXX,XX +XXX,XX @@ static void vu_block_req_complete(VuBlockReq *req)
129
MPQEMU_CMD_PCI_CFGREAD,
28
VuDev *vu_dev = &req->server->vu_dev;
130
MPQEMU_CMD_BAR_WRITE,
29
131
MPQEMU_CMD_BAR_READ,
30
/* IO size with 1 extra status byte */
132
+ MPQEMU_CMD_SET_IRQFD,
31
- vu_queue_push(vu_dev, req->vq, req->elem, req->size + 1);
133
MPQEMU_CMD_MAX,
32
+ vu_queue_push(vu_dev, req->vq, &req->elem, req->size + 1);
134
} MPQemuCmd;
33
vu_queue_notify(vu_dev, req->vq);
135
34
136
diff --git a/include/hw/remote/proxy.h b/include/hw/remote/proxy.h
35
- if (req->elem) {
137
index XXXXXXX..XXXXXXX 100644
36
- free(req->elem);
138
--- a/include/hw/remote/proxy.h
37
- }
139
+++ b/include/hw/remote/proxy.h
38
-
140
@@ -XXX,XX +XXX,XX @@
39
- g_free(req);
141
#include "hw/pci/pci.h"
40
+ free(req);
142
#include "io/channel.h"
143
#include "hw/remote/proxy-memory-listener.h"
144
+#include "qemu/event_notifier.h"
145
146
#define TYPE_PCI_PROXY_DEV "x-pci-proxy-dev"
147
OBJECT_DECLARE_SIMPLE_TYPE(PCIProxyDev, PCI_PROXY_DEV)
148
@@ -XXX,XX +XXX,XX @@ struct PCIProxyDev {
149
QIOChannel *ioc;
150
Error *migration_blocker;
151
ProxyMemoryListener proxy_listener;
152
+ int virq;
153
+ EventNotifier intr;
154
+ EventNotifier resample;
155
ProxyMemoryRegion region[PCI_NUM_REGIONS];
156
};
157
158
diff --git a/hw/remote/iohub.c b/hw/remote/iohub.c
159
new file mode 100644
160
index XXXXXXX..XXXXXXX
161
--- /dev/null
162
+++ b/hw/remote/iohub.c
163
@@ -XXX,XX +XXX,XX @@
164
+/*
165
+ * Remote IO Hub
166
+ *
167
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
168
+ *
169
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
170
+ * See the COPYING file in the top-level directory.
171
+ *
172
+ */
173
+
174
+#include "qemu/osdep.h"
175
+#include "qemu-common.h"
176
+
177
+#include "hw/pci/pci.h"
178
+#include "hw/pci/pci_ids.h"
179
+#include "hw/pci/pci_bus.h"
180
+#include "qemu/thread.h"
181
+#include "hw/boards.h"
182
+#include "hw/remote/machine.h"
183
+#include "hw/remote/iohub.h"
184
+#include "qemu/main-loop.h"
185
+
186
+void remote_iohub_init(RemoteIOHubState *iohub)
187
+{
188
+ int pirq;
189
+
190
+ memset(&iohub->irqfds, 0, sizeof(iohub->irqfds));
191
+ memset(&iohub->resamplefds, 0, sizeof(iohub->resamplefds));
192
+
193
+ for (pirq = 0; pirq < REMOTE_IOHUB_NB_PIRQS; pirq++) {
194
+ qemu_mutex_init(&iohub->irq_level_lock[pirq]);
195
+ iohub->irq_level[pirq] = 0;
196
+ event_notifier_init_fd(&iohub->irqfds[pirq], -1);
197
+ event_notifier_init_fd(&iohub->resamplefds[pirq], -1);
198
+ }
199
+}
200
+
201
+void remote_iohub_finalize(RemoteIOHubState *iohub)
202
+{
203
+ int pirq;
204
+
205
+ for (pirq = 0; pirq < REMOTE_IOHUB_NB_PIRQS; pirq++) {
206
+ qemu_set_fd_handler(event_notifier_get_fd(&iohub->resamplefds[pirq]),
207
+ NULL, NULL, NULL);
208
+ event_notifier_cleanup(&iohub->irqfds[pirq]);
209
+ event_notifier_cleanup(&iohub->resamplefds[pirq]);
210
+ qemu_mutex_destroy(&iohub->irq_level_lock[pirq]);
211
+ }
212
+}
213
+
214
+int remote_iohub_map_irq(PCIDevice *pci_dev, int intx)
215
+{
216
+ return pci_dev->devfn;
217
+}
218
+
219
+void remote_iohub_set_irq(void *opaque, int pirq, int level)
220
+{
221
+ RemoteIOHubState *iohub = opaque;
222
+
223
+ assert(pirq >= 0);
224
+ assert(pirq < PCI_DEVFN_MAX);
225
+
226
+ QEMU_LOCK_GUARD(&iohub->irq_level_lock[pirq]);
227
+
228
+ if (level) {
229
+ if (++iohub->irq_level[pirq] == 1) {
230
+ event_notifier_set(&iohub->irqfds[pirq]);
231
+ }
232
+ } else if (iohub->irq_level[pirq] > 0) {
233
+ iohub->irq_level[pirq]--;
234
+ }
235
+}
236
+
237
+static void intr_resample_handler(void *opaque)
238
+{
239
+ ResampleToken *token = opaque;
240
+ RemoteIOHubState *iohub = token->iohub;
241
+ int pirq, s;
242
+
243
+ pirq = token->pirq;
244
+
245
+ s = event_notifier_test_and_clear(&iohub->resamplefds[pirq]);
246
+
247
+ assert(s >= 0);
248
+
249
+ QEMU_LOCK_GUARD(&iohub->irq_level_lock[pirq]);
250
+
251
+ if (iohub->irq_level[pirq]) {
252
+ event_notifier_set(&iohub->irqfds[pirq]);
253
+ }
254
+}
255
+
256
+void process_set_irqfd_msg(PCIDevice *pci_dev, MPQemuMsg *msg)
257
+{
258
+ RemoteMachineState *machine = REMOTE_MACHINE(current_machine);
259
+ RemoteIOHubState *iohub = &machine->iohub;
260
+ int pirq, intx;
261
+
262
+ intx = pci_get_byte(pci_dev->config + PCI_INTERRUPT_PIN) - 1;
263
+
264
+ pirq = remote_iohub_map_irq(pci_dev, intx);
265
+
266
+ if (event_notifier_get_fd(&iohub->irqfds[pirq]) != -1) {
267
+ qemu_set_fd_handler(event_notifier_get_fd(&iohub->resamplefds[pirq]),
268
+ NULL, NULL, NULL);
269
+ event_notifier_cleanup(&iohub->irqfds[pirq]);
270
+ event_notifier_cleanup(&iohub->resamplefds[pirq]);
271
+ memset(&iohub->token[pirq], 0, sizeof(ResampleToken));
272
+ }
273
+
274
+ event_notifier_init_fd(&iohub->irqfds[pirq], msg->fds[0]);
275
+ event_notifier_init_fd(&iohub->resamplefds[pirq], msg->fds[1]);
276
+
277
+ iohub->token[pirq].iohub = iohub;
278
+ iohub->token[pirq].pirq = pirq;
279
+
280
+ qemu_set_fd_handler(msg->fds[1], intr_resample_handler, NULL,
281
+ &iohub->token[pirq]);
282
+}
283
diff --git a/hw/remote/machine.c b/hw/remote/machine.c
284
index XXXXXXX..XXXXXXX 100644
285
--- a/hw/remote/machine.c
286
+++ b/hw/remote/machine.c
287
@@ -XXX,XX +XXX,XX @@
288
#include "exec/address-spaces.h"
289
#include "exec/memory.h"
290
#include "qapi/error.h"
291
+#include "hw/pci/pci_host.h"
292
+#include "hw/remote/iohub.h"
293
294
static void remote_machine_init(MachineState *machine)
295
{
296
MemoryRegion *system_memory, *system_io, *pci_memory;
297
RemoteMachineState *s = REMOTE_MACHINE(machine);
298
RemotePCIHost *rem_host;
299
+ PCIHostState *pci_host;
300
301
system_memory = get_system_memory();
302
system_io = get_system_io();
303
@@ -XXX,XX +XXX,XX @@ static void remote_machine_init(MachineState *machine)
304
memory_region_add_subregion_overlap(system_memory, 0x0, pci_memory, -1);
305
306
qdev_realize(DEVICE(rem_host), sysbus_get_default(), &error_fatal);
307
+
308
+ pci_host = PCI_HOST_BRIDGE(rem_host);
309
+
310
+ remote_iohub_init(&s->iohub);
311
+
312
+ pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
313
+ &s->iohub, REMOTE_IOHUB_NB_PIRQS);
41
}
314
}
42
315
43
static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
316
static void remote_machine_class_init(ObjectClass *oc, void *data)
44
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_flush(VuBlockReq *req)
317
diff --git a/hw/remote/message.c b/hw/remote/message.c
45
blk_co_flush(backend);
318
index XXXXXXX..XXXXXXX 100644
319
--- a/hw/remote/message.c
320
+++ b/hw/remote/message.c
321
@@ -XXX,XX +XXX,XX @@
322
#include "hw/pci/pci.h"
323
#include "exec/memattrs.h"
324
#include "hw/remote/memory.h"
325
+#include "hw/remote/iohub.h"
326
327
static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
328
MPQemuMsg *msg, Error **errp);
329
@@ -XXX,XX +XXX,XX @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
330
case MPQEMU_CMD_SYNC_SYSMEM:
331
remote_sysmem_reconfig(&msg, &local_err);
332
break;
333
+ case MPQEMU_CMD_SET_IRQFD:
334
+ process_set_irqfd_msg(pci_dev, &msg);
335
+ break;
336
default:
337
error_setg(&local_err,
338
"Unknown command (%d) received for device %s"
339
diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
340
index XXXXXXX..XXXXXXX 100644
341
--- a/hw/remote/mpqemu-link.c
342
+++ b/hw/remote/mpqemu-link.c
343
@@ -XXX,XX +XXX,XX @@ bool mpqemu_msg_valid(MPQemuMsg *msg)
344
return false;
345
}
346
break;
347
+ case MPQEMU_CMD_SET_IRQFD:
348
+ if (msg->size || (msg->num_fds != 2)) {
349
+ return false;
350
+ }
351
+ break;
352
default:
353
break;
354
}
355
diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
356
index XXXXXXX..XXXXXXX 100644
357
--- a/hw/remote/proxy.c
358
+++ b/hw/remote/proxy.c
359
@@ -XXX,XX +XXX,XX @@
360
#include "qemu/error-report.h"
361
#include "hw/remote/proxy-memory-listener.h"
362
#include "qom/object.h"
363
+#include "qemu/event_notifier.h"
364
+#include "sysemu/kvm.h"
365
+#include "util/event_notifier-posix.c"
366
+
367
+static void proxy_intx_update(PCIDevice *pci_dev)
368
+{
369
+ PCIProxyDev *dev = PCI_PROXY_DEV(pci_dev);
370
+ PCIINTxRoute route;
371
+ int pin = pci_get_byte(pci_dev->config + PCI_INTERRUPT_PIN) - 1;
372
+
373
+ if (dev->virq != -1) {
374
+ kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, &dev->intr, dev->virq);
375
+ dev->virq = -1;
376
+ }
377
+
378
+ route = pci_device_route_intx_to_irq(pci_dev, pin);
379
+
380
+ dev->virq = route.irq;
381
+
382
+ if (dev->virq != -1) {
383
+ kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, &dev->intr,
384
+ &dev->resample, dev->virq);
385
+ }
386
+}
387
+
388
+static void setup_irqfd(PCIProxyDev *dev)
389
+{
390
+ PCIDevice *pci_dev = PCI_DEVICE(dev);
391
+ MPQemuMsg msg;
392
+ Error *local_err = NULL;
393
+
394
+ event_notifier_init(&dev->intr, 0);
395
+ event_notifier_init(&dev->resample, 0);
396
+
397
+ memset(&msg, 0, sizeof(MPQemuMsg));
398
+ msg.cmd = MPQEMU_CMD_SET_IRQFD;
399
+ msg.num_fds = 2;
400
+ msg.fds[0] = event_notifier_get_fd(&dev->intr);
401
+ msg.fds[1] = event_notifier_get_fd(&dev->resample);
402
+ msg.size = 0;
403
+
404
+ if (!mpqemu_msg_send(&msg, dev->ioc, &local_err)) {
405
+ error_report_err(local_err);
406
+ }
407
+
408
+ dev->virq = -1;
409
+
410
+ proxy_intx_update(pci_dev);
411
+
412
+ pci_device_set_intx_routing_notifier(pci_dev, proxy_intx_update);
413
+}
414
415
static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
416
{
417
@@ -XXX,XX +XXX,XX @@ static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
418
qio_channel_set_blocking(dev->ioc, true, NULL);
419
420
proxy_memory_listener_configure(&dev->proxy_listener, dev->ioc);
421
+
422
+ setup_irqfd(dev);
46
}
423
}
47
424
48
-struct req_data {
425
static void pci_proxy_dev_exit(PCIDevice *pdev)
49
- VuServer *server;
426
@@ -XXX,XX +XXX,XX @@ static void pci_proxy_dev_exit(PCIDevice *pdev)
50
- VuVirtq *vq;
427
error_free(dev->migration_blocker);
51
- VuVirtqElement *elem;
428
52
-};
429
proxy_memory_listener_deconfigure(&dev->proxy_listener);
53
-
430
+
54
static void coroutine_fn vu_block_virtio_process_req(void *opaque)
431
+ event_notifier_cleanup(&dev->intr);
55
{
432
+ event_notifier_cleanup(&dev->resample);
56
- struct req_data *data = opaque;
57
- VuServer *server = data->server;
58
- VuVirtq *vq = data->vq;
59
- VuVirtqElement *elem = data->elem;
60
+ VuBlockReq *req = opaque;
61
+ VuServer *server = req->server;
62
+ VuVirtqElement *elem = &req->elem;
63
uint32_t type;
64
- VuBlockReq *req;
65
66
VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
67
BlockBackend *backend = vdev_blk->backend;
68
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
69
struct iovec *out_iov = elem->out_sg;
70
unsigned in_num = elem->in_num;
71
unsigned out_num = elem->out_num;
72
+
73
/* refer to hw/block/virtio_blk.c */
74
if (elem->out_num < 1 || elem->in_num < 1) {
75
error_report("virtio-blk request missing headers");
76
- free(elem);
77
- return;
78
+ goto err;
79
}
80
81
- req = g_new0(VuBlockReq, 1);
82
- req->server = server;
83
- req->vq = vq;
84
- req->elem = elem;
85
-
86
if (unlikely(iov_to_buf(out_iov, out_num, 0, &req->out,
87
sizeof(req->out)) != sizeof(req->out))) {
88
error_report("virtio-blk request outhdr too short");
89
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
90
91
err:
92
free(elem);
93
- g_free(req);
94
- return;
95
}
433
}
96
434
97
static void vu_block_process_vq(VuDev *vu_dev, int idx)
435
static void config_op_send(PCIProxyDev *pdev, uint32_t addr, uint32_t *val,
98
{
436
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
99
- VuServer *server;
437
index XXXXXXX..XXXXXXX 100644
100
- VuVirtq *vq;
438
--- a/hw/remote/meson.build
101
- struct req_data *req_data;
439
+++ b/hw/remote/meson.build
102
+ VuServer *server = container_of(vu_dev, VuServer, vu_dev);
440
@@ -XXX,XX +XXX,XX @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('mpqemu-link.c'))
103
+ VuVirtq *vq = vu_get_queue(vu_dev, idx);
441
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
104
442
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
105
- server = container_of(vu_dev, VuServer, vu_dev);
443
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
106
- assert(server);
444
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
107
-
445
108
- vq = vu_get_queue(vu_dev, idx);
446
specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
109
- assert(vq);
447
specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy-memory-listener.c'))
110
- VuVirtqElement *elem;
111
while (1) {
112
- elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) +
113
- sizeof(VuBlockReq));
114
- if (elem) {
115
- req_data = g_new0(struct req_data, 1);
116
- req_data->server = server;
117
- req_data->vq = vq;
118
- req_data->elem = elem;
119
- Coroutine *co = qemu_coroutine_create(vu_block_virtio_process_req,
120
- req_data);
121
- aio_co_enter(server->ioc->ctx, co);
122
- } else {
123
+ VuBlockReq *req;
124
+
125
+ req = vu_queue_pop(vu_dev, vq, sizeof(VuBlockReq));
126
+ if (!req) {
127
break;
128
}
129
+
130
+ req->server = server;
131
+ req->vq = vq;
132
+
133
+ Coroutine *co =
134
+ qemu_coroutine_create(vu_block_virtio_process_req, req);
135
+ qemu_coroutine_enter(co);
136
}
137
}
138
139
--
448
--
140
2.26.2
449
2.29.2
141
450
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Jagannathan Raman <jag.raman@oracle.com>
2
2
3
We are going to reuse bdrv_common_block_status_above in
3
Retrieve PCI configuration info about the remote device and
4
bdrv_is_allocated_above. bdrv_is_allocated_above may be called with
4
configure the Proxy PCI object based on the returned information
5
include_base == false and still bs == base (for ex. from img_rebase()).
6
5
7
So, support this corner case.
6
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
8
7
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
9
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
8
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
10
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Reviewed-by: Eric Blake <eblake@redhat.com>
10
Message-id: 85ee367bbb993aa23699b44cfedd83b4ea6d5221.1611938319.git.jag.raman@oracle.com
12
Reviewed-by: Alberto Garcia <berto@igalia.com>
13
Message-id: 20200924194003.22080-4-vsementsov@virtuozzo.com
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
15
---
12
---
16
block/io.c | 6 +++++-
13
hw/remote/proxy.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++
17
1 file changed, 5 insertions(+), 1 deletion(-)
14
1 file changed, 84 insertions(+)
18
15
19
diff --git a/block/io.c b/block/io.c
16
diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
20
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
21
--- a/block/io.c
18
--- a/hw/remote/proxy.c
22
+++ b/block/io.c
19
+++ b/hw/remote/proxy.c
23
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
20
@@ -XXX,XX +XXX,XX @@
24
BlockDriverState *p;
21
#include "sysemu/kvm.h"
25
int64_t eof = 0;
22
#include "util/event_notifier-posix.c"
26
23
27
- assert(include_base || bs != base);
24
+static void probe_pci_info(PCIDevice *dev, Error **errp);
28
assert(!include_base || base); /* Can't include NULL base */
25
+
29
26
static void proxy_intx_update(PCIDevice *pci_dev)
30
+ if (!include_base && bs == base) {
27
{
31
+ *pnum = bytes;
28
PCIProxyDev *dev = PCI_PROXY_DEV(pci_dev);
32
+ return 0;
29
@@ -XXX,XX +XXX,XX @@ static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
30
{
31
ERRP_GUARD();
32
PCIProxyDev *dev = PCI_PROXY_DEV(device);
33
+ uint8_t *pci_conf = device->config;
34
int fd;
35
36
if (!dev->fd) {
37
@@ -XXX,XX +XXX,XX @@ static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
38
qemu_mutex_init(&dev->io_mutex);
39
qio_channel_set_blocking(dev->ioc, true, NULL);
40
41
+ pci_conf[PCI_LATENCY_TIMER] = 0xff;
42
+ pci_conf[PCI_INTERRUPT_PIN] = 0x01;
43
+
44
proxy_memory_listener_configure(&dev->proxy_listener, dev->ioc);
45
46
setup_irqfd(dev);
47
+
48
+ probe_pci_info(PCI_DEVICE(dev), errp);
49
}
50
51
static void pci_proxy_dev_exit(PCIDevice *pdev)
52
@@ -XXX,XX +XXX,XX @@ const MemoryRegionOps proxy_mr_ops = {
53
.max_access_size = 8,
54
},
55
};
56
+
57
+static void probe_pci_info(PCIDevice *dev, Error **errp)
58
+{
59
+ PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
60
+ uint32_t orig_val, new_val, base_class, val;
61
+ PCIProxyDev *pdev = PCI_PROXY_DEV(dev);
62
+ DeviceClass *dc = DEVICE_CLASS(pc);
63
+ uint8_t type;
64
+ int i, size;
65
+
66
+ config_op_send(pdev, PCI_VENDOR_ID, &val, 2, MPQEMU_CMD_PCI_CFGREAD);
67
+ pc->vendor_id = (uint16_t)val;
68
+
69
+ config_op_send(pdev, PCI_DEVICE_ID, &val, 2, MPQEMU_CMD_PCI_CFGREAD);
70
+ pc->device_id = (uint16_t)val;
71
+
72
+ config_op_send(pdev, PCI_CLASS_DEVICE, &val, 2, MPQEMU_CMD_PCI_CFGREAD);
73
+ pc->class_id = (uint16_t)val;
74
+
75
+ config_op_send(pdev, PCI_SUBSYSTEM_ID, &val, 2, MPQEMU_CMD_PCI_CFGREAD);
76
+ pc->subsystem_id = (uint16_t)val;
77
+
78
+ base_class = pc->class_id >> 4;
79
+ switch (base_class) {
80
+ case PCI_BASE_CLASS_BRIDGE:
81
+ set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
82
+ break;
83
+ case PCI_BASE_CLASS_STORAGE:
84
+ set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
85
+ break;
86
+ case PCI_BASE_CLASS_NETWORK:
87
+ set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);
88
+ break;
89
+ case PCI_BASE_CLASS_INPUT:
90
+ set_bit(DEVICE_CATEGORY_INPUT, dc->categories);
91
+ break;
92
+ case PCI_BASE_CLASS_DISPLAY:
93
+ set_bit(DEVICE_CATEGORY_DISPLAY, dc->categories);
94
+ break;
95
+ case PCI_BASE_CLASS_PROCESSOR:
96
+ set_bit(DEVICE_CATEGORY_CPU, dc->categories);
97
+ break;
98
+ default:
99
+ set_bit(DEVICE_CATEGORY_MISC, dc->categories);
100
+ break;
33
+ }
101
+ }
34
+
102
+
35
ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
103
+ for (i = 0; i < PCI_NUM_REGIONS; i++) {
36
if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
104
+ config_op_send(pdev, PCI_BASE_ADDRESS_0 + (4 * i), &orig_val, 4,
37
return ret;
105
+ MPQEMU_CMD_PCI_CFGREAD);
106
+ new_val = 0xffffffff;
107
+ config_op_send(pdev, PCI_BASE_ADDRESS_0 + (4 * i), &new_val, 4,
108
+ MPQEMU_CMD_PCI_CFGWRITE);
109
+ config_op_send(pdev, PCI_BASE_ADDRESS_0 + (4 * i), &new_val, 4,
110
+ MPQEMU_CMD_PCI_CFGREAD);
111
+ size = (~(new_val & 0xFFFFFFF0)) + 1;
112
+ config_op_send(pdev, PCI_BASE_ADDRESS_0 + (4 * i), &orig_val, 4,
113
+ MPQEMU_CMD_PCI_CFGWRITE);
114
+ type = (new_val & 0x1) ?
115
+ PCI_BASE_ADDRESS_SPACE_IO : PCI_BASE_ADDRESS_SPACE_MEMORY;
116
+
117
+ if (size) {
118
+ g_autofree char *name;
119
+ pdev->region[i].dev = pdev;
120
+ pdev->region[i].present = true;
121
+ if (type == PCI_BASE_ADDRESS_SPACE_MEMORY) {
122
+ pdev->region[i].memory = true;
123
+ }
124
+ name = g_strdup_printf("bar-region-%d", i);
125
+ memory_region_init_io(&pdev->region[i].mr, OBJECT(pdev),
126
+ &proxy_mr_ops, &pdev->region[i],
127
+ name, size);
128
+ pci_register_bar(dev, i, type, &pdev->region[i].mr);
129
+ }
130
+ }
131
+}
38
--
132
--
39
2.26.2
133
2.29.2
40
134
diff view generated by jsdifflib
1
Unexpected EOF is an error that must be reported.
1
From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
2
2
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
3
Perform device reset in the remote process when QEMU performs
4
Message-id: 20200924151549.913737-9-stefanha@redhat.com
4
device reset. This is required to reset the internal state
5
(like registers, etc...) of emulated devices
6
7
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
8
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
9
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
10
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Message-id: 7cb220a51f565dc0817bd76e2f540e89c2d2b850.1611938319.git.jag.raman@oracle.com
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
---
13
---
7
util/vhost-user-server.c | 6 ++++--
14
include/hw/remote/mpqemu-link.h | 1 +
8
1 file changed, 4 insertions(+), 2 deletions(-)
15
hw/remote/message.c | 22 ++++++++++++++++++++++
16
hw/remote/proxy.c | 19 +++++++++++++++++++
17
3 files changed, 42 insertions(+)
9
18
10
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
19
diff --git a/include/hw/remote/mpqemu-link.h b/include/hw/remote/mpqemu-link.h
11
index XXXXXXX..XXXXXXX 100644
20
index XXXXXXX..XXXXXXX 100644
12
--- a/util/vhost-user-server.c
21
--- a/include/hw/remote/mpqemu-link.h
13
+++ b/util/vhost-user-server.c
22
+++ b/include/hw/remote/mpqemu-link.h
14
@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
23
@@ -XXX,XX +XXX,XX @@ typedef enum {
15
};
24
MPQEMU_CMD_BAR_WRITE,
16
if (vmsg->size) {
25
MPQEMU_CMD_BAR_READ,
17
rc = qio_channel_readv_all_eof(ioc, &iov_payload, 1, &local_err);
26
MPQEMU_CMD_SET_IRQFD,
18
- if (rc == -1) {
27
+ MPQEMU_CMD_DEVICE_RESET,
19
- error_report_err(local_err);
28
MPQEMU_CMD_MAX,
20
+ if (rc != 1) {
29
} MPQemuCmd;
21
+ if (local_err) {
30
22
+ error_report_err(local_err);
31
diff --git a/hw/remote/message.c b/hw/remote/message.c
23
+ }
32
index XXXXXXX..XXXXXXX 100644
24
goto fail;
33
--- a/hw/remote/message.c
34
+++ b/hw/remote/message.c
35
@@ -XXX,XX +XXX,XX @@
36
#include "exec/memattrs.h"
37
#include "hw/remote/memory.h"
38
#include "hw/remote/iohub.h"
39
+#include "sysemu/reset.h"
40
41
static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
42
MPQemuMsg *msg, Error **errp);
43
@@ -XXX,XX +XXX,XX @@ static void process_config_read(QIOChannel *ioc, PCIDevice *dev,
44
MPQemuMsg *msg, Error **errp);
45
static void process_bar_write(QIOChannel *ioc, MPQemuMsg *msg, Error **errp);
46
static void process_bar_read(QIOChannel *ioc, MPQemuMsg *msg, Error **errp);
47
+static void process_device_reset_msg(QIOChannel *ioc, PCIDevice *dev,
48
+ Error **errp);
49
50
void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
51
{
52
@@ -XXX,XX +XXX,XX @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
53
case MPQEMU_CMD_SET_IRQFD:
54
process_set_irqfd_msg(pci_dev, &msg);
55
break;
56
+ case MPQEMU_CMD_DEVICE_RESET:
57
+ process_device_reset_msg(com->ioc, pci_dev, &local_err);
58
+ break;
59
default:
60
error_setg(&local_err,
61
"Unknown command (%d) received for device %s"
62
@@ -XXX,XX +XXX,XX @@ fail:
63
getpid());
64
}
65
}
66
+
67
+static void process_device_reset_msg(QIOChannel *ioc, PCIDevice *dev,
68
+ Error **errp)
69
+{
70
+ DeviceClass *dc = DEVICE_GET_CLASS(dev);
71
+ DeviceState *s = DEVICE(dev);
72
+ MPQemuMsg ret = { 0 };
73
+
74
+ if (dc->reset) {
75
+ dc->reset(s);
76
+ }
77
+
78
+ ret.cmd = MPQEMU_CMD_RET;
79
+
80
+ mpqemu_msg_send(&ret, ioc, errp);
81
+}
82
diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
83
index XXXXXXX..XXXXXXX 100644
84
--- a/hw/remote/proxy.c
85
+++ b/hw/remote/proxy.c
86
@@ -XXX,XX +XXX,XX @@
87
#include "util/event_notifier-posix.c"
88
89
static void probe_pci_info(PCIDevice *dev, Error **errp);
90
+static void proxy_device_reset(DeviceState *dev);
91
92
static void proxy_intx_update(PCIDevice *pci_dev)
93
{
94
@@ -XXX,XX +XXX,XX @@ static void pci_proxy_dev_class_init(ObjectClass *klass, void *data)
95
k->config_read = pci_proxy_read_config;
96
k->config_write = pci_proxy_write_config;
97
98
+ dc->reset = proxy_device_reset;
99
+
100
device_class_set_props(dc, proxy_properties);
101
}
102
103
@@ -XXX,XX +XXX,XX @@ static void probe_pci_info(PCIDevice *dev, Error **errp)
25
}
104
}
26
}
105
}
106
}
107
+
108
+static void proxy_device_reset(DeviceState *dev)
109
+{
110
+ PCIProxyDev *pdev = PCI_PROXY_DEV(dev);
111
+ MPQemuMsg msg = { 0 };
112
+ Error *local_err = NULL;
113
+
114
+ msg.cmd = MPQEMU_CMD_DEVICE_RESET;
115
+ msg.size = 0;
116
+
117
+ mpqemu_msg_send_and_await_reply(&msg, pdev, &local_err);
118
+ if (local_err) {
119
+ error_report_err(local_err);
120
+ }
121
+
122
+}
27
--
123
--
28
2.26.2
124
2.29.2
29
125
diff view generated by jsdifflib
1
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
1
From: "Denis V. Lunev" <den@openvz.org>
2
Message-id: 20200924151549.913737-3-stefanha@redhat.com
2
3
Original specification says that l1 table size if 64 * l1_size, which
4
is obviously wrong. The size of the l1 entry is 64 _bits_, not bytes.
5
Thus 64 is to be replaces with 8 as specification says about bytes.
6
7
There is also minor tweak, field name is renamed from l1 to l1_table,
8
which matches with the later text.
9
10
Signed-off-by: Denis V. Lunev <den@openvz.org>
11
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
12
Message-id: 20210128171313.2210947-1-den@openvz.org
13
CC: Stefan Hajnoczi <stefanha@redhat.com>
14
CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
15
16
[Replace the original commit message "docs: fix mistake in dirty bitmap
17
feature description" as suggested by Eric Blake.
18
--Stefan]
19
3
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
20
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
4
---
21
---
5
util/vhost-user-server.c | 2 +-
22
docs/interop/parallels.txt | 2 +-
6
1 file changed, 1 insertion(+), 1 deletion(-)
23
1 file changed, 1 insertion(+), 1 deletion(-)
7
24
8
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
25
diff --git a/docs/interop/parallels.txt b/docs/interop/parallels.txt
9
index XXXXXXX..XXXXXXX 100644
26
index XXXXXXX..XXXXXXX 100644
10
--- a/util/vhost-user-server.c
27
--- a/docs/interop/parallels.txt
11
+++ b/util/vhost-user-server.c
28
+++ b/docs/interop/parallels.txt
12
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
29
@@ -XXX,XX +XXX,XX @@ of its data area are:
13
return false;
30
28 - 31: l1_size
14
}
31
The number of entries in the L1 table of the bitmap.
15
32
16
- /* zero out unspecified fileds */
33
- variable: l1 (64 * l1_size bytes)
17
+ /* zero out unspecified fields */
34
+ variable: l1_table (8 * l1_size bytes)
18
*server = (VuServer) {
35
L1 offset table (in bytes)
19
.listener = listener,
36
20
.vu_iface = vu_iface,
37
A dirty bitmap is stored using a one-level structure for the mapping to host
21
--
38
--
22
2.26.2
39
2.29.2
23
40
diff view generated by jsdifflib
Deleted patch
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2
1
3
These cases are fixed by previous patches around block_status and
4
is_allocated.
5
6
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
7
Reviewed-by: Eric Blake <eblake@redhat.com>
8
Reviewed-by: Alberto Garcia <berto@igalia.com>
9
Message-id: 20200924194003.22080-6-vsementsov@virtuozzo.com
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
12
tests/qemu-iotests/274 | 20 +++++++++++
13
tests/qemu-iotests/274.out | 68 ++++++++++++++++++++++++++++++++++++++
14
2 files changed, 88 insertions(+)
15
16
diff --git a/tests/qemu-iotests/274 b/tests/qemu-iotests/274
17
index XXXXXXX..XXXXXXX 100755
18
--- a/tests/qemu-iotests/274
19
+++ b/tests/qemu-iotests/274
20
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('base') as base, \
21
iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, mid)
22
iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), mid)
23
24
+ iotests.log('=== Testing qemu-img commit (top -> base) ===')
25
+
26
+ create_chain()
27
+ iotests.qemu_img_log('commit', '-b', base, top)
28
+ iotests.img_info_log(base)
29
+ iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
30
+ iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), base)
31
+
32
+ iotests.log('=== Testing QMP active commit (top -> base) ===')
33
+
34
+ create_chain()
35
+ with create_vm() as vm:
36
+ vm.launch()
37
+ vm.qmp_log('block-commit', device='top', base_node='base',
38
+ job_id='job0', auto_dismiss=False)
39
+ vm.run_job('job0', wait=5)
40
+
41
+ iotests.img_info_log(mid)
42
+ iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
43
+ iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), base)
44
45
iotests.log('== Resize tests ==')
46
47
diff --git a/tests/qemu-iotests/274.out b/tests/qemu-iotests/274.out
48
index XXXXXXX..XXXXXXX 100644
49
--- a/tests/qemu-iotests/274.out
50
+++ b/tests/qemu-iotests/274.out
51
@@ -XXX,XX +XXX,XX @@ read 1048576/1048576 bytes at offset 0
52
read 1048576/1048576 bytes at offset 1048576
53
1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
54
55
+=== Testing qemu-img commit (top -> base) ===
56
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 lazy_refcounts=off refcount_bits=16
57
+
58
+Formatting 'TEST_DIR/PID-mid', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1048576 backing_file=TEST_DIR/PID-base backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
59
+
60
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 backing_file=TEST_DIR/PID-mid backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
61
+
62
+wrote 2097152/2097152 bytes at offset 0
63
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
64
+
65
+Image committed.
66
+
67
+image: TEST_IMG
68
+file format: IMGFMT
69
+virtual size: 2 MiB (2097152 bytes)
70
+cluster_size: 65536
71
+Format specific information:
72
+ compat: 1.1
73
+ compression type: zlib
74
+ lazy refcounts: false
75
+ refcount bits: 16
76
+ corrupt: false
77
+ extended l2: false
78
+
79
+read 1048576/1048576 bytes at offset 0
80
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
81
+
82
+read 1048576/1048576 bytes at offset 1048576
83
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
84
+
85
+=== Testing QMP active commit (top -> base) ===
86
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 lazy_refcounts=off refcount_bits=16
87
+
88
+Formatting 'TEST_DIR/PID-mid', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1048576 backing_file=TEST_DIR/PID-base backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
89
+
90
+Formatting 'TEST_DIR/PID-top', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 backing_file=TEST_DIR/PID-mid backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
91
+
92
+wrote 2097152/2097152 bytes at offset 0
93
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
94
+
95
+{"execute": "block-commit", "arguments": {"auto-dismiss": false, "base-node": "base", "device": "top", "job-id": "job0"}}
96
+{"return": {}}
97
+{"execute": "job-complete", "arguments": {"id": "job0"}}
98
+{"return": {}}
99
+{"data": {"device": "job0", "len": 1048576, "offset": 1048576, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
100
+{"data": {"device": "job0", "len": 1048576, "offset": 1048576, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
101
+{"execute": "job-dismiss", "arguments": {"id": "job0"}}
102
+{"return": {}}
103
+image: TEST_IMG
104
+file format: IMGFMT
105
+virtual size: 1 MiB (1048576 bytes)
106
+cluster_size: 65536
107
+backing file: TEST_DIR/PID-base
108
+backing file format: IMGFMT
109
+Format specific information:
110
+ compat: 1.1
111
+ compression type: zlib
112
+ lazy refcounts: false
113
+ refcount bits: 16
114
+ corrupt: false
115
+ extended l2: false
116
+
117
+read 1048576/1048576 bytes at offset 0
118
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
119
+
120
+read 1048576/1048576 bytes at offset 1048576
121
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
122
+
123
== Resize tests ==
124
=== preallocation=off ===
125
Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=6442450944 lazy_refcounts=off refcount_bits=16
126
--
127
2.26.2
128
diff view generated by jsdifflib