1
The following changes since commit 8c1ecb590497b0349c550607db923972b37f6963:
1
The following changes since commit 77f3804ab7ed94b471a14acb260e5aeacf26193f:
2
2
3
Merge remote-tracking branch 'remotes/stsquad/tags/pull-testing-next-280519-2' into staging (2019-05-28 17:38:32 +0100)
3
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging (2021-02-02 16:47:51 +0000)
4
4
5
are available in the Git repository at:
5
are available in the Git repository at:
6
6
7
https://github.com/XanClic/qemu.git tags/pull-block-2019-05-28
7
https://gitlab.com/stefanha/qemu.git tags/block-pull-request
8
8
9
for you to fetch changes up to a2d665c1bc3624a8375e2f9a7d569f7565cc1358:
9
for you to fetch changes up to 026362226f1ff6a1168524a326bbd6347ad40e85:
10
10
11
blockdev: loosen restrictions on drive-backup source node (2019-05-28 20:30:55 +0200)
11
docs: fix Parallels Image "dirty bitmap" section (2021-02-03 16:48:21 +0000)
12
12
13
----------------------------------------------------------------
13
----------------------------------------------------------------
14
Block patches:
14
Pull request
15
- qcow2: Use threads for encrypted I/O
15
16
- qemu-img rebase: Optimizations
16
The pull request includes Multi-Process QEMU, GitLab repo URL updates, and even
17
- backup job: Allow any source node, and some refactoring
17
a block layer patch to fix the Parallels Image format specification!
18
- Some general simplifications in the block layer
19
18
20
----------------------------------------------------------------
19
----------------------------------------------------------------
21
Alberto Garcia (2):
22
block: Use bdrv_unref_child() for all children in bdrv_close()
23
block: Make bdrv_root_attach_child() unref child_bs on failure
24
20
25
Andrey Shinkevich (1):
21
Denis V. Lunev (1):
26
qcow2-bitmap: initialize bitmap directory alignment
22
docs: fix Parallels Image "dirty bitmap" section
27
23
28
Anton Nefedov (1):
24
Elena Ufimtseva (8):
29
qcow2: skip writing zero buffers to empty COW areas
25
multi-process: add configure and usage information
26
io: add qio_channel_writev_full_all helper
27
io: add qio_channel_readv_full_all_eof & qio_channel_readv_full_all
28
helpers
29
multi-process: define MPQemuMsg format and transmission functions
30
multi-process: introduce proxy object
31
multi-process: add proxy communication functions
32
multi-process: Forward PCI config space acceses to the remote process
33
multi-process: perform device reset in the remote process
30
34
31
John Snow (1):
35
Jagannathan Raman (11):
32
blockdev: loosen restrictions on drive-backup source node
36
memory: alloc RAM from file at offset
37
multi-process: Add config option for multi-process QEMU
38
multi-process: setup PCI host bridge for remote device
39
multi-process: setup a machine object for remote device process
40
multi-process: Initialize message handler in remote device
41
multi-process: Associate fd of a PCIDevice with its object
42
multi-process: setup memory manager for remote device
43
multi-process: PCI BAR read/write handling for proxy & remote
44
endpoints
45
multi-process: Synchronize remote memory
46
multi-process: create IOHUB object to handle irq
47
multi-process: Retrieve PCI info from remote process
33
48
34
Sam Eiderman (3):
49
John G Johnson (1):
35
qemu-img: rebase: Reuse parent BlockDriverState
50
multi-process: add the concept description to
36
qemu-img: rebase: Reduce reads on in-chain rebase
51
docs/devel/qemu-multiprocess
37
qemu-img: rebase: Reuse in-chain BlockDriverState
38
52
39
Vladimir Sementsov-Ogievskiy (13):
53
Stefan Hajnoczi (6):
40
qcow2.h: add missing include
54
.github: point Repo Lockdown bot to GitLab repo
41
qcow2: add separate file for threaded data processing functions
55
gitmodules: use GitLab repos instead of qemu.org
42
qcow2-threads: use thread_pool_submit_co
56
gitlab-ci: remove redundant GitLab repo URL command
43
qcow2-threads: qcow2_co_do_compress: protect queuing by mutex
57
docs: update README to use GitLab repo URLs
44
qcow2-threads: split out generic path
58
pc-bios: update mirror URLs to GitLab
45
qcow2: qcow2_co_preadv: improve locking
59
get_maintainer: update repo URL to GitLab
46
qcow2: bdrv_co_pwritev: move encryption code out of the lock
47
qcow2: do encryption in threads
48
block/backup: simplify backup_incremental_init_copy_bitmap
49
block/backup: move to copy_bitmap with granularity
50
block/backup: refactor and tolerate unallocated cluster skipping
51
block/backup: unify different modes code path
52
block/backup: refactor: split out backup_calculate_cluster_size
53
60
54
block/Makefile.objs | 2 +-
61
MAINTAINERS | 24 +
55
qapi/block-core.json | 4 +-
62
README.rst | 4 +-
56
block/qcow2.h | 26 ++-
63
docs/devel/index.rst | 1 +
57
block.c | 46 +++---
64
docs/devel/multi-process.rst | 966 ++++++++++++++++++++++
58
block/backup.c | 243 ++++++++++++---------------
65
docs/system/index.rst | 1 +
59
block/block-backend.c | 3 +-
66
docs/system/multi-process.rst | 64 ++
60
block/qcow2-bitmap.c | 3 +-
67
docs/interop/parallels.txt | 2 +-
61
block/qcow2-cache.c | 1 -
68
configure | 10 +
62
block/qcow2-cluster.c | 10 +-
69
meson.build | 5 +-
63
block/qcow2-refcount.c | 1 -
70
hw/remote/trace.h | 1 +
64
block/qcow2-snapshot.c | 1 -
71
include/exec/memory.h | 2 +
65
block/qcow2-threads.c | 268 ++++++++++++++++++++++++++++++
72
include/exec/ram_addr.h | 2 +-
66
block/qcow2.c | 320 +++++++++++++-----------------------
73
include/hw/pci-host/remote.h | 30 +
67
block/quorum.c | 1 -
74
include/hw/pci/pci_ids.h | 3 +
68
blockdev.c | 7 +-
75
include/hw/remote/iohub.h | 42 +
69
blockjob.c | 2 +-
76
include/hw/remote/machine.h | 38 +
70
qemu-img.c | 85 ++++++----
77
include/hw/remote/memory.h | 19 +
71
tests/test-bdrv-drain.c | 6 -
78
include/hw/remote/mpqemu-link.h | 99 +++
72
tests/test-bdrv-graph-mod.c | 1 -
79
include/hw/remote/proxy-memory-listener.h | 28 +
73
block/trace-events | 1 +
80
include/hw/remote/proxy.h | 48 ++
74
tests/qemu-iotests/056 | 2 +-
81
include/io/channel.h | 78 ++
75
tests/qemu-iotests/060 | 7 +-
82
include/qemu/mmap-alloc.h | 4 +-
76
tests/qemu-iotests/060.out | 5 +-
83
include/sysemu/iothread.h | 6 +
77
23 files changed, 615 insertions(+), 430 deletions(-)
84
backends/hostmem-memfd.c | 2 +-
78
create mode 100644 block/qcow2-threads.c
85
hw/misc/ivshmem.c | 3 +-
86
hw/pci-host/remote.c | 75 ++
87
hw/remote/iohub.c | 119 +++
88
hw/remote/machine.c | 80 ++
89
hw/remote/memory.c | 65 ++
90
hw/remote/message.c | 230 ++++++
91
hw/remote/mpqemu-link.c | 267 ++++++
92
hw/remote/proxy-memory-listener.c | 227 +++++
93
hw/remote/proxy.c | 379 +++++++++
94
hw/remote/remote-obj.c | 203 +++++
95
io/channel.c | 116 ++-
96
iothread.c | 6 +
97
softmmu/memory.c | 3 +-
98
softmmu/physmem.c | 11 +-
99
util/mmap-alloc.c | 7 +-
100
util/oslib-posix.c | 2 +-
101
.github/lockdown.yml | 8 +-
102
.gitlab-ci.yml | 1 -
103
.gitmodules | 44 +-
104
Kconfig.host | 4 +
105
hw/Kconfig | 1 +
106
hw/meson.build | 1 +
107
hw/pci-host/Kconfig | 3 +
108
hw/pci-host/meson.build | 1 +
109
hw/remote/Kconfig | 4 +
110
hw/remote/meson.build | 13 +
111
hw/remote/trace-events | 4 +
112
pc-bios/README | 4 +-
113
scripts/get_maintainer.pl | 2 +-
114
53 files changed, 3294 insertions(+), 68 deletions(-)
115
create mode 100644 docs/devel/multi-process.rst
116
create mode 100644 docs/system/multi-process.rst
117
create mode 100644 hw/remote/trace.h
118
create mode 100644 include/hw/pci-host/remote.h
119
create mode 100644 include/hw/remote/iohub.h
120
create mode 100644 include/hw/remote/machine.h
121
create mode 100644 include/hw/remote/memory.h
122
create mode 100644 include/hw/remote/mpqemu-link.h
123
create mode 100644 include/hw/remote/proxy-memory-listener.h
124
create mode 100644 include/hw/remote/proxy.h
125
create mode 100644 hw/pci-host/remote.c
126
create mode 100644 hw/remote/iohub.c
127
create mode 100644 hw/remote/machine.c
128
create mode 100644 hw/remote/memory.c
129
create mode 100644 hw/remote/message.c
130
create mode 100644 hw/remote/mpqemu-link.c
131
create mode 100644 hw/remote/proxy-memory-listener.c
132
create mode 100644 hw/remote/proxy.c
133
create mode 100644 hw/remote/remote-obj.c
134
create mode 100644 hw/remote/Kconfig
135
create mode 100644 hw/remote/meson.build
136
create mode 100644 hw/remote/trace-events
79
137
80
--
138
--
81
2.21.0
139
2.29.2
82
140
83
diff view generated by jsdifflib
New patch
1
Use the GitLab repo URL as the main repo location in order to reduce
2
load on qemu.org.
1
3
4
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
6
Reviewed-by: Thomas Huth <thuth@redhat.com>
7
Message-id: 20210111115017.156802-2-stefanha@redhat.com
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
---
10
.github/lockdown.yml | 8 ++++----
11
1 file changed, 4 insertions(+), 4 deletions(-)
12
13
diff --git a/.github/lockdown.yml b/.github/lockdown.yml
14
index XXXXXXX..XXXXXXX 100644
15
--- a/.github/lockdown.yml
16
+++ b/.github/lockdown.yml
17
@@ -XXX,XX +XXX,XX @@ issues:
18
comment: |
19
Thank you for your interest in the QEMU project.
20
21
- This repository is a read-only mirror of the project's master
22
- repostories hosted on https://git.qemu.org/git/qemu.git.
23
+ This repository is a read-only mirror of the project's repostories hosted
24
+ at https://gitlab.com/qemu-project/qemu.git.
25
The project does not process issues filed on GitHub.
26
27
The project issues are tracked on Launchpad:
28
@@ -XXX,XX +XXX,XX @@ pulls:
29
comment: |
30
Thank you for your interest in the QEMU project.
31
32
- This repository is a read-only mirror of the project's master
33
- repostories hosted on https://git.qemu.org/git/qemu.git.
34
+ This repository is a read-only mirror of the project's repostories hosted
35
+ on https://gitlab.com/qemu-project/qemu.git.
36
The project does not process merge requests filed on GitHub.
37
38
QEMU welcomes contributions of code (either fixing bugs or adding new
39
--
40
2.29.2
41
diff view generated by jsdifflib
New patch
1
qemu.org is running out of bandwidth and the QEMU project is moving
2
towards a gating CI on GitLab. Use the GitLab repos instead of qemu.org
3
(they will become mirrors).
1
4
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
7
Reviewed-by: Thomas Huth <thuth@redhat.com>
8
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
9
Message-id: 20210111115017.156802-3-stefanha@redhat.com
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
12
.gitmodules | 44 ++++++++++++++++++++++----------------------
13
1 file changed, 22 insertions(+), 22 deletions(-)
14
15
diff --git a/.gitmodules b/.gitmodules
16
index XXXXXXX..XXXXXXX 100644
17
--- a/.gitmodules
18
+++ b/.gitmodules
19
@@ -XXX,XX +XXX,XX @@
20
[submodule "roms/seabios"]
21
    path = roms/seabios
22
-    url = https://git.qemu.org/git/seabios.git/
23
+    url = https://gitlab.com/qemu-project/seabios.git/
24
[submodule "roms/SLOF"]
25
    path = roms/SLOF
26
-    url = https://git.qemu.org/git/SLOF.git
27
+    url = https://gitlab.com/qemu-project/SLOF.git
28
[submodule "roms/ipxe"]
29
    path = roms/ipxe
30
-    url = https://git.qemu.org/git/ipxe.git
31
+    url = https://gitlab.com/qemu-project/ipxe.git
32
[submodule "roms/openbios"]
33
    path = roms/openbios
34
-    url = https://git.qemu.org/git/openbios.git
35
+    url = https://gitlab.com/qemu-project/openbios.git
36
[submodule "roms/qemu-palcode"]
37
    path = roms/qemu-palcode
38
-    url = https://git.qemu.org/git/qemu-palcode.git
39
+    url = https://gitlab.com/qemu-project/qemu-palcode.git
40
[submodule "roms/sgabios"]
41
    path = roms/sgabios
42
-    url = https://git.qemu.org/git/sgabios.git
43
+    url = https://gitlab.com/qemu-project/sgabios.git
44
[submodule "dtc"]
45
    path = dtc
46
-    url = https://git.qemu.org/git/dtc.git
47
+    url = https://gitlab.com/qemu-project/dtc.git
48
[submodule "roms/u-boot"]
49
    path = roms/u-boot
50
-    url = https://git.qemu.org/git/u-boot.git
51
+    url = https://gitlab.com/qemu-project/u-boot.git
52
[submodule "roms/skiboot"]
53
    path = roms/skiboot
54
-    url = https://git.qemu.org/git/skiboot.git
55
+    url = https://gitlab.com/qemu-project/skiboot.git
56
[submodule "roms/QemuMacDrivers"]
57
    path = roms/QemuMacDrivers
58
-    url = https://git.qemu.org/git/QemuMacDrivers.git
59
+    url = https://gitlab.com/qemu-project/QemuMacDrivers.git
60
[submodule "ui/keycodemapdb"]
61
    path = ui/keycodemapdb
62
-    url = https://git.qemu.org/git/keycodemapdb.git
63
+    url = https://gitlab.com/qemu-project/keycodemapdb.git
64
[submodule "capstone"]
65
    path = capstone
66
-    url = https://git.qemu.org/git/capstone.git
67
+    url = https://gitlab.com/qemu-project/capstone.git
68
[submodule "roms/seabios-hppa"]
69
    path = roms/seabios-hppa
70
-    url = https://git.qemu.org/git/seabios-hppa.git
71
+    url = https://gitlab.com/qemu-project/seabios-hppa.git
72
[submodule "roms/u-boot-sam460ex"]
73
    path = roms/u-boot-sam460ex
74
-    url = https://git.qemu.org/git/u-boot-sam460ex.git
75
+    url = https://gitlab.com/qemu-project/u-boot-sam460ex.git
76
[submodule "tests/fp/berkeley-testfloat-3"]
77
    path = tests/fp/berkeley-testfloat-3
78
-    url = https://git.qemu.org/git/berkeley-testfloat-3.git
79
+    url = https://gitlab.com/qemu-project/berkeley-testfloat-3.git
80
[submodule "tests/fp/berkeley-softfloat-3"]
81
    path = tests/fp/berkeley-softfloat-3
82
-    url = https://git.qemu.org/git/berkeley-softfloat-3.git
83
+    url = https://gitlab.com/qemu-project/berkeley-softfloat-3.git
84
[submodule "roms/edk2"]
85
    path = roms/edk2
86
-    url = https://git.qemu.org/git/edk2.git
87
+    url = https://gitlab.com/qemu-project/edk2.git
88
[submodule "slirp"]
89
    path = slirp
90
-    url = https://git.qemu.org/git/libslirp.git
91
+    url = https://gitlab.com/qemu-project/libslirp.git
92
[submodule "roms/opensbi"]
93
    path = roms/opensbi
94
-    url =     https://git.qemu.org/git/opensbi.git
95
+    url =     https://gitlab.com/qemu-project/opensbi.git
96
[submodule "roms/qboot"]
97
    path = roms/qboot
98
-    url = https://git.qemu.org/git/qboot.git
99
+    url = https://gitlab.com/qemu-project/qboot.git
100
[submodule "meson"]
101
    path = meson
102
-    url = https://git.qemu.org/git/meson.git
103
+    url = https://gitlab.com/qemu-project/meson.git
104
[submodule "roms/vbootrom"]
105
    path = roms/vbootrom
106
-    url = https://git.qemu.org/git/vbootrom.git
107
+    url = https://gitlab.com/qemu-project/vbootrom.git
108
--
109
2.29.2
110
diff view generated by jsdifflib
New patch
1
It is no longer necessary to point .gitmodules at GitLab repos when
2
running in GitLab CI since they are now used all the time.
1
3
4
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
5
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
6
Reviewed-by: Thomas Huth <thuth@redhat.com>
7
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
8
Message-id: 20210111115017.156802-4-stefanha@redhat.com
9
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
11
.gitlab-ci.yml | 1 -
12
1 file changed, 1 deletion(-)
13
14
diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
15
index XXXXXXX..XXXXXXX 100644
16
--- a/.gitlab-ci.yml
17
+++ b/.gitlab-ci.yml
18
@@ -XXX,XX +XXX,XX @@ include:
19
image: $CI_REGISTRY_IMAGE/qemu/$IMAGE:latest
20
before_script:
21
- JOBS=$(expr $(nproc) + 1)
22
- - sed -i s,git.qemu.org/git,gitlab.com/qemu-project, .gitmodules
23
script:
24
- mkdir build
25
- cd build
26
--
27
2.29.2
28
diff view generated by jsdifflib
New patch
1
qemu.org is running out of bandwidth and the QEMU project is moving
2
towards a gating CI on GitLab. Use the GitLab repos instead of qemu.org
3
(they will become mirrors).
1
4
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
7
Reviewed-by: Thomas Huth <thuth@redhat.com>
8
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
9
Message-id: 20210111115017.156802-5-stefanha@redhat.com
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
12
README.rst | 4 ++--
13
1 file changed, 2 insertions(+), 2 deletions(-)
14
15
diff --git a/README.rst b/README.rst
16
index XXXXXXX..XXXXXXX 100644
17
--- a/README.rst
18
+++ b/README.rst
19
@@ -XXX,XX +XXX,XX @@ The QEMU source code is maintained under the GIT version control system.
20
21
.. code-block:: shell
22
23
- git clone https://git.qemu.org/git/qemu.git
24
+ git clone https://gitlab.com/qemu-project/qemu.git
25
26
When submitting patches, one common approach is to use 'git
27
format-patch' and/or 'git send-email' to format & send the mail to the
28
@@ -XXX,XX +XXX,XX @@ The QEMU website is also maintained under source control.
29
30
.. code-block:: shell
31
32
- git clone https://git.qemu.org/git/qemu-web.git
33
+ git clone https://gitlab.com/qemu-project/qemu-web.git
34
35
* `<https://www.qemu.org/2017/02/04/the-new-qemu-website-is-up/>`_
36
37
--
38
2.29.2
39
diff view generated by jsdifflib
New patch
1
qemu.org is running out of bandwidth and the QEMU project is moving
2
towards a gating CI on GitLab. Use the GitLab repos instead of qemu.org
3
(they will become mirrors).
1
4
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
7
Reviewed-by: Thomas Huth <thuth@redhat.com>
8
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
9
Message-id: 20210111115017.156802-6-stefanha@redhat.com
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
12
pc-bios/README | 4 ++--
13
1 file changed, 2 insertions(+), 2 deletions(-)
14
15
diff --git a/pc-bios/README b/pc-bios/README
16
index XXXXXXX..XXXXXXX 100644
17
--- a/pc-bios/README
18
+++ b/pc-bios/README
19
@@ -XXX,XX +XXX,XX @@
20
legacy x86 software to communicate with an attached serial console as
21
if a video card were attached. The master sources reside in a subversion
22
repository at http://sgabios.googlecode.com/svn/trunk. A git mirror is
23
- available at https://git.qemu.org/git/sgabios.git.
24
+ available at https://gitlab.com/qemu-project/sgabios.git.
25
26
- The PXE roms come from the iPXE project. Built with BANNER_TIME 0.
27
Sources available at http://ipxe.org. Vendor:Device ID -> ROM mapping:
28
@@ -XXX,XX +XXX,XX @@
29
30
- The u-boot binary for e500 comes from the upstream denx u-boot project where
31
it was compiled using the qemu-ppce500 target.
32
- A git mirror is available at: https://git.qemu.org/git/u-boot.git
33
+ A git mirror is available at: https://gitlab.com/qemu-project/u-boot.git
34
The hash used to compile the current version is: 2072e72
35
36
- Skiboot (https://github.com/open-power/skiboot/) is an OPAL
37
--
38
2.29.2
39
diff view generated by jsdifflib
New patch
1
qemu.org is running out of bandwidth and the QEMU project is moving
2
towards a gating CI on GitLab. Use the GitLab repos instead of qemu.org
3
(they will become mirrors).
1
4
5
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
6
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
7
Reviewed-by: Thomas Huth <thuth@redhat.com>
8
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
9
Message-id: 20210111115017.156802-7-stefanha@redhat.com
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
12
scripts/get_maintainer.pl | 2 +-
13
1 file changed, 1 insertion(+), 1 deletion(-)
14
15
diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
16
index XXXXXXX..XXXXXXX 100755
17
--- a/scripts/get_maintainer.pl
18
+++ b/scripts/get_maintainer.pl
19
@@ -XXX,XX +XXX,XX @@ sub vcs_exists {
20
    warn("$P: No supported VCS found. Add --nogit to options?\n");
21
    warn("Using a git repository produces better results.\n");
22
    warn("Try latest git repository using:\n");
23
-    warn("git clone https://git.qemu.org/git/qemu.git\n");
24
+    warn("git clone https://gitlab.com/qemu-project/qemu.git\n");
25
    $printed_novcs = 1;
26
}
27
return 0;
28
--
29
2.29.2
30
diff view generated by jsdifflib
1
From: Sam Eiderman <shmuel.eiderman@oracle.com>
1
From: John G Johnson <john.g.johnson@oracle.com>
2
2
3
In the following case:
3
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
4
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
5
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
6
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7
Message-id: 02a68adef99f5df6a380bf8fd7b90948777e411c.1611938319.git.jag.raman@oracle.com
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
---
10
MAINTAINERS | 7 +
11
docs/devel/index.rst | 1 +
12
docs/devel/multi-process.rst | 966 +++++++++++++++++++++++++++++++++++
13
3 files changed, 974 insertions(+)
14
create mode 100644 docs/devel/multi-process.rst
4
15
5
(base) A <- B <- C (tip)
16
diff --git a/MAINTAINERS b/MAINTAINERS
17
index XXXXXXX..XXXXXXX 100644
18
--- a/MAINTAINERS
19
+++ b/MAINTAINERS
20
@@ -XXX,XX +XXX,XX @@ S: Maintained
21
F: hw/semihosting/
22
F: include/hw/semihosting/
23
24
+Multi-process QEMU
25
+M: Elena Ufimtseva <elena.ufimtseva@oracle.com>
26
+M: Jagannathan Raman <jag.raman@oracle.com>
27
+M: John G Johnson <john.g.johnson@oracle.com>
28
+S: Maintained
29
+F: docs/devel/multi-process.rst
30
+
31
Build and test automation
32
-------------------------
33
Build and test automation
34
diff --git a/docs/devel/index.rst b/docs/devel/index.rst
35
index XXXXXXX..XXXXXXX 100644
36
--- a/docs/devel/index.rst
37
+++ b/docs/devel/index.rst
38
@@ -XXX,XX +XXX,XX @@ Contents:
39
clocks
40
qom
41
block-coroutine-wrapper
42
+ multi-process
43
diff --git a/docs/devel/multi-process.rst b/docs/devel/multi-process.rst
44
new file mode 100644
45
index XXXXXXX..XXXXXXX
46
--- /dev/null
47
+++ b/docs/devel/multi-process.rst
48
@@ -XXX,XX +XXX,XX @@
49
+This is the design document for multi-process QEMU. It does not
50
+necessarily reflect the status of the current implementation, which
51
+may lack features or be considerably different from what is described
52
+in this document. This document is still useful as a description of
53
+the goals and general direction of this feature.
54
+
55
+Please refer to the following wiki for latest details:
56
+https://wiki.qemu.org/Features/MultiProcessQEMU
57
+
58
+Multi-process QEMU
59
+===================
60
+
61
+QEMU is often used as the hypervisor for virtual machines running in the
62
+Oracle cloud. Since one of the advantages of cloud computing is the
63
+ability to run many VMs from different tenants in the same cloud
64
+infrastructure, a guest that compromised its hypervisor could
65
+potentially use the hypervisor's access privileges to access data it is
66
+not authorized for.
67
+
68
+QEMU can be susceptible to security attacks because it is a large,
69
+monolithic program that provides many features to the VMs it services.
70
+Many of these features can be configured out of QEMU, but even a reduced
71
+configuration QEMU has a large amount of code a guest can potentially
72
+attack. Separating QEMU reduces the attack surface by aiding to
73
+limit each component in the system to only access the resources that
74
+it needs to perform its job.
75
+
76
+QEMU services
77
+-------------
78
+
79
+QEMU can be broadly described as providing three main services. One is a
80
+VM control point, where VMs can be created, migrated, re-configured, and
81
+destroyed. A second is to emulate the CPU instructions within the VM,
82
+often accelerated by HW virtualization features such as Intel's VT
83
+extensions. Finally, it provides IO services to the VM by emulating HW
84
+IO devices, such as disk and network devices.
85
+
86
+A multi-process QEMU
87
+~~~~~~~~~~~~~~~~~~~~
88
+
89
+A multi-process QEMU involves separating QEMU services into separate
90
+host processes. Each of these processes can be given only the privileges
91
+it needs to provide its service, e.g., a disk service could be given
92
+access only to the disk images it provides, and not be allowed to
93
+access other files, or any network devices. An attacker who compromised
94
+this service would not be able to use this exploit to access files or
95
+devices beyond what the disk service was given access to.
96
+
97
+A QEMU control process would remain, but in multi-process mode, will
98
+have no direct interfaces to the VM. During VM execution, it would still
99
+provide the user interface to hot-plug devices or live migrate the VM.
100
+
101
+A first step in creating a multi-process QEMU is to separate IO services
102
+from the main QEMU program, which would continue to provide CPU
103
+emulation. i.e., the control process would also be the CPU emulation
104
+process. In a later phase, CPU emulation could be separated from the
105
+control process.
106
+
107
+Separating IO services
108
+----------------------
109
+
110
+Separating IO services into individual host processes is a good place to
111
+begin for a couple of reasons. One is the sheer number of IO devices QEMU
112
+can emulate provides a large surface of interfaces which could potentially
113
+be exploited, and, indeed, have been a source of exploits in the past.
114
+Another is the modular nature of QEMU device emulation code provides
115
+interface points where the QEMU functions that perform device emulation
116
+can be separated from the QEMU functions that manage the emulation of
117
+guest CPU instructions. The devices emulated in the separate process are
118
+referred to as remote devices.
119
+
120
+QEMU device emulation
121
+~~~~~~~~~~~~~~~~~~~~~
122
+
123
+QEMU uses an object oriented SW architecture for device emulation code.
124
+Configured objects are all compiled into the QEMU binary, then objects
125
+are instantiated by name when used by the guest VM. For example, the
126
+code to emulate a device named "foo" is always present in QEMU, but its
127
+instantiation code is only run when the device is included in the target
128
+VM. (e.g., via the QEMU command line as *-device foo*)
129
+
130
+The object model is hierarchical, so device emulation code names its
131
+parent object (such as "pci-device" for a PCI device) and QEMU will
132
+instantiate a parent object before calling the device's instantiation
133
+code.
134
+
135
+Current separation models
136
+~~~~~~~~~~~~~~~~~~~~~~~~~
137
+
138
+In order to separate the device emulation code from the CPU emulation
139
+code, the device object code must run in a different process. There are
140
+a couple of existing QEMU features that can run emulation code
141
+separately from the main QEMU process. These are examined below.
142
+
143
+vhost user model
144
+^^^^^^^^^^^^^^^^
145
+
146
+Virtio guest device drivers can be connected to vhost user applications
147
+in order to perform their IO operations. This model uses special virtio
148
+device drivers in the guest and vhost user device objects in QEMU, but
149
+once the QEMU vhost user code has configured the vhost user application,
150
+mission-mode IO is performed by the application. The vhost user
151
+application is a daemon process that can be contacted via a known UNIX
152
+domain socket.
153
+
154
+vhost socket
155
+''''''''''''
156
+
157
+As mentioned above, one of the tasks of the vhost device object within
158
+QEMU is to contact the vhost application and send it configuration
159
+information about this device instance. As part of the configuration
160
+process, the application can also be sent other file descriptors over
161
+the socket, which then can be used by the vhost user application in
162
+various ways, some of which are described below.
163
+
164
+vhost MMIO store acceleration
165
+'''''''''''''''''''''''''''''
166
+
167
+VMs are often run using HW virtualization features via the KVM kernel
168
+driver. This driver allows QEMU to accelerate the emulation of guest CPU
169
+instructions by running the guest in a virtual HW mode. When the guest
170
+executes instructions that cannot be executed by virtual HW mode,
171
+execution returns to the KVM driver so it can inform QEMU to emulate the
172
+instructions in SW.
173
+
174
+One of the events that can cause a return to QEMU is when a guest device
175
+driver accesses an IO location. QEMU then dispatches the memory
176
+operation to the corresponding QEMU device object. In the case of a
177
+vhost user device, the memory operation would need to be sent over a
178
+socket to the vhost application. This path is accelerated by the QEMU
179
+virtio code by setting up an eventfd file descriptor that the vhost
180
+application can directly receive MMIO store notifications from the KVM
181
+driver, instead of needing them to be sent to the QEMU process first.
182
+
183
+vhost interrupt acceleration
184
+''''''''''''''''''''''''''''
185
+
186
+Another optimization used by the vhost application is the ability to
187
+directly inject interrupts into the VM via the KVM driver, again,
188
+bypassing the need to send the interrupt back to the QEMU process first.
189
+The QEMU virtio setup code configures the KVM driver with an eventfd
190
+that triggers the device interrupt in the guest when the eventfd is
191
+written. This irqfd file descriptor is then passed to the vhost user
192
+application program.
193
+
194
+vhost access to guest memory
195
+''''''''''''''''''''''''''''
196
+
197
+The vhost application is also allowed to directly access guest memory,
198
+instead of needing to send the data as messages to QEMU. This is also
199
+done with file descriptors sent to the vhost user application by QEMU.
200
+These descriptors can be passed to ``mmap()`` by the vhost application
201
+to map the guest address space into the vhost application.
202
+
203
+IOMMUs introduce another level of complexity, since the address given to
204
+the guest virtio device to DMA to or from is not a guest physical
205
+address. This case is handled by having vhost code within QEMU register
206
+as a listener for IOMMU mapping changes. The vhost application maintains
207
+a cache of IOMMMU translations: sending translation requests back to
208
+QEMU on cache misses, and in turn receiving flush requests from QEMU
209
+when mappings are purged.
210
+
211
+applicability to device separation
212
+''''''''''''''''''''''''''''''''''
213
+
214
+Much of the vhost model can be re-used by separated device emulation. In
215
+particular, the ideas of using a socket between QEMU and the device
216
+emulation application, using a file descriptor to inject interrupts into
217
+the VM via KVM, and allowing the application to ``mmap()`` the guest
218
+should be re used.
219
+
220
+There are, however, some notable differences between how a vhost
221
+application works and the needs of separated device emulation. The most
222
+basic is that vhost uses custom virtio device drivers which always
223
+trigger IO with MMIO stores. A separated device emulation model must
224
+work with existing IO device models and guest device drivers. MMIO loads
225
+break vhost store acceleration since they are synchronous - guest
226
+progress cannot continue until the load has been emulated. By contrast,
227
+stores are asynchronous, the guest can continue after the store event
228
+has been sent to the vhost application.
229
+
230
+Another difference is that in the vhost user model, a single daemon can
231
+support multiple QEMU instances. This is contrary to the security regime
232
+desired, in which the emulation application should only be allowed to
233
+access the files or devices the VM it's running on behalf of can access.
234
+#### qemu-io model
235
+
236
+Qemu-io is a test harness used to test changes to the QEMU block backend
237
+object code. (e.g., the code that implements disk images for disk driver
238
+emulation) Qemu-io is not a device emulation application per se, but it
239
+does compile the QEMU block objects into a separate binary from the main
240
+QEMU one. This could be useful for disk device emulation, since its
241
+emulation applications will need to include the QEMU block objects.
242
+
243
+New separation model based on proxy objects
244
+-------------------------------------------
245
+
246
+A different model based on proxy objects in the QEMU program
247
+communicating with remote emulation programs could provide separation
248
+while minimizing the changes needed to the device emulation code. The
249
+rest of this section is a discussion of how a proxy object model would
250
+work.
251
+
252
+Remote emulation processes
253
+~~~~~~~~~~~~~~~~~~~~~~~~~~
254
+
255
+The remote emulation process will run the QEMU object hierarchy without
256
+modification. The device emulation objects will be also be based on the
257
+QEMU code, because for anything but the simplest device, it would not be
258
+a tractable to re-implement both the object model and the many device
259
+backends that QEMU has.
260
+
261
+The processes will communicate with the QEMU process over UNIX domain
262
+sockets. The processes can be executed either as standalone processes,
263
+or be executed by QEMU. In both cases, the host backends the emulation
264
+processes will provide are specified on its command line, as they would
265
+be for QEMU. For example:
266
+
267
+::
268
+
269
+ disk-proc -blockdev driver=file,node-name=file0,filename=disk-file0 \
270
+ -blockdev driver=qcow2,node-name=drive0,file=file0
271
+
272
+would indicate process *disk-proc* uses a qcow2 emulated disk named
273
+*file0* as its backend.
274
+
275
+Emulation processes may emulate more than one guest controller. A common
276
+configuration might be to put all controllers of the same device class
277
+(e.g., disk, network, etc.) in a single process, so that all backends of
278
+the same type can be managed by a single QMP monitor.
279
+
280
+communication with QEMU
281
+^^^^^^^^^^^^^^^^^^^^^^^
282
+
283
+The first argument to the remote emulation process will be a Unix domain
284
+socket that connects with the Proxy object. This is a required argument.
285
+
286
+::
287
+
288
+ disk-proc <socket number> <backend list>
289
+
290
+remote process QMP monitor
291
+^^^^^^^^^^^^^^^^^^^^^^^^^^
292
+
293
+Remote emulation processes can be monitored via QMP, similar to QEMU
294
+itself. The QMP monitor socket is specified the same as for a QEMU
295
+process:
296
+
297
+::
298
+
299
+ disk-proc -qmp unix:/tmp/disk-mon,server
300
+
301
+can be monitored over the UNIX socket path */tmp/disk-mon*.
302
+
303
+QEMU command line
304
+~~~~~~~~~~~~~~~~~
305
+
306
+Each remote device emulated in a remote process on the host is
307
+represented as a *-device* of type *pci-proxy-dev*. A socket
308
+sub-option to this option specifies the Unix socket that connects
309
+to the remote process. An *id* sub-option is required, and it should
310
+be the same id as used in the remote process.
311
+
312
+::
313
+
314
+ qemu-system-x86_64 ... -device pci-proxy-dev,id=lsi0,socket=3
315
+
316
+can be used to add a device emulated in a remote process
317
+
318
+
319
+QEMU management of remote processes
320
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
321
+
322
+QEMU is not aware of the type of type of the remote PCI device. It is
323
+a pass through device as far as QEMU is concerned.
324
+
325
+communication with emulation process
326
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
327
+
328
+primary channel
329
+'''''''''''''''
330
+
331
+The primary channel (referred to as com in the code) is used to bootstrap
332
+the remote process. It is also used to pass on device-agnostic commands
333
+like reset.
334
+
335
+per-device channels
336
+'''''''''''''''''''
337
+
338
+Each remote device communicates with QEMU using a dedicated communication
339
+channel. The proxy object sets up this channel using the primary
340
+channel during its initialization.
341
+
342
+QEMU device proxy objects
343
+~~~~~~~~~~~~~~~~~~~~~~~~~
344
+
345
+QEMU has an object model based on sub-classes inherited from the
346
+"object" super-class. The sub-classes that are of interest here are the
347
+"device" and "bus" sub-classes whose child sub-classes make up the
348
+device tree of a QEMU emulated system.
349
+
350
+The proxy object model will use device proxy objects to replace the
351
+device emulation code within the QEMU process. These objects will live
352
+in the same place in the object and bus hierarchies as the objects they
353
+replace. i.e., the proxy object for an LSI SCSI controller will be a
354
+sub-class of the "pci-device" class, and will have the same PCI bus
355
+parent and the same SCSI bus child objects as the LSI controller object
356
+it replaces.
357
+
358
+It is worth noting that the same proxy object is used to mediate with
359
+all types of remote PCI devices.
360
+
361
+object initialization
362
+^^^^^^^^^^^^^^^^^^^^^
363
+
364
+The Proxy device objects are initialized in the exact same manner in
365
+which any other QEMU device would be initialized.
366
+
367
+In addition, the Proxy objects perform the following two tasks:
368
+- Parses the "socket" sub option and connects to the remote process
369
+using this channel
370
+- Uses the "id" sub-option to connect to the emulated device on the
371
+separate process
372
+
373
+class\_init
374
+'''''''''''
375
+
376
+The ``class_init()`` method of a proxy object will, in general behave
377
+similarly to the object it replaces, including setting any static
378
+properties and methods needed by the proxy.
379
+
380
+instance\_init / realize
381
+''''''''''''''''''''''''
382
+
383
+The ``instance_init()`` and ``realize()`` functions would only need to
384
+perform tasks related to being a proxy, such are registering its own
385
+MMIO handlers, or creating a child bus that other proxy devices can be
386
+attached to later.
387
+
388
+Other tasks will be device-specific. For example, PCI device objects
389
+will initialize the PCI config space in order to make a valid PCI device
390
+tree within the QEMU process.
391
+
392
+address space registration
393
+^^^^^^^^^^^^^^^^^^^^^^^^^^
394
+
395
+Most devices are driven by guest device driver accesses to IO addresses
396
+or ports. The QEMU device emulation code uses QEMU's memory region
397
+function calls (such as ``memory_region_init_io()``) to add callback
398
+functions that QEMU will invoke when the guest accesses the device's
399
+areas of the IO address space. When a guest driver does access the
400
+device, the VM will exit HW virtualization mode and return to QEMU,
401
+which will then lookup and execute the corresponding callback function.
402
+
403
+A proxy object would need to mirror the memory region calls the actual
404
+device emulator would perform in its initialization code, but with its
405
+own callbacks. When invoked by QEMU as a result of a guest IO operation,
406
+they will forward the operation to the device emulation process.
407
+
408
+PCI config space
409
+^^^^^^^^^^^^^^^^
410
+
411
+PCI devices also have a configuration space that can be accessed by the
412
+guest driver. Guest accesses to this space is not handled by the device
413
+emulation object, but by its PCI parent object. Much of this space is
414
+read-only, but certain registers (especially BAR and MSI-related ones)
415
+need to be propagated to the emulation process.
416
+
417
+PCI parent proxy
418
+''''''''''''''''
419
+
420
+One way to propagate guest PCI config accesses is to create a
421
+"pci-device-proxy" class that can serve as the parent of a PCI device
422
+proxy object. This class's parent would be "pci-device" and it would
423
+override the PCI parent's ``config_read()`` and ``config_write()``
424
+methods with ones that forward these operations to the emulation
425
+program.
426
+
427
+interrupt receipt
428
+^^^^^^^^^^^^^^^^^
429
+
430
+A proxy for a device that generates interrupts will need to create a
431
+socket to receive interrupt indications from the emulation process. An
432
+incoming interrupt indication would then be sent up to its bus parent to
433
+be injected into the guest. For example, a PCI device object may use
434
+``pci_set_irq()``.
435
+
436
+live migration
437
+^^^^^^^^^^^^^^
438
+
439
+The proxy will register to save and restore any *vmstate* it needs over
440
+a live migration event. The device proxy does not need to manage the
441
+remote device's *vmstate*; that will be handled by the remote process
442
+proxy (see below).
443
+
444
+QEMU remote device operation
445
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
446
+
447
+Generic device operations, such as DMA, will be performed by the remote
448
+process proxy by sending messages to the remote process.
449
+
450
+DMA operations
451
+^^^^^^^^^^^^^^
452
+
453
+DMA operations would be handled much like vhost applications do. One of
454
+the initial messages sent to the emulation process is a guest memory
455
+table. Each entry in this table consists of a file descriptor and size
456
+that the emulation process can ``mmap()`` to directly access guest
457
+memory, similar to ``vhost_user_set_mem_table()``. Note guest memory
458
+must be backed by file descriptors, such as when QEMU is given the
459
+*-mem-path* command line option.
460
+
461
+IOMMU operations
462
+^^^^^^^^^^^^^^^^
463
+
464
+When the emulated system includes an IOMMU, the remote process proxy in
465
+QEMU will need to create a socket for IOMMU requests from the emulation
466
+process. It will handle those requests with an
467
+``address_space_get_iotlb_entry()`` call. In order to handle IOMMU
468
+unmaps, the remote process proxy will also register as a listener on the
469
+device's DMA address space. When an IOMMU memory region is created
470
+within the DMA address space, an IOMMU notifier for unmaps will be added
471
+to the memory region that will forward unmaps to the emulation process
472
+over the IOMMU socket.
473
+
474
+device hot-plug via QMP
475
+^^^^^^^^^^^^^^^^^^^^^^^
476
+
477
+An QMP "device\_add" command can add a device emulated by a remote
478
+process. It will also have "rid" option to the command, just as the
479
+*-device* command line option does. The remote process may either be one
480
+started at QEMU startup, or be one added by the "add-process" QMP
481
+command described above. In either case, the remote process proxy will
482
+forward the new device's JSON description to the corresponding emulation
483
+process.
484
+
485
+live migration
486
+^^^^^^^^^^^^^^
487
+
488
+The remote process proxy will also register for live migration
489
+notifications with ``vmstate_register()``. When called to save state,
490
+the proxy will send the remote process a secondary socket file
491
+descriptor to save the remote process's device *vmstate* over. The
492
+incoming byte stream length and data will be saved as the proxy's
493
+*vmstate*. When the proxy is resumed on its new host, this *vmstate*
494
+will be extracted, and a secondary socket file descriptor will be sent
495
+to the new remote process through which it receives the *vmstate* in
496
+order to restore the devices there.
497
+
498
+device emulation in remote process
499
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
500
+
501
+The parts of QEMU that the emulation program will need include the
502
+object model; the memory emulation objects; the device emulation objects
503
+of the targeted device, and any dependent devices; and, the device's
504
+backends. It will also need code to setup the machine environment,
505
+handle requests from the QEMU process, and route machine-level requests
506
+(such as interrupts or IOMMU mappings) back to the QEMU process.
507
+
508
+initialization
509
+^^^^^^^^^^^^^^
510
+
511
+The process initialization sequence will follow the same sequence
512
+followed by QEMU. It will first initialize the backend objects, then
513
+device emulation objects. The JSON descriptions sent by the QEMU process
514
+will drive which objects need to be created.
515
+
516
+- address spaces
517
+
518
+Before the device objects are created, the initial address spaces and
519
+memory regions must be configured with ``memory_map_init()``. This
520
+creates a RAM memory region object (*system\_memory*) and an IO memory
521
+region object (*system\_io*).
522
+
523
+- RAM
524
+
525
+RAM memory region creation will follow how ``pc_memory_init()`` creates
526
+them, but must use ``memory_region_init_ram_from_fd()`` instead of
527
+``memory_region_allocate_system_memory()``. The file descriptors needed
528
+will be supplied by the guest memory table from above. Those RAM regions
529
+would then be added to the *system\_memory* memory region with
530
+``memory_region_add_subregion()``.
531
+
532
+- PCI
533
+
534
+IO initialization will be driven by the JSON descriptions sent from the
535
+QEMU process. For a PCI device, a PCI bus will need to be created with
536
+``pci_root_bus_new()``, and a PCI memory region will need to be created
537
+and added to the *system\_memory* memory region with
538
+``memory_region_add_subregion_overlap()``. The overlap version is
539
+required for architectures where PCI memory overlaps with RAM memory.
540
+
541
+MMIO handling
542
+^^^^^^^^^^^^^
543
+
544
+The device emulation objects will use ``memory_region_init_io()`` to
545
+install their MMIO handlers, and ``pci_register_bar()`` to associate
546
+those handlers with a PCI BAR, as they do within QEMU currently.
547
+
548
+In order to use ``address_space_rw()`` in the emulation process to
549
+handle MMIO requests from QEMU, the PCI physical addresses must be the
550
+same in the QEMU process and the device emulation process. In order to
551
+accomplish that, guest BAR programming must also be forwarded from QEMU
552
+to the emulation process.
553
+
554
+interrupt injection
555
+^^^^^^^^^^^^^^^^^^^
556
+
557
+When device emulation wants to inject an interrupt into the VM, the
558
+request climbs the device's bus object hierarchy until the point where a
559
+bus object knows how to signal the interrupt to the guest. The details
560
+depend on the type of interrupt being raised.
561
+
562
+- PCI pin interrupts
563
+
564
+On x86 systems, there is an emulated IOAPIC object attached to the root
565
+PCI bus object, and the root PCI object forwards interrupt requests to
566
+it. The IOAPIC object, in turn, calls the KVM driver to inject the
567
+corresponding interrupt into the VM. The simplest way to handle this in
568
+an emulation process would be to setup the root PCI bus driver (via
569
+``pci_bus_irqs()``) to send a interrupt request back to the QEMU
570
+process, and have the device proxy object reflect it up the PCI tree
571
+there.
572
+
573
+- PCI MSI/X interrupts
574
+
575
+PCI MSI/X interrupts are implemented in HW as DMA writes to a
576
+CPU-specific PCI address. In QEMU on x86, a KVM APIC object receives
577
+these DMA writes, then calls into the KVM driver to inject the interrupt
578
+into the VM. A simple emulation process implementation would be to send
579
+the MSI DMA address from QEMU as a message at initialization, then
580
+install an address space handler at that address which forwards the MSI
581
+message back to QEMU.
582
+
583
+DMA operations
584
+^^^^^^^^^^^^^^
585
+
586
+When a emulation object wants to DMA into or out of guest memory, it
587
+first must use dma\_memory\_map() to convert the DMA address to a local
588
+virtual address. The emulation process memory region objects setup above
589
+will be used to translate the DMA address to a local virtual address the
590
+device emulation code can access.
591
+
592
+IOMMU
593
+^^^^^
594
+
595
+When an IOMMU is in use in QEMU, DMA translation uses IOMMU memory
596
+regions to translate the DMA address to a guest physical address before
597
+that physical address can be translated to a local virtual address. The
598
+emulation process will need similar functionality.
599
+
600
+- IOTLB cache
601
+
602
+The emulation process will maintain a cache of recent IOMMU translations
603
+(the IOTLB). When the translate() callback of an IOMMU memory region is
604
+invoked, the IOTLB cache will be searched for an entry that will map the
605
+DMA address to a guest PA. On a cache miss, a message will be sent back
606
+to QEMU requesting the corresponding translation entry, which be both be
607
+used to return a guest address and be added to the cache.
608
+
609
+- IOTLB purge
610
+
611
+The IOMMU emulation will also need to act on unmap requests from QEMU.
612
+These happen when the guest IOMMU driver purges an entry from the
613
+guest's translation table.
614
+
615
+live migration
616
+^^^^^^^^^^^^^^
617
+
618
+When a remote process receives a live migration indication from QEMU, it
619
+will set up a channel using the received file descriptor with
620
+``qio_channel_socket_new_fd()``. This channel will be used to create a
621
+*QEMUfile* that can be passed to ``qemu_save_device_state()`` to send
622
+the process's device state back to QEMU. This method will be reversed on
623
+restore - the channel will be passed to ``qemu_loadvm_state()`` to
624
+restore the device state.
625
+
626
+Accelerating device emulation
627
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
628
+
629
+The messages that are required to be sent between QEMU and the emulation
630
+process can add considerable latency to IO operations. The optimizations
631
+described below attempt to ameliorate this effect by allowing the
632
+emulation process to communicate directly with the kernel KVM driver.
633
+The KVM file descriptors created would be passed to the emulation process
634
+via initialization messages, much like the guest memory table is done.
635
+#### MMIO acceleration
636
+
637
+Vhost user applications can receive guest virtio driver stores directly
638
+from KVM. The issue with the eventfd mechanism used by vhost user is
639
+that it does not pass any data with the event indication, so it cannot
640
+handle guest loads or guest stores that carry store data. This concept
641
+could, however, be expanded to cover more cases.
642
+
643
+The expanded idea would require a new type of KVM device:
644
+*KVM\_DEV\_TYPE\_USER*. This device has two file descriptors: a master
645
+descriptor that QEMU can use for configuration, and a slave descriptor
646
+that the emulation process can use to receive MMIO notifications. QEMU
647
+would create both descriptors using the KVM driver, and pass the slave
648
+descriptor to the emulation process via an initialization message.
649
+
650
+data structures
651
+^^^^^^^^^^^^^^^
652
+
653
+- guest physical range
654
+
655
+The guest physical range structure describes the address range that a
656
+device will respond to. It includes the base and length of the range, as
657
+well as which bus the range resides on (e.g., on an x86machine, it can
658
+specify whether the range refers to memory or IO addresses).
659
+
660
+A device can have multiple physical address ranges it responds to (e.g.,
661
+a PCI device can have multiple BARs), so the structure will also include
662
+an enumerated identifier to specify which of the device's ranges is
663
+being referred to.
664
+
665
++--------+----------------------------+
666
+| Name | Description |
667
++========+============================+
668
+| addr | range base address |
669
++--------+----------------------------+
670
+| len | range length |
671
++--------+----------------------------+
672
+| bus | addr type (memory or IO) |
673
++--------+----------------------------+
674
+| id | range ID (e.g., PCI BAR) |
675
++--------+----------------------------+
676
+
677
+- MMIO request structure
678
+
679
+This structure describes an MMIO operation. It includes which guest
680
+physical range the MMIO was within, the offset within that range, the
681
+MMIO type (e.g., load or store), and its length and data. It also
682
+includes a sequence number that can be used to reply to the MMIO, and
683
+the CPU that issued the MMIO.
684
+
685
++----------+------------------------+
686
+| Name | Description |
687
++==========+========================+
688
+| rid | range MMIO is within |
689
++----------+------------------------+
690
+| offset | offset withing *rid* |
691
++----------+------------------------+
692
+| type | e.g., load or store |
693
++----------+------------------------+
694
+| len | MMIO length |
695
++----------+------------------------+
696
+| data | store data |
697
++----------+------------------------+
698
+| seq | sequence ID |
699
++----------+------------------------+
700
+
701
+- MMIO request queues
702
+
703
+MMIO request queues are FIFO arrays of MMIO request structures. There
704
+are two queues: pending queue is for MMIOs that haven't been read by the
705
+emulation program, and the sent queue is for MMIOs that haven't been
706
+acknowledged. The main use of the second queue is to validate MMIO
707
+replies from the emulation program.
708
+
709
+- scoreboard
710
+
711
+Each CPU in the VM is emulated in QEMU by a separate thread, so multiple
712
+MMIOs may be waiting to be consumed by an emulation program and multiple
713
+threads may be waiting for MMIO replies. The scoreboard would contain a
714
+wait queue and sequence number for the per-CPU threads, allowing them to
715
+be individually woken when the MMIO reply is received from the emulation
716
+program. It also tracks the number of posted MMIO stores to the device
717
+that haven't been replied to, in order to satisfy the PCI constraint
718
+that a load to a device will not complete until all previous stores to
719
+that device have been completed.
720
+
721
+- device shadow memory
722
+
723
+Some MMIO loads do not have device side-effects. These MMIOs can be
724
+completed without sending a MMIO request to the emulation program if the
725
+emulation program shares a shadow image of the device's memory image
726
+with the KVM driver.
727
+
728
+The emulation program will ask the KVM driver to allocate memory for the
729
+shadow image, and will then use ``mmap()`` to directly access it. The
730
+emulation program can control KVM access to the shadow image by sending
731
+KVM an access map telling it which areas of the image have no
732
+side-effects (and can be completed immediately), and which require a
733
+MMIO request to the emulation program. The access map can also inform
734
+the KVM drive which size accesses are allowed to the image.
735
+
736
+master descriptor
737
+^^^^^^^^^^^^^^^^^
738
+
739
+The master descriptor is used by QEMU to configure the new KVM device.
740
+The descriptor would be returned by the KVM driver when QEMU issues a
741
+*KVM\_CREATE\_DEVICE* ``ioctl()`` with a *KVM\_DEV\_TYPE\_USER* type.
742
+
743
+KVM\_DEV\_TYPE\_USER device ops
744
+
745
+
746
+The *KVM\_DEV\_TYPE\_USER* operations vector will be registered by a
747
+``kvm_register_device_ops()`` call when the KVM system in initialized by
748
+``kvm_init()``. These device ops are called by the KVM driver when QEMU
749
+executes certain ``ioctl()`` operations on its KVM file descriptor. They
750
+include:
751
+
752
+- create
753
+
754
+This routine is called when QEMU issues a *KVM\_CREATE\_DEVICE*
755
+``ioctl()`` on its per-VM file descriptor. It will allocate and
756
+initialize a KVM user device specific data structure, and assign the
757
+*kvm\_device* private field to it.
758
+
759
+- ioctl
760
+
761
+This routine is invoked when QEMU issues an ``ioctl()`` on the master
762
+descriptor. The ``ioctl()`` commands supported are defined by the KVM
763
+device type. *KVM\_DEV\_TYPE\_USER* ones will need several commands:
764
+
765
+*KVM\_DEV\_USER\_SLAVE\_FD* creates the slave file descriptor that will
766
+be passed to the device emulation program. Only one slave can be created
767
+by each master descriptor. The file operations performed by this
768
+descriptor are described below.
769
+
770
+The *KVM\_DEV\_USER\_PA\_RANGE* command configures a guest physical
771
+address range that the slave descriptor will receive MMIO notifications
772
+for. The range is specified by a guest physical range structure
773
+argument. For buses that assign addresses to devices dynamically, this
774
+command can be executed while the guest is running, such as the case
775
+when a guest changes a device's PCI BAR registers.
776
+
777
+*KVM\_DEV\_USER\_PA\_RANGE* will use ``kvm_io_bus_register_dev()`` to
778
+register *kvm\_io\_device\_ops* callbacks to be invoked when the guest
779
+performs a MMIO operation within the range. When a range is changed,
780
+``kvm_io_bus_unregister_dev()`` is used to remove the previous
781
+instantiation.
782
+
783
+*KVM\_DEV\_USER\_TIMEOUT* will configure a timeout value that specifies
784
+how long KVM will wait for the emulation process to respond to a MMIO
785
+indication.
786
+
787
+- destroy
788
+
789
+This routine is called when the VM instance is destroyed. It will need
790
+to destroy the slave descriptor; and free any memory allocated by the
791
+driver, as well as the *kvm\_device* structure itself.
792
+
793
+slave descriptor
794
+^^^^^^^^^^^^^^^^
795
+
796
+The slave descriptor will have its own file operations vector, which
797
+responds to system calls on the descriptor performed by the device
798
+emulation program.
799
+
800
+- read
801
+
802
+A read returns any pending MMIO requests from the KVM driver as MMIO
803
+request structures. Multiple structures can be returned if there are
804
+multiple MMIO operations pending. The MMIO requests are moved from the
805
+pending queue to the sent queue, and if there are threads waiting for
806
+space in the pending to add new MMIO operations, they will be woken
807
+here.
808
+
809
+- write
810
+
811
+A write also consists of a set of MMIO requests. They are compared to
812
+the MMIO requests in the sent queue. Matches are removed from the sent
813
+queue, and any threads waiting for the reply are woken. If a store is
814
+removed, then the number of posted stores in the per-CPU scoreboard is
815
+decremented. When the number is zero, and a non side-effect load was
816
+waiting for posted stores to complete, the load is continued.
817
+
818
+- ioctl
819
+
820
+There are several ioctl()s that can be performed on the slave
821
+descriptor.
822
+
823
+A *KVM\_DEV\_USER\_SHADOW\_SIZE* ``ioctl()`` causes the KVM driver to
824
+allocate memory for the shadow image. This memory can later be
825
+``mmap()``\ ed by the emulation process to share the emulation's view of
826
+device memory with the KVM driver.
827
+
828
+A *KVM\_DEV\_USER\_SHADOW\_CTRL* ``ioctl()`` controls access to the
829
+shadow image. It will send the KVM driver a shadow control map, which
830
+specifies which areas of the image can complete guest loads without
831
+sending the load request to the emulation program. It will also specify
832
+the size of load operations that are allowed.
833
+
834
+- poll
835
+
836
+An emulation program will use the ``poll()`` call with a *POLLIN* flag
837
+to determine if there are MMIO requests waiting to be read. It will
838
+return if the pending MMIO request queue is not empty.
839
+
840
+- mmap
841
+
842
+This call allows the emulation program to directly access the shadow
843
+image allocated by the KVM driver. As device emulation updates device
844
+memory, changes with no side-effects will be reflected in the shadow,
845
+and the KVM driver can satisfy guest loads from the shadow image without
846
+needing to wait for the emulation program.
847
+
848
+kvm\_io\_device ops
849
+^^^^^^^^^^^^^^^^^^^
850
+
851
+Each KVM per-CPU thread can handle MMIO operation on behalf of the guest
852
+VM. KVM will use the MMIO's guest physical address to search for a
853
+matching *kvm\_io\_device* to see if the MMIO can be handled by the KVM
854
+driver instead of exiting back to QEMU. If a match is found, the
855
+corresponding callback will be invoked.
856
+
857
+- read
858
+
859
+This callback is invoked when the guest performs a load to the device.
860
+Loads with side-effects must be handled synchronously, with the KVM
861
+driver putting the QEMU thread to sleep waiting for the emulation
862
+process reply before re-starting the guest. Loads that do not have
863
+side-effects may be optimized by satisfying them from the shadow image,
864
+if there are no outstanding stores to the device by this CPU. PCI memory
865
+ordering demands that a load cannot complete before all older stores to
866
+the same device have been completed.
867
+
868
+- write
869
+
870
+Stores can be handled asynchronously unless the pending MMIO request
871
+queue is full. In this case, the QEMU thread must sleep waiting for
872
+space in the queue. Stores will increment the number of posted stores in
873
+the per-CPU scoreboard, in order to implement the PCI ordering
874
+constraint above.
875
+
876
+interrupt acceleration
877
+^^^^^^^^^^^^^^^^^^^^^^
878
+
879
+This performance optimization would work much like a vhost user
880
+application does, where the QEMU process sets up *eventfds* that cause
881
+the device's corresponding interrupt to be triggered by the KVM driver.
882
+These irq file descriptors are sent to the emulation process at
883
+initialization, and are used when the emulation code raises a device
884
+interrupt.
885
+
886
+intx acceleration
887
+'''''''''''''''''
888
+
889
+Traditional PCI pin interrupts are level based, so, in addition to an
890
+irq file descriptor, a re-sampling file descriptor needs to be sent to
891
+the emulation program. This second file descriptor allows multiple
892
+devices sharing an irq to be notified when the interrupt has been
893
+acknowledged by the guest, so they can re-trigger the interrupt if their
894
+device has not de-asserted its interrupt.
895
+
896
+intx irq descriptor
897
+
898
+
899
+The irq descriptors are created by the proxy object
900
+``using event_notifier_init()`` to create the irq and re-sampling
901
+*eventds*, and ``kvm_vm_ioctl(KVM_IRQFD)`` to bind them to an interrupt.
902
+The interrupt route can be found with
903
+``pci_device_route_intx_to_irq()``.
904
+
905
+intx routing changes
906
+
907
+
908
+Intx routing can be changed when the guest programs the APIC the device
909
+pin is connected to. The proxy object in QEMU will use
910
+``pci_device_set_intx_routing_notifier()`` to be informed of any guest
911
+changes to the route. This handler will broadly follow the VFIO
912
+interrupt logic to change the route: de-assigning the existing irq
913
+descriptor from its route, then assigning it the new route. (see
914
+``vfio_intx_update()``)
915
+
916
+MSI/X acceleration
917
+''''''''''''''''''
918
+
919
+MSI/X interrupts are sent as DMA transactions to the host. The interrupt
920
+data contains a vector that is programmed by the guest, A device may have
921
+multiple MSI interrupts associated with it, so multiple irq descriptors
922
+may need to be sent to the emulation program.
923
+
924
+MSI/X irq descriptor
925
+
926
+
927
+This case will also follow the VFIO example. For each MSI/X interrupt,
928
+an *eventfd* is created, a virtual interrupt is allocated by
929
+``kvm_irqchip_add_msi_route()``, and the virtual interrupt is bound to
930
+the eventfd with ``kvm_irqchip_add_irqfd_notifier()``.
931
+
932
+MSI/X config space changes
933
+
934
+
935
+The guest may dynamically update several MSI-related tables in the
936
+device's PCI config space. These include per-MSI interrupt enables and
937
+vector data. Additionally, MSIX tables exist in device memory space, not
938
+config space. Much like the BAR case above, the proxy object must look
939
+at guest config space programming to keep the MSI interrupt state
940
+consistent between QEMU and the emulation program.
941
+
942
+--------------
943
+
944
+Disaggregated CPU emulation
945
+---------------------------
946
+
947
+After IO services have been disaggregated, a second phase would be to
948
+separate a process to handle CPU instruction emulation from the main
949
+QEMU control function. There are no object separation points for this
950
+code, so the first task would be to create one.
951
+
952
+Host access controls
953
+--------------------
954
+
955
+Separating QEMU relies on the host OS's access restriction mechanisms to
956
+enforce that the differing processes can only access the objects they
957
+are entitled to. There are a couple types of mechanisms usually provided
958
+by general purpose OSs.
959
+
960
+Discretionary access control
961
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
962
+
963
+Discretionary access control allows each user to control who can access
964
+their files. In Linux, this type of control is usually too coarse for
965
+QEMU separation, since it only provides three separate access controls:
966
+one for the same user ID, the second for users IDs with the same group
967
+ID, and the third for all other user IDs. Each device instance would
968
+need a separate user ID to provide access control, which is likely to be
969
+unwieldy for dynamically created VMs.
970
+
971
+Mandatory access control
972
+~~~~~~~~~~~~~~~~~~~~~~~~
973
+
974
+Mandatory access control allows the OS to add an additional set of
975
+controls on top of discretionary access for the OS to control. It also
976
+adds other attributes to processes and files such as types, roles, and
977
+categories, and can establish rules for how processes and files can
978
+interact.
979
+
980
+Type enforcement
981
+^^^^^^^^^^^^^^^^
982
+
983
+Type enforcement assigns a *type* attribute to processes and files, and
984
+allows rules to be written on what operations a process with a given
985
+type can perform on a file with a given type. QEMU separation could take
986
+advantage of type enforcement by running the emulation processes with
987
+different types, both from the main QEMU process, and from the emulation
988
+processes of different classes of devices.
989
+
990
+For example, guest disk images and disk emulation processes could have
991
+types separate from the main QEMU process and non-disk emulation
992
+processes, and the type rules could prevent processes other than disk
993
+emulation ones from accessing guest disk images. Similarly, network
994
+emulation processes can have a type separate from the main QEMU process
995
+and non-network emulation process, and only that type can access the
996
+host tun/tap device used to provide guest networking.
997
+
998
+Category enforcement
999
+^^^^^^^^^^^^^^^^^^^^
1000
+
1001
+Category enforcement assigns a set of numbers within a given range to
1002
+the process or file. The process is granted access to the file if the
1003
+process's set is a superset of the file's set. This enforcement can be
1004
+used to separate multiple instances of devices in the same class.
1005
+
1006
+For example, if there are multiple disk devices provides to a guest,
1007
+each device emulation process could be provisioned with a separate
1008
+category. The different device emulation processes would not be able to
1009
+access each other's backing disk images.
1010
+
1011
+Alternatively, categories could be used in lieu of the type enforcement
1012
+scheme described above. In this scenario, different categories would be
1013
+used to prevent device emulation processes in different classes from
1014
+accessing resources assigned to other classes.
1015
--
1016
2.29.2
6
1017
7
when running:
8
9
qemu-img rebase -b A C
10
11
QEMU would read all sectors not allocated in the file being rebased (C)
12
and compare them to the new base image (A), regardless of whether they
13
were changed or even allocated anywhere along the chain between the new
14
base and the top image (B). This causes many unneeded reads when
15
rebasing an image which represents a small diff of a large disk, as it
16
would read most of the disk's sectors.
17
18
Instead, use bdrv_is_allocated_above() to reduce the number of
19
unnecessary reads.
20
21
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
22
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
23
Signed-off-by: Eyal Moscovici <eyal.moscovici@oracle.com>
24
Message-id: 20190523163337.4497-3-shmuel.eiderman@oracle.com
25
Signed-off-by: Max Reitz <mreitz@redhat.com>
26
---
27
qemu-img.c | 25 ++++++++++++++++++++++++-
28
1 file changed, 24 insertions(+), 1 deletion(-)
29
30
diff --git a/qemu-img.c b/qemu-img.c
31
index XXXXXXX..XXXXXXX 100644
32
--- a/qemu-img.c
33
+++ b/qemu-img.c
34
@@ -XXX,XX +XXX,XX @@ static int img_rebase(int argc, char **argv)
35
BlockBackend *blk = NULL, *blk_old_backing = NULL, *blk_new_backing = NULL;
36
uint8_t *buf_old = NULL;
37
uint8_t *buf_new = NULL;
38
- BlockDriverState *bs = NULL;
39
+ BlockDriverState *bs = NULL, *prefix_chain_bs = NULL;
40
char *filename;
41
const char *fmt, *cache, *src_cache, *out_basefmt, *out_baseimg;
42
int c, flags, src_flags, ret;
43
@@ -XXX,XX +XXX,XX @@ static int img_rebase(int argc, char **argv)
44
goto out;
45
}
46
47
+ /*
48
+ * Find out whether we rebase an image on top of a previous image
49
+ * in its chain.
50
+ */
51
+ prefix_chain_bs = bdrv_find_backing_image(bs, out_real_path);
52
+
53
blk_new_backing = blk_new_open(out_real_path, NULL,
54
options, src_flags, &local_err);
55
g_free(out_real_path);
56
@@ -XXX,XX +XXX,XX @@ static int img_rebase(int argc, char **argv)
57
continue;
58
}
59
60
+ if (prefix_chain_bs) {
61
+ /*
62
+ * If cluster wasn't changed since prefix_chain, we don't need
63
+ * to take action
64
+ */
65
+ ret = bdrv_is_allocated_above(backing_bs(bs), prefix_chain_bs,
66
+ offset, n, &n);
67
+ if (ret < 0) {
68
+ error_report("error while reading image metadata: %s",
69
+ strerror(-ret));
70
+ goto out;
71
+ }
72
+ if (!ret) {
73
+ continue;
74
+ }
75
+ }
76
+
77
/*
78
* Read old and new backing file and take into consideration that
79
* backing files may be smaller than the COW image.
80
--
81
2.21.0
82
83
diff view generated by jsdifflib
1
From: Sam Eiderman <shmuel.eiderman@oracle.com>
1
From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
2
2
3
In safe mode we open the entire chain, including the parent backing
3
Adds documentation explaining the command-line arguments needed
4
file of the rebased file.
4
to use multi-process.
5
Do not open a new BlockBackend for the parent backing file, which
6
saves opening the rest of the chain twice, which for long chains
7
saves many "pricy" bdrv_open() calls.
8
5
9
Permissions for blk_new() were copied from blk_new_open() when
6
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
10
flags = 0.
7
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
8
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Message-id: 49f757a84e5dd6fae14b22544897d1124c5fdbad.1611938319.git.jag.raman@oracle.com
11
11
12
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
12
[Move orphan docs/multi-process.rst document into docs/system/ and add
13
Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
13
it to index.rst to prevent Sphinx "document isn't included in any
14
Signed-off-by: Sagi Amit <sagi.amit@oracle.com>
14
toctree" error.
15
Co-developed-by: Sagi Amit <sagi.amit@oracle.com>
15
--Stefan]
16
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
16
17
Message-id: 20190523163337.4497-2-shmuel.eiderman@oracle.com
17
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
18
Signed-off-by: Max Reitz <mreitz@redhat.com>
19
---
18
---
20
qemu-img.c | 29 +++++++++--------------------
19
MAINTAINERS | 1 +
21
1 file changed, 9 insertions(+), 20 deletions(-)
20
docs/system/index.rst | 1 +
21
docs/system/multi-process.rst | 64 +++++++++++++++++++++++++++++++++++
22
3 files changed, 66 insertions(+)
23
create mode 100644 docs/system/multi-process.rst
22
24
23
diff --git a/qemu-img.c b/qemu-img.c
25
diff --git a/MAINTAINERS b/MAINTAINERS
24
index XXXXXXX..XXXXXXX 100644
26
index XXXXXXX..XXXXXXX 100644
25
--- a/qemu-img.c
27
--- a/MAINTAINERS
26
+++ b/qemu-img.c
28
+++ b/MAINTAINERS
27
@@ -XXX,XX +XXX,XX @@ static int img_rebase(int argc, char **argv)
29
@@ -XXX,XX +XXX,XX @@ M: Jagannathan Raman <jag.raman@oracle.com>
28
30
M: John G Johnson <john.g.johnson@oracle.com>
29
/* For safe rebasing we need to compare old and new backing file */
31
S: Maintained
30
if (!unsafe) {
32
F: docs/devel/multi-process.rst
31
- char backing_name[PATH_MAX];
33
+F: docs/system/multi-process.rst
32
QDict *options = NULL;
34
33
+ BlockDriverState *base_bs = backing_bs(bs);
35
Build and test automation
34
36
-------------------------
35
- if (bs->backing) {
37
diff --git a/docs/system/index.rst b/docs/system/index.rst
36
- if (bs->backing_format[0] != '\0') {
38
index XXXXXXX..XXXXXXX 100644
37
- options = qdict_new();
39
--- a/docs/system/index.rst
38
- qdict_put_str(options, "driver", bs->backing_format);
40
+++ b/docs/system/index.rst
39
- }
41
@@ -XXX,XX +XXX,XX @@ Contents:
40
-
42
pr-manager
41
- if (force_share) {
43
targets
42
- if (!options) {
44
security
43
- options = qdict_new();
45
+ multi-process
44
- }
46
deprecated
45
- qdict_put_bool(options, BDRV_OPT_FORCE_SHARE, true);
47
removed-features
46
- }
48
build-platforms
47
- bdrv_get_backing_filename(bs, backing_name, sizeof(backing_name));
49
diff --git a/docs/system/multi-process.rst b/docs/system/multi-process.rst
48
- blk_old_backing = blk_new_open(backing_name, NULL,
50
new file mode 100644
49
- options, src_flags, &local_err);
51
index XXXXXXX..XXXXXXX
50
- if (!blk_old_backing) {
52
--- /dev/null
51
+ if (base_bs) {
53
+++ b/docs/system/multi-process.rst
52
+ blk_old_backing = blk_new(BLK_PERM_CONSISTENT_READ,
54
@@ -XXX,XX +XXX,XX @@
53
+ BLK_PERM_ALL);
55
+Multi-process QEMU
54
+ ret = blk_insert_bs(blk_old_backing, base_bs,
56
+==================
55
+ &local_err);
57
+
56
+ if (ret < 0) {
58
+This document describes how to configure and use multi-process qemu.
57
error_reportf_err(local_err,
59
+For the design document refer to docs/devel/qemu-multiprocess.
58
- "Could not open old backing file '%s': ",
60
+
59
- backing_name);
61
+1) Configuration
60
- ret = -1;
62
+----------------
61
+ "Could not reuse old backing file '%s': ",
63
+
62
+ base_bs->filename);
64
+multi-process is enabled by default for targets that enable KVM
63
goto out;
65
+
64
}
66
+
65
} else {
67
+2) Usage
68
+--------
69
+
70
+Multi-process QEMU requires an orchestrator to launch.
71
+
72
+Following is a description of command-line used to launch mpqemu.
73
+
74
+* Orchestrator:
75
+
76
+ - The Orchestrator creates a unix socketpair
77
+
78
+ - It launches the remote process and passes one of the
79
+ sockets to it via command-line.
80
+
81
+ - It then launches QEMU and specifies the other socket as an option
82
+ to the Proxy device object
83
+
84
+* Remote Process:
85
+
86
+ - QEMU can enter remote process mode by using the "remote" machine
87
+ option.
88
+
89
+ - The orchestrator creates a "remote-object" with details about
90
+ the device and the file descriptor for the device
91
+
92
+ - The remaining options are no different from how one launches QEMU with
93
+ devices.
94
+
95
+ - Example command-line for the remote process is as follows:
96
+
97
+ /usr/bin/qemu-system-x86_64 \
98
+ -machine x-remote \
99
+ -device lsi53c895a,id=lsi0 \
100
+ -drive id=drive_image2,file=/build/ol7-nvme-test-1.qcow2 \
101
+ -device scsi-hd,id=drive2,drive=drive_image2,bus=lsi0.0,scsi-id=0 \
102
+ -object x-remote-object,id=robj1,devid=lsi1,fd=4,
103
+
104
+* QEMU:
105
+
106
+ - Since parts of the RAM are shared between QEMU & remote process, a
107
+ memory-backend-memfd is required to facilitate this, as follows:
108
+
109
+ -object memory-backend-memfd,id=mem,size=2G
110
+
111
+ - A "x-pci-proxy-dev" device is created for each of the PCI devices emulated
112
+ in the remote process. A "socket" sub-option specifies the other end of
113
+ unix channel created by orchestrator. The "id" sub-option must be specified
114
+ and should be the same as the "id" specified for the remote PCI device
115
+
116
+ - Example commandline for QEMU is as follows:
117
+
118
+ -device x-pci-proxy-dev,id=lsi0,socket=3
66
--
119
--
67
2.21.0
120
2.29.2
68
121
69
diff view generated by jsdifflib
1
From: John Snow <jsnow@redhat.com>
1
From: Jagannathan Raman <jag.raman@oracle.com>
2
2
3
We mandate that the source node must be a root node; but there's no reason
3
Allow RAM MemoryRegion to be created from an offset in a file, instead
4
I am aware of that it needs to be restricted to such. In some cases, we need
4
of allocating at offset of 0 by default. This is needed to synchronize
5
to make sure that there's a medium present, but in the general case we can
5
RAM between QEMU & remote process.
6
allow the backup job itself to do the graph checking.
6
7
7
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
8
This patch helps improve the error message when you try to backup from
8
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
9
the same node more than once, which is reflected in the change to test
9
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
10
056.
10
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
11
11
Message-id: 609996697ad8617e3b01df38accc5c208c24d74e.1611938319.git.jag.raman@oracle.com
12
For backups with bitmaps, it will also show a better error message that
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
the bitmap is in use instead of giving you something cryptic like "need
14
a root node."
15
16
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1707303
17
Signed-off-by: John Snow <jsnow@redhat.com>
18
Message-id: 20190521210053.8864-1-jsnow@redhat.com
19
Signed-off-by: Max Reitz <mreitz@redhat.com>
20
---
13
---
21
blockdev.c | 7 ++++++-
14
include/exec/memory.h | 2 ++
22
tests/qemu-iotests/056 | 2 +-
15
include/exec/ram_addr.h | 2 +-
23
2 files changed, 7 insertions(+), 2 deletions(-)
16
include/qemu/mmap-alloc.h | 4 +++-
24
17
backends/hostmem-memfd.c | 2 +-
25
diff --git a/blockdev.c b/blockdev.c
18
hw/misc/ivshmem.c | 3 ++-
26
index XXXXXXX..XXXXXXX 100644
19
softmmu/memory.c | 3 ++-
27
--- a/blockdev.c
20
softmmu/physmem.c | 11 +++++++----
28
+++ b/blockdev.c
21
util/mmap-alloc.c | 7 ++++---
29
@@ -XXX,XX +XXX,XX @@ static BlockJob *do_drive_backup(DriveBackup *backup, JobTxn *txn,
22
util/oslib-posix.c | 2 +-
30
backup->compress = false;
23
9 files changed, 23 insertions(+), 13 deletions(-)
24
25
diff --git a/include/exec/memory.h b/include/exec/memory.h
26
index XXXXXXX..XXXXXXX 100644
27
--- a/include/exec/memory.h
28
+++ b/include/exec/memory.h
29
@@ -XXX,XX +XXX,XX @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
30
* @size: size of the region.
31
* @share: %true if memory must be mmaped with the MAP_SHARED flag
32
* @fd: the fd to mmap.
33
+ * @offset: offset within the file referenced by fd
34
* @errp: pointer to Error*, to store an error if it happens.
35
*
36
* Note that this function does not do anything to cause the data in the
37
@@ -XXX,XX +XXX,XX @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
38
uint64_t size,
39
bool share,
40
int fd,
41
+ ram_addr_t offset,
42
Error **errp);
43
#endif
44
45
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
46
index XXXXXXX..XXXXXXX 100644
47
--- a/include/exec/ram_addr.h
48
+++ b/include/exec/ram_addr.h
49
@@ -XXX,XX +XXX,XX @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
50
Error **errp);
51
RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
52
uint32_t ram_flags, int fd,
53
- Error **errp);
54
+ off_t offset, Error **errp);
55
56
RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
57
MemoryRegion *mr, Error **errp);
58
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
59
index XXXXXXX..XXXXXXX 100644
60
--- a/include/qemu/mmap-alloc.h
61
+++ b/include/qemu/mmap-alloc.h
62
@@ -XXX,XX +XXX,XX @@ size_t qemu_mempath_getpagesize(const char *mem_path);
63
* otherwise, the alignment in use will be determined by QEMU.
64
* @shared: map has RAM_SHARED flag.
65
* @is_pmem: map has RAM_PMEM flag.
66
+ * @map_offset: map starts at offset of map_offset from the start of fd
67
*
68
* Return:
69
* On success, return a pointer to the mapped area.
70
@@ -XXX,XX +XXX,XX @@ void *qemu_ram_mmap(int fd,
71
size_t size,
72
size_t align,
73
bool shared,
74
- bool is_pmem);
75
+ bool is_pmem,
76
+ off_t map_offset);
77
78
void qemu_ram_munmap(int fd, void *ptr, size_t size);
79
80
diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
81
index XXXXXXX..XXXXXXX 100644
82
--- a/backends/hostmem-memfd.c
83
+++ b/backends/hostmem-memfd.c
84
@@ -XXX,XX +XXX,XX @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
85
name = host_memory_backend_get_name(backend);
86
memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend),
87
name, backend->size,
88
- backend->share, fd, errp);
89
+ backend->share, fd, 0, errp);
90
g_free(name);
91
}
92
93
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
94
index XXXXXXX..XXXXXXX 100644
95
--- a/hw/misc/ivshmem.c
96
+++ b/hw/misc/ivshmem.c
97
@@ -XXX,XX +XXX,XX @@ static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
98
99
/* mmap the region and map into the BAR2 */
100
memory_region_init_ram_from_fd(&s->server_bar2, OBJECT(s),
101
- "ivshmem.bar2", size, true, fd, &local_err);
102
+ "ivshmem.bar2", size, true, fd, 0,
103
+ &local_err);
104
if (local_err) {
105
error_propagate(errp, local_err);
106
return;
107
diff --git a/softmmu/memory.c b/softmmu/memory.c
108
index XXXXXXX..XXXXXXX 100644
109
--- a/softmmu/memory.c
110
+++ b/softmmu/memory.c
111
@@ -XXX,XX +XXX,XX @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
112
uint64_t size,
113
bool share,
114
int fd,
115
+ ram_addr_t offset,
116
Error **errp)
117
{
118
Error *err = NULL;
119
@@ -XXX,XX +XXX,XX @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
120
mr->destructor = memory_region_destructor_ram;
121
mr->ram_block = qemu_ram_alloc_from_fd(size, mr,
122
share ? RAM_SHARED : 0,
123
- fd, &err);
124
+ fd, offset, &err);
125
if (err) {
126
mr->size = int128_zero();
127
object_unparent(OBJECT(mr));
128
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
129
index XXXXXXX..XXXXXXX 100644
130
--- a/softmmu/physmem.c
131
+++ b/softmmu/physmem.c
132
@@ -XXX,XX +XXX,XX @@ static void *file_ram_alloc(RAMBlock *block,
133
ram_addr_t memory,
134
int fd,
135
bool truncate,
136
+ off_t offset,
137
Error **errp)
138
{
139
void *area;
140
@@ -XXX,XX +XXX,XX @@ static void *file_ram_alloc(RAMBlock *block,
31
}
141
}
32
142
33
- bs = qmp_get_root_bs(backup->device, errp);
143
area = qemu_ram_mmap(fd, memory, block->mr->align,
34
+ bs = bdrv_lookup_bs(backup->device, backup->device, errp);
144
- block->flags & RAM_SHARED, block->flags & RAM_PMEM);
35
if (!bs) {
145
+ block->flags & RAM_SHARED, block->flags & RAM_PMEM,
146
+ offset);
147
if (area == MAP_FAILED) {
148
error_setg_errno(errp, errno,
149
"unable to map backing store for guest RAM");
150
@@ -XXX,XX +XXX,XX @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
151
#ifdef CONFIG_POSIX
152
RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
153
uint32_t ram_flags, int fd,
154
- Error **errp)
155
+ off_t offset, Error **errp)
156
{
157
RAMBlock *new_block;
158
Error *local_err = NULL;
159
@@ -XXX,XX +XXX,XX @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
160
new_block->used_length = size;
161
new_block->max_length = size;
162
new_block->flags = ram_flags;
163
- new_block->host = file_ram_alloc(new_block, size, fd, !file_size, errp);
164
+ new_block->host = file_ram_alloc(new_block, size, fd, !file_size, offset,
165
+ errp);
166
if (!new_block->host) {
167
g_free(new_block);
168
return NULL;
169
@@ -XXX,XX +XXX,XX @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
36
return NULL;
170
return NULL;
37
}
171
}
38
172
39
+ if (!bs->drv) {
173
- block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, errp);
40
+ error_setg(errp, "Device has no medium");
174
+ block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, 0, errp);
41
+ return NULL;
175
if (!block) {
42
+ }
176
if (created) {
43
+
177
unlink(mem_path);
44
aio_context = bdrv_get_aio_context(bs);
178
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
45
aio_context_acquire(aio_context);
179
index XXXXXXX..XXXXXXX 100644
46
180
--- a/util/mmap-alloc.c
47
diff --git a/tests/qemu-iotests/056 b/tests/qemu-iotests/056
181
+++ b/util/mmap-alloc.c
48
index XXXXXXX..XXXXXXX 100755
182
@@ -XXX,XX +XXX,XX @@ void *qemu_ram_mmap(int fd,
49
--- a/tests/qemu-iotests/056
183
size_t size,
50
+++ b/tests/qemu-iotests/056
184
size_t align,
51
@@ -XXX,XX +XXX,XX @@ class BackupTest(iotests.QMPTestCase):
185
bool shared,
52
res = self.vm.qmp('query-block-jobs')
186
- bool is_pmem)
53
self.assert_qmp(res, 'return[0]/status', 'concluded')
187
+ bool is_pmem,
54
# Leave zombie job un-dismissed, observe a failure:
188
+ off_t map_offset)
55
- res = self.qmp_backup_and_wait(serror='Need a root block node',
189
{
56
+ res = self.qmp_backup_and_wait(serror="Node 'drive0' is busy: block device is in use by block job: backup",
190
int flags;
57
device='drive0', format=iotests.imgfmt,
191
int map_sync_flags = 0;
58
sync='full', target=self.dest_img,
192
@@ -XXX,XX +XXX,XX @@ void *qemu_ram_mmap(int fd,
59
auto_dismiss=False)
193
offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) - (uintptr_t)guardptr;
194
195
ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
196
- flags | map_sync_flags, fd, 0);
197
+ flags | map_sync_flags, fd, map_offset);
198
199
if (ptr == MAP_FAILED && map_sync_flags) {
200
if (errno == ENOTSUP) {
201
@@ -XXX,XX +XXX,XX @@ void *qemu_ram_mmap(int fd,
202
* we will remove these flags to handle compatibility.
203
*/
204
ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
205
- flags, fd, 0);
206
+ flags, fd, map_offset);
207
}
208
209
if (ptr == MAP_FAILED) {
210
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
211
index XXXXXXX..XXXXXXX 100644
212
--- a/util/oslib-posix.c
213
+++ b/util/oslib-posix.c
214
@@ -XXX,XX +XXX,XX @@ void *qemu_memalign(size_t alignment, size_t size)
215
void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared)
216
{
217
size_t align = QEMU_VMALLOC_ALIGN;
218
- void *ptr = qemu_ram_mmap(-1, size, align, shared, false);
219
+ void *ptr = qemu_ram_mmap(-1, size, align, shared, false, 0);
220
221
if (ptr == MAP_FAILED) {
222
return NULL;
60
--
223
--
61
2.21.0
224
2.29.2
62
225
63
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Jagannathan Raman <jag.raman@oracle.com>
2
2
3
Background: decryption will be done in threads, to take benefit of it,
3
Add configuration options to enable or disable multiprocess QEMU code
4
we should move it out of the lock first.
5
4
6
But let's go further: it turns out, that only
5
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
7
qcow2_get_cluster_offset() needs locking, so reduce locking to it.
6
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
7
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
8
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
9
Message-id: 6cc37253e35418ebd7b675a31a3df6e3c7a12dc1.1611938319.git.jag.raman@oracle.com
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
12
configure | 10 ++++++++++
13
meson.build | 4 +++-
14
Kconfig.host | 4 ++++
15
hw/Kconfig | 1 +
16
hw/remote/Kconfig | 3 +++
17
5 files changed, 21 insertions(+), 1 deletion(-)
18
create mode 100644 hw/remote/Kconfig
8
19
9
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
20
diff --git a/configure b/configure
10
Message-id: 20190506142741.41731-7-vsementsov@virtuozzo.com
21
index XXXXXXX..XXXXXXX 100755
11
Reviewed-by: Alberto Garcia <berto@igalia.com>
22
--- a/configure
12
Signed-off-by: Max Reitz <mreitz@redhat.com>
23
+++ b/configure
13
---
24
@@ -XXX,XX +XXX,XX @@ skip_meson=no
14
block/qcow2.c | 12 ++----------
25
gettext="auto"
15
1 file changed, 2 insertions(+), 10 deletions(-)
26
fuse="auto"
27
fuse_lseek="auto"
28
+multiprocess="no"
29
30
malloc_trim="auto"
31
32
@@ -XXX,XX +XXX,XX @@ Linux)
33
linux="yes"
34
linux_user="yes"
35
vhost_user=${default_feature:-yes}
36
+ multiprocess=${default_feature:-yes}
37
;;
38
esac
39
40
@@ -XXX,XX +XXX,XX @@ for opt do
41
;;
42
--disable-fuse-lseek) fuse_lseek="disabled"
43
;;
44
+ --enable-multiprocess) multiprocess="yes"
45
+ ;;
46
+ --disable-multiprocess) multiprocess="no"
47
+ ;;
48
*)
49
echo "ERROR: unknown option $opt"
50
echo "Try '$0 --help' for more information"
51
@@ -XXX,XX +XXX,XX @@ disabled with --disable-FEATURE, default is enabled if available
52
libdaxctl libdaxctl support
53
fuse FUSE block device export
54
fuse-lseek SEEK_HOLE/SEEK_DATA support for FUSE exports
55
+ multiprocess Multiprocess QEMU support
56
57
NOTE: The object files are built at the place where configure is launched
58
EOF
59
@@ -XXX,XX +XXX,XX @@ fi
60
if test "$have_mlockall" = "yes" ; then
61
echo "HAVE_MLOCKALL=y" >> $config_host_mak
62
fi
63
+if test "$multiprocess" = "yes" ; then
64
+ echo "CONFIG_MULTIPROCESS_ALLOWED=y" >> $config_host_mak
65
+fi
66
if test "$fuzzing" = "yes" ; then
67
# If LIB_FUZZING_ENGINE is set, assume we are running on OSS-Fuzz, and the
68
# needed CFLAGS have already been provided
69
diff --git a/meson.build b/meson.build
70
index XXXXXXX..XXXXXXX 100644
71
--- a/meson.build
72
+++ b/meson.build
73
@@ -XXX,XX +XXX,XX @@ host_kconfig = \
74
('CONFIG_VHOST_KERNEL' in config_host ? ['CONFIG_VHOST_KERNEL=y'] : []) + \
75
(have_virtfs ? ['CONFIG_VIRTFS=y'] : []) + \
76
('CONFIG_LINUX' in config_host ? ['CONFIG_LINUX=y'] : []) + \
77
- ('CONFIG_PVRDMA' in config_host ? ['CONFIG_PVRDMA=y'] : [])
78
+ ('CONFIG_PVRDMA' in config_host ? ['CONFIG_PVRDMA=y'] : []) + \
79
+ ('CONFIG_MULTIPROCESS_ALLOWED' in config_host ? ['CONFIG_MULTIPROCESS_ALLOWED=y'] : [])
80
81
ignored = [ 'TARGET_XML_FILES', 'TARGET_ABI_DIR', 'TARGET_ARCH' ]
82
83
@@ -XXX,XX +XXX,XX @@ summary_info += {'libpmem support': config_host.has_key('CONFIG_LIBPMEM')}
84
summary_info += {'libdaxctl support': config_host.has_key('CONFIG_LIBDAXCTL')}
85
summary_info += {'libudev': libudev.found()}
86
summary_info += {'FUSE lseek': fuse_lseek.found()}
87
+summary_info += {'Multiprocess QEMU': config_host.has_key('CONFIG_MULTIPROCESS_ALLOWED')}
88
summary(summary_info, bool_yn: true, section: 'Dependencies')
89
90
if not supported_cpus.contains(cpu)
91
diff --git a/Kconfig.host b/Kconfig.host
92
index XXXXXXX..XXXXXXX 100644
93
--- a/Kconfig.host
94
+++ b/Kconfig.host
95
@@ -XXX,XX +XXX,XX @@ config VIRTFS
96
97
config PVRDMA
98
bool
99
+
100
+config MULTIPROCESS_ALLOWED
101
+ bool
102
+ imply MULTIPROCESS
103
diff --git a/hw/Kconfig b/hw/Kconfig
104
index XXXXXXX..XXXXXXX 100644
105
--- a/hw/Kconfig
106
+++ b/hw/Kconfig
107
@@ -XXX,XX +XXX,XX @@ source pci-host/Kconfig
108
source pcmcia/Kconfig
109
source pci/Kconfig
110
source rdma/Kconfig
111
+source remote/Kconfig
112
source rtc/Kconfig
113
source scsi/Kconfig
114
source sd/Kconfig
115
diff --git a/hw/remote/Kconfig b/hw/remote/Kconfig
116
new file mode 100644
117
index XXXXXXX..XXXXXXX
118
--- /dev/null
119
+++ b/hw/remote/Kconfig
120
@@ -XXX,XX +XXX,XX @@
121
+config MULTIPROCESS
122
+ bool
123
+ depends on PCI && KVM
124
--
125
2.29.2
16
126
17
diff --git a/block/qcow2.c b/block/qcow2.c
18
index XXXXXXX..XXXXXXX 100644
19
--- a/block/qcow2.c
20
+++ b/block/qcow2.c
21
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset,
22
23
qemu_iovec_init(&hd_qiov, qiov->niov);
24
25
- qemu_co_mutex_lock(&s->lock);
26
-
27
while (bytes != 0) {
28
29
/* prepare next request */
30
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset,
31
QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
32
}
33
34
+ qemu_co_mutex_lock(&s->lock);
35
ret = qcow2_get_cluster_offset(bs, offset, &cur_bytes, &cluster_offset);
36
+ qemu_co_mutex_unlock(&s->lock);
37
if (ret < 0) {
38
goto fail;
39
}
40
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset,
41
42
if (bs->backing) {
43
BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
44
- qemu_co_mutex_unlock(&s->lock);
45
ret = bdrv_co_preadv(bs->backing, offset, cur_bytes,
46
&hd_qiov, 0);
47
- qemu_co_mutex_lock(&s->lock);
48
if (ret < 0) {
49
goto fail;
50
}
51
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset,
52
break;
53
54
case QCOW2_CLUSTER_COMPRESSED:
55
- qemu_co_mutex_unlock(&s->lock);
56
ret = qcow2_co_preadv_compressed(bs, cluster_offset,
57
offset, cur_bytes,
58
&hd_qiov);
59
- qemu_co_mutex_lock(&s->lock);
60
if (ret < 0) {
61
goto fail;
62
}
63
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset,
64
}
65
66
BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
67
- qemu_co_mutex_unlock(&s->lock);
68
ret = bdrv_co_preadv(s->data_file,
69
cluster_offset + offset_in_cluster,
70
cur_bytes, &hd_qiov, 0);
71
- qemu_co_mutex_lock(&s->lock);
72
if (ret < 0) {
73
goto fail;
74
}
75
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset,
76
ret = 0;
77
78
fail:
79
- qemu_co_mutex_unlock(&s->lock);
80
-
81
qemu_iovec_destroy(&hd_qiov);
82
qemu_vfree(cluster_data);
83
84
--
85
2.21.0
86
87
diff view generated by jsdifflib
1
From: Anton Nefedov <anton.nefedov@virtuozzo.com>
1
From: Jagannathan Raman <jag.raman@oracle.com>
2
2
3
If COW areas of the newly allocated clusters are zeroes on the backing
3
PCI host bridge is setup for the remote device process. It is
4
image, efficient bdrv_write_zeroes(flags=BDRV_REQ_NO_FALLBACK) can be
4
implemented using remote-pcihost object. It is an extension of the PCI
5
used on the whole cluster instead of writing explicit zero buffers later
5
host bridge setup by QEMU.
6
in perform_cow().
6
Remote-pcihost configures a PCI bus which could be used by the remote
7
PCI device to latch on to.
7
8
8
iotest 060:
9
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
9
write to the discarded cluster does not trigger COW anymore.
10
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
10
Use a backing image instead.
11
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
12
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Message-id: 0871ba857abb2eafacde07e7fe66a3f12415bfb2.1611938319.git.jag.raman@oracle.com
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
15
---
16
MAINTAINERS | 2 +
17
include/hw/pci-host/remote.h | 29 ++++++++++++++
18
hw/pci-host/remote.c | 75 ++++++++++++++++++++++++++++++++++++
19
hw/pci-host/Kconfig | 3 ++
20
hw/pci-host/meson.build | 1 +
21
hw/remote/Kconfig | 1 +
22
6 files changed, 111 insertions(+)
23
create mode 100644 include/hw/pci-host/remote.h
24
create mode 100644 hw/pci-host/remote.c
11
25
12
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
26
diff --git a/MAINTAINERS b/MAINTAINERS
13
Message-id: 20190516142749.81019-2-anton.nefedov@virtuozzo.com
14
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
15
Reviewed-by: Alberto Garcia <berto@igalia.com>
16
Signed-off-by: Max Reitz <mreitz@redhat.com>
17
---
18
qapi/block-core.json | 4 +-
19
block/qcow2.h | 6 +++
20
block/qcow2-cluster.c | 2 +-
21
block/qcow2.c | 85 ++++++++++++++++++++++++++++++++++++++
22
block/trace-events | 1 +
23
tests/qemu-iotests/060 | 7 +++-
24
tests/qemu-iotests/060.out | 5 ++-
25
7 files changed, 106 insertions(+), 4 deletions(-)
26
27
diff --git a/qapi/block-core.json b/qapi/block-core.json
28
index XXXXXXX..XXXXXXX 100644
27
index XXXXXXX..XXXXXXX 100644
29
--- a/qapi/block-core.json
28
--- a/MAINTAINERS
30
+++ b/qapi/block-core.json
29
+++ b/MAINTAINERS
30
@@ -XXX,XX +XXX,XX @@ M: John G Johnson <john.g.johnson@oracle.com>
31
S: Maintained
32
F: docs/devel/multi-process.rst
33
F: docs/system/multi-process.rst
34
+F: hw/pci-host/remote.c
35
+F: include/hw/pci-host/remote.h
36
37
Build and test automation
38
-------------------------
39
diff --git a/include/hw/pci-host/remote.h b/include/hw/pci-host/remote.h
40
new file mode 100644
41
index XXXXXXX..XXXXXXX
42
--- /dev/null
43
+++ b/include/hw/pci-host/remote.h
31
@@ -XXX,XX +XXX,XX @@
44
@@ -XXX,XX +XXX,XX @@
32
#
45
+/*
33
# @cor_write: a write due to copy-on-read (since 2.11)
46
+ * PCI Host for remote device
34
#
47
+ *
35
+# @cluster_alloc_space: an allocation of file space for a cluster (since 4.1)
48
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
36
+#
49
+ *
37
# Since: 2.9
50
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
38
##
51
+ * See the COPYING file in the top-level directory.
39
{ 'enum': 'BlkdebugEvent', 'prefix': 'BLKDBG',
52
+ *
53
+ */
54
+
55
+#ifndef REMOTE_PCIHOST_H
56
+#define REMOTE_PCIHOST_H
57
+
58
+#include "exec/memory.h"
59
+#include "hw/pci/pcie_host.h"
60
+
61
+#define TYPE_REMOTE_PCIHOST "remote-pcihost"
62
+OBJECT_DECLARE_SIMPLE_TYPE(RemotePCIHost, REMOTE_PCIHOST)
63
+
64
+struct RemotePCIHost {
65
+ /*< private >*/
66
+ PCIExpressHost parent_obj;
67
+ /*< public >*/
68
+
69
+ MemoryRegion *mr_pci_mem;
70
+ MemoryRegion *mr_sys_io;
71
+};
72
+
73
+#endif
74
diff --git a/hw/pci-host/remote.c b/hw/pci-host/remote.c
75
new file mode 100644
76
index XXXXXXX..XXXXXXX
77
--- /dev/null
78
+++ b/hw/pci-host/remote.c
40
@@ -XXX,XX +XXX,XX @@
79
@@ -XXX,XX +XXX,XX @@
41
'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev',
80
+/*
42
'pwritev_zero', 'pwritev_done', 'empty_image_prepare',
81
+ * Remote PCI host device
43
'l1_shrink_write_table', 'l1_shrink_free_l2_clusters',
82
+ *
44
- 'cor_write'] }
83
+ * Unlike PCI host devices that model physical hardware, the purpose
45
+ 'cor_write', 'cluster_alloc_space'] }
84
+ * of this PCI host is to host multi-process QEMU devices.
46
85
+ *
47
##
86
+ * Multi-process QEMU extends the PCI host of a QEMU machine into a
48
# @BlkdebugInjectErrorOptions:
87
+ * remote process. Any PCI device attached to the remote process is
49
diff --git a/block/qcow2.h b/block/qcow2.h
88
+ * visible in the QEMU guest. This allows existing QEMU device models
50
index XXXXXXX..XXXXXXX 100644
89
+ * to be reused in the remote process.
51
--- a/block/qcow2.h
90
+ *
52
+++ b/block/qcow2.h
91
+ * This PCI host is purely a container for PCI devices. It's fake in the
53
@@ -XXX,XX +XXX,XX @@ typedef struct QCowL2Meta
92
+ * sense that the guest never sees this PCI host and has no way of
54
*/
93
+ * accessing it. Its job is just to provide the environment that QEMU
55
Qcow2COWRegion cow_end;
94
+ * PCI device models need when running in a remote process.
56
95
+ *
57
+ /*
96
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
58
+ * Indicates that COW regions are already handled and do not require
97
+ *
59
+ * any more processing.
98
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
60
+ */
99
+ * See the COPYING file in the top-level directory.
61
+ bool skip_cow;
100
+ *
101
+ */
62
+
102
+
63
/**
103
+#include "qemu/osdep.h"
64
* The I/O vector with the data from the actual guest write request.
104
+#include "qemu-common.h"
65
* If non-NULL, this is meant to be merged together with the data
66
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
67
index XXXXXXX..XXXXXXX 100644
68
--- a/block/qcow2-cluster.c
69
+++ b/block/qcow2-cluster.c
70
@@ -XXX,XX +XXX,XX @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
71
assert(start->offset + start->nb_bytes <= end->offset);
72
assert(!m->data_qiov || m->data_qiov->size == data_bytes);
73
74
- if (start->nb_bytes == 0 && end->nb_bytes == 0) {
75
+ if ((start->nb_bytes == 0 && end->nb_bytes == 0) || m->skip_cow) {
76
return 0;
77
}
78
79
diff --git a/block/qcow2.c b/block/qcow2.c
80
index XXXXXXX..XXXXXXX 100644
81
--- a/block/qcow2.c
82
+++ b/block/qcow2.c
83
@@ -XXX,XX +XXX,XX @@ static bool merge_cow(uint64_t offset, unsigned bytes,
84
continue;
85
}
86
87
+ /* If COW regions are handled already, skip this too */
88
+ if (m->skip_cow) {
89
+ continue;
90
+ }
91
+
105
+
92
/* The data (middle) region must be immediately after the
106
+#include "hw/pci/pci.h"
93
* start region */
107
+#include "hw/pci/pci_host.h"
94
if (l2meta_cow_start(m) + m->cow_start.nb_bytes != offset) {
108
+#include "hw/pci/pcie_host.h"
95
@@ -XXX,XX +XXX,XX @@ static bool merge_cow(uint64_t offset, unsigned bytes,
109
+#include "hw/qdev-properties.h"
96
return false;
110
+#include "hw/pci-host/remote.h"
97
}
111
+#include "exec/memory.h"
98
112
+
99
+static bool is_unallocated(BlockDriverState *bs, int64_t offset, int64_t bytes)
113
+static const char *remote_pcihost_root_bus_path(PCIHostState *host_bridge,
114
+ PCIBus *rootbus)
100
+{
115
+{
101
+ int64_t nr;
116
+ return "0000:00";
102
+ return !bytes ||
103
+ (!bdrv_is_allocated_above(bs, NULL, offset, bytes, &nr) && nr == bytes);
104
+}
117
+}
105
+
118
+
106
+static bool is_zero_cow(BlockDriverState *bs, QCowL2Meta *m)
119
+static void remote_pcihost_realize(DeviceState *dev, Error **errp)
107
+{
120
+{
108
+ /*
121
+ PCIHostState *pci = PCI_HOST_BRIDGE(dev);
109
+ * This check is designed for optimization shortcut so it must be
122
+ RemotePCIHost *s = REMOTE_PCIHOST(dev);
110
+ * efficient.
123
+
111
+ * Instead of is_zero(), use is_unallocated() as it is faster (but not
124
+ pci->bus = pci_root_bus_new(DEVICE(s), "remote-pci",
112
+ * as accurate and can result in false negatives).
125
+ s->mr_pci_mem, s->mr_sys_io,
113
+ */
126
+ 0, TYPE_PCIE_BUS);
114
+ return is_unallocated(bs, m->offset + m->cow_start.offset,
115
+ m->cow_start.nb_bytes) &&
116
+ is_unallocated(bs, m->offset + m->cow_end.offset,
117
+ m->cow_end.nb_bytes);
118
+}
127
+}
119
+
128
+
120
+static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
129
+static void remote_pcihost_class_init(ObjectClass *klass, void *data)
121
+{
130
+{
122
+ BDRVQcow2State *s = bs->opaque;
131
+ DeviceClass *dc = DEVICE_CLASS(klass);
123
+ QCowL2Meta *m;
132
+ PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
124
+
133
+
125
+ if (!(s->data_file->bs->supported_zero_flags & BDRV_REQ_NO_FALLBACK)) {
134
+ hc->root_bus_path = remote_pcihost_root_bus_path;
126
+ return 0;
135
+ dc->realize = remote_pcihost_realize;
127
+ }
128
+
136
+
129
+ if (bs->encrypted) {
137
+ dc->user_creatable = false;
130
+ return 0;
138
+ set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
131
+ }
139
+ dc->fw_name = "pci";
132
+
133
+ for (m = l2meta; m != NULL; m = m->next) {
134
+ int ret;
135
+
136
+ if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
137
+ continue;
138
+ }
139
+
140
+ if (!is_zero_cow(bs, m)) {
141
+ continue;
142
+ }
143
+
144
+ /*
145
+ * instead of writing zero COW buffers,
146
+ * efficiently zero out the whole clusters
147
+ */
148
+
149
+ ret = qcow2_pre_write_overlap_check(bs, 0, m->alloc_offset,
150
+ m->nb_clusters * s->cluster_size,
151
+ true);
152
+ if (ret < 0) {
153
+ return ret;
154
+ }
155
+
156
+ BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_SPACE);
157
+ ret = bdrv_co_pwrite_zeroes(s->data_file, m->alloc_offset,
158
+ m->nb_clusters * s->cluster_size,
159
+ BDRV_REQ_NO_FALLBACK);
160
+ if (ret < 0) {
161
+ if (ret != -ENOTSUP && ret != -EAGAIN) {
162
+ return ret;
163
+ }
164
+ continue;
165
+ }
166
+
167
+ trace_qcow2_skip_cow(qemu_coroutine_self(), m->offset, m->nb_clusters);
168
+ m->skip_cow = true;
169
+ }
170
+ return 0;
171
+}
140
+}
172
+
141
+
173
static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
142
+static const TypeInfo remote_pcihost_info = {
174
uint64_t bytes, QEMUIOVector *qiov,
143
+ .name = TYPE_REMOTE_PCIHOST,
175
int flags)
144
+ .parent = TYPE_PCIE_HOST_BRIDGE,
176
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
145
+ .instance_size = sizeof(RemotePCIHost),
177
qemu_iovec_add(&hd_qiov, cluster_data, cur_bytes);
146
+ .class_init = remote_pcihost_class_init,
178
}
147
+};
179
180
+ /* Try to efficiently initialize the physical space with zeroes */
181
+ ret = handle_alloc_space(bs, l2meta);
182
+ if (ret < 0) {
183
+ goto out_unlocked;
184
+ }
185
+
148
+
186
/* If we need to do COW, check if it's possible to merge the
149
+static void remote_pcihost_register(void)
187
* writing of the guest data together with that of the COW regions.
150
+{
188
* If it's not possible (or not necessary) then write the
151
+ type_register_static(&remote_pcihost_info);
189
diff --git a/block/trace-events b/block/trace-events
152
+}
153
+
154
+type_init(remote_pcihost_register)
155
diff --git a/hw/pci-host/Kconfig b/hw/pci-host/Kconfig
190
index XXXXXXX..XXXXXXX 100644
156
index XXXXXXX..XXXXXXX 100644
191
--- a/block/trace-events
157
--- a/hw/pci-host/Kconfig
192
+++ b/block/trace-events
158
+++ b/hw/pci-host/Kconfig
193
@@ -XXX,XX +XXX,XX @@ qcow2_writev_done_part(void *co, int cur_bytes) "co %p cur_bytes %d"
159
@@ -XXX,XX +XXX,XX @@ config PCI_POWERNV
194
qcow2_writev_data(void *co, uint64_t offset) "co %p offset 0x%" PRIx64
160
select PCI_EXPRESS
195
qcow2_pwrite_zeroes_start_req(void *co, int64_t offset, int count) "co %p offset 0x%" PRIx64 " count %d"
161
select MSI_NONBROKEN
196
qcow2_pwrite_zeroes(void *co, int64_t offset, int count) "co %p offset 0x%" PRIx64 " count %d"
162
select PCIE_PORT
197
+qcow2_skip_cow(void *co, uint64_t offset, int nb_clusters) "co %p offset 0x%" PRIx64 " nb_clusters %d"
198
199
# qcow2-cluster.c
200
qcow2_alloc_clusters_offset(void *co, uint64_t offset, int bytes) "co %p offset 0x%" PRIx64 " bytes %d"
201
diff --git a/tests/qemu-iotests/060 b/tests/qemu-iotests/060
202
index XXXXXXX..XXXXXXX 100755
203
--- a/tests/qemu-iotests/060
204
+++ b/tests/qemu-iotests/060
205
@@ -XXX,XX +XXX,XX @@ $QEMU_IO -c "$OPEN_RO" -c "read -P 1 0 512" | _filter_qemu_io
206
echo
207
echo "=== Testing overlap while COW is in flight ==="
208
echo
209
+BACKING_IMG=$TEST_IMG.base
210
+TEST_IMG=$BACKING_IMG _make_test_img 1G
211
+
163
+
212
+$QEMU_IO -c 'write 0k 64k' "$BACKING_IMG" | _filter_qemu_io
164
+config REMOTE_PCIHOST
213
+
165
+ bool
214
# compat=0.10 is required in order to make the following discard actually
166
diff --git a/hw/pci-host/meson.build b/hw/pci-host/meson.build
215
# unallocate the sector rather than make it a zero sector - we want COW, after
216
# all.
217
-IMGOPTS='compat=0.10' _make_test_img 1G
218
+IMGOPTS='compat=0.10' _make_test_img -b "$BACKING_IMG" 1G
219
# Write two clusters, the second one enforces creation of an L2 table after
220
# the first data cluster.
221
$QEMU_IO -c 'write 0k 64k' -c 'write 512M 64k' "$TEST_IMG" | _filter_qemu_io
222
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
223
index XXXXXXX..XXXXXXX 100644
167
index XXXXXXX..XXXXXXX 100644
224
--- a/tests/qemu-iotests/060.out
168
--- a/hw/pci-host/meson.build
225
+++ b/tests/qemu-iotests/060.out
169
+++ b/hw/pci-host/meson.build
226
@@ -XXX,XX +XXX,XX @@ read 512/512 bytes at offset 0
170
@@ -XXX,XX +XXX,XX @@ pci_ss.add(when: 'CONFIG_PCI_EXPRESS_XILINX', if_true: files('xilinx-pcie.c'))
227
171
pci_ss.add(when: 'CONFIG_PCI_I440FX', if_true: files('i440fx.c'))
228
=== Testing overlap while COW is in flight ===
172
pci_ss.add(when: 'CONFIG_PCI_SABRE', if_true: files('sabre.c'))
229
173
pci_ss.add(when: 'CONFIG_XEN_IGD_PASSTHROUGH', if_true: files('xen_igd_pt.c'))
230
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
174
+pci_ss.add(when: 'CONFIG_REMOTE_PCIHOST', if_true: files('remote.c'))
231
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=1073741824
175
232
+wrote 65536/65536 bytes at offset 0
176
# PPC devices
233
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
177
pci_ss.add(when: 'CONFIG_PREP_PCI', if_true: files('prep.c'))
234
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 backing_file=TEST_DIR/t.IMGFMT.base
178
diff --git a/hw/remote/Kconfig b/hw/remote/Kconfig
235
wrote 65536/65536 bytes at offset 0
179
index XXXXXXX..XXXXXXX 100644
236
64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
180
--- a/hw/remote/Kconfig
237
wrote 65536/65536 bytes at offset 536870912
181
+++ b/hw/remote/Kconfig
182
@@ -XXX,XX +XXX,XX @@
183
config MULTIPROCESS
184
bool
185
depends on PCI && KVM
186
+ select REMOTE_PCIHOST
238
--
187
--
239
2.21.0
188
2.29.2
240
189
241
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Jagannathan Raman <jag.raman@oracle.com>
2
2
3
Do encryption/decryption in threads, like it is already done for
3
x-remote-machine object sets up various subsystems of the remote
4
compression. This improves asynchronous encrypted io.
4
device process. Instantiate PCI host bridge object and initialize RAM, IO &
5
PCI memory regions.
5
6
6
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
7
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
7
Reviewed-by: Alberto Garcia <berto@igalia.com>
8
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
8
Reviewed-by: Max Reitz <mreitz@redhat.com>
9
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9
Message-id: 20190506142741.41731-9-vsementsov@virtuozzo.com
10
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Signed-off-by: Max Reitz <mreitz@redhat.com>
11
Message-id: c537f38d17f90453ca610c6b70cf3480274e0ba1.1611938319.git.jag.raman@oracle.com
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
13
---
12
block/qcow2.h | 8 ++++++
14
MAINTAINERS | 2 ++
13
block/qcow2-cluster.c | 7 ++---
15
include/hw/pci-host/remote.h | 1 +
14
block/qcow2-threads.c | 65 +++++++++++++++++++++++++++++++++++++++++--
16
include/hw/remote/machine.h | 27 ++++++++++++++
15
block/qcow2.c | 22 +++++----------
17
hw/remote/machine.c | 70 ++++++++++++++++++++++++++++++++++++
16
4 files changed, 81 insertions(+), 21 deletions(-)
18
hw/meson.build | 1 +
19
hw/remote/meson.build | 5 +++
20
6 files changed, 106 insertions(+)
21
create mode 100644 include/hw/remote/machine.h
22
create mode 100644 hw/remote/machine.c
23
create mode 100644 hw/remote/meson.build
17
24
18
diff --git a/block/qcow2.h b/block/qcow2.h
25
diff --git a/MAINTAINERS b/MAINTAINERS
19
index XXXXXXX..XXXXXXX 100644
26
index XXXXXXX..XXXXXXX 100644
20
--- a/block/qcow2.h
27
--- a/MAINTAINERS
21
+++ b/block/qcow2.h
28
+++ b/MAINTAINERS
22
@@ -XXX,XX +XXX,XX @@ typedef struct Qcow2BitmapHeaderExt {
29
@@ -XXX,XX +XXX,XX @@ F: docs/devel/multi-process.rst
23
uint64_t bitmap_directory_offset;
30
F: docs/system/multi-process.rst
24
} QEMU_PACKED Qcow2BitmapHeaderExt;
31
F: hw/pci-host/remote.c
25
32
F: include/hw/pci-host/remote.h
26
+#define QCOW2_MAX_THREADS 4
33
+F: hw/remote/machine.c
27
+
34
+F: include/hw/remote/machine.h
28
typedef struct BDRVQcow2State {
35
29
int cluster_bits;
36
Build and test automation
30
int cluster_size;
37
-------------------------
31
@@ -XXX,XX +XXX,XX @@ qcow2_co_compress(BlockDriverState *bs, void *dest, size_t dest_size,
38
diff --git a/include/hw/pci-host/remote.h b/include/hw/pci-host/remote.h
32
ssize_t coroutine_fn
39
index XXXXXXX..XXXXXXX 100644
33
qcow2_co_decompress(BlockDriverState *bs, void *dest, size_t dest_size,
40
--- a/include/hw/pci-host/remote.h
34
const void *src, size_t src_size);
41
+++ b/include/hw/pci-host/remote.h
35
+int coroutine_fn
42
@@ -XXX,XX +XXX,XX @@ struct RemotePCIHost {
36
+qcow2_co_encrypt(BlockDriverState *bs, uint64_t file_cluster_offset,
43
37
+ uint64_t offset, void *buf, size_t len);
44
MemoryRegion *mr_pci_mem;
38
+int coroutine_fn
45
MemoryRegion *mr_sys_io;
39
+qcow2_co_decrypt(BlockDriverState *bs, uint64_t file_cluster_offset,
46
+ MemoryRegion *mr_sys_mem;
40
+ uint64_t offset, void *buf, size_t len);
47
};
41
48
42
#endif
49
#endif
43
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
50
diff --git a/include/hw/remote/machine.h b/include/hw/remote/machine.h
44
index XXXXXXX..XXXXXXX 100644
51
new file mode 100644
45
--- a/block/qcow2-cluster.c
52
index XXXXXXX..XXXXXXX
46
+++ b/block/qcow2-cluster.c
53
--- /dev/null
47
@@ -XXX,XX +XXX,XX @@ static bool coroutine_fn do_perform_cow_encrypt(BlockDriverState *bs,
54
+++ b/include/hw/remote/machine.h
48
{
49
if (bytes && bs->encrypted) {
50
BDRVQcow2State *s = bs->opaque;
51
- int64_t offset = (s->crypt_physical_offset ?
52
- (cluster_offset + offset_in_cluster) :
53
- (src_cluster_offset + offset_in_cluster));
54
assert((offset_in_cluster & ~BDRV_SECTOR_MASK) == 0);
55
assert((bytes & ~BDRV_SECTOR_MASK) == 0);
56
assert(s->crypto);
57
- if (qcrypto_block_encrypt(s->crypto, offset, buffer, bytes, NULL) < 0) {
58
+ if (qcow2_co_encrypt(bs, cluster_offset,
59
+ src_cluster_offset + offset_in_cluster,
60
+ buffer, bytes) < 0) {
61
return false;
62
}
63
}
64
diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
65
index XXXXXXX..XXXXXXX 100644
66
--- a/block/qcow2-threads.c
67
+++ b/block/qcow2-threads.c
68
@@ -XXX,XX +XXX,XX @@
55
@@ -XXX,XX +XXX,XX @@
69
70
#include "qcow2.h"
71
#include "block/thread-pool.h"
72
-
73
-#define QCOW2_MAX_THREADS 4
74
+#include "crypto.h"
75
76
static int coroutine_fn
77
qcow2_co_process(BlockDriverState *bs, ThreadPoolFunc *func, void *arg)
78
@@ -XXX,XX +XXX,XX @@ qcow2_co_decompress(BlockDriverState *bs, void *dest, size_t dest_size,
79
return qcow2_co_do_compress(bs, dest, dest_size, src, src_size,
80
qcow2_decompress);
81
}
82
+
83
+
84
+/*
56
+/*
85
+ * Cryptography
57
+ * Remote machine configuration
58
+ *
59
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
60
+ *
61
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
62
+ * See the COPYING file in the top-level directory.
63
+ *
86
+ */
64
+ */
87
+
65
+
66
+#ifndef REMOTE_MACHINE_H
67
+#define REMOTE_MACHINE_H
68
+
69
+#include "qom/object.h"
70
+#include "hw/boards.h"
71
+#include "hw/pci-host/remote.h"
72
+
73
+struct RemoteMachineState {
74
+ MachineState parent_obj;
75
+
76
+ RemotePCIHost *host;
77
+};
78
+
79
+#define TYPE_REMOTE_MACHINE "x-remote-machine"
80
+OBJECT_DECLARE_SIMPLE_TYPE(RemoteMachineState, REMOTE_MACHINE)
81
+
82
+#endif
83
diff --git a/hw/remote/machine.c b/hw/remote/machine.c
84
new file mode 100644
85
index XXXXXXX..XXXXXXX
86
--- /dev/null
87
+++ b/hw/remote/machine.c
88
@@ -XXX,XX +XXX,XX @@
88
+/*
89
+/*
89
+ * Qcow2EncDecFunc: common prototype of qcrypto_block_encrypt() and
90
+ * Machine for remote device
90
+ * qcrypto_block_decrypt() functions.
91
+ *
92
+ * This machine type is used by the remote device process in multi-process
93
+ * QEMU. QEMU device models depend on parent busses, interrupt controllers,
94
+ * memory regions, etc. The remote machine type offers this environment so
95
+ * that QEMU device models can be used as remote devices.
96
+ *
97
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
98
+ *
99
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
100
+ * See the COPYING file in the top-level directory.
101
+ *
91
+ */
102
+ */
92
+typedef int (*Qcow2EncDecFunc)(QCryptoBlock *block, uint64_t offset,
93
+ uint8_t *buf, size_t len, Error **errp);
94
+
103
+
95
+typedef struct Qcow2EncDecData {
104
+#include "qemu/osdep.h"
96
+ QCryptoBlock *block;
105
+#include "qemu-common.h"
97
+ uint64_t offset;
98
+ uint8_t *buf;
99
+ size_t len;
100
+
106
+
101
+ Qcow2EncDecFunc func;
107
+#include "hw/remote/machine.h"
102
+} Qcow2EncDecData;
108
+#include "exec/address-spaces.h"
109
+#include "exec/memory.h"
110
+#include "qapi/error.h"
103
+
111
+
104
+static int qcow2_encdec_pool_func(void *opaque)
112
+static void remote_machine_init(MachineState *machine)
105
+{
113
+{
106
+ Qcow2EncDecData *data = opaque;
114
+ MemoryRegion *system_memory, *system_io, *pci_memory;
115
+ RemoteMachineState *s = REMOTE_MACHINE(machine);
116
+ RemotePCIHost *rem_host;
107
+
117
+
108
+ return data->func(data->block, data->offset, data->buf, data->len, NULL);
118
+ system_memory = get_system_memory();
119
+ system_io = get_system_io();
120
+
121
+ pci_memory = g_new(MemoryRegion, 1);
122
+ memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
123
+
124
+ rem_host = REMOTE_PCIHOST(qdev_new(TYPE_REMOTE_PCIHOST));
125
+
126
+ rem_host->mr_pci_mem = pci_memory;
127
+ rem_host->mr_sys_mem = system_memory;
128
+ rem_host->mr_sys_io = system_io;
129
+
130
+ s->host = rem_host;
131
+
132
+ object_property_add_child(OBJECT(s), "remote-pcihost", OBJECT(rem_host));
133
+ memory_region_add_subregion_overlap(system_memory, 0x0, pci_memory, -1);
134
+
135
+ qdev_realize(DEVICE(rem_host), sysbus_get_default(), &error_fatal);
109
+}
136
+}
110
+
137
+
111
+static int coroutine_fn
138
+static void remote_machine_class_init(ObjectClass *oc, void *data)
112
+qcow2_co_encdec(BlockDriverState *bs, uint64_t file_cluster_offset,
113
+ uint64_t offset, void *buf, size_t len, Qcow2EncDecFunc func)
114
+{
139
+{
115
+ BDRVQcow2State *s = bs->opaque;
140
+ MachineClass *mc = MACHINE_CLASS(oc);
116
+ Qcow2EncDecData arg = {
117
+ .block = s->crypto,
118
+ .offset = s->crypt_physical_offset ?
119
+ file_cluster_offset + offset_into_cluster(s, offset) :
120
+ offset,
121
+ .buf = buf,
122
+ .len = len,
123
+ .func = func,
124
+ };
125
+
141
+
126
+ return qcow2_co_process(bs, qcow2_encdec_pool_func, &arg);
142
+ mc->init = remote_machine_init;
143
+ mc->desc = "Experimental remote machine";
127
+}
144
+}
128
+
145
+
129
+int coroutine_fn
146
+static const TypeInfo remote_machine = {
130
+qcow2_co_encrypt(BlockDriverState *bs, uint64_t file_cluster_offset,
147
+ .name = TYPE_REMOTE_MACHINE,
131
+ uint64_t offset, void *buf, size_t len)
148
+ .parent = TYPE_MACHINE,
149
+ .instance_size = sizeof(RemoteMachineState),
150
+ .class_init = remote_machine_class_init,
151
+};
152
+
153
+static void remote_machine_register_types(void)
132
+{
154
+{
133
+ return qcow2_co_encdec(bs, file_cluster_offset, offset, buf, len,
155
+ type_register_static(&remote_machine);
134
+ qcrypto_block_encrypt);
135
+}
156
+}
136
+
157
+
137
+int coroutine_fn
158
+type_init(remote_machine_register_types);
138
+qcow2_co_decrypt(BlockDriverState *bs, uint64_t file_cluster_offset,
159
diff --git a/hw/meson.build b/hw/meson.build
139
+ uint64_t offset, void *buf, size_t len)
140
+{
141
+ return qcow2_co_encdec(bs, file_cluster_offset, offset, buf, len,
142
+ qcrypto_block_decrypt);
143
+}
144
diff --git a/block/qcow2.c b/block/qcow2.c
145
index XXXXXXX..XXXXXXX 100644
160
index XXXXXXX..XXXXXXX 100644
146
--- a/block/qcow2.c
161
--- a/hw/meson.build
147
+++ b/block/qcow2.c
162
+++ b/hw/meson.build
148
@@ -XXX,XX +XXX,XX @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
163
@@ -XXX,XX +XXX,XX @@ subdir('moxie')
149
}
164
subdir('nios2')
150
s->crypto = qcrypto_block_open(s->crypto_opts, "encrypt.",
165
subdir('openrisc')
151
qcow2_crypto_hdr_read_func,
166
subdir('ppc')
152
- bs, cflags, 1, errp);
167
+subdir('remote')
153
+ bs, cflags, QCOW2_MAX_THREADS, errp);
168
subdir('riscv')
154
if (!s->crypto) {
169
subdir('rx')
155
return -EINVAL;
170
subdir('s390x')
156
}
171
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
157
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
172
new file mode 100644
158
cflags |= QCRYPTO_BLOCK_OPEN_NO_IO;
173
index XXXXXXX..XXXXXXX
159
}
174
--- /dev/null
160
s->crypto = qcrypto_block_open(s->crypto_opts, "encrypt.",
175
+++ b/hw/remote/meson.build
161
- NULL, NULL, cflags, 1, errp);
176
@@ -XXX,XX +XXX,XX @@
162
+ NULL, NULL, cflags,
177
+remote_ss = ss.source_set()
163
+ QCOW2_MAX_THREADS, errp);
178
+
164
if (!s->crypto) {
179
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('machine.c'))
165
ret = -EINVAL;
180
+
166
goto fail;
181
+softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
167
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset,
168
assert(s->crypto);
169
assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
170
assert((cur_bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
171
- if (qcrypto_block_decrypt(s->crypto,
172
- (s->crypt_physical_offset ?
173
- cluster_offset + offset_in_cluster :
174
- offset),
175
- cluster_data,
176
- cur_bytes,
177
- NULL) < 0) {
178
+ if (qcow2_co_decrypt(bs, cluster_offset, offset,
179
+ cluster_data, cur_bytes) < 0) {
180
ret = -EIO;
181
goto fail;
182
}
183
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
184
QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
185
qemu_iovec_to_buf(&hd_qiov, 0, cluster_data, hd_qiov.size);
186
187
- if (qcrypto_block_encrypt(s->crypto,
188
- (s->crypt_physical_offset ?
189
- cluster_offset + offset_in_cluster :
190
- offset),
191
- cluster_data,
192
- cur_bytes, NULL) < 0) {
193
+ if (qcow2_co_encrypt(bs, cluster_offset, offset,
194
+ cluster_data, cur_bytes) < 0) {
195
ret = -EIO;
196
goto out_unlocked;
197
}
198
--
182
--
199
2.21.0
183
2.29.2
200
184
201
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
2
2
3
Move generic part out of qcow2_co_do_compress, to reuse it for
3
Adds qio_channel_writev_full_all() to transmit both data and FDs.
4
encryption and rename things that would be shared with encryption path.
4
Refactors existing code to use this helper.
5
5
6
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
6
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
7
Reviewed-by: Alberto Garcia <berto@igalia.com>
7
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
8
Reviewed-by: Max Reitz <mreitz@redhat.com>
8
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
9
Message-id: 20190506142741.41731-6-vsementsov@virtuozzo.com
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Signed-off-by: Max Reitz <mreitz@redhat.com>
10
Acked-by: Daniel P. Berrangé <berrange@redhat.com>
11
Message-id: 480fbf1fe4152495d60596c9b665124549b426a5.1611938319.git.jag.raman@oracle.com
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
13
---
12
block/qcow2.h | 4 ++--
14
include/io/channel.h | 25 +++++++++++++++++++++++++
13
block/qcow2-threads.c | 47 ++++++++++++++++++++++++++++---------------
15
io/channel.c | 15 ++++++++++++++-
14
block/qcow2.c | 2 +-
16
2 files changed, 39 insertions(+), 1 deletion(-)
15
3 files changed, 34 insertions(+), 19 deletions(-)
16
17
17
diff --git a/block/qcow2.h b/block/qcow2.h
18
diff --git a/include/io/channel.h b/include/io/channel.h
18
index XXXXXXX..XXXXXXX 100644
19
index XXXXXXX..XXXXXXX 100644
19
--- a/block/qcow2.h
20
--- a/include/io/channel.h
20
+++ b/block/qcow2.h
21
+++ b/include/io/channel.h
21
@@ -XXX,XX +XXX,XX @@ typedef struct BDRVQcow2State {
22
@@ -XXX,XX +XXX,XX @@ void qio_channel_set_aio_fd_handler(QIOChannel *ioc,
22
char *image_backing_format;
23
IOHandler *io_write,
23
char *image_data_file;
24
void *opaque);
24
25
25
- CoQueue compress_wait_queue;
26
+/**
26
- int nb_compress_threads;
27
+ * qio_channel_writev_full_all:
27
+ CoQueue thread_task_queue;
28
+ * @ioc: the channel object
28
+ int nb_threads;
29
+ * @iov: the array of memory regions to write data from
29
30
+ * @niov: the length of the @iov array
30
BdrvChild *data_file;
31
+ * @fds: an array of file handles to send
31
} BDRVQcow2State;
32
+ * @nfds: number of file handles in @fds
32
diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
33
+ * @errp: pointer to a NULL-initialized error object
34
+ *
35
+ *
36
+ * Behaves like qio_channel_writev_full but will attempt
37
+ * to send all data passed (file handles and memory regions).
38
+ * The function will wait for all requested data
39
+ * to be written, yielding from the current coroutine
40
+ * if required.
41
+ *
42
+ * Returns: 0 if all bytes were written, or -1 on error
43
+ */
44
+
45
+int qio_channel_writev_full_all(QIOChannel *ioc,
46
+ const struct iovec *iov,
47
+ size_t niov,
48
+ int *fds, size_t nfds,
49
+ Error **errp);
50
+
51
#endif /* QIO_CHANNEL_H */
52
diff --git a/io/channel.c b/io/channel.c
33
index XXXXXXX..XXXXXXX 100644
53
index XXXXXXX..XXXXXXX 100644
34
--- a/block/qcow2-threads.c
54
--- a/io/channel.c
35
+++ b/block/qcow2-threads.c
55
+++ b/io/channel.c
36
@@ -XXX,XX +XXX,XX @@
56
@@ -XXX,XX +XXX,XX @@ int qio_channel_writev_all(QIOChannel *ioc,
37
#include "qcow2.h"
57
const struct iovec *iov,
38
#include "block/thread-pool.h"
58
size_t niov,
39
59
Error **errp)
40
-#define MAX_COMPRESS_THREADS 4
41
+#define QCOW2_MAX_THREADS 4
42
+
43
+static int coroutine_fn
44
+qcow2_co_process(BlockDriverState *bs, ThreadPoolFunc *func, void *arg)
45
+{
60
+{
46
+ int ret;
61
+ return qio_channel_writev_full_all(ioc, iov, niov, NULL, 0, errp);
47
+ BDRVQcow2State *s = bs->opaque;
48
+ ThreadPool *pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
49
+
50
+ qemu_co_mutex_lock(&s->lock);
51
+ while (s->nb_threads >= QCOW2_MAX_THREADS) {
52
+ qemu_co_queue_wait(&s->thread_task_queue, &s->lock);
53
+ }
54
+ s->nb_threads++;
55
+ qemu_co_mutex_unlock(&s->lock);
56
+
57
+ ret = thread_pool_submit_co(pool, func, arg);
58
+
59
+ qemu_co_mutex_lock(&s->lock);
60
+ s->nb_threads--;
61
+ qemu_co_queue_next(&s->thread_task_queue);
62
+ qemu_co_mutex_unlock(&s->lock);
63
+
64
+ return ret;
65
+}
62
+}
66
+
63
+
64
+int qio_channel_writev_full_all(QIOChannel *ioc,
65
+ const struct iovec *iov,
66
+ size_t niov,
67
+ int *fds, size_t nfds,
68
+ Error **errp)
69
{
70
int ret = -1;
71
struct iovec *local_iov = g_new(struct iovec, niov);
72
@@ -XXX,XX +XXX,XX @@ int qio_channel_writev_all(QIOChannel *ioc,
73
74
while (nlocal_iov > 0) {
75
ssize_t len;
76
- len = qio_channel_writev(ioc, local_iov, nlocal_iov, errp);
77
+ len = qio_channel_writev_full(ioc, local_iov, nlocal_iov, fds, nfds,
78
+ errp);
79
if (len == QIO_CHANNEL_ERR_BLOCK) {
80
if (qemu_in_coroutine()) {
81
qio_channel_yield(ioc, G_IO_OUT);
82
@@ -XXX,XX +XXX,XX @@ int qio_channel_writev_all(QIOChannel *ioc,
83
}
84
85
iov_discard_front(&local_iov, &nlocal_iov, len);
67
+
86
+
68
+/*
87
+ fds = NULL;
69
+ * Compression
88
+ nfds = 0;
70
+ */
71
72
typedef ssize_t (*Qcow2CompressFunc)(void *dest, size_t dest_size,
73
const void *src, size_t src_size);
74
@@ -XXX,XX +XXX,XX @@ static ssize_t coroutine_fn
75
qcow2_co_do_compress(BlockDriverState *bs, void *dest, size_t dest_size,
76
const void *src, size_t src_size, Qcow2CompressFunc func)
77
{
78
- BDRVQcow2State *s = bs->opaque;
79
- ThreadPool *pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
80
Qcow2CompressData arg = {
81
.dest = dest,
82
.dest_size = dest_size,
83
@@ -XXX,XX +XXX,XX @@ qcow2_co_do_compress(BlockDriverState *bs, void *dest, size_t dest_size,
84
.func = func,
85
};
86
87
- qemu_co_mutex_lock(&s->lock);
88
- while (s->nb_compress_threads >= MAX_COMPRESS_THREADS) {
89
- qemu_co_queue_wait(&s->compress_wait_queue, &s->lock);
90
- }
91
- s->nb_compress_threads++;
92
- qemu_co_mutex_unlock(&s->lock);
93
-
94
- thread_pool_submit_co(pool, qcow2_compress_pool_func, &arg);
95
-
96
- qemu_co_mutex_lock(&s->lock);
97
- s->nb_compress_threads--;
98
- qemu_co_queue_next(&s->compress_wait_queue);
99
- qemu_co_mutex_unlock(&s->lock);
100
+ qcow2_co_process(bs, qcow2_compress_pool_func, &arg);
101
102
return arg.ret;
103
}
104
diff --git a/block/qcow2.c b/block/qcow2.c
105
index XXXXXXX..XXXXXXX 100644
106
--- a/block/qcow2.c
107
+++ b/block/qcow2.c
108
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
109
}
89
}
110
#endif
90
111
91
ret = 0;
112
- qemu_co_queue_init(&s->compress_wait_queue);
113
+ qemu_co_queue_init(&s->thread_task_queue);
114
115
return ret;
116
117
--
92
--
118
2.21.0
93
2.29.2
119
94
120
diff view generated by jsdifflib
1
From: Alberto Garcia <berto@igalia.com>
1
From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
2
2
3
bdrv_unref_child() does the following things:
3
Adds qio_channel_readv_full_all_eof() and qio_channel_readv_full_all()
4
4
to read both data and FDs. Refactors existing code to use these helpers.
5
- Updates the child->bs->inherits_from pointer.
5
6
- Calls bdrv_detach_child() to remove the BdrvChild from bs->children.
6
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
7
- Calls bdrv_unref() to unref the child BlockDriverState.
7
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
8
8
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
9
When bdrv_unref_child() was introduced in commit 33a604075c it was not
9
Acked-by: Daniel P. Berrangé <berrange@redhat.com>
10
used in bdrv_close() because the drivers that had additional children
10
Message-id: b059c4cc0fb741e794d644c144cc21372cad877d.1611938319.git.jag.raman@oracle.com
11
(like quorum or blkverify) had already called bdrv_unref() on their
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
children during their own close functions.
13
14
This was changed later (in 0bd6e91a7e for quorum, in 3e586be0b2 for
15
blkverify) so there's no reason not to use bdrv_unref_child() in
16
bdrv_close() anymore.
17
18
After this there's also no need to remove bs->backing and bs->file
19
separately from the rest of the children, so bdrv_close() can be
20
simplified.
21
22
Now bdrv_close() unrefs all children (before this patch it was only
23
bs->file and bs->backing). As a result, none of the callers of
24
brvd_attach_child() should remove their reference to child_bs (because
25
this function effectively steals that reference). This patch updates a
26
couple of tests that were doing their own bdrv_unref().
27
28
Signed-off-by: Alberto Garcia <berto@igalia.com>
29
Message-id: 6d1d5feaa53aa1ab127adb73d605dc4503e3abd5.1557754872.git.berto@igalia.com
30
[mreitz: s/where/were/]
31
Signed-off-by: Max Reitz <mreitz@redhat.com>
32
---
12
---
33
block.c | 16 +++-------------
13
include/io/channel.h | 53 +++++++++++++++++++++++
34
tests/test-bdrv-drain.c | 6 ------
14
io/channel.c | 101 ++++++++++++++++++++++++++++++++++---------
35
tests/test-bdrv-graph-mod.c | 1 -
15
2 files changed, 134 insertions(+), 20 deletions(-)
36
3 files changed, 3 insertions(+), 20 deletions(-)
16
37
17
diff --git a/include/io/channel.h b/include/io/channel.h
38
diff --git a/block.c b/block.c
39
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
40
--- a/block.c
19
--- a/include/io/channel.h
41
+++ b/block.c
20
+++ b/include/io/channel.h
42
@@ -XXX,XX +XXX,XX @@ static void bdrv_close(BlockDriverState *bs)
21
@@ -XXX,XX +XXX,XX @@ void qio_channel_set_aio_fd_handler(QIOChannel *ioc,
43
bs->drv = NULL;
22
IOHandler *io_write,
23
void *opaque);
24
25
+/**
26
+ * qio_channel_readv_full_all_eof:
27
+ * @ioc: the channel object
28
+ * @iov: the array of memory regions to read data to
29
+ * @niov: the length of the @iov array
30
+ * @fds: an array of file handles to read
31
+ * @nfds: number of file handles in @fds
32
+ * @errp: pointer to a NULL-initialized error object
33
+ *
34
+ *
35
+ * Performs same function as qio_channel_readv_all_eof.
36
+ * Additionally, attempts to read file descriptors shared
37
+ * over the channel. The function will wait for all
38
+ * requested data to be read, yielding from the current
39
+ * coroutine if required. data refers to both file
40
+ * descriptors and the iovs.
41
+ *
42
+ * Returns: 1 if all bytes were read, 0 if end-of-file
43
+ * occurs without data, or -1 on error
44
+ */
45
+
46
+int qio_channel_readv_full_all_eof(QIOChannel *ioc,
47
+ const struct iovec *iov,
48
+ size_t niov,
49
+ int **fds, size_t *nfds,
50
+ Error **errp);
51
+
52
+/**
53
+ * qio_channel_readv_full_all:
54
+ * @ioc: the channel object
55
+ * @iov: the array of memory regions to read data to
56
+ * @niov: the length of the @iov array
57
+ * @fds: an array of file handles to read
58
+ * @nfds: number of file handles in @fds
59
+ * @errp: pointer to a NULL-initialized error object
60
+ *
61
+ *
62
+ * Performs same function as qio_channel_readv_all_eof.
63
+ * Additionally, attempts to read file descriptors shared
64
+ * over the channel. The function will wait for all
65
+ * requested data to be read, yielding from the current
66
+ * coroutine if required. data refers to both file
67
+ * descriptors and the iovs.
68
+ *
69
+ * Returns: 0 if all bytes were read, or -1 on error
70
+ */
71
+
72
+int qio_channel_readv_full_all(QIOChannel *ioc,
73
+ const struct iovec *iov,
74
+ size_t niov,
75
+ int **fds, size_t *nfds,
76
+ Error **errp);
77
+
78
/**
79
* qio_channel_writev_full_all:
80
* @ioc: the channel object
81
diff --git a/io/channel.c b/io/channel.c
82
index XXXXXXX..XXXXXXX 100644
83
--- a/io/channel.c
84
+++ b/io/channel.c
85
@@ -XXX,XX +XXX,XX @@ int qio_channel_readv_all_eof(QIOChannel *ioc,
86
const struct iovec *iov,
87
size_t niov,
88
Error **errp)
89
+{
90
+ return qio_channel_readv_full_all_eof(ioc, iov, niov, NULL, NULL, errp);
91
+}
92
+
93
+int qio_channel_readv_all(QIOChannel *ioc,
94
+ const struct iovec *iov,
95
+ size_t niov,
96
+ Error **errp)
97
+{
98
+ return qio_channel_readv_full_all(ioc, iov, niov, NULL, NULL, errp);
99
+}
100
+
101
+int qio_channel_readv_full_all_eof(QIOChannel *ioc,
102
+ const struct iovec *iov,
103
+ size_t niov,
104
+ int **fds, size_t *nfds,
105
+ Error **errp)
106
{
107
int ret = -1;
108
struct iovec *local_iov = g_new(struct iovec, niov);
109
struct iovec *local_iov_head = local_iov;
110
unsigned int nlocal_iov = niov;
111
+ int **local_fds = fds;
112
+ size_t *local_nfds = nfds;
113
bool partial = false;
114
115
+ if (nfds) {
116
+ *nfds = 0;
117
+ }
118
+
119
+ if (fds) {
120
+ *fds = NULL;
121
+ }
122
+
123
nlocal_iov = iov_copy(local_iov, nlocal_iov,
124
iov, niov,
125
0, iov_size(iov, niov));
126
127
- while (nlocal_iov > 0) {
128
+ while ((nlocal_iov > 0) || local_fds) {
129
ssize_t len;
130
- len = qio_channel_readv(ioc, local_iov, nlocal_iov, errp);
131
+ len = qio_channel_readv_full(ioc, local_iov, nlocal_iov, local_fds,
132
+ local_nfds, errp);
133
if (len == QIO_CHANNEL_ERR_BLOCK) {
134
if (qemu_in_coroutine()) {
135
qio_channel_yield(ioc, G_IO_IN);
136
@@ -XXX,XX +XXX,XX @@ int qio_channel_readv_all_eof(QIOChannel *ioc,
137
qio_channel_wait(ioc, G_IO_IN);
138
}
139
continue;
140
- } else if (len < 0) {
141
- goto cleanup;
142
- } else if (len == 0) {
143
- if (partial) {
144
- error_setg(errp,
145
- "Unexpected end-of-file before all bytes were read");
146
- } else {
147
+ }
148
+
149
+ if (len == 0) {
150
+ if (local_nfds && *local_nfds) {
151
+ /*
152
+ * Got some FDs, but no data yet. This isn't an EOF
153
+ * scenario (yet), so carry on to try to read data
154
+ * on next loop iteration
155
+ */
156
+ goto next_iter;
157
+ } else if (!partial) {
158
+ /* No fds and no data - EOF before any data read */
159
ret = 0;
160
+ goto cleanup;
161
+ } else {
162
+ len = -1;
163
+ error_setg(errp,
164
+ "Unexpected end-of-file before all data were read");
165
+ /* Fallthrough into len < 0 handling */
166
+ }
167
+ }
168
+
169
+ if (len < 0) {
170
+ /* Close any FDs we previously received */
171
+ if (nfds && fds) {
172
+ size_t i;
173
+ for (i = 0; i < (*nfds); i++) {
174
+ close((*fds)[i]);
175
+ }
176
+ g_free(*fds);
177
+ *fds = NULL;
178
+ *nfds = 0;
179
}
180
goto cleanup;
181
}
182
183
+ if (nlocal_iov) {
184
+ iov_discard_front(&local_iov, &nlocal_iov, len);
185
+ }
186
+
187
+next_iter:
188
partial = true;
189
- iov_discard_front(&local_iov, &nlocal_iov, len);
190
+ local_fds = NULL;
191
+ local_nfds = NULL;
44
}
192
}
45
193
46
- bdrv_set_backing_hd(bs, NULL, &error_abort);
194
ret = 1;
47
-
195
@@ -XXX,XX +XXX,XX @@ int qio_channel_readv_all_eof(QIOChannel *ioc,
48
- if (bs->file != NULL) {
196
return ret;
49
- bdrv_unref_child(bs, bs->file);
197
}
50
- bs->file = NULL;
198
51
- }
199
-int qio_channel_readv_all(QIOChannel *ioc,
52
-
200
- const struct iovec *iov,
53
QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
201
- size_t niov,
54
- /* TODO Remove bdrv_unref() from drivers' close function and use
202
- Error **errp)
55
- * bdrv_unref_child() here */
203
+int qio_channel_readv_full_all(QIOChannel *ioc,
56
- if (child->bs->inherits_from == bs) {
204
+ const struct iovec *iov,
57
- child->bs->inherits_from = NULL;
205
+ size_t niov,
58
- }
206
+ int **fds, size_t *nfds,
59
- bdrv_detach_child(child);
207
+ Error **errp)
60
+ bdrv_unref_child(bs, child);
208
{
209
- int ret = qio_channel_readv_all_eof(ioc, iov, niov, errp);
210
+ int ret = qio_channel_readv_full_all_eof(ioc, iov, niov, fds, nfds, errp);
211
212
if (ret == 0) {
213
- ret = -1;
214
- error_setg(errp,
215
- "Unexpected end-of-file before all bytes were read");
216
- } else if (ret == 1) {
217
- ret = 0;
218
+ error_prepend(errp,
219
+ "Unexpected end-of-file before all data were read.");
220
+ return -1;
61
}
221
}
62
222
+ if (ret == 1) {
63
+ bs->backing = NULL;
223
+ return 0;
64
+ bs->file = NULL;
224
+ }
65
g_free(bs->opaque);
225
+
66
bs->opaque = NULL;
226
return ret;
67
atomic_set(&bs->copy_on_read, 0);
68
diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
69
index XXXXXXX..XXXXXXX 100644
70
--- a/tests/test-bdrv-drain.c
71
+++ b/tests/test-bdrv-drain.c
72
@@ -XXX,XX +XXX,XX @@ static void test_detach_indirect(bool by_parent_cb)
73
bdrv_unref(parent_b);
74
blk_unref(blk);
75
76
- /* XXX Once bdrv_close() unref's children instead of just detaching them,
77
- * this won't be necessary any more. */
78
- bdrv_unref(a);
79
- bdrv_unref(a);
80
- bdrv_unref(c);
81
-
82
g_assert_cmpint(a->refcnt, ==, 1);
83
g_assert_cmpint(b->refcnt, ==, 1);
84
g_assert_cmpint(c->refcnt, ==, 1);
85
diff --git a/tests/test-bdrv-graph-mod.c b/tests/test-bdrv-graph-mod.c
86
index XXXXXXX..XXXXXXX 100644
87
--- a/tests/test-bdrv-graph-mod.c
88
+++ b/tests/test-bdrv-graph-mod.c
89
@@ -XXX,XX +XXX,XX @@ static void test_update_perm_tree(void)
90
g_assert_nonnull(local_err);
91
error_free(local_err);
92
93
- bdrv_unref(bs);
94
blk_unref(root);
95
}
227
}
96
228
97
--
229
--
98
2.21.0
230
2.29.2
99
231
100
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
2
2
3
Move compression-on-threads to separate file. Encryption will be in it
3
Defines MPQemuMsg, which is the message that is sent to the remote
4
too.
4
process. This message is sent over QIOChannel and is used to
5
command the remote process to perform various tasks.
6
Define transmission functions used by proxy and by remote.
5
7
6
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
8
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
7
Reviewed-by: Alberto Garcia <berto@igalia.com>
9
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
8
Reviewed-by: Max Reitz <mreitz@redhat.com>
10
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9
Message-id: 20190506142741.41731-3-vsementsov@virtuozzo.com
11
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Signed-off-by: Max Reitz <mreitz@redhat.com>
12
Message-id: 56ca8bcf95195b2b195b08f6b9565b6d7410bce5.1611938319.git.jag.raman@oracle.com
13
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
14
---
12
block/Makefile.objs | 2 +-
15
MAINTAINERS | 2 +
13
block/qcow2.h | 7 ++
16
meson.build | 1 +
14
block/qcow2-threads.c | 201 ++++++++++++++++++++++++++++++++++++++++++
17
hw/remote/trace.h | 1 +
15
block/qcow2.c | 169 -----------------------------------
18
include/hw/remote/mpqemu-link.h | 63 ++++++++++
16
4 files changed, 209 insertions(+), 170 deletions(-)
19
include/sysemu/iothread.h | 6 +
17
create mode 100644 block/qcow2-threads.c
20
hw/remote/mpqemu-link.c | 205 ++++++++++++++++++++++++++++++++
21
iothread.c | 6 +
22
hw/remote/meson.build | 1 +
23
hw/remote/trace-events | 4 +
24
9 files changed, 289 insertions(+)
25
create mode 100644 hw/remote/trace.h
26
create mode 100644 include/hw/remote/mpqemu-link.h
27
create mode 100644 hw/remote/mpqemu-link.c
28
create mode 100644 hw/remote/trace-events
18
29
19
diff --git a/block/Makefile.objs b/block/Makefile.objs
30
diff --git a/MAINTAINERS b/MAINTAINERS
20
index XXXXXXX..XXXXXXX 100644
31
index XXXXXXX..XXXXXXX 100644
21
--- a/block/Makefile.objs
32
--- a/MAINTAINERS
22
+++ b/block/Makefile.objs
33
+++ b/MAINTAINERS
23
@@ -XXX,XX +XXX,XX @@ block-obj-$(CONFIG_BOCHS) += bochs.o
34
@@ -XXX,XX +XXX,XX @@ F: hw/pci-host/remote.c
24
block-obj-$(CONFIG_VVFAT) += vvfat.o
35
F: include/hw/pci-host/remote.h
25
block-obj-$(CONFIG_DMG) += dmg.o
36
F: hw/remote/machine.c
26
37
F: include/hw/remote/machine.h
27
-block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o qcow2-bitmap.o
38
+F: hw/remote/mpqemu-link.c
28
+block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o qcow2-bitmap.o qcow2-threads.o
39
+F: include/hw/remote/mpqemu-link.h
29
block-obj-$(CONFIG_QED) += qed.o qed-l2-cache.o qed-table.o qed-cluster.o
40
30
block-obj-$(CONFIG_QED) += qed-check.o
41
Build and test automation
31
block-obj-y += vhdx.o vhdx-endian.o vhdx-log.o
42
-------------------------
32
diff --git a/block/qcow2.h b/block/qcow2.h
43
diff --git a/meson.build b/meson.build
33
index XXXXXXX..XXXXXXX 100644
44
index XXXXXXX..XXXXXXX 100644
34
--- a/block/qcow2.h
45
--- a/meson.build
35
+++ b/block/qcow2.h
46
+++ b/meson.build
36
@@ -XXX,XX +XXX,XX @@ void qcow2_remove_persistent_dirty_bitmap(BlockDriverState *bs,
47
@@ -XXX,XX +XXX,XX @@ if have_system
37
const char *name,
48
'net',
38
Error **errp);
49
'softmmu',
39
50
'ui',
40
+ssize_t coroutine_fn
51
+ 'hw/remote',
41
+qcow2_co_compress(BlockDriverState *bs, void *dest, size_t dest_size,
52
]
42
+ const void *src, size_t src_size);
53
endif
43
+ssize_t coroutine_fn
54
trace_events_subdirs += [
44
+qcow2_co_decompress(BlockDriverState *bs, void *dest, size_t dest_size,
55
diff --git a/hw/remote/trace.h b/hw/remote/trace.h
45
+ const void *src, size_t src_size);
46
+
47
#endif
48
diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
49
new file mode 100644
56
new file mode 100644
50
index XXXXXXX..XXXXXXX
57
index XXXXXXX..XXXXXXX
51
--- /dev/null
58
--- /dev/null
52
+++ b/block/qcow2-threads.c
59
+++ b/hw/remote/trace.h
60
@@ -0,0 +1 @@
61
+#include "trace/trace-hw_remote.h"
62
diff --git a/include/hw/remote/mpqemu-link.h b/include/hw/remote/mpqemu-link.h
63
new file mode 100644
64
index XXXXXXX..XXXXXXX
65
--- /dev/null
66
+++ b/include/hw/remote/mpqemu-link.h
53
@@ -XXX,XX +XXX,XX @@
67
@@ -XXX,XX +XXX,XX @@
54
+/*
68
+/*
55
+ * Threaded data processing for Qcow2: compression, encryption
69
+ * Communication channel between QEMU and remote device process
56
+ *
70
+ *
57
+ * Copyright (c) 2004-2006 Fabrice Bellard
71
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
58
+ * Copyright (c) 2018 Virtuozzo International GmbH. All rights reserved.
72
+ *
59
+ *
73
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
60
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
74
+ * See the COPYING file in the top-level directory.
61
+ * of this software and associated documentation files (the "Software"), to deal
75
+ *
62
+ * in the Software without restriction, including without limitation the rights
76
+ */
63
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
77
+
64
+ * copies of the Software, and to permit persons to whom the Software is
78
+#ifndef MPQEMU_LINK_H
65
+ * furnished to do so, subject to the following conditions:
79
+#define MPQEMU_LINK_H
66
+ *
80
+
67
+ * The above copyright notice and this permission notice shall be included in
81
+#include "qom/object.h"
68
+ * all copies or substantial portions of the Software.
82
+#include "qemu/thread.h"
69
+ *
83
+#include "io/channel.h"
70
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
84
+
71
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
85
+#define REMOTE_MAX_FDS 8
72
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
86
+
73
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
87
+#define MPQEMU_MSG_HDR_SIZE offsetof(MPQemuMsg, data.u64)
74
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
88
+
75
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
89
+/**
76
+ * THE SOFTWARE.
90
+ * MPQemuCmd:
91
+ *
92
+ * MPQemuCmd enum type to specify the command to be executed on the remote
93
+ * device.
94
+ *
95
+ * This uses a private protocol between QEMU and the remote process. vfio-user
96
+ * protocol would supersede this in the future.
97
+ *
98
+ */
99
+typedef enum {
100
+ MPQEMU_CMD_MAX,
101
+} MPQemuCmd;
102
+
103
+/**
104
+ * MPQemuMsg:
105
+ * @cmd: The remote command
106
+ * @size: Size of the data to be shared
107
+ * @data: Structured data
108
+ * @fds: File descriptors to be shared with remote device
109
+ *
110
+ * MPQemuMsg Format of the message sent to the remote device from QEMU.
111
+ *
112
+ */
113
+typedef struct {
114
+ int cmd;
115
+ size_t size;
116
+
117
+ union {
118
+ uint64_t u64;
119
+ } data;
120
+
121
+ int fds[REMOTE_MAX_FDS];
122
+ int num_fds;
123
+} MPQemuMsg;
124
+
125
+bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp);
126
+bool mpqemu_msg_recv(MPQemuMsg *msg, QIOChannel *ioc, Error **errp);
127
+
128
+bool mpqemu_msg_valid(MPQemuMsg *msg);
129
+
130
+#endif
131
diff --git a/include/sysemu/iothread.h b/include/sysemu/iothread.h
132
index XXXXXXX..XXXXXXX 100644
133
--- a/include/sysemu/iothread.h
134
+++ b/include/sysemu/iothread.h
135
@@ -XXX,XX +XXX,XX @@ IOThread *iothread_create(const char *id, Error **errp);
136
void iothread_stop(IOThread *iothread);
137
void iothread_destroy(IOThread *iothread);
138
139
+/*
140
+ * Returns true if executing withing IOThread context,
141
+ * false otherwise.
142
+ */
143
+bool qemu_in_iothread(void);
144
+
145
#endif /* IOTHREAD_H */
146
diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
147
new file mode 100644
148
index XXXXXXX..XXXXXXX
149
--- /dev/null
150
+++ b/hw/remote/mpqemu-link.c
151
@@ -XXX,XX +XXX,XX @@
152
+/*
153
+ * Communication channel between QEMU and remote device process
154
+ *
155
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
156
+ *
157
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
158
+ * See the COPYING file in the top-level directory.
159
+ *
77
+ */
160
+ */
78
+
161
+
79
+#include "qemu/osdep.h"
162
+#include "qemu/osdep.h"
80
+
163
+#include "qemu-common.h"
81
+#define ZLIB_CONST
164
+
82
+#include <zlib.h>
165
+#include "qemu/module.h"
83
+
166
+#include "hw/remote/mpqemu-link.h"
84
+#include "qcow2.h"
167
+#include "qapi/error.h"
85
+#include "block/thread-pool.h"
168
+#include "qemu/iov.h"
86
+
169
+#include "qemu/error-report.h"
87
+#define MAX_COMPRESS_THREADS 4
170
+#include "qemu/main-loop.h"
88
+
171
+#include "io/channel.h"
89
+typedef ssize_t (*Qcow2CompressFunc)(void *dest, size_t dest_size,
172
+#include "sysemu/iothread.h"
90
+ const void *src, size_t src_size);
173
+#include "trace.h"
91
+typedef struct Qcow2CompressData {
174
+
92
+ void *dest;
175
+/*
93
+ size_t dest_size;
176
+ * Send message over the ioc QIOChannel.
94
+ const void *src;
177
+ * This function is safe to call from:
95
+ size_t src_size;
178
+ * - main loop in co-routine context. Will block the main loop if not in
96
+ ssize_t ret;
179
+ * co-routine context;
97
+
180
+ * - vCPU thread with no co-routine context and if the channel is not part
98
+ Qcow2CompressFunc func;
181
+ * of the main loop handling;
99
+} Qcow2CompressData;
182
+ * - IOThread within co-routine context, outside of co-routine context
100
+
183
+ * will block IOThread;
101
+/*
184
+ * Returns true if no errors were encountered, false otherwise.
102
+ * qcow2_compress()
185
+ */
103
+ *
186
+bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
104
+ * @dest - destination buffer, @dest_size bytes
187
+{
105
+ * @src - source buffer, @src_size bytes
188
+ ERRP_GUARD();
106
+ *
189
+ bool iolock = qemu_mutex_iothread_locked();
107
+ * Returns: compressed size on success
190
+ bool iothread = qemu_in_iothread();
108
+ * -ENOMEM destination buffer is not enough to store compressed data
191
+ struct iovec send[2] = {0};
109
+ * -EIO on any other error
192
+ int *fds = NULL;
110
+ */
193
+ size_t nfds = 0;
111
+static ssize_t qcow2_compress(void *dest, size_t dest_size,
194
+ bool ret = false;
112
+ const void *src, size_t src_size)
195
+
113
+{
196
+ send[0].iov_base = msg;
114
+ ssize_t ret;
197
+ send[0].iov_len = MPQEMU_MSG_HDR_SIZE;
115
+ z_stream strm;
198
+
116
+
199
+ send[1].iov_base = (void *)&msg->data;
117
+ /* best compression, small window, no zlib header */
200
+ send[1].iov_len = msg->size;
118
+ memset(&strm, 0, sizeof(strm));
201
+
119
+ ret = deflateInit2(&strm, Z_DEFAULT_COMPRESSION, Z_DEFLATED,
202
+ if (msg->num_fds) {
120
+ -12, 9, Z_DEFAULT_STRATEGY);
203
+ nfds = msg->num_fds;
121
+ if (ret != Z_OK) {
204
+ fds = msg->fds;
122
+ return -EIO;
123
+ }
205
+ }
124
+
206
+
125
+ /*
207
+ /*
126
+ * strm.next_in is not const in old zlib versions, such as those used on
208
+ * Dont use in IOThread out of co-routine context as
127
+ * OpenBSD/NetBSD, so cast the const away
209
+ * it will block IOThread.
128
+ */
210
+ */
129
+ strm.avail_in = src_size;
211
+ assert(qemu_in_coroutine() || !iothread);
130
+ strm.next_in = (void *) src;
212
+
131
+ strm.avail_out = dest_size;
213
+ /*
132
+ strm.next_out = dest;
214
+ * Skip unlocking/locking iothread lock when the IOThread is running
133
+
215
+ * in co-routine context. Co-routine context is asserted above
134
+ ret = deflate(&strm, Z_FINISH);
216
+ * for IOThread case.
135
+ if (ret == Z_STREAM_END) {
217
+ * Also skip lock handling while in a co-routine in the main context.
136
+ ret = dest_size - strm.avail_out;
218
+ */
219
+ if (iolock && !iothread && !qemu_in_coroutine()) {
220
+ qemu_mutex_unlock_iothread();
221
+ }
222
+
223
+ if (!qio_channel_writev_full_all(ioc, send, G_N_ELEMENTS(send),
224
+ fds, nfds, errp)) {
225
+ ret = true;
137
+ } else {
226
+ } else {
138
+ ret = (ret == Z_OK ? -ENOMEM : -EIO);
227
+ trace_mpqemu_send_io_error(msg->cmd, msg->size, nfds);
139
+ }
228
+ }
140
+
229
+
141
+ deflateEnd(&strm);
230
+ if (iolock && !iothread && !qemu_in_coroutine()) {
231
+ /* See above comment why skip locking here. */
232
+ qemu_mutex_lock_iothread();
233
+ }
142
+
234
+
143
+ return ret;
235
+ return ret;
144
+}
236
+}
145
+
237
+
146
+/*
238
+/*
147
+ * qcow2_decompress()
239
+ * Read message from the ioc QIOChannel.
148
+ *
240
+ * This function is safe to call from:
149
+ * Decompress some data (not more than @src_size bytes) to produce exactly
241
+ * - From main loop in co-routine context. Will block the main loop if not in
150
+ * @dest_size bytes.
242
+ * co-routine context;
151
+ *
243
+ * - From vCPU thread with no co-routine context and if the channel is not part
152
+ * @dest - destination buffer, @dest_size bytes
244
+ * of the main loop handling;
153
+ * @src - source buffer, @src_size bytes
245
+ * - From IOThread within co-routine context, outside of co-routine context
154
+ *
246
+ * will block IOThread;
155
+ * Returns: 0 on success
247
+ */
156
+ * -1 on fail
248
+static ssize_t mpqemu_read(QIOChannel *ioc, void *buf, size_t len, int **fds,
157
+ */
249
+ size_t *nfds, Error **errp)
158
+static ssize_t qcow2_decompress(void *dest, size_t dest_size,
250
+{
159
+ const void *src, size_t src_size)
251
+ ERRP_GUARD();
160
+{
252
+ struct iovec iov = { .iov_base = buf, .iov_len = len };
161
+ int ret = 0;
253
+ bool iolock = qemu_mutex_iothread_locked();
162
+ z_stream strm;
254
+ bool iothread = qemu_in_iothread();
163
+
255
+ int ret = -1;
164
+ memset(&strm, 0, sizeof(strm));
256
+
165
+ strm.avail_in = src_size;
257
+ /*
166
+ strm.next_in = (void *) src;
258
+ * Dont use in IOThread out of co-routine context as
167
+ strm.avail_out = dest_size;
259
+ * it will block IOThread.
168
+ strm.next_out = dest;
260
+ */
169
+
261
+ assert(qemu_in_coroutine() || !iothread);
170
+ ret = inflateInit2(&strm, -12);
262
+
171
+ if (ret != Z_OK) {
263
+ if (iolock && !iothread && !qemu_in_coroutine()) {
172
+ return -1;
264
+ qemu_mutex_unlock_iothread();
173
+ }
265
+ }
174
+
266
+
175
+ ret = inflate(&strm, Z_FINISH);
267
+ ret = qio_channel_readv_full_all_eof(ioc, &iov, 1, fds, nfds, errp);
176
+ if ((ret != Z_STREAM_END && ret != Z_BUF_ERROR) || strm.avail_out != 0) {
268
+
177
+ /*
269
+ if (iolock && !iothread && !qemu_in_coroutine()) {
178
+ * We approve Z_BUF_ERROR because we need @dest buffer to be filled, but
270
+ qemu_mutex_lock_iothread();
179
+ * @src buffer may be processed partly (because in qcow2 we know size of
271
+ }
180
+ * compressed data with precision of one sector)
272
+
181
+ */
273
+ return (ret <= 0) ? ret : iov.iov_len;
182
+ ret = -1;
274
+}
183
+ }
275
+
184
+
276
+bool mpqemu_msg_recv(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
185
+ inflateEnd(&strm);
277
+{
278
+ ERRP_GUARD();
279
+ g_autofree int *fds = NULL;
280
+ size_t nfds = 0;
281
+ ssize_t len;
282
+ bool ret = false;
283
+
284
+ len = mpqemu_read(ioc, msg, MPQEMU_MSG_HDR_SIZE, &fds, &nfds, errp);
285
+ if (len <= 0) {
286
+ goto fail;
287
+ } else if (len != MPQEMU_MSG_HDR_SIZE) {
288
+ error_setg(errp, "Message header corrupted");
289
+ goto fail;
290
+ }
291
+
292
+ if (msg->size > sizeof(msg->data)) {
293
+ error_setg(errp, "Invalid size for message");
294
+ goto fail;
295
+ }
296
+
297
+ if (!msg->size) {
298
+ goto copy_fds;
299
+ }
300
+
301
+ len = mpqemu_read(ioc, &msg->data, msg->size, NULL, NULL, errp);
302
+ if (len <= 0) {
303
+ goto fail;
304
+ }
305
+ if (len != msg->size) {
306
+ error_setg(errp, "Unable to read full message");
307
+ goto fail;
308
+ }
309
+
310
+copy_fds:
311
+ msg->num_fds = nfds;
312
+ if (nfds > G_N_ELEMENTS(msg->fds)) {
313
+ error_setg(errp,
314
+ "Overflow error: received %zu fds, more than max of %d fds",
315
+ nfds, REMOTE_MAX_FDS);
316
+ goto fail;
317
+ }
318
+ if (nfds) {
319
+ memcpy(msg->fds, fds, nfds * sizeof(int));
320
+ }
321
+
322
+ ret = true;
323
+
324
+fail:
325
+ if (*errp) {
326
+ trace_mpqemu_recv_io_error(msg->cmd, msg->size, nfds);
327
+ }
328
+ while (*errp && nfds) {
329
+ close(fds[nfds - 1]);
330
+ nfds--;
331
+ }
186
+
332
+
187
+ return ret;
333
+ return ret;
188
+}
334
+}
189
+
335
+
190
+static int qcow2_compress_pool_func(void *opaque)
336
+bool mpqemu_msg_valid(MPQemuMsg *msg)
191
+{
337
+{
192
+ Qcow2CompressData *data = opaque;
338
+ if (msg->cmd >= MPQEMU_CMD_MAX && msg->cmd < 0) {
193
+
339
+ return false;
194
+ data->ret = data->func(data->dest, data->dest_size,
340
+ }
195
+ data->src, data->src_size);
341
+
196
+
342
+ /* Verify FDs. */
197
+ return 0;
343
+ if (msg->num_fds >= REMOTE_MAX_FDS) {
198
+}
344
+ return false;
199
+
345
+ }
200
+static void qcow2_compress_complete(void *opaque, int ret)
346
+
201
+{
347
+ if (msg->num_fds > 0) {
202
+ qemu_coroutine_enter(opaque);
348
+ for (int i = 0; i < msg->num_fds; i++) {
203
+}
349
+ if (fcntl(msg->fds[i], F_GETFL) == -1) {
204
+
350
+ return false;
205
+static ssize_t coroutine_fn
351
+ }
206
+qcow2_co_do_compress(BlockDriverState *bs, void *dest, size_t dest_size,
352
+ }
207
+ const void *src, size_t src_size, Qcow2CompressFunc func)
353
+ }
208
+{
354
+
209
+ BDRVQcow2State *s = bs->opaque;
355
+ return true;
210
+ BlockAIOCB *acb;
356
+}
211
+ ThreadPool *pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
357
diff --git a/iothread.c b/iothread.c
212
+ Qcow2CompressData arg = {
358
index XXXXXXX..XXXXXXX 100644
213
+ .dest = dest,
359
--- a/iothread.c
214
+ .dest_size = dest_size,
360
+++ b/iothread.c
215
+ .src = src,
361
@@ -XXX,XX +XXX,XX @@ IOThread *iothread_by_id(const char *id)
216
+ .src_size = src_size,
362
{
217
+ .func = func,
363
return IOTHREAD(object_resolve_path_type(id, TYPE_IOTHREAD, NULL));
218
+ };
364
}
219
+
365
+
220
+ while (s->nb_compress_threads >= MAX_COMPRESS_THREADS) {
366
+bool qemu_in_iothread(void)
221
+ qemu_co_queue_wait(&s->compress_wait_queue, NULL);
367
+{
222
+ }
368
+ return qemu_get_current_aio_context() == qemu_get_aio_context() ?
223
+
369
+ false : true;
224
+ s->nb_compress_threads++;
370
+}
225
+ acb = thread_pool_submit_aio(pool, qcow2_compress_pool_func, &arg,
371
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
226
+ qcow2_compress_complete,
372
index XXXXXXX..XXXXXXX 100644
227
+ qemu_coroutine_self());
373
--- a/hw/remote/meson.build
228
+
374
+++ b/hw/remote/meson.build
229
+ if (!acb) {
230
+ s->nb_compress_threads--;
231
+ return -EINVAL;
232
+ }
233
+ qemu_coroutine_yield();
234
+ s->nb_compress_threads--;
235
+ qemu_co_queue_next(&s->compress_wait_queue);
236
+
237
+ return arg.ret;
238
+}
239
+
240
+ssize_t coroutine_fn
241
+qcow2_co_compress(BlockDriverState *bs, void *dest, size_t dest_size,
242
+ const void *src, size_t src_size)
243
+{
244
+ return qcow2_co_do_compress(bs, dest, dest_size, src, src_size,
245
+ qcow2_compress);
246
+}
247
+
248
+ssize_t coroutine_fn
249
+qcow2_co_decompress(BlockDriverState *bs, void *dest, size_t dest_size,
250
+ const void *src, size_t src_size)
251
+{
252
+ return qcow2_co_do_compress(bs, dest, dest_size, src, src_size,
253
+ qcow2_decompress);
254
+}
255
diff --git a/block/qcow2.c b/block/qcow2.c
256
index XXXXXXX..XXXXXXX 100644
257
--- a/block/qcow2.c
258
+++ b/block/qcow2.c
259
@@ -XXX,XX +XXX,XX @@
375
@@ -XXX,XX +XXX,XX @@
260
376
remote_ss = ss.source_set()
261
#include "qemu/osdep.h"
377
262
378
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('machine.c'))
263
-#define ZLIB_CONST
379
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('mpqemu-link.c'))
264
-#include <zlib.h>
380
265
-
381
softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
266
#include "block/qdict.h"
382
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
267
#include "sysemu/block-backend.h"
383
new file mode 100644
268
#include "qemu/module.h"
384
index XXXXXXX..XXXXXXX
385
--- /dev/null
386
+++ b/hw/remote/trace-events
269
@@ -XXX,XX +XXX,XX @@
387
@@ -XXX,XX +XXX,XX @@
270
#include "qapi/qobject-input-visitor.h"
388
+# multi-process trace events
271
#include "qapi/qapi-visit-block-core.h"
389
+
272
#include "crypto.h"
390
+mpqemu_send_io_error(int cmd, int size, int nfds) "send command %d size %d, %d file descriptors to remote process"
273
-#include "block/thread-pool.h"
391
+mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d, %d file descriptors to remote process"
274
275
/*
276
Differences with QCOW:
277
@@ -XXX,XX +XXX,XX @@ fail:
278
return ret;
279
}
280
281
-/*
282
- * qcow2_compress()
283
- *
284
- * @dest - destination buffer, @dest_size bytes
285
- * @src - source buffer, @src_size bytes
286
- *
287
- * Returns: compressed size on success
288
- * -ENOMEM destination buffer is not enough to store compressed data
289
- * -EIO on any other error
290
- */
291
-static ssize_t qcow2_compress(void *dest, size_t dest_size,
292
- const void *src, size_t src_size)
293
-{
294
- ssize_t ret;
295
- z_stream strm;
296
-
297
- /* best compression, small window, no zlib header */
298
- memset(&strm, 0, sizeof(strm));
299
- ret = deflateInit2(&strm, Z_DEFAULT_COMPRESSION, Z_DEFLATED,
300
- -12, 9, Z_DEFAULT_STRATEGY);
301
- if (ret != Z_OK) {
302
- return -EIO;
303
- }
304
-
305
- /* strm.next_in is not const in old zlib versions, such as those used on
306
- * OpenBSD/NetBSD, so cast the const away */
307
- strm.avail_in = src_size;
308
- strm.next_in = (void *) src;
309
- strm.avail_out = dest_size;
310
- strm.next_out = dest;
311
-
312
- ret = deflate(&strm, Z_FINISH);
313
- if (ret == Z_STREAM_END) {
314
- ret = dest_size - strm.avail_out;
315
- } else {
316
- ret = (ret == Z_OK ? -ENOMEM : -EIO);
317
- }
318
-
319
- deflateEnd(&strm);
320
-
321
- return ret;
322
-}
323
-
324
-/*
325
- * qcow2_decompress()
326
- *
327
- * Decompress some data (not more than @src_size bytes) to produce exactly
328
- * @dest_size bytes.
329
- *
330
- * @dest - destination buffer, @dest_size bytes
331
- * @src - source buffer, @src_size bytes
332
- *
333
- * Returns: 0 on success
334
- * -1 on fail
335
- */
336
-static ssize_t qcow2_decompress(void *dest, size_t dest_size,
337
- const void *src, size_t src_size)
338
-{
339
- int ret = 0;
340
- z_stream strm;
341
-
342
- memset(&strm, 0, sizeof(strm));
343
- strm.avail_in = src_size;
344
- strm.next_in = (void *) src;
345
- strm.avail_out = dest_size;
346
- strm.next_out = dest;
347
-
348
- ret = inflateInit2(&strm, -12);
349
- if (ret != Z_OK) {
350
- return -1;
351
- }
352
-
353
- ret = inflate(&strm, Z_FINISH);
354
- if ((ret != Z_STREAM_END && ret != Z_BUF_ERROR) || strm.avail_out != 0) {
355
- /* We approve Z_BUF_ERROR because we need @dest buffer to be filled, but
356
- * @src buffer may be processed partly (because in qcow2 we know size of
357
- * compressed data with precision of one sector) */
358
- ret = -1;
359
- }
360
-
361
- inflateEnd(&strm);
362
-
363
- return ret;
364
-}
365
-
366
-#define MAX_COMPRESS_THREADS 4
367
-
368
-typedef ssize_t (*Qcow2CompressFunc)(void *dest, size_t dest_size,
369
- const void *src, size_t src_size);
370
-typedef struct Qcow2CompressData {
371
- void *dest;
372
- size_t dest_size;
373
- const void *src;
374
- size_t src_size;
375
- ssize_t ret;
376
-
377
- Qcow2CompressFunc func;
378
-} Qcow2CompressData;
379
-
380
-static int qcow2_compress_pool_func(void *opaque)
381
-{
382
- Qcow2CompressData *data = opaque;
383
-
384
- data->ret = data->func(data->dest, data->dest_size,
385
- data->src, data->src_size);
386
-
387
- return 0;
388
-}
389
-
390
-static void qcow2_compress_complete(void *opaque, int ret)
391
-{
392
- qemu_coroutine_enter(opaque);
393
-}
394
-
395
-static ssize_t coroutine_fn
396
-qcow2_co_do_compress(BlockDriverState *bs, void *dest, size_t dest_size,
397
- const void *src, size_t src_size, Qcow2CompressFunc func)
398
-{
399
- BDRVQcow2State *s = bs->opaque;
400
- BlockAIOCB *acb;
401
- ThreadPool *pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
402
- Qcow2CompressData arg = {
403
- .dest = dest,
404
- .dest_size = dest_size,
405
- .src = src,
406
- .src_size = src_size,
407
- .func = func,
408
- };
409
-
410
- while (s->nb_compress_threads >= MAX_COMPRESS_THREADS) {
411
- qemu_co_queue_wait(&s->compress_wait_queue, NULL);
412
- }
413
-
414
- s->nb_compress_threads++;
415
- acb = thread_pool_submit_aio(pool, qcow2_compress_pool_func, &arg,
416
- qcow2_compress_complete,
417
- qemu_coroutine_self());
418
-
419
- if (!acb) {
420
- s->nb_compress_threads--;
421
- return -EINVAL;
422
- }
423
- qemu_coroutine_yield();
424
- s->nb_compress_threads--;
425
- qemu_co_queue_next(&s->compress_wait_queue);
426
-
427
- return arg.ret;
428
-}
429
-
430
-static ssize_t coroutine_fn
431
-qcow2_co_compress(BlockDriverState *bs, void *dest, size_t dest_size,
432
- const void *src, size_t src_size)
433
-{
434
- return qcow2_co_do_compress(bs, dest, dest_size, src, src_size,
435
- qcow2_compress);
436
-}
437
-
438
-static ssize_t coroutine_fn
439
-qcow2_co_decompress(BlockDriverState *bs, void *dest, size_t dest_size,
440
- const void *src, size_t src_size)
441
-{
442
- return qcow2_co_do_compress(bs, dest, dest_size, src, src_size,
443
- qcow2_decompress);
444
-}
445
-
446
/* XXX: put compressed sectors first, then all the cluster aligned
447
tables to avoid losing bytes in alignment */
448
static coroutine_fn int
449
--
392
--
450
2.21.0
393
2.29.2
451
394
452
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Jagannathan Raman <jag.raman@oracle.com>
2
2
3
Encryption will be done in threads, to take benefit of it, we should
3
Initializes the message handler function in the remote process. It is
4
move it out of the lock first.
4
called whenever there's an event pending on QIOChannel that registers
5
this function.
5
6
6
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
7
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
7
Reviewed-by: Alberto Garcia <berto@igalia.com>
8
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
8
Reviewed-by: Max Reitz <mreitz@redhat.com>
9
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
9
Message-id: 20190506142741.41731-8-vsementsov@virtuozzo.com
10
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Signed-off-by: Max Reitz <mreitz@redhat.com>
11
Message-id: 99d38d8b93753a6409ac2340e858858cda59ab1b.1611938319.git.jag.raman@oracle.com
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
13
---
12
block/qcow2.c | 35 +++++++++++++++++++++--------------
14
MAINTAINERS | 1 +
13
1 file changed, 21 insertions(+), 14 deletions(-)
15
include/hw/remote/machine.h | 9 ++++++
16
hw/remote/message.c | 57 +++++++++++++++++++++++++++++++++++++
17
hw/remote/meson.build | 1 +
18
4 files changed, 68 insertions(+)
19
create mode 100644 hw/remote/message.c
14
20
15
diff --git a/block/qcow2.c b/block/qcow2.c
21
diff --git a/MAINTAINERS b/MAINTAINERS
16
index XXXXXXX..XXXXXXX 100644
22
index XXXXXXX..XXXXXXX 100644
17
--- a/block/qcow2.c
23
--- a/MAINTAINERS
18
+++ b/block/qcow2.c
24
+++ b/MAINTAINERS
19
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
25
@@ -XXX,XX +XXX,XX @@ F: hw/remote/machine.c
20
ret = qcow2_alloc_cluster_offset(bs, offset, &cur_bytes,
26
F: include/hw/remote/machine.h
21
&cluster_offset, &l2meta);
27
F: hw/remote/mpqemu-link.c
22
if (ret < 0) {
28
F: include/hw/remote/mpqemu-link.h
23
- goto fail;
29
+F: hw/remote/message.c
24
+ goto out_locked;
30
25
}
31
Build and test automation
26
32
-------------------------
27
assert((cluster_offset & 511) == 0);
33
diff --git a/include/hw/remote/machine.h b/include/hw/remote/machine.h
28
34
index XXXXXXX..XXXXXXX 100644
29
+ ret = qcow2_pre_write_overlap_check(bs, 0,
35
--- a/include/hw/remote/machine.h
30
+ cluster_offset + offset_in_cluster,
36
+++ b/include/hw/remote/machine.h
31
+ cur_bytes, true);
37
@@ -XXX,XX +XXX,XX @@
32
+ if (ret < 0) {
38
#include "qom/object.h"
33
+ goto out_locked;
39
#include "hw/boards.h"
40
#include "hw/pci-host/remote.h"
41
+#include "io/channel.h"
42
43
struct RemoteMachineState {
44
MachineState parent_obj;
45
@@ -XXX,XX +XXX,XX @@ struct RemoteMachineState {
46
RemotePCIHost *host;
47
};
48
49
+/* Used to pass to co-routine device and ioc. */
50
+typedef struct RemoteCommDev {
51
+ PCIDevice *dev;
52
+ QIOChannel *ioc;
53
+} RemoteCommDev;
54
+
55
#define TYPE_REMOTE_MACHINE "x-remote-machine"
56
OBJECT_DECLARE_SIMPLE_TYPE(RemoteMachineState, REMOTE_MACHINE)
57
58
+void coroutine_fn mpqemu_remote_msg_loop_co(void *data);
59
+
60
#endif
61
diff --git a/hw/remote/message.c b/hw/remote/message.c
62
new file mode 100644
63
index XXXXXXX..XXXXXXX
64
--- /dev/null
65
+++ b/hw/remote/message.c
66
@@ -XXX,XX +XXX,XX @@
67
+/*
68
+ * Copyright © 2020, 2021 Oracle and/or its affiliates.
69
+ *
70
+ * This work is licensed under the terms of the GNU GPL-v2, version 2 or later.
71
+ *
72
+ * See the COPYING file in the top-level directory.
73
+ *
74
+ */
75
+
76
+#include "qemu/osdep.h"
77
+#include "qemu-common.h"
78
+
79
+#include "hw/remote/machine.h"
80
+#include "io/channel.h"
81
+#include "hw/remote/mpqemu-link.h"
82
+#include "qapi/error.h"
83
+#include "sysemu/runstate.h"
84
+
85
+void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
86
+{
87
+ g_autofree RemoteCommDev *com = (RemoteCommDev *)data;
88
+ PCIDevice *pci_dev = NULL;
89
+ Error *local_err = NULL;
90
+
91
+ assert(com->ioc);
92
+
93
+ pci_dev = com->dev;
94
+ for (; !local_err;) {
95
+ MPQemuMsg msg = {0};
96
+
97
+ if (!mpqemu_msg_recv(&msg, com->ioc, &local_err)) {
98
+ break;
34
+ }
99
+ }
35
+
100
+
36
+ qemu_co_mutex_unlock(&s->lock);
101
+ if (!mpqemu_msg_valid(&msg)) {
102
+ error_setg(&local_err, "Received invalid message from proxy"
103
+ "in remote process pid="FMT_pid"",
104
+ getpid());
105
+ break;
106
+ }
37
+
107
+
38
qemu_iovec_reset(&hd_qiov);
108
+ switch (msg.cmd) {
39
qemu_iovec_concat(&hd_qiov, qiov, bytes_done, cur_bytes);
109
+ default:
40
110
+ error_setg(&local_err,
41
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
111
+ "Unknown command (%d) received for device %s"
42
* s->cluster_size);
112
+ " (pid="FMT_pid")",
43
if (cluster_data == NULL) {
113
+ msg.cmd, DEVICE(pci_dev)->id, getpid());
44
ret = -ENOMEM;
114
+ }
45
- goto fail;
115
+ }
46
+ goto out_unlocked;
47
}
48
}
49
50
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
51
cluster_data,
52
cur_bytes, NULL) < 0) {
53
ret = -EIO;
54
- goto fail;
55
+ goto out_unlocked;
56
}
57
58
qemu_iovec_reset(&hd_qiov);
59
qemu_iovec_add(&hd_qiov, cluster_data, cur_bytes);
60
}
61
62
- ret = qcow2_pre_write_overlap_check(bs, 0,
63
- cluster_offset + offset_in_cluster, cur_bytes, true);
64
- if (ret < 0) {
65
- goto fail;
66
- }
67
-
68
/* If we need to do COW, check if it's possible to merge the
69
* writing of the guest data together with that of the COW regions.
70
* If it's not possible (or not necessary) then write the
71
* guest data now. */
72
if (!merge_cow(offset, cur_bytes, &hd_qiov, l2meta)) {
73
- qemu_co_mutex_unlock(&s->lock);
74
BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
75
trace_qcow2_writev_data(qemu_coroutine_self(),
76
cluster_offset + offset_in_cluster);
77
ret = bdrv_co_pwritev(s->data_file,
78
cluster_offset + offset_in_cluster,
79
cur_bytes, &hd_qiov, 0);
80
- qemu_co_mutex_lock(&s->lock);
81
if (ret < 0) {
82
- goto fail;
83
+ goto out_unlocked;
84
}
85
}
86
87
+ qemu_co_mutex_lock(&s->lock);
88
+
116
+
89
ret = qcow2_handle_l2meta(bs, &l2meta, true);
117
+ if (local_err) {
90
if (ret) {
118
+ error_report_err(local_err);
91
- goto fail;
119
+ qemu_system_shutdown_request(SHUTDOWN_CAUSE_HOST_ERROR);
92
+ goto out_locked;
120
+ } else {
93
}
121
+ qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
94
122
+ }
95
bytes -= cur_bytes;
123
+}
96
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
124
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
97
trace_qcow2_writev_done_part(qemu_coroutine_self(), cur_bytes);
125
index XXXXXXX..XXXXXXX 100644
98
}
126
--- a/hw/remote/meson.build
99
ret = 0;
127
+++ b/hw/remote/meson.build
100
+ goto out_locked;
128
@@ -XXX,XX +XXX,XX @@ remote_ss = ss.source_set()
101
129
102
-fail:
130
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('machine.c'))
103
+out_unlocked:
131
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('mpqemu-link.c'))
104
+ qemu_co_mutex_lock(&s->lock);
132
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
105
+
133
106
+out_locked:
134
softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
107
qcow2_handle_l2meta(bs, &l2meta, false);
108
109
qemu_co_mutex_unlock(&s->lock);
110
--
135
--
111
2.21.0
136
2.29.2
112
137
113
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Jagannathan Raman <jag.raman@oracle.com>
2
2
3
Drop dependence on AioContext lock.
3
Associate the file descriptor for a PCIDevice in remote process with
4
4
DeviceState object.
5
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
5
6
Reviewed-by: Alberto Garcia <berto@igalia.com>
6
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
7
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
7
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
8
Reviewed-by: Max Reitz <mreitz@redhat.com>
8
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
9
Message-id: 20190506142741.41731-5-vsementsov@virtuozzo.com
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
Signed-off-by: Max Reitz <mreitz@redhat.com>
10
Message-id: f405a2ed5d7518b87bea7c59cfdf334d67e5ee51.1611938319.git.jag.raman@oracle.com
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
12
---
12
block/qcow2-threads.c | 10 +++++++---
13
MAINTAINERS | 1 +
13
1 file changed, 7 insertions(+), 3 deletions(-)
14
hw/remote/remote-obj.c | 203 +++++++++++++++++++++++++++++++++++++++++
14
15
hw/remote/meson.build | 1 +
15
diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
16
3 files changed, 205 insertions(+)
17
create mode 100644 hw/remote/remote-obj.c
18
19
diff --git a/MAINTAINERS b/MAINTAINERS
16
index XXXXXXX..XXXXXXX 100644
20
index XXXXXXX..XXXXXXX 100644
17
--- a/block/qcow2-threads.c
21
--- a/MAINTAINERS
18
+++ b/block/qcow2-threads.c
22
+++ b/MAINTAINERS
19
@@ -XXX,XX +XXX,XX @@ qcow2_co_do_compress(BlockDriverState *bs, void *dest, size_t dest_size,
23
@@ -XXX,XX +XXX,XX @@ F: include/hw/remote/machine.h
20
.func = func,
24
F: hw/remote/mpqemu-link.c
21
};
25
F: include/hw/remote/mpqemu-link.h
22
26
F: hw/remote/message.c
23
+ qemu_co_mutex_lock(&s->lock);
27
+F: hw/remote/remote-obj.c
24
while (s->nb_compress_threads >= MAX_COMPRESS_THREADS) {
28
25
- qemu_co_queue_wait(&s->compress_wait_queue, NULL);
29
Build and test automation
26
+ qemu_co_queue_wait(&s->compress_wait_queue, &s->lock);
30
-------------------------
27
}
31
diff --git a/hw/remote/remote-obj.c b/hw/remote/remote-obj.c
28
-
32
new file mode 100644
29
s->nb_compress_threads++;
33
index XXXXXXX..XXXXXXX
30
+ qemu_co_mutex_unlock(&s->lock);
34
--- /dev/null
31
+
35
+++ b/hw/remote/remote-obj.c
32
thread_pool_submit_co(pool, qcow2_compress_pool_func, &arg);
36
@@ -XXX,XX +XXX,XX @@
33
- s->nb_compress_threads--;
37
+/*
34
38
+ * Copyright © 2020, 2021 Oracle and/or its affiliates.
35
+ qemu_co_mutex_lock(&s->lock);
39
+ *
36
+ s->nb_compress_threads--;
40
+ * This work is licensed under the terms of the GNU GPL-v2, version 2 or later.
37
qemu_co_queue_next(&s->compress_wait_queue);
41
+ *
38
+ qemu_co_mutex_unlock(&s->lock);
42
+ * See the COPYING file in the top-level directory.
39
43
+ *
40
return arg.ret;
44
+ */
41
}
45
+
46
+#include "qemu/osdep.h"
47
+#include "qemu-common.h"
48
+
49
+#include "qemu/error-report.h"
50
+#include "qemu/notify.h"
51
+#include "qom/object_interfaces.h"
52
+#include "hw/qdev-core.h"
53
+#include "io/channel.h"
54
+#include "hw/qdev-core.h"
55
+#include "hw/remote/machine.h"
56
+#include "io/channel-util.h"
57
+#include "qapi/error.h"
58
+#include "sysemu/sysemu.h"
59
+#include "hw/pci/pci.h"
60
+#include "qemu/sockets.h"
61
+#include "monitor/monitor.h"
62
+
63
+#define TYPE_REMOTE_OBJECT "x-remote-object"
64
+OBJECT_DECLARE_TYPE(RemoteObject, RemoteObjectClass, REMOTE_OBJECT)
65
+
66
+struct RemoteObjectClass {
67
+ ObjectClass parent_class;
68
+
69
+ unsigned int nr_devs;
70
+ unsigned int max_devs;
71
+};
72
+
73
+struct RemoteObject {
74
+ /* private */
75
+ Object parent;
76
+
77
+ Notifier machine_done;
78
+
79
+ int32_t fd;
80
+ char *devid;
81
+
82
+ QIOChannel *ioc;
83
+
84
+ DeviceState *dev;
85
+ DeviceListener listener;
86
+};
87
+
88
+static void remote_object_set_fd(Object *obj, const char *str, Error **errp)
89
+{
90
+ RemoteObject *o = REMOTE_OBJECT(obj);
91
+ int fd = -1;
92
+
93
+ fd = monitor_fd_param(monitor_cur(), str, errp);
94
+ if (fd == -1) {
95
+ error_prepend(errp, "Could not parse remote object fd %s:", str);
96
+ return;
97
+ }
98
+
99
+ if (!fd_is_socket(fd)) {
100
+ error_setg(errp, "File descriptor '%s' is not a socket", str);
101
+ close(fd);
102
+ return;
103
+ }
104
+
105
+ o->fd = fd;
106
+}
107
+
108
+static void remote_object_set_devid(Object *obj, const char *str, Error **errp)
109
+{
110
+ RemoteObject *o = REMOTE_OBJECT(obj);
111
+
112
+ g_free(o->devid);
113
+
114
+ o->devid = g_strdup(str);
115
+}
116
+
117
+static void remote_object_unrealize_listener(DeviceListener *listener,
118
+ DeviceState *dev)
119
+{
120
+ RemoteObject *o = container_of(listener, RemoteObject, listener);
121
+
122
+ if (o->dev == dev) {
123
+ object_unref(OBJECT(o));
124
+ }
125
+}
126
+
127
+static void remote_object_machine_done(Notifier *notifier, void *data)
128
+{
129
+ RemoteObject *o = container_of(notifier, RemoteObject, machine_done);
130
+ DeviceState *dev = NULL;
131
+ QIOChannel *ioc = NULL;
132
+ Coroutine *co = NULL;
133
+ RemoteCommDev *comdev = NULL;
134
+ Error *err = NULL;
135
+
136
+ dev = qdev_find_recursive(sysbus_get_default(), o->devid);
137
+ if (!dev || !object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
138
+ error_report("%s is not a PCI device", o->devid);
139
+ return;
140
+ }
141
+
142
+ ioc = qio_channel_new_fd(o->fd, &err);
143
+ if (!ioc) {
144
+ error_report_err(err);
145
+ return;
146
+ }
147
+ qio_channel_set_blocking(ioc, false, NULL);
148
+
149
+ o->dev = dev;
150
+
151
+ o->listener.unrealize = remote_object_unrealize_listener;
152
+ device_listener_register(&o->listener);
153
+
154
+ /* co-routine should free this. */
155
+ comdev = g_new0(RemoteCommDev, 1);
156
+ *comdev = (RemoteCommDev) {
157
+ .ioc = ioc,
158
+ .dev = PCI_DEVICE(dev),
159
+ };
160
+
161
+ co = qemu_coroutine_create(mpqemu_remote_msg_loop_co, comdev);
162
+ qemu_coroutine_enter(co);
163
+}
164
+
165
+static void remote_object_init(Object *obj)
166
+{
167
+ RemoteObjectClass *k = REMOTE_OBJECT_GET_CLASS(obj);
168
+ RemoteObject *o = REMOTE_OBJECT(obj);
169
+
170
+ if (k->nr_devs >= k->max_devs) {
171
+ error_report("Reached maximum number of devices: %u", k->max_devs);
172
+ return;
173
+ }
174
+
175
+ o->ioc = NULL;
176
+ o->fd = -1;
177
+ o->devid = NULL;
178
+
179
+ k->nr_devs++;
180
+
181
+ o->machine_done.notify = remote_object_machine_done;
182
+ qemu_add_machine_init_done_notifier(&o->machine_done);
183
+}
184
+
185
+static void remote_object_finalize(Object *obj)
186
+{
187
+ RemoteObjectClass *k = REMOTE_OBJECT_GET_CLASS(obj);
188
+ RemoteObject *o = REMOTE_OBJECT(obj);
189
+
190
+ device_listener_unregister(&o->listener);
191
+
192
+ if (o->ioc) {
193
+ qio_channel_shutdown(o->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
194
+ qio_channel_close(o->ioc, NULL);
195
+ }
196
+
197
+ object_unref(OBJECT(o->ioc));
198
+
199
+ k->nr_devs--;
200
+ g_free(o->devid);
201
+}
202
+
203
+static void remote_object_class_init(ObjectClass *klass, void *data)
204
+{
205
+ RemoteObjectClass *k = REMOTE_OBJECT_CLASS(klass);
206
+
207
+ /*
208
+ * Limit number of supported devices to 1. This is done to avoid devices
209
+ * from one VM accessing the RAM of another VM. This is done until we
210
+ * start using separate address spaces for individual devices.
211
+ */
212
+ k->max_devs = 1;
213
+ k->nr_devs = 0;
214
+
215
+ object_class_property_add_str(klass, "fd", NULL, remote_object_set_fd);
216
+ object_class_property_add_str(klass, "devid", NULL,
217
+ remote_object_set_devid);
218
+}
219
+
220
+static const TypeInfo remote_object_info = {
221
+ .name = TYPE_REMOTE_OBJECT,
222
+ .parent = TYPE_OBJECT,
223
+ .instance_size = sizeof(RemoteObject),
224
+ .instance_init = remote_object_init,
225
+ .instance_finalize = remote_object_finalize,
226
+ .class_size = sizeof(RemoteObjectClass),
227
+ .class_init = remote_object_class_init,
228
+ .interfaces = (InterfaceInfo[]) {
229
+ { TYPE_USER_CREATABLE },
230
+ { }
231
+ }
232
+};
233
+
234
+static void register_types(void)
235
+{
236
+ type_register_static(&remote_object_info);
237
+}
238
+
239
+type_init(register_types);
240
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
241
index XXXXXXX..XXXXXXX 100644
242
--- a/hw/remote/meson.build
243
+++ b/hw/remote/meson.build
244
@@ -XXX,XX +XXX,XX @@ remote_ss = ss.source_set()
245
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('machine.c'))
246
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('mpqemu-link.c'))
247
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
248
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
249
250
softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
42
--
251
--
43
2.21.0
252
2.29.2
44
253
45
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Jagannathan Raman <jag.raman@oracle.com>
2
2
3
Simplify backup_incremental_init_copy_bitmap using the function
3
SyncSysMemMsg message format is defined. It is used to send
4
bdrv_dirty_bitmap_next_dirty_area.
4
file descriptors of the RAM regions to remote device.
5
5
RAM on the remote device is configured with a set of file descriptors.
6
Note: move to job->len instead of bitmap size: it should not matter but
6
Old RAM regions are deleted and new regions, each with an fd, is
7
less code.
7
added to the RAM.
8
8
9
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
9
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
10
Reviewed-by: Max Reitz <mreitz@redhat.com>
10
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
11
Message-id: 20190429090842.57910-2-vsementsov@virtuozzo.com
11
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
12
Signed-off-by: Max Reitz <mreitz@redhat.com>
12
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Message-id: 7d2d1831d812e85f681e7a8ab99e032cf4704689.1611938319.git.jag.raman@oracle.com
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
---
15
---
14
block/backup.c | 40 ++++++++++++----------------------------
16
MAINTAINERS | 2 +
15
1 file changed, 12 insertions(+), 28 deletions(-)
17
include/hw/remote/memory.h | 19 ++++++++++
16
18
include/hw/remote/mpqemu-link.h | 10 +++++
17
diff --git a/block/backup.c b/block/backup.c
19
hw/remote/memory.c | 65 +++++++++++++++++++++++++++++++++
18
index XXXXXXX..XXXXXXX 100644
20
hw/remote/mpqemu-link.c | 11 ++++++
19
--- a/block/backup.c
21
hw/remote/meson.build | 2 +
20
+++ b/block/backup.c
22
6 files changed, 109 insertions(+)
21
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
23
create mode 100644 include/hw/remote/memory.h
22
/* init copy_bitmap from sync_bitmap */
24
create mode 100644 hw/remote/memory.c
23
static void backup_incremental_init_copy_bitmap(BackupBlockJob *job)
25
24
{
26
diff --git a/MAINTAINERS b/MAINTAINERS
25
- BdrvDirtyBitmapIter *dbi;
27
index XXXXXXX..XXXXXXX 100644
26
- int64_t offset;
28
--- a/MAINTAINERS
27
- int64_t end = DIV_ROUND_UP(bdrv_dirty_bitmap_size(job->sync_bitmap),
29
+++ b/MAINTAINERS
28
- job->cluster_size);
30
@@ -XXX,XX +XXX,XX @@ F: hw/remote/mpqemu-link.c
29
-
31
F: include/hw/remote/mpqemu-link.h
30
- dbi = bdrv_dirty_iter_new(job->sync_bitmap);
32
F: hw/remote/message.c
31
- while ((offset = bdrv_dirty_iter_next(dbi)) != -1) {
33
F: hw/remote/remote-obj.c
32
- int64_t cluster = offset / job->cluster_size;
34
+F: include/hw/remote/memory.h
33
- int64_t next_cluster;
35
+F: hw/remote/memory.c
34
-
36
35
- offset += bdrv_dirty_bitmap_granularity(job->sync_bitmap);
37
Build and test automation
36
- if (offset >= bdrv_dirty_bitmap_size(job->sync_bitmap)) {
38
-------------------------
37
- hbitmap_set(job->copy_bitmap, cluster, end - cluster);
39
diff --git a/include/hw/remote/memory.h b/include/hw/remote/memory.h
38
- break;
40
new file mode 100644
39
- }
41
index XXXXXXX..XXXXXXX
40
+ uint64_t offset = 0;
42
--- /dev/null
41
+ uint64_t bytes = job->len;
43
+++ b/include/hw/remote/memory.h
42
44
@@ -XXX,XX +XXX,XX @@
43
- offset = bdrv_dirty_bitmap_next_zero(job->sync_bitmap, offset,
45
+/*
44
- UINT64_MAX);
46
+ * Memory manager for remote device
45
- if (offset == -1) {
47
+ *
46
- hbitmap_set(job->copy_bitmap, cluster, end - cluster);
48
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
47
- break;
49
+ *
48
- }
50
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
49
+ while (bdrv_dirty_bitmap_next_dirty_area(job->sync_bitmap,
51
+ * See the COPYING file in the top-level directory.
50
+ &offset, &bytes))
52
+ *
51
+ {
53
+ */
52
+ uint64_t cluster = offset / job->cluster_size;
54
+
53
+ uint64_t end_cluster = DIV_ROUND_UP(offset + bytes, job->cluster_size);
55
+#ifndef REMOTE_MEMORY_H
54
56
+#define REMOTE_MEMORY_H
55
- next_cluster = DIV_ROUND_UP(offset, job->cluster_size);
57
+
56
- hbitmap_set(job->copy_bitmap, cluster, next_cluster - cluster);
58
+#include "exec/hwaddr.h"
57
- if (next_cluster >= end) {
59
+#include "hw/remote/mpqemu-link.h"
58
+ hbitmap_set(job->copy_bitmap, cluster, end_cluster - cluster);
60
+
59
+
61
+void remote_sysmem_reconfig(MPQemuMsg *msg, Error **errp);
60
+ offset = end_cluster * job->cluster_size;
62
+
61
+ if (offset >= job->len) {
63
+#endif
62
break;
64
diff --git a/include/hw/remote/mpqemu-link.h b/include/hw/remote/mpqemu-link.h
65
index XXXXXXX..XXXXXXX 100644
66
--- a/include/hw/remote/mpqemu-link.h
67
+++ b/include/hw/remote/mpqemu-link.h
68
@@ -XXX,XX +XXX,XX @@
69
#include "qom/object.h"
70
#include "qemu/thread.h"
71
#include "io/channel.h"
72
+#include "exec/hwaddr.h"
73
74
#define REMOTE_MAX_FDS 8
75
76
@@ -XXX,XX +XXX,XX @@
77
*
78
*/
79
typedef enum {
80
+ MPQEMU_CMD_SYNC_SYSMEM,
81
MPQEMU_CMD_MAX,
82
} MPQemuCmd;
83
84
+typedef struct {
85
+ hwaddr gpas[REMOTE_MAX_FDS];
86
+ uint64_t sizes[REMOTE_MAX_FDS];
87
+ off_t offsets[REMOTE_MAX_FDS];
88
+} SyncSysmemMsg;
89
+
90
/**
91
* MPQemuMsg:
92
* @cmd: The remote command
93
@@ -XXX,XX +XXX,XX @@ typedef enum {
94
* MPQemuMsg Format of the message sent to the remote device from QEMU.
95
*
96
*/
97
+
98
typedef struct {
99
int cmd;
100
size_t size;
101
102
union {
103
uint64_t u64;
104
+ SyncSysmemMsg sync_sysmem;
105
} data;
106
107
int fds[REMOTE_MAX_FDS];
108
diff --git a/hw/remote/memory.c b/hw/remote/memory.c
109
new file mode 100644
110
index XXXXXXX..XXXXXXX
111
--- /dev/null
112
+++ b/hw/remote/memory.c
113
@@ -XXX,XX +XXX,XX @@
114
+/*
115
+ * Memory manager for remote device
116
+ *
117
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
118
+ *
119
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
120
+ * See the COPYING file in the top-level directory.
121
+ *
122
+ */
123
+
124
+#include "qemu/osdep.h"
125
+#include "qemu-common.h"
126
+
127
+#include "hw/remote/memory.h"
128
+#include "exec/address-spaces.h"
129
+#include "exec/ram_addr.h"
130
+#include "qapi/error.h"
131
+
132
+static void remote_sysmem_reset(void)
133
+{
134
+ MemoryRegion *sysmem, *subregion, *next;
135
+
136
+ sysmem = get_system_memory();
137
+
138
+ QTAILQ_FOREACH_SAFE(subregion, &sysmem->subregions, subregions_link, next) {
139
+ if (subregion->ram) {
140
+ memory_region_del_subregion(sysmem, subregion);
141
+ object_unparent(OBJECT(subregion));
142
+ }
143
+ }
144
+}
145
+
146
+void remote_sysmem_reconfig(MPQemuMsg *msg, Error **errp)
147
+{
148
+ ERRP_GUARD();
149
+ SyncSysmemMsg *sysmem_info = &msg->data.sync_sysmem;
150
+ MemoryRegion *sysmem, *subregion;
151
+ static unsigned int suffix;
152
+ int region;
153
+
154
+ sysmem = get_system_memory();
155
+
156
+ remote_sysmem_reset();
157
+
158
+ for (region = 0; region < msg->num_fds; region++) {
159
+ g_autofree char *name;
160
+ subregion = g_new(MemoryRegion, 1);
161
+ name = g_strdup_printf("remote-mem-%u", suffix++);
162
+ memory_region_init_ram_from_fd(subregion, NULL,
163
+ name, sysmem_info->sizes[region],
164
+ true, msg->fds[region],
165
+ sysmem_info->offsets[region],
166
+ errp);
167
+
168
+ if (*errp) {
169
+ g_free(subregion);
170
+ remote_sysmem_reset();
171
+ return;
172
+ }
173
+
174
+ memory_region_add_subregion(sysmem, sysmem_info->gpas[region],
175
+ subregion);
176
+
177
+ }
178
+}
179
diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
180
index XXXXXXX..XXXXXXX 100644
181
--- a/hw/remote/mpqemu-link.c
182
+++ b/hw/remote/mpqemu-link.c
183
@@ -XXX,XX +XXX,XX @@ bool mpqemu_msg_valid(MPQemuMsg *msg)
63
}
184
}
64
-
65
- bdrv_set_dirty_iter(dbi, next_cluster * job->cluster_size);
66
+ bytes = job->len - offset;
67
}
185
}
68
186
69
/* TODO job_progress_set_remaining() would make more sense */
187
+ /* Verify message specific fields. */
70
job_progress_update(&job->common.job,
188
+ switch (msg->cmd) {
71
job->len - hbitmap_count(job->copy_bitmap) * job->cluster_size);
189
+ case MPQEMU_CMD_SYNC_SYSMEM:
72
-
190
+ if (msg->num_fds == 0 || msg->size != sizeof(SyncSysmemMsg)) {
73
- bdrv_dirty_iter_free(dbi);
191
+ return false;
192
+ }
193
+ break;
194
+ default:
195
+ break;
196
+ }
197
+
198
return true;
74
}
199
}
75
200
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
76
static int coroutine_fn backup_run(Job *job, Error **errp)
201
index XXXXXXX..XXXXXXX 100644
202
--- a/hw/remote/meson.build
203
+++ b/hw/remote/meson.build
204
@@ -XXX,XX +XXX,XX @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('mpqemu-link.c'))
205
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
206
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
207
208
+specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
209
+
210
softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
77
--
211
--
78
2.21.0
212
2.29.2
79
213
80
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
2
2
3
Split out cluster_size calculation. Move copy-bitmap creation above
3
Defines a PCI Device proxy object as a child of TYPE_PCI_DEVICE.
4
block-job creation, as we are going to share it with upcoming
5
backup-top filter, which also should be created before actual block job
6
creation.
7
4
8
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
5
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9
Message-id: 20190429090842.57910-6-vsementsov@virtuozzo.com
6
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
10
[mreitz: Dropped a paragraph from the commit message that was left over
7
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
11
from a previous version]
8
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Signed-off-by: Max Reitz <mreitz@redhat.com>
9
Message-id: b5186ebfedf8e557044d09a768846c59230ad3a7.1611938319.git.jag.raman@oracle.com
10
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
---
11
---
14
block/backup.c | 82 ++++++++++++++++++++++++++++++++------------------
12
MAINTAINERS | 2 +
15
1 file changed, 52 insertions(+), 30 deletions(-)
13
include/hw/remote/proxy.h | 33 +++++++++++++
14
hw/remote/proxy.c | 99 +++++++++++++++++++++++++++++++++++++++
15
hw/remote/meson.build | 1 +
16
4 files changed, 135 insertions(+)
17
create mode 100644 include/hw/remote/proxy.h
18
create mode 100644 hw/remote/proxy.c
16
19
17
diff --git a/block/backup.c b/block/backup.c
20
diff --git a/MAINTAINERS b/MAINTAINERS
18
index XXXXXXX..XXXXXXX 100644
21
index XXXXXXX..XXXXXXX 100644
19
--- a/block/backup.c
22
--- a/MAINTAINERS
20
+++ b/block/backup.c
23
+++ b/MAINTAINERS
21
@@ -XXX,XX +XXX,XX @@ static const BlockJobDriver backup_job_driver = {
24
@@ -XXX,XX +XXX,XX @@ F: hw/remote/message.c
22
.drain = backup_drain,
25
F: hw/remote/remote-obj.c
23
};
26
F: include/hw/remote/memory.h
24
27
F: hw/remote/memory.c
25
+static int64_t backup_calculate_cluster_size(BlockDriverState *target,
28
+F: hw/remote/proxy.c
26
+ Error **errp)
29
+F: include/hw/remote/proxy.h
27
+{
30
28
+ int ret;
31
Build and test automation
29
+ BlockDriverInfo bdi;
32
-------------------------
33
diff --git a/include/hw/remote/proxy.h b/include/hw/remote/proxy.h
34
new file mode 100644
35
index XXXXXXX..XXXXXXX
36
--- /dev/null
37
+++ b/include/hw/remote/proxy.h
38
@@ -XXX,XX +XXX,XX @@
39
+/*
40
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
41
+ *
42
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
43
+ * See the COPYING file in the top-level directory.
44
+ *
45
+ */
46
+
47
+#ifndef PROXY_H
48
+#define PROXY_H
49
+
50
+#include "hw/pci/pci.h"
51
+#include "io/channel.h"
52
+
53
+#define TYPE_PCI_PROXY_DEV "x-pci-proxy-dev"
54
+OBJECT_DECLARE_SIMPLE_TYPE(PCIProxyDev, PCI_PROXY_DEV)
55
+
56
+struct PCIProxyDev {
57
+ PCIDevice parent_dev;
58
+ char *fd;
30
+
59
+
31
+ /*
60
+ /*
32
+ * If there is no backing file on the target, we cannot rely on COW if our
61
+ * Mutex used to protect the QIOChannel fd from
33
+ * backup cluster size is smaller than the target cluster size. Even for
62
+ * the concurrent access by the VCPUs since proxy
34
+ * targets with a backing file, try to avoid COW if possible.
63
+ * blocks while awaiting for the replies from the
64
+ * process remote.
35
+ */
65
+ */
36
+ ret = bdrv_get_info(target, &bdi);
66
+ QemuMutex io_mutex;
37
+ if (ret == -ENOTSUP && !target->backing) {
67
+ QIOChannel *ioc;
38
+ /* Cluster size is not defined */
68
+ Error *migration_blocker;
39
+ warn_report("The target block device doesn't provide "
69
+};
40
+ "information about the block size and it doesn't have a "
70
+
41
+ "backing file. The default block size of %u bytes is "
71
+#endif /* PROXY_H */
42
+ "used. If the actual block size of the target exceeds "
72
diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
43
+ "this default, the backup may be unusable",
73
new file mode 100644
44
+ BACKUP_CLUSTER_SIZE_DEFAULT);
74
index XXXXXXX..XXXXXXX
45
+ return BACKUP_CLUSTER_SIZE_DEFAULT;
75
--- /dev/null
46
+ } else if (ret < 0 && !target->backing) {
76
+++ b/hw/remote/proxy.c
47
+ error_setg_errno(errp, -ret,
77
@@ -XXX,XX +XXX,XX @@
48
+ "Couldn't determine the cluster size of the target image, "
78
+/*
49
+ "which has no backing file");
79
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
50
+ error_append_hint(errp,
80
+ *
51
+ "Aborting, since this may create an unusable destination image\n");
81
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
52
+ return ret;
82
+ * See the COPYING file in the top-level directory.
53
+ } else if (ret < 0 && target->backing) {
83
+ *
54
+ /* Not fatal; just trudge on ahead. */
84
+ */
55
+ return BACKUP_CLUSTER_SIZE_DEFAULT;
85
+
86
+#include "qemu/osdep.h"
87
+#include "qemu-common.h"
88
+
89
+#include "hw/remote/proxy.h"
90
+#include "hw/pci/pci.h"
91
+#include "qapi/error.h"
92
+#include "io/channel-util.h"
93
+#include "hw/qdev-properties.h"
94
+#include "monitor/monitor.h"
95
+#include "migration/blocker.h"
96
+#include "qemu/sockets.h"
97
+
98
+static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
99
+{
100
+ ERRP_GUARD();
101
+ PCIProxyDev *dev = PCI_PROXY_DEV(device);
102
+ int fd;
103
+
104
+ if (!dev->fd) {
105
+ error_setg(errp, "fd parameter not specified for %s",
106
+ DEVICE(device)->id);
107
+ return;
56
+ }
108
+ }
57
+
109
+
58
+ return MAX(BACKUP_CLUSTER_SIZE_DEFAULT, bdi.cluster_size);
110
+ fd = monitor_fd_param(monitor_cur(), dev->fd, errp);
111
+ if (fd == -1) {
112
+ error_prepend(errp, "proxy: unable to parse fd %s: ", dev->fd);
113
+ return;
114
+ }
115
+
116
+ if (!fd_is_socket(fd)) {
117
+ error_setg(errp, "proxy: fd %d is not a socket", fd);
118
+ close(fd);
119
+ return;
120
+ }
121
+
122
+ dev->ioc = qio_channel_new_fd(fd, errp);
123
+
124
+ error_setg(&dev->migration_blocker, "%s does not support migration",
125
+ TYPE_PCI_PROXY_DEV);
126
+ migrate_add_blocker(dev->migration_blocker, errp);
127
+
128
+ qemu_mutex_init(&dev->io_mutex);
129
+ qio_channel_set_blocking(dev->ioc, true, NULL);
59
+}
130
+}
60
+
131
+
61
BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
132
+static void pci_proxy_dev_exit(PCIDevice *pdev)
62
BlockDriverState *target, int64_t speed,
133
+{
63
MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
134
+ PCIProxyDev *dev = PCI_PROXY_DEV(pdev);
64
@@ -XXX,XX +XXX,XX @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
135
+
65
JobTxn *txn, Error **errp)
136
+ if (dev->ioc) {
66
{
137
+ qio_channel_close(dev->ioc, NULL);
67
int64_t len;
68
- BlockDriverInfo bdi;
69
BackupBlockJob *job = NULL;
70
int ret;
71
+ int64_t cluster_size;
72
+ HBitmap *copy_bitmap = NULL;
73
74
assert(bs);
75
assert(target);
76
@@ -XXX,XX +XXX,XX @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
77
goto error;
78
}
79
80
+ cluster_size = backup_calculate_cluster_size(target, errp);
81
+ if (cluster_size < 0) {
82
+ goto error;
83
+ }
138
+ }
84
+
139
+
85
+ copy_bitmap = hbitmap_alloc(len, ctz32(cluster_size));
140
+ migrate_del_blocker(dev->migration_blocker);
86
+
141
+
87
/* job->len is fixed, so we can't allow resize */
142
+ error_free(dev->migration_blocker);
88
job = block_job_create(job_id, &backup_job_driver, txn, bs,
143
+}
89
BLK_PERM_CONSISTENT_READ,
144
+
90
@@ -XXX,XX +XXX,XX @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
145
+static Property proxy_properties[] = {
91
146
+ DEFINE_PROP_STRING("fd", PCIProxyDev, fd),
92
/* Detect image-fleecing (and similar) schemes */
147
+ DEFINE_PROP_END_OF_LIST(),
93
job->serialize_target_writes = bdrv_chain_contains(target, bs);
148
+};
94
-
149
+
95
- /* If there is no backing file on the target, we cannot rely on COW if our
150
+static void pci_proxy_dev_class_init(ObjectClass *klass, void *data)
96
- * backup cluster size is smaller than the target cluster size. Even for
151
+{
97
- * targets with a backing file, try to avoid COW if possible. */
152
+ DeviceClass *dc = DEVICE_CLASS(klass);
98
- ret = bdrv_get_info(target, &bdi);
153
+ PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
99
- if (ret == -ENOTSUP && !target->backing) {
154
+
100
- /* Cluster size is not defined */
155
+ k->realize = pci_proxy_dev_realize;
101
- warn_report("The target block device doesn't provide "
156
+ k->exit = pci_proxy_dev_exit;
102
- "information about the block size and it doesn't have a "
157
+ device_class_set_props(dc, proxy_properties);
103
- "backing file. The default block size of %u bytes is "
158
+}
104
- "used. If the actual block size of the target exceeds "
159
+
105
- "this default, the backup may be unusable",
160
+static const TypeInfo pci_proxy_dev_type_info = {
106
- BACKUP_CLUSTER_SIZE_DEFAULT);
161
+ .name = TYPE_PCI_PROXY_DEV,
107
- job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
162
+ .parent = TYPE_PCI_DEVICE,
108
- } else if (ret < 0 && !target->backing) {
163
+ .instance_size = sizeof(PCIProxyDev),
109
- error_setg_errno(errp, -ret,
164
+ .class_init = pci_proxy_dev_class_init,
110
- "Couldn't determine the cluster size of the target image, "
165
+ .interfaces = (InterfaceInfo[]) {
111
- "which has no backing file");
166
+ { INTERFACE_CONVENTIONAL_PCI_DEVICE },
112
- error_append_hint(errp,
167
+ { },
113
- "Aborting, since this may create an unusable destination image\n");
168
+ },
114
- goto error;
169
+};
115
- } else if (ret < 0 && target->backing) {
170
+
116
- /* Not fatal; just trudge on ahead. */
171
+static void pci_proxy_dev_register_types(void)
117
- job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
172
+{
118
- } else {
173
+ type_register_static(&pci_proxy_dev_type_info);
119
- job->cluster_size = MAX(BACKUP_CLUSTER_SIZE_DEFAULT, bdi.cluster_size);
174
+}
120
- }
175
+
121
-
176
+type_init(pci_proxy_dev_register_types)
122
- job->copy_bitmap = hbitmap_alloc(len, ctz32(job->cluster_size));
177
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
123
+ job->cluster_size = cluster_size;
178
index XXXXXXX..XXXXXXX 100644
124
+ job->copy_bitmap = copy_bitmap;
179
--- a/hw/remote/meson.build
125
+ copy_bitmap = NULL;
180
+++ b/hw/remote/meson.build
126
job->use_copy_range = true;
181
@@ -XXX,XX +XXX,XX @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('machine.c'))
127
job->copy_range_size = MIN_NON_ZERO(blk_get_max_transfer(job->common.blk),
182
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('mpqemu-link.c'))
128
blk_get_max_transfer(job->target));
183
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
129
@@ -XXX,XX +XXX,XX @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
184
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
130
return &job->common;
185
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
131
186
132
error:
187
specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
133
+ if (copy_bitmap) {
188
134
+ assert(!job || !job->copy_bitmap);
135
+ hbitmap_free(copy_bitmap);
136
+ }
137
if (sync_bitmap) {
138
bdrv_reclaim_dirty_bitmap(bs, sync_bitmap, NULL);
139
}
140
--
189
--
141
2.21.0
190
2.29.2
142
191
143
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
2
2
3
Split allocation checking to separate function and reduce nesting.
3
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
4
Consider bdrv_is_allocated() fail as allocated area, as copying more
4
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
5
than needed is not wrong (and we do it anyway) and seems better than
5
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
6
fail the whole job. And, most probably we will fail on the next read,
6
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7
if there are real problem with source.
7
Message-id: d54edb4176361eed86b903e8f27058363b6c83b3.1611938319.git.jag.raman@oracle.com
8
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9
---
10
include/hw/remote/mpqemu-link.h | 4 ++++
11
hw/remote/mpqemu-link.c | 34 +++++++++++++++++++++++++++++++++
12
2 files changed, 38 insertions(+)
8
13
9
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
14
diff --git a/include/hw/remote/mpqemu-link.h b/include/hw/remote/mpqemu-link.h
10
Reviewed-by: Max Reitz <mreitz@redhat.com>
11
Message-id: 20190429090842.57910-4-vsementsov@virtuozzo.com
12
Signed-off-by: Max Reitz <mreitz@redhat.com>
13
---
14
block/backup.c | 60 +++++++++++++++++++-------------------------------
15
1 file changed, 23 insertions(+), 37 deletions(-)
16
17
diff --git a/block/backup.c b/block/backup.c
18
index XXXXXXX..XXXXXXX 100644
15
index XXXXXXX..XXXXXXX 100644
19
--- a/block/backup.c
16
--- a/include/hw/remote/mpqemu-link.h
20
+++ b/block/backup.c
17
+++ b/include/hw/remote/mpqemu-link.h
21
@@ -XXX,XX +XXX,XX @@ static bool coroutine_fn yield_and_check(BackupBlockJob *job)
18
@@ -XXX,XX +XXX,XX @@
22
return false;
19
#include "qemu/thread.h"
20
#include "io/channel.h"
21
#include "exec/hwaddr.h"
22
+#include "io/channel-socket.h"
23
+#include "hw/remote/proxy.h"
24
25
#define REMOTE_MAX_FDS 8
26
27
@@ -XXX,XX +XXX,XX @@ typedef struct {
28
bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp);
29
bool mpqemu_msg_recv(MPQemuMsg *msg, QIOChannel *ioc, Error **errp);
30
31
+uint64_t mpqemu_msg_send_and_await_reply(MPQemuMsg *msg, PCIProxyDev *pdev,
32
+ Error **errp);
33
bool mpqemu_msg_valid(MPQemuMsg *msg);
34
35
#endif
36
diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
37
index XXXXXXX..XXXXXXX 100644
38
--- a/hw/remote/mpqemu-link.c
39
+++ b/hw/remote/mpqemu-link.c
40
@@ -XXX,XX +XXX,XX @@ fail:
41
return ret;
23
}
42
}
24
43
25
+static bool bdrv_is_unallocated_range(BlockDriverState *bs,
44
+/*
26
+ int64_t offset, int64_t bytes)
45
+ * Send msg and wait for a reply with command code RET_MSG.
46
+ * Returns the message received of size u64 or UINT64_MAX
47
+ * on error.
48
+ * Called from VCPU thread in non-coroutine context.
49
+ * Used by the Proxy object to communicate to remote processes.
50
+ */
51
+uint64_t mpqemu_msg_send_and_await_reply(MPQemuMsg *msg, PCIProxyDev *pdev,
52
+ Error **errp)
27
+{
53
+{
28
+ int64_t end = offset + bytes;
54
+ ERRP_GUARD();
55
+ MPQemuMsg msg_reply = {0};
56
+ uint64_t ret = UINT64_MAX;
29
+
57
+
30
+ while (offset < end && !bdrv_is_allocated(bs, offset, bytes, &bytes)) {
58
+ assert(!qemu_in_coroutine());
31
+ if (bytes == 0) {
59
+
32
+ return true;
60
+ QEMU_LOCK_GUARD(&pdev->io_mutex);
33
+ }
61
+ if (!mpqemu_msg_send(msg, pdev->ioc, errp)) {
34
+ offset += bytes;
62
+ return ret;
35
+ bytes = end - offset;
36
+ }
63
+ }
37
+
64
+
38
+ return offset >= end;
65
+ if (!mpqemu_msg_recv(&msg_reply, pdev->ioc, errp)) {
66
+ return ret;
67
+ }
68
+
69
+ if (!mpqemu_msg_valid(&msg_reply)) {
70
+ error_setg(errp, "ERROR: Invalid reply received for command %d",
71
+ msg->cmd);
72
+ return ret;
73
+ }
74
+
75
+ return msg_reply.data.u64;
39
+}
76
+}
40
+
77
+
41
static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
78
bool mpqemu_msg_valid(MPQemuMsg *msg)
42
{
79
{
43
int ret;
80
if (msg->cmd >= MPQEMU_CMD_MAX && msg->cmd < 0) {
44
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_run(Job *job, Error **errp)
45
for (offset = 0; offset < s->len;
46
offset += s->cluster_size) {
47
bool error_is_read;
48
- int alloced = 0;
49
50
if (yield_and_check(s)) {
51
break;
52
}
53
54
- if (s->sync_mode == MIRROR_SYNC_MODE_TOP) {
55
- int i;
56
- int64_t n;
57
-
58
- /* Check to see if these blocks are already in the
59
- * backing file. */
60
-
61
- for (i = 0; i < s->cluster_size;) {
62
- /* bdrv_is_allocated() only returns true/false based
63
- * on the first set of sectors it comes across that
64
- * are are all in the same state.
65
- * For that reason we must verify each sector in the
66
- * backup cluster length. We end up copying more than
67
- * needed but at some point that is always the case. */
68
- alloced =
69
- bdrv_is_allocated(bs, offset + i,
70
- s->cluster_size - i, &n);
71
- i += n;
72
-
73
- if (alloced || n == 0) {
74
- break;
75
- }
76
- }
77
-
78
- /* If the above loop never found any sectors that are in
79
- * the topmost image, skip this backup. */
80
- if (alloced == 0) {
81
- continue;
82
- }
83
- }
84
- /* FULL sync mode we copy the whole drive. */
85
- if (alloced < 0) {
86
- ret = alloced;
87
- } else {
88
- ret = backup_do_cow(s, offset, s->cluster_size,
89
- &error_is_read, false);
90
+ if (s->sync_mode == MIRROR_SYNC_MODE_TOP &&
91
+ bdrv_is_unallocated_range(bs, offset, s->cluster_size))
92
+ {
93
+ continue;
94
}
95
+
96
+ ret = backup_do_cow(s, offset, s->cluster_size,
97
+ &error_is_read, false);
98
if (ret < 0) {
99
/* Depending on error action, fail now or retry cluster */
100
BlockErrorAction action =
101
--
81
--
102
2.21.0
82
2.29.2
103
83
104
diff view generated by jsdifflib
1
From: Alberto Garcia <berto@igalia.com>
1
From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
2
2
3
A consequence of the previous patch is that bdrv_attach_child()
3
The Proxy Object sends the PCI config space accesses as messages
4
transfers the reference to child_bs from the caller to parent_bs,
4
to the remote process over the communication channel
5
which will drop it on bdrv_close() or when someone calls
5
6
bdrv_unref_child().
6
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
7
7
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
8
But this only happens when bdrv_attach_child() succeeds. If it fails
8
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
9
then the caller is responsible for dropping the reference to child_bs.
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
10
10
Message-id: d3c94f4618813234655356c60e6f0d0362ff42d6.1611938319.git.jag.raman@oracle.com
11
This patch makes bdrv_attach_child() take the reference also when
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
there is an error, freeing the caller for having to do it.
13
14
A similar situation happens with bdrv_root_attach_child(), so the
15
changes on this patch affect both functions.
16
17
Signed-off-by: Alberto Garcia <berto@igalia.com>
18
Message-id: 20dfb3d9ccec559cdd1a9690146abad5d204a186.1557754872.git.berto@igalia.com
19
[mreitz: Removed now superfluous BdrvChild * variable in
20
bdrv_open_child()]
21
Signed-off-by: Max Reitz <mreitz@redhat.com>
22
---
12
---
23
block.c | 30 ++++++++++++++++++------------
13
include/hw/remote/mpqemu-link.h | 10 ++++++
24
block/block-backend.c | 3 +--
14
hw/remote/message.c | 60 +++++++++++++++++++++++++++++++++
25
block/quorum.c | 1 -
15
hw/remote/mpqemu-link.c | 8 ++++-
26
blockjob.c | 2 +-
16
hw/remote/proxy.c | 55 ++++++++++++++++++++++++++++++
27
4 files changed, 20 insertions(+), 16 deletions(-)
17
4 files changed, 132 insertions(+), 1 deletion(-)
28
18
29
diff --git a/block.c b/block.c
19
diff --git a/include/hw/remote/mpqemu-link.h b/include/hw/remote/mpqemu-link.h
30
index XXXXXXX..XXXXXXX 100644
20
index XXXXXXX..XXXXXXX 100644
31
--- a/block.c
21
--- a/include/hw/remote/mpqemu-link.h
32
+++ b/block.c
22
+++ b/include/hw/remote/mpqemu-link.h
33
@@ -XXX,XX +XXX,XX @@ static void bdrv_replace_child(BdrvChild *child, BlockDriverState *new_bs)
23
@@ -XXX,XX +XXX,XX @@
24
*/
25
typedef enum {
26
MPQEMU_CMD_SYNC_SYSMEM,
27
+ MPQEMU_CMD_RET,
28
+ MPQEMU_CMD_PCI_CFGWRITE,
29
+ MPQEMU_CMD_PCI_CFGREAD,
30
MPQEMU_CMD_MAX,
31
} MPQemuCmd;
32
33
@@ -XXX,XX +XXX,XX @@ typedef struct {
34
off_t offsets[REMOTE_MAX_FDS];
35
} SyncSysmemMsg;
36
37
+typedef struct {
38
+ uint32_t addr;
39
+ uint32_t val;
40
+ int len;
41
+} PciConfDataMsg;
42
+
43
/**
44
* MPQemuMsg:
45
* @cmd: The remote command
46
@@ -XXX,XX +XXX,XX @@ typedef struct {
47
48
union {
49
uint64_t u64;
50
+ PciConfDataMsg pci_conf_data;
51
SyncSysmemMsg sync_sysmem;
52
} data;
53
54
diff --git a/hw/remote/message.c b/hw/remote/message.c
55
index XXXXXXX..XXXXXXX 100644
56
--- a/hw/remote/message.c
57
+++ b/hw/remote/message.c
58
@@ -XXX,XX +XXX,XX @@
59
#include "hw/remote/mpqemu-link.h"
60
#include "qapi/error.h"
61
#include "sysemu/runstate.h"
62
+#include "hw/pci/pci.h"
63
+
64
+static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
65
+ MPQemuMsg *msg, Error **errp);
66
+static void process_config_read(QIOChannel *ioc, PCIDevice *dev,
67
+ MPQemuMsg *msg, Error **errp);
68
69
void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
70
{
71
@@ -XXX,XX +XXX,XX @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
72
}
73
74
switch (msg.cmd) {
75
+ case MPQEMU_CMD_PCI_CFGWRITE:
76
+ process_config_write(com->ioc, pci_dev, &msg, &local_err);
77
+ break;
78
+ case MPQEMU_CMD_PCI_CFGREAD:
79
+ process_config_read(com->ioc, pci_dev, &msg, &local_err);
80
+ break;
81
default:
82
error_setg(&local_err,
83
"Unknown command (%d) received for device %s"
84
@@ -XXX,XX +XXX,XX @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
85
qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
34
}
86
}
35
}
87
}
36
88
+
37
+/*
89
+static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
38
+ * This function steals the reference to child_bs from the caller.
90
+ MPQemuMsg *msg, Error **errp)
39
+ * That reference is later dropped by bdrv_root_unref_child().
91
+{
40
+ *
92
+ ERRP_GUARD();
41
+ * On failure NULL is returned, errp is set and the reference to
93
+ PciConfDataMsg *conf = (PciConfDataMsg *)&msg->data.pci_conf_data;
42
+ * child_bs is also dropped.
94
+ MPQemuMsg ret = { 0 };
43
+ */
95
+
44
BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
96
+ if ((conf->addr + sizeof(conf->val)) > pci_config_size(dev)) {
45
const char *child_name,
97
+ error_setg(errp, "Bad address for PCI config write, pid "FMT_pid".",
46
const BdrvChildRole *child_role,
98
+ getpid());
47
@@ -XXX,XX +XXX,XX @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
99
+ ret.data.u64 = UINT64_MAX;
48
ret = bdrv_check_update_perm(child_bs, NULL, perm, shared_perm, NULL, errp);
100
+ } else {
49
if (ret < 0) {
101
+ pci_default_write_config(dev, conf->addr, conf->val, conf->len);
50
bdrv_abort_perm_update(child_bs);
102
+ }
51
+ bdrv_unref(child_bs);
103
+
52
return NULL;
104
+ ret.cmd = MPQEMU_CMD_RET;
105
+ ret.size = sizeof(ret.data.u64);
106
+
107
+ if (!mpqemu_msg_send(&ret, ioc, NULL)) {
108
+ error_prepend(errp, "Error returning code to proxy, pid "FMT_pid": ",
109
+ getpid());
110
+ }
111
+}
112
+
113
+static void process_config_read(QIOChannel *ioc, PCIDevice *dev,
114
+ MPQemuMsg *msg, Error **errp)
115
+{
116
+ ERRP_GUARD();
117
+ PciConfDataMsg *conf = (PciConfDataMsg *)&msg->data.pci_conf_data;
118
+ MPQemuMsg ret = { 0 };
119
+
120
+ if ((conf->addr + sizeof(conf->val)) > pci_config_size(dev)) {
121
+ error_setg(errp, "Bad address for PCI config read, pid "FMT_pid".",
122
+ getpid());
123
+ ret.data.u64 = UINT64_MAX;
124
+ } else {
125
+ ret.data.u64 = pci_default_read_config(dev, conf->addr, conf->len);
126
+ }
127
+
128
+ ret.cmd = MPQEMU_CMD_RET;
129
+ ret.size = sizeof(ret.data.u64);
130
+
131
+ if (!mpqemu_msg_send(&ret, ioc, NULL)) {
132
+ error_prepend(errp, "Error returning code to proxy, pid "FMT_pid": ",
133
+ getpid());
134
+ }
135
+}
136
diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
137
index XXXXXXX..XXXXXXX 100644
138
--- a/hw/remote/mpqemu-link.c
139
+++ b/hw/remote/mpqemu-link.c
140
@@ -XXX,XX +XXX,XX @@ uint64_t mpqemu_msg_send_and_await_reply(MPQemuMsg *msg, PCIProxyDev *pdev,
141
return ret;
53
}
142
}
54
143
55
@@ -XXX,XX +XXX,XX @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
144
- if (!mpqemu_msg_valid(&msg_reply)) {
56
return child;
145
+ if (!mpqemu_msg_valid(&msg_reply) || msg_reply.cmd != MPQEMU_CMD_RET) {
146
error_setg(errp, "ERROR: Invalid reply received for command %d",
147
msg->cmd);
148
return ret;
149
@@ -XXX,XX +XXX,XX @@ bool mpqemu_msg_valid(MPQemuMsg *msg)
150
return false;
151
}
152
break;
153
+ case MPQEMU_CMD_PCI_CFGWRITE:
154
+ case MPQEMU_CMD_PCI_CFGREAD:
155
+ if (msg->size != sizeof(PciConfDataMsg)) {
156
+ return false;
157
+ }
158
+ break;
159
default:
160
break;
161
}
162
diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
163
index XXXXXXX..XXXXXXX 100644
164
--- a/hw/remote/proxy.c
165
+++ b/hw/remote/proxy.c
166
@@ -XXX,XX +XXX,XX @@
167
#include "monitor/monitor.h"
168
#include "migration/blocker.h"
169
#include "qemu/sockets.h"
170
+#include "hw/remote/mpqemu-link.h"
171
+#include "qemu/error-report.h"
172
173
static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
174
{
175
@@ -XXX,XX +XXX,XX @@ static void pci_proxy_dev_exit(PCIDevice *pdev)
176
error_free(dev->migration_blocker);
57
}
177
}
58
178
59
+/*
179
+static void config_op_send(PCIProxyDev *pdev, uint32_t addr, uint32_t *val,
60
+ * This function transfers the reference to child_bs from the caller
180
+ int len, unsigned int op)
61
+ * to parent_bs. That reference is later dropped by parent_bs on
181
+{
62
+ * bdrv_close() or if someone calls bdrv_unref_child().
182
+ MPQemuMsg msg = { 0 };
63
+ *
183
+ uint64_t ret = -EINVAL;
64
+ * On failure NULL is returned, errp is set and the reference to
184
+ Error *local_err = NULL;
65
+ * child_bs is also dropped.
185
+
66
+ */
186
+ msg.cmd = op;
67
BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
187
+ msg.data.pci_conf_data.addr = addr;
68
BlockDriverState *child_bs,
188
+ msg.data.pci_conf_data.val = (op == MPQEMU_CMD_PCI_CFGWRITE) ? *val : 0;
69
const char *child_name,
189
+ msg.data.pci_conf_data.len = len;
70
@@ -XXX,XX +XXX,XX @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
190
+ msg.size = sizeof(PciConfDataMsg);
71
/* If backing_hd was already part of bs's backing chain, and
191
+
72
* inherits_from pointed recursively to bs then let's update it to
192
+ ret = mpqemu_msg_send_and_await_reply(&msg, pdev, &local_err);
73
* point directly to bs (else it will become NULL). */
193
+ if (local_err) {
74
- if (update_inherits_from) {
194
+ error_report_err(local_err);
75
+ if (bs->backing && update_inherits_from) {
195
+ }
76
backing_hd->inherits_from = bs;
196
+
77
}
197
+ if (ret == UINT64_MAX) {
78
- if (!bs->backing) {
198
+ error_report("Failed to perform PCI config %s operation",
79
- bdrv_unref(backing_hd);
199
+ (op == MPQEMU_CMD_PCI_CFGREAD) ? "READ" : "WRITE");
80
- }
200
+ }
81
201
+
82
out:
202
+ if (op == MPQEMU_CMD_PCI_CFGREAD) {
83
bdrv_refresh_limits(bs, NULL);
203
+ *val = (uint32_t)ret;
84
@@ -XXX,XX +XXX,XX @@ BdrvChild *bdrv_open_child(const char *filename,
204
+ }
85
const BdrvChildRole *child_role,
205
+}
86
bool allow_none, Error **errp)
206
+
87
{
207
+static uint32_t pci_proxy_read_config(PCIDevice *d, uint32_t addr, int len)
88
- BdrvChild *c;
208
+{
89
BlockDriverState *bs;
209
+ uint32_t val;
90
210
+
91
bs = bdrv_open_child_bs(filename, options, bdref_key, parent, child_role,
211
+ config_op_send(PCI_PROXY_DEV(d), addr, &val, len, MPQEMU_CMD_PCI_CFGREAD);
92
@@ -XXX,XX +XXX,XX @@ BdrvChild *bdrv_open_child(const char *filename,
212
+
93
return NULL;
213
+ return val;
94
}
214
+}
95
215
+
96
- c = bdrv_attach_child(parent, bs, bdref_key, child_role, errp);
216
+static void pci_proxy_write_config(PCIDevice *d, uint32_t addr, uint32_t val,
97
- if (!c) {
217
+ int len)
98
- bdrv_unref(bs);
218
+{
99
- return NULL;
219
+ /*
100
- }
220
+ * Some of the functions access the copy of remote device's PCI config
101
-
221
+ * space which is cached in the proxy device. Therefore, maintain
102
- return c;
222
+ * it updated.
103
+ return bdrv_attach_child(parent, bs, bdref_key, child_role, errp);
223
+ */
224
+ pci_default_write_config(d, addr, val, len);
225
+
226
+ config_op_send(PCI_PROXY_DEV(d), addr, &val, len, MPQEMU_CMD_PCI_CFGWRITE);
227
+}
228
+
229
static Property proxy_properties[] = {
230
DEFINE_PROP_STRING("fd", PCIProxyDev, fd),
231
DEFINE_PROP_END_OF_LIST(),
232
@@ -XXX,XX +XXX,XX @@ static void pci_proxy_dev_class_init(ObjectClass *klass, void *data)
233
234
k->realize = pci_proxy_dev_realize;
235
k->exit = pci_proxy_dev_exit;
236
+ k->config_read = pci_proxy_read_config;
237
+ k->config_write = pci_proxy_write_config;
238
+
239
device_class_set_props(dc, proxy_properties);
104
}
240
}
105
241
106
/* TODO Future callers may need to specify parent/child_role in order for
107
diff --git a/block/block-backend.c b/block/block-backend.c
108
index XXXXXXX..XXXXXXX 100644
109
--- a/block/block-backend.c
110
+++ b/block/block-backend.c
111
@@ -XXX,XX +XXX,XX @@ BlockBackend *blk_new_open(const char *filename, const char *reference,
112
blk->root = bdrv_root_attach_child(bs, "root", &child_root,
113
perm, BLK_PERM_ALL, blk, errp);
114
if (!blk->root) {
115
- bdrv_unref(bs);
116
blk_unref(blk);
117
return NULL;
118
}
119
@@ -XXX,XX +XXX,XX @@ void blk_remove_bs(BlockBackend *blk)
120
int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp)
121
{
122
ThrottleGroupMember *tgm = &blk->public.throttle_group_member;
123
+ bdrv_ref(bs);
124
blk->root = bdrv_root_attach_child(bs, "root", &child_root,
125
blk->perm, blk->shared_perm, blk, errp);
126
if (blk->root == NULL) {
127
return -EPERM;
128
}
129
- bdrv_ref(bs);
130
131
notifier_list_notify(&blk->insert_bs_notifiers, blk);
132
if (tgm->throttle_state) {
133
diff --git a/block/quorum.c b/block/quorum.c
134
index XXXXXXX..XXXXXXX 100644
135
--- a/block/quorum.c
136
+++ b/block/quorum.c
137
@@ -XXX,XX +XXX,XX @@ static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
138
child = bdrv_attach_child(bs, child_bs, indexstr, &child_format, errp);
139
if (child == NULL) {
140
s->next_child_index--;
141
- bdrv_unref(child_bs);
142
goto out;
143
}
144
s->children = g_renew(BdrvChild *, s->children, s->num_children + 1);
145
diff --git a/blockjob.c b/blockjob.c
146
index XXXXXXX..XXXXXXX 100644
147
--- a/blockjob.c
148
+++ b/blockjob.c
149
@@ -XXX,XX +XXX,XX @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
150
{
151
BdrvChild *c;
152
153
+ bdrv_ref(bs);
154
c = bdrv_root_attach_child(bs, name, &child_job, perm, shared_perm,
155
job, errp);
156
if (c == NULL) {
157
@@ -XXX,XX +XXX,XX @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
158
}
159
160
job->nodes = g_slist_prepend(job->nodes, c);
161
- bdrv_ref(bs);
162
bdrv_op_block_all(bs, job->blocker);
163
164
return 0;
165
--
242
--
166
2.21.0
243
2.29.2
167
244
168
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Jagannathan Raman <jag.raman@oracle.com>
2
2
3
qcow2.h depends on block_int.h. Compilation isn't broken currently only
3
Proxy device object implements handler for PCI BAR writes and reads.
4
due to block_int.h always included before qcow2.h. Though, it seems
4
The handler uses BAR_WRITE/BAR_READ message to communicate to the
5
better to directly include block_int.h in qcow2.h.
5
remote process with the BAR address and value to be written/read.
6
6
The remote process implements handler for BAR_WRITE/BAR_READ
7
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
7
message.
8
Reviewed-by: Alberto Garcia <berto@igalia.com>
8
9
Reviewed-by: Max Reitz <mreitz@redhat.com>
9
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
10
Message-id: 20190506142741.41731-2-vsementsov@virtuozzo.com
10
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
11
Signed-off-by: Max Reitz <mreitz@redhat.com>
11
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
12
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
13
Message-id: a8b76714a9688be5552c4c92d089bc9e8a4707ff.1611938319.git.jag.raman@oracle.com
14
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
15
---
13
block/qcow2.h | 1 +
16
include/hw/remote/mpqemu-link.h | 10 ++++
14
block/qcow2-bitmap.c | 1 -
17
include/hw/remote/proxy.h | 9 ++++
15
block/qcow2-cache.c | 1 -
18
hw/remote/message.c | 83 +++++++++++++++++++++++++++++++++
16
block/qcow2-cluster.c | 1 -
19
hw/remote/mpqemu-link.c | 6 +++
17
block/qcow2-refcount.c | 1 -
20
hw/remote/proxy.c | 60 ++++++++++++++++++++++++
18
block/qcow2-snapshot.c | 1 -
21
5 files changed, 168 insertions(+)
19
block/qcow2.c | 1 -
22
20
7 files changed, 1 insertion(+), 6 deletions(-)
23
diff --git a/include/hw/remote/mpqemu-link.h b/include/hw/remote/mpqemu-link.h
21
24
index XXXXXXX..XXXXXXX 100644
22
diff --git a/block/qcow2.h b/block/qcow2.h
25
--- a/include/hw/remote/mpqemu-link.h
23
index XXXXXXX..XXXXXXX 100644
26
+++ b/include/hw/remote/mpqemu-link.h
24
--- a/block/qcow2.h
27
@@ -XXX,XX +XXX,XX @@ typedef enum {
25
+++ b/block/qcow2.h
28
MPQEMU_CMD_RET,
29
MPQEMU_CMD_PCI_CFGWRITE,
30
MPQEMU_CMD_PCI_CFGREAD,
31
+ MPQEMU_CMD_BAR_WRITE,
32
+ MPQEMU_CMD_BAR_READ,
33
MPQEMU_CMD_MAX,
34
} MPQemuCmd;
35
36
@@ -XXX,XX +XXX,XX @@ typedef struct {
37
int len;
38
} PciConfDataMsg;
39
40
+typedef struct {
41
+ hwaddr addr;
42
+ uint64_t val;
43
+ unsigned size;
44
+ bool memory;
45
+} BarAccessMsg;
46
+
47
/**
48
* MPQemuMsg:
49
* @cmd: The remote command
50
@@ -XXX,XX +XXX,XX @@ typedef struct {
51
uint64_t u64;
52
PciConfDataMsg pci_conf_data;
53
SyncSysmemMsg sync_sysmem;
54
+ BarAccessMsg bar_access;
55
} data;
56
57
int fds[REMOTE_MAX_FDS];
58
diff --git a/include/hw/remote/proxy.h b/include/hw/remote/proxy.h
59
index XXXXXXX..XXXXXXX 100644
60
--- a/include/hw/remote/proxy.h
61
+++ b/include/hw/remote/proxy.h
26
@@ -XXX,XX +XXX,XX @@
62
@@ -XXX,XX +XXX,XX @@
27
#include "crypto/block.h"
63
#define TYPE_PCI_PROXY_DEV "x-pci-proxy-dev"
28
#include "qemu/coroutine.h"
64
OBJECT_DECLARE_SIMPLE_TYPE(PCIProxyDev, PCI_PROXY_DEV)
29
#include "qemu/units.h"
65
30
+#include "block/block_int.h"
66
+typedef struct ProxyMemoryRegion {
31
67
+ PCIProxyDev *dev;
32
//#define DEBUG_ALLOC
68
+ MemoryRegion mr;
33
//#define DEBUG_ALLOC2
69
+ bool memory;
34
diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
70
+ bool present;
35
index XXXXXXX..XXXXXXX 100644
71
+ uint8_t type;
36
--- a/block/qcow2-bitmap.c
72
+} ProxyMemoryRegion;
37
+++ b/block/qcow2-bitmap.c
73
+
74
struct PCIProxyDev {
75
PCIDevice parent_dev;
76
char *fd;
77
@@ -XXX,XX +XXX,XX @@ struct PCIProxyDev {
78
QemuMutex io_mutex;
79
QIOChannel *ioc;
80
Error *migration_blocker;
81
+ ProxyMemoryRegion region[PCI_NUM_REGIONS];
82
};
83
84
#endif /* PROXY_H */
85
diff --git a/hw/remote/message.c b/hw/remote/message.c
86
index XXXXXXX..XXXXXXX 100644
87
--- a/hw/remote/message.c
88
+++ b/hw/remote/message.c
38
@@ -XXX,XX +XXX,XX @@
89
@@ -XXX,XX +XXX,XX @@
39
#include "qapi/error.h"
90
#include "qapi/error.h"
40
#include "qemu/cutils.h"
91
#include "sysemu/runstate.h"
41
92
#include "hw/pci/pci.h"
42
-#include "block/block_int.h"
93
+#include "exec/memattrs.h"
43
#include "qcow2.h"
94
44
95
static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
45
/* NOTICE: BME here means Bitmaps Extension and used as a namespace for
96
MPQemuMsg *msg, Error **errp);
46
diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
97
static void process_config_read(QIOChannel *ioc, PCIDevice *dev,
47
index XXXXXXX..XXXXXXX 100644
98
MPQemuMsg *msg, Error **errp);
48
--- a/block/qcow2-cache.c
99
+static void process_bar_write(QIOChannel *ioc, MPQemuMsg *msg, Error **errp);
49
+++ b/block/qcow2-cache.c
100
+static void process_bar_read(QIOChannel *ioc, MPQemuMsg *msg, Error **errp);
50
@@ -XXX,XX +XXX,XX @@
101
51
*/
102
void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
52
103
{
53
#include "qemu/osdep.h"
104
@@ -XXX,XX +XXX,XX @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
54
-#include "block/block_int.h"
105
case MPQEMU_CMD_PCI_CFGREAD:
55
#include "qemu-common.h"
106
process_config_read(com->ioc, pci_dev, &msg, &local_err);
56
#include "qcow2.h"
107
break;
57
#include "trace.h"
108
+ case MPQEMU_CMD_BAR_WRITE:
58
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
109
+ process_bar_write(com->ioc, &msg, &local_err);
59
index XXXXXXX..XXXXXXX 100644
110
+ break;
60
--- a/block/qcow2-cluster.c
111
+ case MPQEMU_CMD_BAR_READ:
61
+++ b/block/qcow2-cluster.c
112
+ process_bar_read(com->ioc, &msg, &local_err);
62
@@ -XXX,XX +XXX,XX @@
113
+ break;
63
114
default:
64
#include "qapi/error.h"
115
error_setg(&local_err,
65
#include "qemu-common.h"
116
"Unknown command (%d) received for device %s"
66
-#include "block/block_int.h"
117
@@ -XXX,XX +XXX,XX @@ static void process_config_read(QIOChannel *ioc, PCIDevice *dev,
67
#include "qcow2.h"
118
getpid());
68
#include "qemu/bswap.h"
119
}
69
#include "trace.h"
120
}
70
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
121
+
71
index XXXXXXX..XXXXXXX 100644
122
+static void process_bar_write(QIOChannel *ioc, MPQemuMsg *msg, Error **errp)
72
--- a/block/qcow2-refcount.c
123
+{
73
+++ b/block/qcow2-refcount.c
124
+ ERRP_GUARD();
74
@@ -XXX,XX +XXX,XX @@
125
+ BarAccessMsg *bar_access = &msg->data.bar_access;
75
#include "qemu/osdep.h"
126
+ AddressSpace *as =
76
#include "qapi/error.h"
127
+ bar_access->memory ? &address_space_memory : &address_space_io;
77
#include "qemu-common.h"
128
+ MPQemuMsg ret = { 0 };
78
-#include "block/block_int.h"
129
+ MemTxResult res;
79
#include "qcow2.h"
130
+ uint64_t val;
80
#include "qemu/range.h"
131
+
81
#include "qemu/bswap.h"
132
+ if (!is_power_of_2(bar_access->size) ||
82
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
133
+ (bar_access->size > sizeof(uint64_t))) {
83
index XXXXXXX..XXXXXXX 100644
134
+ ret.data.u64 = UINT64_MAX;
84
--- a/block/qcow2-snapshot.c
135
+ goto fail;
85
+++ b/block/qcow2-snapshot.c
136
+ }
86
@@ -XXX,XX +XXX,XX @@
137
+
87
138
+ val = cpu_to_le64(bar_access->val);
88
#include "qemu/osdep.h"
139
+
89
#include "qapi/error.h"
140
+ res = address_space_rw(as, bar_access->addr, MEMTXATTRS_UNSPECIFIED,
90
-#include "block/block_int.h"
141
+ (void *)&val, bar_access->size, true);
91
#include "qcow2.h"
142
+
92
#include "qemu/bswap.h"
143
+ if (res != MEMTX_OK) {
93
#include "qemu/error-report.h"
144
+ error_setg(errp, "Bad address %"PRIx64" for mem write, pid "FMT_pid".",
94
diff --git a/block/qcow2.c b/block/qcow2.c
145
+ bar_access->addr, getpid());
95
index XXXXXXX..XXXXXXX 100644
146
+ ret.data.u64 = -1;
96
--- a/block/qcow2.c
147
+ }
97
+++ b/block/qcow2.c
148
+
98
@@ -XXX,XX +XXX,XX @@
149
+fail:
99
#define ZLIB_CONST
150
+ ret.cmd = MPQEMU_CMD_RET;
100
#include <zlib.h>
151
+ ret.size = sizeof(ret.data.u64);
101
152
+
102
-#include "block/block_int.h"
153
+ if (!mpqemu_msg_send(&ret, ioc, NULL)) {
103
#include "block/qdict.h"
154
+ error_prepend(errp, "Error returning code to proxy, pid "FMT_pid": ",
104
#include "sysemu/block-backend.h"
155
+ getpid());
105
#include "qemu/module.h"
156
+ }
157
+}
158
+
159
+static void process_bar_read(QIOChannel *ioc, MPQemuMsg *msg, Error **errp)
160
+{
161
+ ERRP_GUARD();
162
+ BarAccessMsg *bar_access = &msg->data.bar_access;
163
+ MPQemuMsg ret = { 0 };
164
+ AddressSpace *as;
165
+ MemTxResult res;
166
+ uint64_t val = 0;
167
+
168
+ as = bar_access->memory ? &address_space_memory : &address_space_io;
169
+
170
+ if (!is_power_of_2(bar_access->size) ||
171
+ (bar_access->size > sizeof(uint64_t))) {
172
+ val = UINT64_MAX;
173
+ goto fail;
174
+ }
175
+
176
+ res = address_space_rw(as, bar_access->addr, MEMTXATTRS_UNSPECIFIED,
177
+ (void *)&val, bar_access->size, false);
178
+
179
+ if (res != MEMTX_OK) {
180
+ error_setg(errp, "Bad address %"PRIx64" for mem read, pid "FMT_pid".",
181
+ bar_access->addr, getpid());
182
+ val = UINT64_MAX;
183
+ }
184
+
185
+fail:
186
+ ret.cmd = MPQEMU_CMD_RET;
187
+ ret.data.u64 = le64_to_cpu(val);
188
+ ret.size = sizeof(ret.data.u64);
189
+
190
+ if (!mpqemu_msg_send(&ret, ioc, NULL)) {
191
+ error_prepend(errp, "Error returning code to proxy, pid "FMT_pid": ",
192
+ getpid());
193
+ }
194
+}
195
diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
196
index XXXXXXX..XXXXXXX 100644
197
--- a/hw/remote/mpqemu-link.c
198
+++ b/hw/remote/mpqemu-link.c
199
@@ -XXX,XX +XXX,XX @@ bool mpqemu_msg_valid(MPQemuMsg *msg)
200
return false;
201
}
202
break;
203
+ case MPQEMU_CMD_BAR_WRITE:
204
+ case MPQEMU_CMD_BAR_READ:
205
+ if ((msg->size != sizeof(BarAccessMsg)) || (msg->num_fds != 0)) {
206
+ return false;
207
+ }
208
+ break;
209
default:
210
break;
211
}
212
diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
213
index XXXXXXX..XXXXXXX 100644
214
--- a/hw/remote/proxy.c
215
+++ b/hw/remote/proxy.c
216
@@ -XXX,XX +XXX,XX @@ static void pci_proxy_dev_register_types(void)
217
}
218
219
type_init(pci_proxy_dev_register_types)
220
+
221
+static void send_bar_access_msg(PCIProxyDev *pdev, MemoryRegion *mr,
222
+ bool write, hwaddr addr, uint64_t *val,
223
+ unsigned size, bool memory)
224
+{
225
+ MPQemuMsg msg = { 0 };
226
+ long ret = -EINVAL;
227
+ Error *local_err = NULL;
228
+
229
+ msg.size = sizeof(BarAccessMsg);
230
+ msg.data.bar_access.addr = mr->addr + addr;
231
+ msg.data.bar_access.size = size;
232
+ msg.data.bar_access.memory = memory;
233
+
234
+ if (write) {
235
+ msg.cmd = MPQEMU_CMD_BAR_WRITE;
236
+ msg.data.bar_access.val = *val;
237
+ } else {
238
+ msg.cmd = MPQEMU_CMD_BAR_READ;
239
+ }
240
+
241
+ ret = mpqemu_msg_send_and_await_reply(&msg, pdev, &local_err);
242
+ if (local_err) {
243
+ error_report_err(local_err);
244
+ }
245
+
246
+ if (!write) {
247
+ *val = ret;
248
+ }
249
+}
250
+
251
+static void proxy_bar_write(void *opaque, hwaddr addr, uint64_t val,
252
+ unsigned size)
253
+{
254
+ ProxyMemoryRegion *pmr = opaque;
255
+
256
+ send_bar_access_msg(pmr->dev, &pmr->mr, true, addr, &val, size,
257
+ pmr->memory);
258
+}
259
+
260
+static uint64_t proxy_bar_read(void *opaque, hwaddr addr, unsigned size)
261
+{
262
+ ProxyMemoryRegion *pmr = opaque;
263
+ uint64_t val;
264
+
265
+ send_bar_access_msg(pmr->dev, &pmr->mr, false, addr, &val, size,
266
+ pmr->memory);
267
+
268
+ return val;
269
+}
270
+
271
+const MemoryRegionOps proxy_mr_ops = {
272
+ .read = proxy_bar_read,
273
+ .write = proxy_bar_write,
274
+ .endianness = DEVICE_NATIVE_ENDIAN,
275
+ .impl = {
276
+ .min_access_size = 1,
277
+ .max_access_size = 8,
278
+ },
279
+};
106
--
280
--
107
2.21.0
281
2.29.2
108
282
109
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Jagannathan Raman <jag.raman@oracle.com>
2
2
3
Do full, top and incremental mode copying all in one place. This
3
Add ProxyMemoryListener object which is used to keep the view of the RAM
4
unifies the code path and helps further improvements.
4
in sync between QEMU and remote process.
5
A MemoryListener is registered for system-memory AddressSpace. The
6
listener sends SYNC_SYSMEM message to the remote process when memory
7
listener commits the changes to memory, the remote process receives
8
the message and processes it in the handler for SYNC_SYSMEM message.
5
9
6
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
10
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
7
Reviewed-by: Max Reitz <mreitz@redhat.com>
11
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
8
Message-id: 20190429090842.57910-5-vsementsov@virtuozzo.com
12
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
9
Signed-off-by: Max Reitz <mreitz@redhat.com>
13
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
14
Message-id: 04fe4e6a9ca90d4f11ab6f59be7652f5b086a071.1611938319.git.jag.raman@oracle.com
15
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
10
---
16
---
11
block/backup.c | 43 ++++++++++---------------------------------
17
MAINTAINERS | 2 +
12
1 file changed, 10 insertions(+), 33 deletions(-)
18
include/hw/remote/proxy-memory-listener.h | 28 +++
19
include/hw/remote/proxy.h | 2 +
20
hw/remote/message.c | 4 +
21
hw/remote/proxy-memory-listener.c | 227 ++++++++++++++++++++++
22
hw/remote/proxy.c | 6 +
23
hw/remote/meson.build | 1 +
24
7 files changed, 270 insertions(+)
25
create mode 100644 include/hw/remote/proxy-memory-listener.h
26
create mode 100644 hw/remote/proxy-memory-listener.c
13
27
14
diff --git a/block/backup.c b/block/backup.c
28
diff --git a/MAINTAINERS b/MAINTAINERS
15
index XXXXXXX..XXXXXXX 100644
29
index XXXXXXX..XXXXXXX 100644
16
--- a/block/backup.c
30
--- a/MAINTAINERS
17
+++ b/block/backup.c
31
+++ b/MAINTAINERS
18
@@ -XXX,XX +XXX,XX @@ static bool bdrv_is_unallocated_range(BlockDriverState *bs,
32
@@ -XXX,XX +XXX,XX @@ F: include/hw/remote/memory.h
19
return offset >= end;
33
F: hw/remote/memory.c
34
F: hw/remote/proxy.c
35
F: include/hw/remote/proxy.h
36
+F: hw/remote/proxy-memory-listener.c
37
+F: include/hw/remote/proxy-memory-listener.h
38
39
Build and test automation
40
-------------------------
41
diff --git a/include/hw/remote/proxy-memory-listener.h b/include/hw/remote/proxy-memory-listener.h
42
new file mode 100644
43
index XXXXXXX..XXXXXXX
44
--- /dev/null
45
+++ b/include/hw/remote/proxy-memory-listener.h
46
@@ -XXX,XX +XXX,XX @@
47
+/*
48
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
49
+ *
50
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
51
+ * See the COPYING file in the top-level directory.
52
+ *
53
+ */
54
+
55
+#ifndef PROXY_MEMORY_LISTENER_H
56
+#define PROXY_MEMORY_LISTENER_H
57
+
58
+#include "exec/memory.h"
59
+#include "io/channel.h"
60
+
61
+typedef struct ProxyMemoryListener {
62
+ MemoryListener listener;
63
+
64
+ int n_mr_sections;
65
+ MemoryRegionSection *mr_sections;
66
+
67
+ QIOChannel *ioc;
68
+} ProxyMemoryListener;
69
+
70
+void proxy_memory_listener_configure(ProxyMemoryListener *proxy_listener,
71
+ QIOChannel *ioc);
72
+void proxy_memory_listener_deconfigure(ProxyMemoryListener *proxy_listener);
73
+
74
+#endif
75
diff --git a/include/hw/remote/proxy.h b/include/hw/remote/proxy.h
76
index XXXXXXX..XXXXXXX 100644
77
--- a/include/hw/remote/proxy.h
78
+++ b/include/hw/remote/proxy.h
79
@@ -XXX,XX +XXX,XX @@
80
81
#include "hw/pci/pci.h"
82
#include "io/channel.h"
83
+#include "hw/remote/proxy-memory-listener.h"
84
85
#define TYPE_PCI_PROXY_DEV "x-pci-proxy-dev"
86
OBJECT_DECLARE_SIMPLE_TYPE(PCIProxyDev, PCI_PROXY_DEV)
87
@@ -XXX,XX +XXX,XX @@ struct PCIProxyDev {
88
QemuMutex io_mutex;
89
QIOChannel *ioc;
90
Error *migration_blocker;
91
+ ProxyMemoryListener proxy_listener;
92
ProxyMemoryRegion region[PCI_NUM_REGIONS];
93
};
94
95
diff --git a/hw/remote/message.c b/hw/remote/message.c
96
index XXXXXXX..XXXXXXX 100644
97
--- a/hw/remote/message.c
98
+++ b/hw/remote/message.c
99
@@ -XXX,XX +XXX,XX @@
100
#include "sysemu/runstate.h"
101
#include "hw/pci/pci.h"
102
#include "exec/memattrs.h"
103
+#include "hw/remote/memory.h"
104
105
static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
106
MPQemuMsg *msg, Error **errp);
107
@@ -XXX,XX +XXX,XX @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
108
case MPQEMU_CMD_BAR_READ:
109
process_bar_read(com->ioc, &msg, &local_err);
110
break;
111
+ case MPQEMU_CMD_SYNC_SYSMEM:
112
+ remote_sysmem_reconfig(&msg, &local_err);
113
+ break;
114
default:
115
error_setg(&local_err,
116
"Unknown command (%d) received for device %s"
117
diff --git a/hw/remote/proxy-memory-listener.c b/hw/remote/proxy-memory-listener.c
118
new file mode 100644
119
index XXXXXXX..XXXXXXX
120
--- /dev/null
121
+++ b/hw/remote/proxy-memory-listener.c
122
@@ -XXX,XX +XXX,XX @@
123
+/*
124
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
125
+ *
126
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
127
+ * See the COPYING file in the top-level directory.
128
+ *
129
+ */
130
+
131
+#include "qemu/osdep.h"
132
+#include "qemu-common.h"
133
+
134
+#include "qemu/compiler.h"
135
+#include "qemu/int128.h"
136
+#include "qemu/range.h"
137
+#include "exec/memory.h"
138
+#include "exec/cpu-common.h"
139
+#include "cpu.h"
140
+#include "exec/ram_addr.h"
141
+#include "exec/address-spaces.h"
142
+#include "qapi/error.h"
143
+#include "hw/remote/mpqemu-link.h"
144
+#include "hw/remote/proxy-memory-listener.h"
145
+
146
+/*
147
+ * TODO: get_fd_from_hostaddr(), proxy_mrs_can_merge() and
148
+ * proxy_memory_listener_commit() defined below perform tasks similar to the
149
+ * functions defined in vhost-user.c. These functions are good candidates
150
+ * for refactoring.
151
+ *
152
+ */
153
+
154
+static void proxy_memory_listener_reset(MemoryListener *listener)
155
+{
156
+ ProxyMemoryListener *proxy_listener = container_of(listener,
157
+ ProxyMemoryListener,
158
+ listener);
159
+ int mrs;
160
+
161
+ for (mrs = 0; mrs < proxy_listener->n_mr_sections; mrs++) {
162
+ memory_region_unref(proxy_listener->mr_sections[mrs].mr);
163
+ }
164
+
165
+ g_free(proxy_listener->mr_sections);
166
+ proxy_listener->mr_sections = NULL;
167
+ proxy_listener->n_mr_sections = 0;
168
+}
169
+
170
+static int get_fd_from_hostaddr(uint64_t host, ram_addr_t *offset)
171
+{
172
+ MemoryRegion *mr;
173
+ ram_addr_t off;
174
+
175
+ /**
176
+ * Assumes that the host address is a valid address as it's
177
+ * coming from the MemoryListener system. In the case host
178
+ * address is not valid, the following call would return
179
+ * the default subregion of "system_memory" region, and
180
+ * not NULL. So it's not possible to check for NULL here.
181
+ */
182
+ mr = memory_region_from_host((void *)(uintptr_t)host, &off);
183
+
184
+ if (offset) {
185
+ *offset = off;
186
+ }
187
+
188
+ return memory_region_get_fd(mr);
189
+}
190
+
191
+static bool proxy_mrs_can_merge(uint64_t host, uint64_t prev_host, size_t size)
192
+{
193
+ if (((prev_host + size) != host)) {
194
+ return false;
195
+ }
196
+
197
+ if (get_fd_from_hostaddr(host, NULL) !=
198
+ get_fd_from_hostaddr(prev_host, NULL)) {
199
+ return false;
200
+ }
201
+
202
+ return true;
203
+}
204
+
205
+static bool try_merge(ProxyMemoryListener *proxy_listener,
206
+ MemoryRegionSection *section)
207
+{
208
+ uint64_t mrs_size, mrs_gpa, mrs_page;
209
+ MemoryRegionSection *prev_sec;
210
+ bool merged = false;
211
+ uintptr_t mrs_host;
212
+ RAMBlock *mrs_rb;
213
+
214
+ if (!proxy_listener->n_mr_sections) {
215
+ return false;
216
+ }
217
+
218
+ mrs_rb = section->mr->ram_block;
219
+ mrs_page = (uint64_t)qemu_ram_pagesize(mrs_rb);
220
+ mrs_size = int128_get64(section->size);
221
+ mrs_gpa = section->offset_within_address_space;
222
+ mrs_host = (uintptr_t)memory_region_get_ram_ptr(section->mr) +
223
+ section->offset_within_region;
224
+
225
+ if (get_fd_from_hostaddr(mrs_host, NULL) < 0) {
226
+ return true;
227
+ }
228
+
229
+ mrs_host = mrs_host & ~(mrs_page - 1);
230
+ mrs_gpa = mrs_gpa & ~(mrs_page - 1);
231
+ mrs_size = ROUND_UP(mrs_size, mrs_page);
232
+
233
+ prev_sec = proxy_listener->mr_sections +
234
+ (proxy_listener->n_mr_sections - 1);
235
+ uint64_t prev_gpa_start = prev_sec->offset_within_address_space;
236
+ uint64_t prev_size = int128_get64(prev_sec->size);
237
+ uint64_t prev_gpa_end = range_get_last(prev_gpa_start, prev_size);
238
+ uint64_t prev_host_start =
239
+ (uintptr_t)memory_region_get_ram_ptr(prev_sec->mr) +
240
+ prev_sec->offset_within_region;
241
+ uint64_t prev_host_end = range_get_last(prev_host_start, prev_size);
242
+
243
+ if (mrs_gpa <= (prev_gpa_end + 1)) {
244
+ g_assert(mrs_gpa > prev_gpa_start);
245
+
246
+ if ((section->mr == prev_sec->mr) &&
247
+ proxy_mrs_can_merge(mrs_host, prev_host_start,
248
+ (mrs_gpa - prev_gpa_start))) {
249
+ uint64_t max_end = MAX(prev_host_end, mrs_host + mrs_size);
250
+ merged = true;
251
+ prev_sec->offset_within_address_space =
252
+ MIN(prev_gpa_start, mrs_gpa);
253
+ prev_sec->offset_within_region =
254
+ MIN(prev_host_start, mrs_host) -
255
+ (uintptr_t)memory_region_get_ram_ptr(prev_sec->mr);
256
+ prev_sec->size = int128_make64(max_end - MIN(prev_host_start,
257
+ mrs_host));
258
+ }
259
+ }
260
+
261
+ return merged;
262
+}
263
+
264
+static void proxy_memory_listener_region_addnop(MemoryListener *listener,
265
+ MemoryRegionSection *section)
266
+{
267
+ ProxyMemoryListener *proxy_listener = container_of(listener,
268
+ ProxyMemoryListener,
269
+ listener);
270
+
271
+ if (!memory_region_is_ram(section->mr) ||
272
+ memory_region_is_rom(section->mr)) {
273
+ return;
274
+ }
275
+
276
+ if (try_merge(proxy_listener, section)) {
277
+ return;
278
+ }
279
+
280
+ ++proxy_listener->n_mr_sections;
281
+ proxy_listener->mr_sections = g_renew(MemoryRegionSection,
282
+ proxy_listener->mr_sections,
283
+ proxy_listener->n_mr_sections);
284
+ proxy_listener->mr_sections[proxy_listener->n_mr_sections - 1] = *section;
285
+ proxy_listener->mr_sections[proxy_listener->n_mr_sections - 1].fv = NULL;
286
+ memory_region_ref(section->mr);
287
+}
288
+
289
+static void proxy_memory_listener_commit(MemoryListener *listener)
290
+{
291
+ ProxyMemoryListener *proxy_listener = container_of(listener,
292
+ ProxyMemoryListener,
293
+ listener);
294
+ MPQemuMsg msg;
295
+ MemoryRegionSection *section;
296
+ ram_addr_t offset;
297
+ uintptr_t host_addr;
298
+ int region;
299
+ Error *local_err = NULL;
300
+
301
+ memset(&msg, 0, sizeof(MPQemuMsg));
302
+
303
+ msg.cmd = MPQEMU_CMD_SYNC_SYSMEM;
304
+ msg.num_fds = proxy_listener->n_mr_sections;
305
+ msg.size = sizeof(SyncSysmemMsg);
306
+ if (msg.num_fds > REMOTE_MAX_FDS) {
307
+ error_report("Number of fds is more than %d", REMOTE_MAX_FDS);
308
+ return;
309
+ }
310
+
311
+ for (region = 0; region < proxy_listener->n_mr_sections; region++) {
312
+ section = &proxy_listener->mr_sections[region];
313
+ msg.data.sync_sysmem.gpas[region] =
314
+ section->offset_within_address_space;
315
+ msg.data.sync_sysmem.sizes[region] = int128_get64(section->size);
316
+ host_addr = (uintptr_t)memory_region_get_ram_ptr(section->mr) +
317
+ section->offset_within_region;
318
+ msg.fds[region] = get_fd_from_hostaddr(host_addr, &offset);
319
+ msg.data.sync_sysmem.offsets[region] = offset;
320
+ }
321
+ if (!mpqemu_msg_send(&msg, proxy_listener->ioc, &local_err)) {
322
+ error_report_err(local_err);
323
+ }
324
+}
325
+
326
+void proxy_memory_listener_deconfigure(ProxyMemoryListener *proxy_listener)
327
+{
328
+ memory_listener_unregister(&proxy_listener->listener);
329
+
330
+ proxy_memory_listener_reset(&proxy_listener->listener);
331
+}
332
+
333
+void proxy_memory_listener_configure(ProxyMemoryListener *proxy_listener,
334
+ QIOChannel *ioc)
335
+{
336
+ proxy_listener->n_mr_sections = 0;
337
+ proxy_listener->mr_sections = NULL;
338
+
339
+ proxy_listener->ioc = ioc;
340
+
341
+ proxy_listener->listener.begin = proxy_memory_listener_reset;
342
+ proxy_listener->listener.commit = proxy_memory_listener_commit;
343
+ proxy_listener->listener.region_add = proxy_memory_listener_region_addnop;
344
+ proxy_listener->listener.region_nop = proxy_memory_listener_region_addnop;
345
+ proxy_listener->listener.priority = 10;
346
+
347
+ memory_listener_register(&proxy_listener->listener,
348
+ &address_space_memory);
349
+}
350
diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
351
index XXXXXXX..XXXXXXX 100644
352
--- a/hw/remote/proxy.c
353
+++ b/hw/remote/proxy.c
354
@@ -XXX,XX +XXX,XX @@
355
#include "qemu/sockets.h"
356
#include "hw/remote/mpqemu-link.h"
357
#include "qemu/error-report.h"
358
+#include "hw/remote/proxy-memory-listener.h"
359
+#include "qom/object.h"
360
361
static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
362
{
363
@@ -XXX,XX +XXX,XX @@ static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
364
365
qemu_mutex_init(&dev->io_mutex);
366
qio_channel_set_blocking(dev->ioc, true, NULL);
367
+
368
+ proxy_memory_listener_configure(&dev->proxy_listener, dev->ioc);
20
}
369
}
21
370
22
-static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
371
static void pci_proxy_dev_exit(PCIDevice *pdev)
23
+static int coroutine_fn backup_loop(BackupBlockJob *job)
372
@@ -XXX,XX +XXX,XX @@ static void pci_proxy_dev_exit(PCIDevice *pdev)
24
{
373
migrate_del_blocker(dev->migration_blocker);
25
int ret;
374
26
bool error_is_read;
375
error_free(dev->migration_blocker);
27
int64_t offset;
376
+
28
HBitmapIter hbi;
377
+ proxy_memory_listener_deconfigure(&dev->proxy_listener);
29
+ BlockDriverState *bs = blk_bs(job->common.blk);
378
}
30
379
31
hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
380
static void config_op_send(PCIProxyDev *pdev, uint32_t addr, uint32_t *val,
32
while ((offset = hbitmap_iter_next(&hbi)) != -1) {
381
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
33
+ if (job->sync_mode == MIRROR_SYNC_MODE_TOP &&
382
index XXXXXXX..XXXXXXX 100644
34
+ bdrv_is_unallocated_range(bs, offset, job->cluster_size))
383
--- a/hw/remote/meson.build
35
+ {
384
+++ b/hw/remote/meson.build
36
+ hbitmap_reset(job->copy_bitmap, offset, job->cluster_size);
385
@@ -XXX,XX +XXX,XX @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
37
+ continue;
386
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
38
+ }
387
39
+
388
specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
40
do {
389
+specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy-memory-listener.c'))
41
if (yield_and_check(job)) {
390
42
return 0;
391
softmmu_ss.add_all(when: 'CONFIG_MULTIPROCESS', if_true: remote_ss)
43
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_run(Job *job, Error **errp)
44
{
45
BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
46
BlockDriverState *bs = blk_bs(s->common.blk);
47
- int64_t offset;
48
int ret = 0;
49
50
QLIST_INIT(&s->inflight_reqs);
51
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_run(Job *job, Error **errp)
52
* notify callback service CoW requests. */
53
job_yield(job);
54
}
55
- } else if (s->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
56
- ret = backup_run_incremental(s);
57
} else {
58
- /* Both FULL and TOP SYNC_MODE's require copying.. */
59
- for (offset = 0; offset < s->len;
60
- offset += s->cluster_size) {
61
- bool error_is_read;
62
-
63
- if (yield_and_check(s)) {
64
- break;
65
- }
66
-
67
- if (s->sync_mode == MIRROR_SYNC_MODE_TOP &&
68
- bdrv_is_unallocated_range(bs, offset, s->cluster_size))
69
- {
70
- continue;
71
- }
72
-
73
- ret = backup_do_cow(s, offset, s->cluster_size,
74
- &error_is_read, false);
75
- if (ret < 0) {
76
- /* Depending on error action, fail now or retry cluster */
77
- BlockErrorAction action =
78
- backup_error_action(s, error_is_read, -ret);
79
- if (action == BLOCK_ERROR_ACTION_REPORT) {
80
- break;
81
- } else {
82
- offset -= s->cluster_size;
83
- continue;
84
- }
85
- }
86
- }
87
+ ret = backup_loop(s);
88
}
89
90
notifier_with_return_remove(&s->before_write);
91
--
392
--
92
2.21.0
393
2.29.2
93
394
94
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Jagannathan Raman <jag.raman@oracle.com>
2
2
3
We are going to share this bitmap between backup and backup-top filter
3
IOHUB object is added to manage PCI IRQs. It uses KVM_IRQFD
4
driver, so let's share something more meaningful. It also simplifies
4
ioctl to create irqfd to injecting PCI interrupts to the guest.
5
some calculations.
5
IOHUB object forwards the irqfd to the remote process. Remote process
6
uses this fd to directly send interrupts to the guest, bypassing QEMU.
6
7
7
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
8
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
8
Reviewed-by: Max Reitz <mreitz@redhat.com>
9
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
9
Message-id: 20190429090842.57910-3-vsementsov@virtuozzo.com
10
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
10
Signed-off-by: Max Reitz <mreitz@redhat.com>
11
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
12
Message-id: 51d5c3d54e28a68b002e3875c59599c9f5a424a1.1611938319.git.jag.raman@oracle.com
13
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
11
---
14
---
12
block/backup.c | 48 +++++++++++++++++++++++-------------------------
15
MAINTAINERS | 2 +
13
1 file changed, 23 insertions(+), 25 deletions(-)
16
include/hw/pci/pci_ids.h | 3 +
17
include/hw/remote/iohub.h | 42 +++++++++++
18
include/hw/remote/machine.h | 2 +
19
include/hw/remote/mpqemu-link.h | 1 +
20
include/hw/remote/proxy.h | 4 ++
21
hw/remote/iohub.c | 119 ++++++++++++++++++++++++++++++++
22
hw/remote/machine.c | 10 +++
23
hw/remote/message.c | 4 ++
24
hw/remote/mpqemu-link.c | 5 ++
25
hw/remote/proxy.c | 56 +++++++++++++++
26
hw/remote/meson.build | 1 +
27
12 files changed, 249 insertions(+)
28
create mode 100644 include/hw/remote/iohub.h
29
create mode 100644 hw/remote/iohub.c
14
30
15
diff --git a/block/backup.c b/block/backup.c
31
diff --git a/MAINTAINERS b/MAINTAINERS
16
index XXXXXXX..XXXXXXX 100644
32
index XXXXXXX..XXXXXXX 100644
17
--- a/block/backup.c
33
--- a/MAINTAINERS
18
+++ b/block/backup.c
34
+++ b/MAINTAINERS
19
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
35
@@ -XXX,XX +XXX,XX @@ F: hw/remote/proxy.c
20
int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
36
F: include/hw/remote/proxy.h
21
int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;
37
F: hw/remote/proxy-memory-listener.c
22
38
F: include/hw/remote/proxy-memory-listener.h
23
- hbitmap_reset(job->copy_bitmap, start / job->cluster_size, 1);
39
+F: hw/remote/iohub.c
24
+ assert(QEMU_IS_ALIGNED(start, job->cluster_size));
40
+F: include/hw/remote/iohub.h
25
+ hbitmap_reset(job->copy_bitmap, start, job->cluster_size);
41
26
nbytes = MIN(job->cluster_size, job->len - start);
42
Build and test automation
27
if (!*bounce_buffer) {
43
-------------------------
28
*bounce_buffer = blk_blockalign(blk, job->cluster_size);
44
diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h
29
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
45
index XXXXXXX..XXXXXXX 100644
30
46
--- a/include/hw/pci/pci_ids.h
31
return nbytes;
47
+++ b/include/hw/pci/pci_ids.h
32
fail:
48
@@ -XXX,XX +XXX,XX @@
33
- hbitmap_set(job->copy_bitmap, start / job->cluster_size, 1);
49
#define PCI_DEVICE_ID_SUN_SIMBA 0x5000
34
+ hbitmap_set(job->copy_bitmap, start, job->cluster_size);
50
#define PCI_DEVICE_ID_SUN_SABRE 0xa000
35
return ret;
51
36
52
+#define PCI_VENDOR_ID_ORACLE 0x108e
53
+#define PCI_DEVICE_ID_REMOTE_IOHUB 0xb000
54
+
55
#define PCI_VENDOR_ID_CMD 0x1095
56
#define PCI_DEVICE_ID_CMD_646 0x0646
57
58
diff --git a/include/hw/remote/iohub.h b/include/hw/remote/iohub.h
59
new file mode 100644
60
index XXXXXXX..XXXXXXX
61
--- /dev/null
62
+++ b/include/hw/remote/iohub.h
63
@@ -XXX,XX +XXX,XX @@
64
+/*
65
+ * IO Hub for remote device
66
+ *
67
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
68
+ *
69
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
70
+ * See the COPYING file in the top-level directory.
71
+ *
72
+ */
73
+
74
+#ifndef REMOTE_IOHUB_H
75
+#define REMOTE_IOHUB_H
76
+
77
+#include "hw/pci/pci.h"
78
+#include "qemu/event_notifier.h"
79
+#include "qemu/thread-posix.h"
80
+#include "hw/remote/mpqemu-link.h"
81
+
82
+#define REMOTE_IOHUB_NB_PIRQS PCI_DEVFN_MAX
83
+
84
+typedef struct ResampleToken {
85
+ void *iohub;
86
+ int pirq;
87
+} ResampleToken;
88
+
89
+typedef struct RemoteIOHubState {
90
+ PCIDevice d;
91
+ EventNotifier irqfds[REMOTE_IOHUB_NB_PIRQS];
92
+ EventNotifier resamplefds[REMOTE_IOHUB_NB_PIRQS];
93
+ unsigned int irq_level[REMOTE_IOHUB_NB_PIRQS];
94
+ ResampleToken token[REMOTE_IOHUB_NB_PIRQS];
95
+ QemuMutex irq_level_lock[REMOTE_IOHUB_NB_PIRQS];
96
+} RemoteIOHubState;
97
+
98
+int remote_iohub_map_irq(PCIDevice *pci_dev, int intx);
99
+void remote_iohub_set_irq(void *opaque, int pirq, int level);
100
+void process_set_irqfd_msg(PCIDevice *pci_dev, MPQemuMsg *msg);
101
+
102
+void remote_iohub_init(RemoteIOHubState *iohub);
103
+void remote_iohub_finalize(RemoteIOHubState *iohub);
104
+
105
+#endif
106
diff --git a/include/hw/remote/machine.h b/include/hw/remote/machine.h
107
index XXXXXXX..XXXXXXX 100644
108
--- a/include/hw/remote/machine.h
109
+++ b/include/hw/remote/machine.h
110
@@ -XXX,XX +XXX,XX @@
111
#include "hw/boards.h"
112
#include "hw/pci-host/remote.h"
113
#include "io/channel.h"
114
+#include "hw/remote/iohub.h"
115
116
struct RemoteMachineState {
117
MachineState parent_obj;
118
119
RemotePCIHost *host;
120
+ RemoteIOHubState iohub;
121
};
122
123
/* Used to pass to co-routine device and ioc. */
124
diff --git a/include/hw/remote/mpqemu-link.h b/include/hw/remote/mpqemu-link.h
125
index XXXXXXX..XXXXXXX 100644
126
--- a/include/hw/remote/mpqemu-link.h
127
+++ b/include/hw/remote/mpqemu-link.h
128
@@ -XXX,XX +XXX,XX @@ typedef enum {
129
MPQEMU_CMD_PCI_CFGREAD,
130
MPQEMU_CMD_BAR_WRITE,
131
MPQEMU_CMD_BAR_READ,
132
+ MPQEMU_CMD_SET_IRQFD,
133
MPQEMU_CMD_MAX,
134
} MPQemuCmd;
135
136
diff --git a/include/hw/remote/proxy.h b/include/hw/remote/proxy.h
137
index XXXXXXX..XXXXXXX 100644
138
--- a/include/hw/remote/proxy.h
139
+++ b/include/hw/remote/proxy.h
140
@@ -XXX,XX +XXX,XX @@
141
#include "hw/pci/pci.h"
142
#include "io/channel.h"
143
#include "hw/remote/proxy-memory-listener.h"
144
+#include "qemu/event_notifier.h"
145
146
#define TYPE_PCI_PROXY_DEV "x-pci-proxy-dev"
147
OBJECT_DECLARE_SIMPLE_TYPE(PCIProxyDev, PCI_PROXY_DEV)
148
@@ -XXX,XX +XXX,XX @@ struct PCIProxyDev {
149
QIOChannel *ioc;
150
Error *migration_blocker;
151
ProxyMemoryListener proxy_listener;
152
+ int virq;
153
+ EventNotifier intr;
154
+ EventNotifier resample;
155
ProxyMemoryRegion region[PCI_NUM_REGIONS];
156
};
157
158
diff --git a/hw/remote/iohub.c b/hw/remote/iohub.c
159
new file mode 100644
160
index XXXXXXX..XXXXXXX
161
--- /dev/null
162
+++ b/hw/remote/iohub.c
163
@@ -XXX,XX +XXX,XX @@
164
+/*
165
+ * Remote IO Hub
166
+ *
167
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
168
+ *
169
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
170
+ * See the COPYING file in the top-level directory.
171
+ *
172
+ */
173
+
174
+#include "qemu/osdep.h"
175
+#include "qemu-common.h"
176
+
177
+#include "hw/pci/pci.h"
178
+#include "hw/pci/pci_ids.h"
179
+#include "hw/pci/pci_bus.h"
180
+#include "qemu/thread.h"
181
+#include "hw/boards.h"
182
+#include "hw/remote/machine.h"
183
+#include "hw/remote/iohub.h"
184
+#include "qemu/main-loop.h"
185
+
186
+void remote_iohub_init(RemoteIOHubState *iohub)
187
+{
188
+ int pirq;
189
+
190
+ memset(&iohub->irqfds, 0, sizeof(iohub->irqfds));
191
+ memset(&iohub->resamplefds, 0, sizeof(iohub->resamplefds));
192
+
193
+ for (pirq = 0; pirq < REMOTE_IOHUB_NB_PIRQS; pirq++) {
194
+ qemu_mutex_init(&iohub->irq_level_lock[pirq]);
195
+ iohub->irq_level[pirq] = 0;
196
+ event_notifier_init_fd(&iohub->irqfds[pirq], -1);
197
+ event_notifier_init_fd(&iohub->resamplefds[pirq], -1);
198
+ }
199
+}
200
+
201
+void remote_iohub_finalize(RemoteIOHubState *iohub)
202
+{
203
+ int pirq;
204
+
205
+ for (pirq = 0; pirq < REMOTE_IOHUB_NB_PIRQS; pirq++) {
206
+ qemu_set_fd_handler(event_notifier_get_fd(&iohub->resamplefds[pirq]),
207
+ NULL, NULL, NULL);
208
+ event_notifier_cleanup(&iohub->irqfds[pirq]);
209
+ event_notifier_cleanup(&iohub->resamplefds[pirq]);
210
+ qemu_mutex_destroy(&iohub->irq_level_lock[pirq]);
211
+ }
212
+}
213
+
214
+int remote_iohub_map_irq(PCIDevice *pci_dev, int intx)
215
+{
216
+ return pci_dev->devfn;
217
+}
218
+
219
+void remote_iohub_set_irq(void *opaque, int pirq, int level)
220
+{
221
+ RemoteIOHubState *iohub = opaque;
222
+
223
+ assert(pirq >= 0);
224
+ assert(pirq < PCI_DEVFN_MAX);
225
+
226
+ QEMU_LOCK_GUARD(&iohub->irq_level_lock[pirq]);
227
+
228
+ if (level) {
229
+ if (++iohub->irq_level[pirq] == 1) {
230
+ event_notifier_set(&iohub->irqfds[pirq]);
231
+ }
232
+ } else if (iohub->irq_level[pirq] > 0) {
233
+ iohub->irq_level[pirq]--;
234
+ }
235
+}
236
+
237
+static void intr_resample_handler(void *opaque)
238
+{
239
+ ResampleToken *token = opaque;
240
+ RemoteIOHubState *iohub = token->iohub;
241
+ int pirq, s;
242
+
243
+ pirq = token->pirq;
244
+
245
+ s = event_notifier_test_and_clear(&iohub->resamplefds[pirq]);
246
+
247
+ assert(s >= 0);
248
+
249
+ QEMU_LOCK_GUARD(&iohub->irq_level_lock[pirq]);
250
+
251
+ if (iohub->irq_level[pirq]) {
252
+ event_notifier_set(&iohub->irqfds[pirq]);
253
+ }
254
+}
255
+
256
+void process_set_irqfd_msg(PCIDevice *pci_dev, MPQemuMsg *msg)
257
+{
258
+ RemoteMachineState *machine = REMOTE_MACHINE(current_machine);
259
+ RemoteIOHubState *iohub = &machine->iohub;
260
+ int pirq, intx;
261
+
262
+ intx = pci_get_byte(pci_dev->config + PCI_INTERRUPT_PIN) - 1;
263
+
264
+ pirq = remote_iohub_map_irq(pci_dev, intx);
265
+
266
+ if (event_notifier_get_fd(&iohub->irqfds[pirq]) != -1) {
267
+ qemu_set_fd_handler(event_notifier_get_fd(&iohub->resamplefds[pirq]),
268
+ NULL, NULL, NULL);
269
+ event_notifier_cleanup(&iohub->irqfds[pirq]);
270
+ event_notifier_cleanup(&iohub->resamplefds[pirq]);
271
+ memset(&iohub->token[pirq], 0, sizeof(ResampleToken));
272
+ }
273
+
274
+ event_notifier_init_fd(&iohub->irqfds[pirq], msg->fds[0]);
275
+ event_notifier_init_fd(&iohub->resamplefds[pirq], msg->fds[1]);
276
+
277
+ iohub->token[pirq].iohub = iohub;
278
+ iohub->token[pirq].pirq = pirq;
279
+
280
+ qemu_set_fd_handler(msg->fds[1], intr_resample_handler, NULL,
281
+ &iohub->token[pirq]);
282
+}
283
diff --git a/hw/remote/machine.c b/hw/remote/machine.c
284
index XXXXXXX..XXXXXXX 100644
285
--- a/hw/remote/machine.c
286
+++ b/hw/remote/machine.c
287
@@ -XXX,XX +XXX,XX @@
288
#include "exec/address-spaces.h"
289
#include "exec/memory.h"
290
#include "qapi/error.h"
291
+#include "hw/pci/pci_host.h"
292
+#include "hw/remote/iohub.h"
293
294
static void remote_machine_init(MachineState *machine)
295
{
296
MemoryRegion *system_memory, *system_io, *pci_memory;
297
RemoteMachineState *s = REMOTE_MACHINE(machine);
298
RemotePCIHost *rem_host;
299
+ PCIHostState *pci_host;
300
301
system_memory = get_system_memory();
302
system_io = get_system_io();
303
@@ -XXX,XX +XXX,XX @@ static void remote_machine_init(MachineState *machine)
304
memory_region_add_subregion_overlap(system_memory, 0x0, pci_memory, -1);
305
306
qdev_realize(DEVICE(rem_host), sysbus_get_default(), &error_fatal);
307
+
308
+ pci_host = PCI_HOST_BRIDGE(rem_host);
309
+
310
+ remote_iohub_init(&s->iohub);
311
+
312
+ pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
313
+ &s->iohub, REMOTE_IOHUB_NB_PIRQS);
37
}
314
}
38
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,
315
39
int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;
316
static void remote_machine_class_init(ObjectClass *oc, void *data)
40
317
diff --git a/hw/remote/message.c b/hw/remote/message.c
41
assert(QEMU_IS_ALIGNED(job->copy_range_size, job->cluster_size));
318
index XXXXXXX..XXXXXXX 100644
42
+ assert(QEMU_IS_ALIGNED(start, job->cluster_size));
319
--- a/hw/remote/message.c
43
nbytes = MIN(job->copy_range_size, end - start);
320
+++ b/hw/remote/message.c
44
nr_clusters = DIV_ROUND_UP(nbytes, job->cluster_size);
321
@@ -XXX,XX +XXX,XX @@
45
- hbitmap_reset(job->copy_bitmap, start / job->cluster_size,
322
#include "hw/pci/pci.h"
46
- nr_clusters);
323
#include "exec/memattrs.h"
47
+ hbitmap_reset(job->copy_bitmap, start, job->cluster_size * nr_clusters);
324
#include "hw/remote/memory.h"
48
ret = blk_co_copy_range(blk, start, job->target, start, nbytes,
325
+#include "hw/remote/iohub.h"
49
read_flags, write_flags);
326
50
if (ret < 0) {
327
static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
51
trace_backup_do_cow_copy_range_fail(job, start, ret);
328
MPQemuMsg *msg, Error **errp);
52
- hbitmap_set(job->copy_bitmap, start / job->cluster_size,
329
@@ -XXX,XX +XXX,XX @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
53
- nr_clusters);
330
case MPQEMU_CMD_SYNC_SYSMEM:
54
+ hbitmap_set(job->copy_bitmap, start, job->cluster_size * nr_clusters);
331
remote_sysmem_reconfig(&msg, &local_err);
55
return ret;
332
break;
333
+ case MPQEMU_CMD_SET_IRQFD:
334
+ process_set_irqfd_msg(pci_dev, &msg);
335
+ break;
336
default:
337
error_setg(&local_err,
338
"Unknown command (%d) received for device %s"
339
diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
340
index XXXXXXX..XXXXXXX 100644
341
--- a/hw/remote/mpqemu-link.c
342
+++ b/hw/remote/mpqemu-link.c
343
@@ -XXX,XX +XXX,XX @@ bool mpqemu_msg_valid(MPQemuMsg *msg)
344
return false;
345
}
346
break;
347
+ case MPQEMU_CMD_SET_IRQFD:
348
+ if (msg->size || (msg->num_fds != 2)) {
349
+ return false;
350
+ }
351
+ break;
352
default:
353
break;
56
}
354
}
57
355
diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
58
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
356
index XXXXXXX..XXXXXXX 100644
59
cow_request_begin(&cow_request, job, start, end);
357
--- a/hw/remote/proxy.c
60
358
+++ b/hw/remote/proxy.c
61
while (start < end) {
359
@@ -XXX,XX +XXX,XX @@
62
- if (!hbitmap_get(job->copy_bitmap, start / job->cluster_size)) {
360
#include "qemu/error-report.h"
63
+ if (!hbitmap_get(job->copy_bitmap, start)) {
361
#include "hw/remote/proxy-memory-listener.h"
64
trace_backup_do_cow_skip(job, start);
362
#include "qom/object.h"
65
start += job->cluster_size;
363
+#include "qemu/event_notifier.h"
66
continue; /* already copied */
364
+#include "sysemu/kvm.h"
67
@@ -XXX,XX +XXX,XX @@ static void backup_clean(Job *job)
365
+#include "util/event_notifier-posix.c"
68
assert(s->target);
366
+
69
blk_unref(s->target);
367
+static void proxy_intx_update(PCIDevice *pci_dev)
70
s->target = NULL;
368
+{
71
+
369
+ PCIProxyDev *dev = PCI_PROXY_DEV(pci_dev);
72
+ if (s->copy_bitmap) {
370
+ PCIINTxRoute route;
73
+ hbitmap_free(s->copy_bitmap);
371
+ int pin = pci_get_byte(pci_dev->config + PCI_INTERRUPT_PIN) - 1;
74
+ s->copy_bitmap = NULL;
372
+
75
+ }
373
+ if (dev->virq != -1) {
374
+ kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, &dev->intr, dev->virq);
375
+ dev->virq = -1;
376
+ }
377
+
378
+ route = pci_device_route_intx_to_irq(pci_dev, pin);
379
+
380
+ dev->virq = route.irq;
381
+
382
+ if (dev->virq != -1) {
383
+ kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, &dev->intr,
384
+ &dev->resample, dev->virq);
385
+ }
386
+}
387
+
388
+static void setup_irqfd(PCIProxyDev *dev)
389
+{
390
+ PCIDevice *pci_dev = PCI_DEVICE(dev);
391
+ MPQemuMsg msg;
392
+ Error *local_err = NULL;
393
+
394
+ event_notifier_init(&dev->intr, 0);
395
+ event_notifier_init(&dev->resample, 0);
396
+
397
+ memset(&msg, 0, sizeof(MPQemuMsg));
398
+ msg.cmd = MPQEMU_CMD_SET_IRQFD;
399
+ msg.num_fds = 2;
400
+ msg.fds[0] = event_notifier_get_fd(&dev->intr);
401
+ msg.fds[1] = event_notifier_get_fd(&dev->resample);
402
+ msg.size = 0;
403
+
404
+ if (!mpqemu_msg_send(&msg, dev->ioc, &local_err)) {
405
+ error_report_err(local_err);
406
+ }
407
+
408
+ dev->virq = -1;
409
+
410
+ proxy_intx_update(pci_dev);
411
+
412
+ pci_device_set_intx_routing_notifier(pci_dev, proxy_intx_update);
413
+}
414
415
static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
416
{
417
@@ -XXX,XX +XXX,XX @@ static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
418
qio_channel_set_blocking(dev->ioc, true, NULL);
419
420
proxy_memory_listener_configure(&dev->proxy_listener, dev->ioc);
421
+
422
+ setup_irqfd(dev);
76
}
423
}
77
424
78
void backup_do_checkpoint(BlockJob *job, Error **errp)
425
static void pci_proxy_dev_exit(PCIDevice *pdev)
79
{
426
@@ -XXX,XX +XXX,XX @@ static void pci_proxy_dev_exit(PCIDevice *pdev)
80
BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
427
error_free(dev->migration_blocker);
81
- int64_t len;
428
82
429
proxy_memory_listener_deconfigure(&dev->proxy_listener);
83
assert(block_job_driver(job) == &backup_job_driver);
430
+
84
431
+ event_notifier_cleanup(&dev->intr);
85
@@ -XXX,XX +XXX,XX @@ void backup_do_checkpoint(BlockJob *job, Error **errp)
432
+ event_notifier_cleanup(&dev->resample);
86
return;
87
}
88
89
- len = DIV_ROUND_UP(backup_job->len, backup_job->cluster_size);
90
- hbitmap_set(backup_job->copy_bitmap, 0, len);
91
+ hbitmap_set(backup_job->copy_bitmap, 0, backup_job->len);
92
}
433
}
93
434
94
static void backup_drain(BlockJob *job)
435
static void config_op_send(PCIProxyDev *pdev, uint32_t addr, uint32_t *val,
95
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
436
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
96
{
437
index XXXXXXX..XXXXXXX 100644
97
int ret;
438
--- a/hw/remote/meson.build
98
bool error_is_read;
439
+++ b/hw/remote/meson.build
99
- int64_t cluster;
440
@@ -XXX,XX +XXX,XX @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('mpqemu-link.c'))
100
+ int64_t offset;
441
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
101
HBitmapIter hbi;
442
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
102
443
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
103
hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
444
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
104
- while ((cluster = hbitmap_iter_next(&hbi)) != -1) {
445
105
+ while ((offset = hbitmap_iter_next(&hbi)) != -1) {
446
specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
106
do {
447
specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy-memory-listener.c'))
107
if (yield_and_check(job)) {
108
return 0;
109
}
110
- ret = backup_do_cow(job, cluster * job->cluster_size,
111
+ ret = backup_do_cow(job, offset,
112
job->cluster_size, &error_is_read, false);
113
if (ret < 0 && backup_error_action(job, error_is_read, -ret) ==
114
BLOCK_ERROR_ACTION_REPORT)
115
@@ -XXX,XX +XXX,XX @@ static void backup_incremental_init_copy_bitmap(BackupBlockJob *job)
116
while (bdrv_dirty_bitmap_next_dirty_area(job->sync_bitmap,
117
&offset, &bytes))
118
{
119
- uint64_t cluster = offset / job->cluster_size;
120
- uint64_t end_cluster = DIV_ROUND_UP(offset + bytes, job->cluster_size);
121
+ hbitmap_set(job->copy_bitmap, offset, bytes);
122
123
- hbitmap_set(job->copy_bitmap, cluster, end_cluster - cluster);
124
-
125
- offset = end_cluster * job->cluster_size;
126
+ offset += bytes;
127
if (offset >= job->len) {
128
break;
129
}
130
@@ -XXX,XX +XXX,XX @@ static void backup_incremental_init_copy_bitmap(BackupBlockJob *job)
131
132
/* TODO job_progress_set_remaining() would make more sense */
133
job_progress_update(&job->common.job,
134
- job->len - hbitmap_count(job->copy_bitmap) * job->cluster_size);
135
+ job->len - hbitmap_count(job->copy_bitmap));
136
}
137
138
static int coroutine_fn backup_run(Job *job, Error **errp)
139
{
140
BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
141
BlockDriverState *bs = blk_bs(s->common.blk);
142
- int64_t offset, nb_clusters;
143
+ int64_t offset;
144
int ret = 0;
145
146
QLIST_INIT(&s->inflight_reqs);
147
qemu_co_rwlock_init(&s->flush_rwlock);
148
149
- nb_clusters = DIV_ROUND_UP(s->len, s->cluster_size);
150
job_progress_set_remaining(job, s->len);
151
152
- s->copy_bitmap = hbitmap_alloc(nb_clusters, 0);
153
if (s->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
154
backup_incremental_init_copy_bitmap(s);
155
} else {
156
- hbitmap_set(s->copy_bitmap, 0, nb_clusters);
157
+ hbitmap_set(s->copy_bitmap, 0, s->len);
158
}
159
160
-
161
s->before_write.notify = backup_before_write_notify;
162
bdrv_add_before_write_notifier(bs, &s->before_write);
163
164
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_run(Job *job, Error **errp)
165
/* wait until pending backup_do_cow() calls have completed */
166
qemu_co_rwlock_wrlock(&s->flush_rwlock);
167
qemu_co_rwlock_unlock(&s->flush_rwlock);
168
- hbitmap_free(s->copy_bitmap);
169
170
return ret;
171
}
172
@@ -XXX,XX +XXX,XX @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
173
} else {
174
job->cluster_size = MAX(BACKUP_CLUSTER_SIZE_DEFAULT, bdi.cluster_size);
175
}
176
+
177
+ job->copy_bitmap = hbitmap_alloc(len, ctz32(job->cluster_size));
178
job->use_copy_range = true;
179
job->copy_range_size = MIN_NON_ZERO(blk_get_max_transfer(job->common.blk),
180
blk_get_max_transfer(job->target));
181
--
448
--
182
2.21.0
449
2.29.2
183
450
184
diff view generated by jsdifflib
1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
1
From: Jagannathan Raman <jag.raman@oracle.com>
2
2
3
Use thread_pool_submit_co, instead of reinventing it here. Note, that
3
Retrieve PCI configuration info about the remote device and
4
thread_pool_submit_aio() never returns NULL, so checking it was an
4
configure the Proxy PCI object based on the returned information
5
extra thing.
6
5
7
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
6
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
8
Reviewed-by: Alberto Garcia <berto@igalia.com>
7
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
9
Reviewed-by: Max Reitz <mreitz@redhat.com>
8
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
10
Message-id: 20190506142741.41731-4-vsementsov@virtuozzo.com
9
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Signed-off-by: Max Reitz <mreitz@redhat.com>
10
Message-id: 85ee367bbb993aa23699b44cfedd83b4ea6d5221.1611938319.git.jag.raman@oracle.com
11
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
12
---
12
---
13
block/qcow2-threads.c | 17 ++---------------
13
hw/remote/proxy.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++
14
1 file changed, 2 insertions(+), 15 deletions(-)
14
1 file changed, 84 insertions(+)
15
15
16
diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
16
diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
17
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
18
--- a/block/qcow2-threads.c
18
--- a/hw/remote/proxy.c
19
+++ b/block/qcow2-threads.c
19
+++ b/hw/remote/proxy.c
20
@@ -XXX,XX +XXX,XX @@ static int qcow2_compress_pool_func(void *opaque)
20
@@ -XXX,XX +XXX,XX @@
21
return 0;
21
#include "sysemu/kvm.h"
22
#include "util/event_notifier-posix.c"
23
24
+static void probe_pci_info(PCIDevice *dev, Error **errp);
25
+
26
static void proxy_intx_update(PCIDevice *pci_dev)
27
{
28
PCIProxyDev *dev = PCI_PROXY_DEV(pci_dev);
29
@@ -XXX,XX +XXX,XX @@ static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
30
{
31
ERRP_GUARD();
32
PCIProxyDev *dev = PCI_PROXY_DEV(device);
33
+ uint8_t *pci_conf = device->config;
34
int fd;
35
36
if (!dev->fd) {
37
@@ -XXX,XX +XXX,XX @@ static void pci_proxy_dev_realize(PCIDevice *device, Error **errp)
38
qemu_mutex_init(&dev->io_mutex);
39
qio_channel_set_blocking(dev->ioc, true, NULL);
40
41
+ pci_conf[PCI_LATENCY_TIMER] = 0xff;
42
+ pci_conf[PCI_INTERRUPT_PIN] = 0x01;
43
+
44
proxy_memory_listener_configure(&dev->proxy_listener, dev->ioc);
45
46
setup_irqfd(dev);
47
+
48
+ probe_pci_info(PCI_DEVICE(dev), errp);
22
}
49
}
23
50
24
-static void qcow2_compress_complete(void *opaque, int ret)
51
static void pci_proxy_dev_exit(PCIDevice *pdev)
25
-{
52
@@ -XXX,XX +XXX,XX @@ const MemoryRegionOps proxy_mr_ops = {
26
- qemu_coroutine_enter(opaque);
53
.max_access_size = 8,
27
-}
54
},
28
-
55
};
29
static ssize_t coroutine_fn
30
qcow2_co_do_compress(BlockDriverState *bs, void *dest, size_t dest_size,
31
const void *src, size_t src_size, Qcow2CompressFunc func)
32
{
33
BDRVQcow2State *s = bs->opaque;
34
- BlockAIOCB *acb;
35
ThreadPool *pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
36
Qcow2CompressData arg = {
37
.dest = dest,
38
@@ -XXX,XX +XXX,XX @@ qcow2_co_do_compress(BlockDriverState *bs, void *dest, size_t dest_size,
39
}
40
41
s->nb_compress_threads++;
42
- acb = thread_pool_submit_aio(pool, qcow2_compress_pool_func, &arg,
43
- qcow2_compress_complete,
44
- qemu_coroutine_self());
45
-
46
- if (!acb) {
47
- s->nb_compress_threads--;
48
- return -EINVAL;
49
- }
50
- qemu_coroutine_yield();
51
+ thread_pool_submit_co(pool, qcow2_compress_pool_func, &arg);
52
s->nb_compress_threads--;
53
+
56
+
54
qemu_co_queue_next(&s->compress_wait_queue);
57
+static void probe_pci_info(PCIDevice *dev, Error **errp)
55
58
+{
56
return arg.ret;
59
+ PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
60
+ uint32_t orig_val, new_val, base_class, val;
61
+ PCIProxyDev *pdev = PCI_PROXY_DEV(dev);
62
+ DeviceClass *dc = DEVICE_CLASS(pc);
63
+ uint8_t type;
64
+ int i, size;
65
+
66
+ config_op_send(pdev, PCI_VENDOR_ID, &val, 2, MPQEMU_CMD_PCI_CFGREAD);
67
+ pc->vendor_id = (uint16_t)val;
68
+
69
+ config_op_send(pdev, PCI_DEVICE_ID, &val, 2, MPQEMU_CMD_PCI_CFGREAD);
70
+ pc->device_id = (uint16_t)val;
71
+
72
+ config_op_send(pdev, PCI_CLASS_DEVICE, &val, 2, MPQEMU_CMD_PCI_CFGREAD);
73
+ pc->class_id = (uint16_t)val;
74
+
75
+ config_op_send(pdev, PCI_SUBSYSTEM_ID, &val, 2, MPQEMU_CMD_PCI_CFGREAD);
76
+ pc->subsystem_id = (uint16_t)val;
77
+
78
+ base_class = pc->class_id >> 4;
79
+ switch (base_class) {
80
+ case PCI_BASE_CLASS_BRIDGE:
81
+ set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
82
+ break;
83
+ case PCI_BASE_CLASS_STORAGE:
84
+ set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
85
+ break;
86
+ case PCI_BASE_CLASS_NETWORK:
87
+ set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);
88
+ break;
89
+ case PCI_BASE_CLASS_INPUT:
90
+ set_bit(DEVICE_CATEGORY_INPUT, dc->categories);
91
+ break;
92
+ case PCI_BASE_CLASS_DISPLAY:
93
+ set_bit(DEVICE_CATEGORY_DISPLAY, dc->categories);
94
+ break;
95
+ case PCI_BASE_CLASS_PROCESSOR:
96
+ set_bit(DEVICE_CATEGORY_CPU, dc->categories);
97
+ break;
98
+ default:
99
+ set_bit(DEVICE_CATEGORY_MISC, dc->categories);
100
+ break;
101
+ }
102
+
103
+ for (i = 0; i < PCI_NUM_REGIONS; i++) {
104
+ config_op_send(pdev, PCI_BASE_ADDRESS_0 + (4 * i), &orig_val, 4,
105
+ MPQEMU_CMD_PCI_CFGREAD);
106
+ new_val = 0xffffffff;
107
+ config_op_send(pdev, PCI_BASE_ADDRESS_0 + (4 * i), &new_val, 4,
108
+ MPQEMU_CMD_PCI_CFGWRITE);
109
+ config_op_send(pdev, PCI_BASE_ADDRESS_0 + (4 * i), &new_val, 4,
110
+ MPQEMU_CMD_PCI_CFGREAD);
111
+ size = (~(new_val & 0xFFFFFFF0)) + 1;
112
+ config_op_send(pdev, PCI_BASE_ADDRESS_0 + (4 * i), &orig_val, 4,
113
+ MPQEMU_CMD_PCI_CFGWRITE);
114
+ type = (new_val & 0x1) ?
115
+ PCI_BASE_ADDRESS_SPACE_IO : PCI_BASE_ADDRESS_SPACE_MEMORY;
116
+
117
+ if (size) {
118
+ g_autofree char *name;
119
+ pdev->region[i].dev = pdev;
120
+ pdev->region[i].present = true;
121
+ if (type == PCI_BASE_ADDRESS_SPACE_MEMORY) {
122
+ pdev->region[i].memory = true;
123
+ }
124
+ name = g_strdup_printf("bar-region-%d", i);
125
+ memory_region_init_io(&pdev->region[i].mr, OBJECT(pdev),
126
+ &proxy_mr_ops, &pdev->region[i],
127
+ name, size);
128
+ pci_register_bar(dev, i, type, &pdev->region[i].mr);
129
+ }
130
+ }
131
+}
57
--
132
--
58
2.21.0
133
2.29.2
59
134
60
diff view generated by jsdifflib
1
From: Sam Eiderman <shmuel.eiderman@oracle.com>
1
From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
2
2
3
If a chain was detected, don't open a new BlockBackend from the target
3
Perform device reset in the remote process when QEMU performs
4
backing file which will create a new BlockDriverState. Instead, create
4
device reset. This is required to reset the internal state
5
an empty BlockBackend and attach the already open BlockDriverState.
5
(like registers, etc...) of emulated devices
6
6
7
Permissions for blk_new() were copied from blk_new_open() when
7
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
8
flags = 0.
8
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
9
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
10
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
11
Message-id: 7cb220a51f565dc0817bd76e2f540e89c2d2b850.1611938319.git.jag.raman@oracle.com
12
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
---
14
include/hw/remote/mpqemu-link.h | 1 +
15
hw/remote/message.c | 22 ++++++++++++++++++++++
16
hw/remote/proxy.c | 19 +++++++++++++++++++
17
3 files changed, 42 insertions(+)
9
18
10
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
19
diff --git a/include/hw/remote/mpqemu-link.h b/include/hw/remote/mpqemu-link.h
11
Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
12
Signed-off-by: Sagi Amit <sagi.amit@oracle.com>
13
Co-developed-by: Sagi Amit <sagi.amit@oracle.com>
14
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
15
Message-id: 20190523163337.4497-4-shmuel.eiderman@oracle.com
16
Signed-off-by: Max Reitz <mreitz@redhat.com>
17
---
18
qemu-img.c | 33 +++++++++++++++++++++++----------
19
1 file changed, 23 insertions(+), 10 deletions(-)
20
21
diff --git a/qemu-img.c b/qemu-img.c
22
index XXXXXXX..XXXXXXX 100644
20
index XXXXXXX..XXXXXXX 100644
23
--- a/qemu-img.c
21
--- a/include/hw/remote/mpqemu-link.h
24
+++ b/qemu-img.c
22
+++ b/include/hw/remote/mpqemu-link.h
25
@@ -XXX,XX +XXX,XX @@ static int img_rebase(int argc, char **argv)
23
@@ -XXX,XX +XXX,XX @@ typedef enum {
26
* in its chain.
24
MPQEMU_CMD_BAR_WRITE,
27
*/
25
MPQEMU_CMD_BAR_READ,
28
prefix_chain_bs = bdrv_find_backing_image(bs, out_real_path);
26
MPQEMU_CMD_SET_IRQFD,
29
-
27
+ MPQEMU_CMD_DEVICE_RESET,
30
- blk_new_backing = blk_new_open(out_real_path, NULL,
28
MPQEMU_CMD_MAX,
31
- options, src_flags, &local_err);
29
} MPQemuCmd;
32
- g_free(out_real_path);
30
33
- if (!blk_new_backing) {
31
diff --git a/hw/remote/message.c b/hw/remote/message.c
34
- error_reportf_err(local_err,
32
index XXXXXXX..XXXXXXX 100644
35
- "Could not open new backing file '%s': ",
33
--- a/hw/remote/message.c
36
- out_baseimg);
34
+++ b/hw/remote/message.c
37
- ret = -1;
35
@@ -XXX,XX +XXX,XX @@
38
- goto out;
36
#include "exec/memattrs.h"
39
+ if (prefix_chain_bs) {
37
#include "hw/remote/memory.h"
40
+ g_free(out_real_path);
38
#include "hw/remote/iohub.h"
41
+ blk_new_backing = blk_new(BLK_PERM_CONSISTENT_READ,
39
+#include "sysemu/reset.h"
42
+ BLK_PERM_ALL);
40
43
+ ret = blk_insert_bs(blk_new_backing, prefix_chain_bs,
41
static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
44
+ &local_err);
42
MPQemuMsg *msg, Error **errp);
45
+ if (ret < 0) {
43
@@ -XXX,XX +XXX,XX @@ static void process_config_read(QIOChannel *ioc, PCIDevice *dev,
46
+ error_reportf_err(local_err,
44
MPQemuMsg *msg, Error **errp);
47
+ "Could not reuse backing file '%s': ",
45
static void process_bar_write(QIOChannel *ioc, MPQemuMsg *msg, Error **errp);
48
+ out_baseimg);
46
static void process_bar_read(QIOChannel *ioc, MPQemuMsg *msg, Error **errp);
49
+ goto out;
47
+static void process_device_reset_msg(QIOChannel *ioc, PCIDevice *dev,
50
+ }
48
+ Error **errp);
51
+ } else {
49
52
+ blk_new_backing = blk_new_open(out_real_path, NULL,
50
void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
53
+ options, src_flags, &local_err);
51
{
54
+ g_free(out_real_path);
52
@@ -XXX,XX +XXX,XX @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
55
+ if (!blk_new_backing) {
53
case MPQEMU_CMD_SET_IRQFD:
56
+ error_reportf_err(local_err,
54
process_set_irqfd_msg(pci_dev, &msg);
57
+ "Could not open new backing file '%s': ",
55
break;
58
+ out_baseimg);
56
+ case MPQEMU_CMD_DEVICE_RESET:
59
+ ret = -1;
57
+ process_device_reset_msg(com->ioc, pci_dev, &local_err);
60
+ goto out;
58
+ break;
61
+ }
59
default:
62
}
60
error_setg(&local_err,
61
"Unknown command (%d) received for device %s"
62
@@ -XXX,XX +XXX,XX @@ fail:
63
getpid());
64
}
65
}
66
+
67
+static void process_device_reset_msg(QIOChannel *ioc, PCIDevice *dev,
68
+ Error **errp)
69
+{
70
+ DeviceClass *dc = DEVICE_GET_CLASS(dev);
71
+ DeviceState *s = DEVICE(dev);
72
+ MPQemuMsg ret = { 0 };
73
+
74
+ if (dc->reset) {
75
+ dc->reset(s);
76
+ }
77
+
78
+ ret.cmd = MPQEMU_CMD_RET;
79
+
80
+ mpqemu_msg_send(&ret, ioc, errp);
81
+}
82
diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
83
index XXXXXXX..XXXXXXX 100644
84
--- a/hw/remote/proxy.c
85
+++ b/hw/remote/proxy.c
86
@@ -XXX,XX +XXX,XX @@
87
#include "util/event_notifier-posix.c"
88
89
static void probe_pci_info(PCIDevice *dev, Error **errp);
90
+static void proxy_device_reset(DeviceState *dev);
91
92
static void proxy_intx_update(PCIDevice *pci_dev)
93
{
94
@@ -XXX,XX +XXX,XX @@ static void pci_proxy_dev_class_init(ObjectClass *klass, void *data)
95
k->config_read = pci_proxy_read_config;
96
k->config_write = pci_proxy_write_config;
97
98
+ dc->reset = proxy_device_reset;
99
+
100
device_class_set_props(dc, proxy_properties);
101
}
102
103
@@ -XXX,XX +XXX,XX @@ static void probe_pci_info(PCIDevice *dev, Error **errp)
63
}
104
}
64
}
105
}
106
}
107
+
108
+static void proxy_device_reset(DeviceState *dev)
109
+{
110
+ PCIProxyDev *pdev = PCI_PROXY_DEV(dev);
111
+ MPQemuMsg msg = { 0 };
112
+ Error *local_err = NULL;
113
+
114
+ msg.cmd = MPQEMU_CMD_DEVICE_RESET;
115
+ msg.size = 0;
116
+
117
+ mpqemu_msg_send_and_await_reply(&msg, pdev, &local_err);
118
+ if (local_err) {
119
+ error_report_err(local_err);
120
+ }
121
+
122
+}
65
--
123
--
66
2.21.0
124
2.29.2
67
125
68
diff view generated by jsdifflib
1
From: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
1
From: "Denis V. Lunev" <den@openvz.org>
2
2
3
Valgrind detects multiple issues in QEMU iotests when the memory is
3
Original specification says that l1 table size if 64 * l1_size, which
4
used without being initialized. Valgrind may dump lots of unnecessary
4
is obviously wrong. The size of the l1 entry is 64 _bits_, not bytes.
5
reports what makes the memory issue analysis harder. Particularly,
5
Thus 64 is to be replaces with 8 as specification says about bytes.
6
that is true for the aligned bitmap directory and can be seen while
7
running the iotest #169. Padding the aligned space with zeros eases
8
the pain.
9
6
10
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
7
There is also minor tweak, field name is renamed from l1 to l1_table,
11
Message-id: 1558961521-131620-1-git-send-email-andrey.shinkevich@virtuozzo.com
8
which matches with the later text.
12
Signed-off-by: Max Reitz <mreitz@redhat.com>
9
10
Signed-off-by: Denis V. Lunev <den@openvz.org>
11
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
12
Message-id: 20210128171313.2210947-1-den@openvz.org
13
CC: Stefan Hajnoczi <stefanha@redhat.com>
14
CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
15
16
[Replace the original commit message "docs: fix mistake in dirty bitmap
17
feature description" as suggested by Eric Blake.
18
--Stefan]
19
20
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
13
---
21
---
14
block/qcow2-bitmap.c | 2 +-
22
docs/interop/parallels.txt | 2 +-
15
1 file changed, 1 insertion(+), 1 deletion(-)
23
1 file changed, 1 insertion(+), 1 deletion(-)
16
24
17
diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
25
diff --git a/docs/interop/parallels.txt b/docs/interop/parallels.txt
18
index XXXXXXX..XXXXXXX 100644
26
index XXXXXXX..XXXXXXX 100644
19
--- a/block/qcow2-bitmap.c
27
--- a/docs/interop/parallels.txt
20
+++ b/block/qcow2-bitmap.c
28
+++ b/docs/interop/parallels.txt
21
@@ -XXX,XX +XXX,XX @@ static int bitmap_list_store(BlockDriverState *bs, Qcow2BitmapList *bm_list,
29
@@ -XXX,XX +XXX,XX @@ of its data area are:
22
dir_offset = *offset;
30
28 - 31: l1_size
23
}
31
The number of entries in the L1 table of the bitmap.
24
32
25
- dir = g_try_malloc(dir_size);
33
- variable: l1 (64 * l1_size bytes)
26
+ dir = g_try_malloc0(dir_size);
34
+ variable: l1_table (8 * l1_size bytes)
27
if (dir == NULL) {
35
L1 offset table (in bytes)
28
return -ENOMEM;
36
29
}
37
A dirty bitmap is stored using a one-level structure for the mapping to host
30
--
38
--
31
2.21.0
39
2.29.2
32
40
33
diff view generated by jsdifflib