docs/devel/index.rst | 1 + docs/devel/vfio-user.rst | 1809 +++++++++++++++++++++++++++++++++ hw/vfio/pci.h | 25 +- hw/vfio/user.h | 279 +++++ include/hw/vfio/vfio-common.h | 8 + hw/vfio/common.c | 273 ++++- hw/vfio/migration.c | 35 +- hw/vfio/pci.c | 547 ++++++++-- hw/vfio/user.c | 997 ++++++++++++++++++ MAINTAINERS | 10 + hw/vfio/meson.build | 1 + 11 files changed, 3879 insertions(+), 106 deletions(-) create mode 100644 docs/devel/vfio-user.rst create mode 100644 hw/vfio/user.h create mode 100644 hw/vfio/user.c
Hi We are happy to introduce the next stage of the multi-process QEMU project[1]. vfio-user is a protocol that allows a device to be emulated in a separate process outside of QEMU. It encapsulates the messages sent from QEMU to the kernel VFIO driver, and sends them to a remote process over a UNIX socket. The vfio-user framework consists of 3 parts: 1) The protocol specification. 2) A server - the VFIO generic device in QEMU that exchanges the protocol messages with the client. 3) A client - remote process that emulates a device. This patchset implements parts 1 and 2. The protocol's specification can be found here [2]: We also include this as the first patch of the series. The libvfio-user project (https://github.com/nutanix/libvfio-user) can be used by a remote process to handle the protocol to implement the third part. We also worked on implementing a client and will be sending this patch series shortly. Contributors: John G Johnson <john.g.johnson@oracle.com> John Levon <john.levon@nutanix.com> Thanos Makatos <thanos.makatos@nutanix.com> Elena Ufimtseva <elena.ufimtseva@oracle.com> Jagannathan Raman <jag.raman@oracle.com> Please send your comments and questions! Thank you. References: [1] https://wiki.qemu.org/Features/MultiProcessQEMU [2] https://patchwork.kernel.org/project/qemu-devel/patch/20210614104608.212276-1-thanos.makatos@nutanix.com/ John G Johnson (18): vfio-user: add VFIO base abstract class vfio-user: define VFIO Proxy and communication functions vfio-user: Define type vfio_user_pci_dev_info vfio-user: connect vfio proxy to remote server vfio-user: negotiate protocol with remote server vfio-user: define vfio-user pci ops vfio-user: VFIO container setup & teardown vfio-user: get device info and get irq info vfio-user: device region read/write vfio-user: get region and DMA map/unmap operations vfio-user: probe remote device's BARs vfio-user: respond to remote DMA read/write requests vfio_user: setup MSI/X interrupts and PCI config operations vfio-user: vfio user device realize vfio-user: pci reset vfio-user: probe remote device ROM BAR vfio-user: migration support vfio-user: add migration cli options and version negotiation Thanos Makatos (1): vfio-user: introduce vfio-user protocol specification docs/devel/index.rst | 1 + docs/devel/vfio-user.rst | 1809 +++++++++++++++++++++++++++++++++ hw/vfio/pci.h | 25 +- hw/vfio/user.h | 279 +++++ include/hw/vfio/vfio-common.h | 8 + hw/vfio/common.c | 273 ++++- hw/vfio/migration.c | 35 +- hw/vfio/pci.c | 547 ++++++++-- hw/vfio/user.c | 997 ++++++++++++++++++ MAINTAINERS | 10 + hw/vfio/meson.build | 1 + 11 files changed, 3879 insertions(+), 106 deletions(-) create mode 100644 docs/devel/vfio-user.rst create mode 100644 hw/vfio/user.h create mode 100644 hw/vfio/user.c -- 2.25.1
Hi, This series adds on to the following series from Elena Ufimtseva <elena.ufimtseva@oracle.com>: [PATCH RFC 00/19] vfio-user implementation QEMU enabled out-of-process device emulation with multi-process [1]. multi-process used a custom protocol to interact between the client and server, which is not desirable. The vfio-user user protocol [2] implements a VFIO based mechanism to interact between the client and server. Since VFIO is a well-established specification, it is preferable in terms of maintenance. It makes sense for multi-process to switch to the vfio-user protocol. Nutanix implemented the vfio-user protocol in their libvfio-user library. The source for this library is located below: https://github.com/nutanix/libvfio-user Elena previously sent the patches for the vfio-user client. This series implements a vfio-user server for QEMU. It includes the libvfio-user as a git submodule to QEMU, and builds it along with QEMU. We would like to make the following notes: - Some of the existing multi-process code would become obsolete, and would need to be removed. This series does not remove them to keep the number of patches to a minimum. We will address them subsequently. - The libvfio-user library needs json-c package to build. It looks like the GitLab CI images used for build test don't have this package. As such it causes build failure. The patches from both series are available in the following github repo: https://github.com/oracle/qemu.git The vfio-user-client-server branch provides the same patches along with a python script (scripts/vfiouser-launcher.py) to launch the VM. Contributors: John G Johnson <john.g.johnson@oracle.com> John Levon <john.levon@nutanix.com> Thanos Makatos <thanos.makatos@nutanix.com> Elena Ufimtseva <elena.ufimtseva@oracle.com> Jagannathan Raman <jag.raman@oracle.com> We are looking forward to your comments and questions. Thank you! [1]: https://patchew.org/QEMU/20210210092628.193785-1-stefanha@redhat.com/ [2]: https://patchwork.kernel.org/project/qemu-devel/patch/20210614104608.212276-1-thanos.makatos@nutanix.com/ Jagannathan Raman (11): vfio-user: build library vfio-user: define vfio-user object vfio-user: instantiate vfio-user context vfio-user: find and init PCI device vfio-user: run vfio-user context vfio-user: handle PCI config space accesses vfio-user: handle DMA mappings vfio-user: handle PCI BAR accesses vfio-user: handle device interrupts vfio-user: register handlers to facilitate migration vfio-user: acceptance test configure | 11 + meson.build | 35 ++ qapi/qom.json | 20 +- include/hw/remote/iohub.h | 2 + migration/savevm.h | 2 + hw/remote/iohub.c | 6 + hw/remote/vfio-user-obj.c | 754 ++++++++++++++++++++++++++++++++++++++++++ migration/savevm.c | 63 ++++ .gitmodules | 3 + MAINTAINERS | 9 + hw/remote/meson.build | 3 + hw/remote/trace-events | 10 + libvfio-user | 1 + tests/acceptance/vfio-user.py | 94 ++++++ 14 files changed, 1011 insertions(+), 2 deletions(-) create mode 100644 hw/remote/vfio-user-obj.c create mode 160000 libvfio-user create mode 100644 tests/acceptance/vfio-user.py -- 1.8.3.1
add the libvfio-user library as a submodule. build it as part of QEMU
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
configure | 11 +++++++++++
meson.build | 35 +++++++++++++++++++++++++++++++++++
.gitmodules | 3 +++
MAINTAINERS | 7 +++++++
hw/remote/meson.build | 2 ++
libvfio-user | 1 +
6 files changed, 59 insertions(+)
create mode 160000 libvfio-user
diff --git a/configure b/configure
index 49b5481..bc1c961 100755
--- a/configure
+++ b/configure
@@ -4297,6 +4297,17 @@ but not implemented on your system"
fi
##########################################
+# check for multiprocess
+
+case "$multiprocess" in
+ auto | enabled )
+ if test "$git_submodules_action" != "ignore"; then
+ git_submodules="${git_submodules} libvfio-user"
+ fi
+ ;;
+esac
+
+##########################################
# End of CC checks
# After here, no more $cc or $ld runs
diff --git a/meson.build b/meson.build
index 6e4d2d8..f2f9f86 100644
--- a/meson.build
+++ b/meson.build
@@ -1894,6 +1894,41 @@ if get_option('cfi') and slirp_opt == 'system'
+ ' Please configure with --enable-slirp=git')
endif
+vfiouser = not_found
+if have_system and multiprocess_allowed
+ have_internal = fs.exists(meson.current_source_dir() / 'libvfio-user/Makefile')
+
+ if not have_internal
+ error('libvfio-user source not found - please pull git submodule')
+ endif
+
+ vfiouser_files = [
+ 'libvfio-user/lib/dma.c',
+ 'libvfio-user/lib/irq.c',
+ 'libvfio-user/lib/libvfio-user.c',
+ 'libvfio-user/lib/migration.c',
+ 'libvfio-user/lib/pci.c',
+ 'libvfio-user/lib/pci_caps.c',
+ 'libvfio-user/lib/tran_sock.c',
+ ]
+
+ vfiouser_inc = include_directories('libvfio-user/include', 'libvfio-user/lib')
+
+ json_c = dependency('json-c', required: false)
+ if not json_c.found()
+ json_c = dependency('libjson-c')
+ endif
+
+ libvfiouser = static_library('vfiouser',
+ build_by_default: false,
+ sources: vfiouser_files,
+ dependencies: json_c,
+ include_directories: vfiouser_inc)
+
+ vfiouser = declare_dependency(link_with: libvfiouser,
+ include_directories: vfiouser_inc)
+endif
+
fdt = not_found
fdt_opt = get_option('fdt')
if have_system
diff --git a/.gitmodules b/.gitmodules
index 08b1b48..a583a39 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -64,3 +64,6 @@
[submodule "roms/vbootrom"]
path = roms/vbootrom
url = https://gitlab.com/qemu-project/vbootrom.git
+[submodule "libvfio-user"]
+ path = libvfio-user
+ url = https://github.com/nutanix/libvfio-user.git
diff --git a/MAINTAINERS b/MAINTAINERS
index aa4df6c..99646e7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3350,6 +3350,13 @@ F: semihosting/
F: include/semihosting/
F: tests/tcg/multiarch/arm-compat-semi/
+libvfio-user Library
+M: Thanos Makatos <thanos.makatos@nutanix.com>
+M: John Levon <john.levon@nutanix.com>
+T: https://github.com/nutanix/libvfio-user.git
+S: Maintained
+F: libvfio-user/*
+
Multi-process QEMU
M: Elena Ufimtseva <elena.ufimtseva@oracle.com>
M: Jagannathan Raman <jag.raman@oracle.com>
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index e6a5574..fb35fb8 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -7,6 +7,8 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: vfiouser)
+
specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy-memory-listener.c'))
diff --git a/libvfio-user b/libvfio-user
new file mode 160000
index 0000000..2a0a929
--- /dev/null
+++ b/libvfio-user
@@ -0,0 +1 @@
+Subproject commit 2a0a92912d598de871ab47c034432c5fa6546dc4
--
1.8.3.1
On Mon, Jul 19, 2021 at 04:00:03PM -0400, Jagannathan Raman wrote: > add the libvfio-user library as a submodule. build it as part of QEMU > > diff --git a/meson.build b/meson.build > index 6e4d2d8..f2f9f86 100644 > --- a/meson.build > +++ b/meson.build > @@ -1894,6 +1894,41 @@ if get_option('cfi') and slirp_opt == 'system' > + ' Please configure with --enable-slirp=git') > endif > > +vfiouser = not_found > +if have_system and multiprocess_allowed > + have_internal = fs.exists(meson.current_source_dir() / 'libvfio-user/Makefile') > + > + if not have_internal > + error('libvfio-user source not found - please pull git submodule') > + endif > + > + vfiouser_files = [ > + 'libvfio-user/lib/dma.c', > + 'libvfio-user/lib/irq.c', > + 'libvfio-user/lib/libvfio-user.c', > + 'libvfio-user/lib/migration.c', > + 'libvfio-user/lib/pci.c', > + 'libvfio-user/lib/pci_caps.c', > + 'libvfio-user/lib/tran_sock.c', > + ] > + > + vfiouser_inc = include_directories('libvfio-user/include', 'libvfio-user/lib') > + > + json_c = dependency('json-c', required: false) > + if not json_c.found() > + json_c = dependency('libjson-c') > + endif > + > + libvfiouser = static_library('vfiouser', > + build_by_default: false, > + sources: vfiouser_files, > + dependencies: json_c, > + include_directories: vfiouser_inc) > + > + vfiouser = declare_dependency(link_with: libvfiouser, > + include_directories: vfiouser_inc) > +endif Why this way, rather than recursing into the submodule? Seems a bit fragile to encode details of the library here. regards john
> On Jul 19, 2021, at 4:24 PM, John Levon <john.levon@nutanix.com> wrote: > > On Mon, Jul 19, 2021 at 04:00:03PM -0400, Jagannathan Raman wrote: > >> add the libvfio-user library as a submodule. build it as part of QEMU >> >> diff --git a/meson.build b/meson.build >> index 6e4d2d8..f2f9f86 100644 >> --- a/meson.build >> +++ b/meson.build >> @@ -1894,6 +1894,41 @@ if get_option('cfi') and slirp_opt == 'system' >> + ' Please configure with --enable-slirp=git') >> endif >> >> +vfiouser = not_found >> +if have_system and multiprocess_allowed >> + have_internal = fs.exists(meson.current_source_dir() / 'libvfio-user/Makefile') >> + >> + if not have_internal >> + error('libvfio-user source not found - please pull git submodule') >> + endif >> + >> + vfiouser_files = [ >> + 'libvfio-user/lib/dma.c', >> + 'libvfio-user/lib/irq.c', >> + 'libvfio-user/lib/libvfio-user.c', >> + 'libvfio-user/lib/migration.c', >> + 'libvfio-user/lib/pci.c', >> + 'libvfio-user/lib/pci_caps.c', >> + 'libvfio-user/lib/tran_sock.c', >> + ] >> + >> + vfiouser_inc = include_directories('libvfio-user/include', 'libvfio-user/lib') >> + >> + json_c = dependency('json-c', required: false) >> + if not json_c.found() >> + json_c = dependency('libjson-c') >> + endif >> + >> + libvfiouser = static_library('vfiouser', >> + build_by_default: false, >> + sources: vfiouser_files, >> + dependencies: json_c, >> + include_directories: vfiouser_inc) >> + >> + vfiouser = declare_dependency(link_with: libvfiouser, >> + include_directories: vfiouser_inc) >> +endif > > Why this way, rather than recursing into the submodule? Seems a bit fragile to > encode details of the library here. +maintainers of meson.build. I apologize for not adding them when I sent the patches out initially. I copied the email list from Elena, but Elena did not make any changes to meson.build - stupid me. John, This way appears to be present convention with QEMU - I’m also not very clear on the reason for it. For example submodules such as slirp (libslirp), capstone (libcapstone), dtc (libfdt) are built this way. I’m guessing it’s because QEMU doesn’t build all parts of a submodule. For example, QEMU only builds libfdt in the doc submodule. Similarly, libvfio-user only builds the core library without building the tests and samples. > > regards > john
Hi On Tue, Jul 20, 2021 at 4:12 PM Jag Raman <jag.raman@oracle.com> wrote: > > > > On Jul 19, 2021, at 4:24 PM, John Levon <john.levon@nutanix.com> wrote: > > > > On Mon, Jul 19, 2021 at 04:00:03PM -0400, Jagannathan Raman wrote: > > > >> add the libvfio-user library as a submodule. build it as part of QEMU > >> > >> diff --git a/meson.build b/meson.build > >> index 6e4d2d8..f2f9f86 100644 > >> --- a/meson.build > >> +++ b/meson.build > >> @@ -1894,6 +1894,41 @@ if get_option('cfi') and slirp_opt == 'system' > >> + ' Please configure with --enable-slirp=git') > >> endif > >> > >> +vfiouser = not_found > >> +if have_system and multiprocess_allowed > >> + have_internal = fs.exists(meson.current_source_dir() / > 'libvfio-user/Makefile') > >> + > >> + if not have_internal > >> + error('libvfio-user source not found - please pull git submodule') > >> + endif > >> + > >> + vfiouser_files = [ > >> + 'libvfio-user/lib/dma.c', > >> + 'libvfio-user/lib/irq.c', > >> + 'libvfio-user/lib/libvfio-user.c', > >> + 'libvfio-user/lib/migration.c', > >> + 'libvfio-user/lib/pci.c', > >> + 'libvfio-user/lib/pci_caps.c', > >> + 'libvfio-user/lib/tran_sock.c', > >> + ] > >> + > >> + vfiouser_inc = include_directories('libvfio-user/include', > 'libvfio-user/lib') > >> + > >> + json_c = dependency('json-c', required: false) > >> + if not json_c.found() > >> + json_c = dependency('libjson-c') > >> + endif > >> + > >> + libvfiouser = static_library('vfiouser', > >> + build_by_default: false, > >> + sources: vfiouser_files, > >> + dependencies: json_c, > >> + include_directories: vfiouser_inc) > >> + > >> + vfiouser = declare_dependency(link_with: libvfiouser, > >> + include_directories: vfiouser_inc) > >> +endif > > > > Why this way, rather than recursing into the submodule? Seems a bit > fragile to > > encode details of the library here. > > +maintainers of meson.build. I apologize for not adding them when I sent > the > patches out initially. I copied the email list from Elena, but Elena did > not make > any changes to meson.build - stupid me. > > John, > > This way appears to be present convention with QEMU - I’m also not > very clear > on the reason for it. > > For example submodules such as slirp (libslirp), capstone (libcapstone), > dtc (libfdt) are built this way. > For slirp and dtc, we are eventually going to use meson subproject(). No idea about capstone. > > I’m guessing it’s because QEMU doesn’t build all parts of a submodule. For > example, QEMU only builds libfdt in the doc submodule. Similarly, > libvfio-user only builds the core library without building the tests and > samples. > > You can give subproject options to build limited parts. Fwiw, since libvfio-user uses cmake, we may be able to use meson cmake.subproject() (https://mesonbuild.com/CMake-module.html). -- Marc-André Lureau
On Tue, Jul 20, 2021 at 04:20:13PM +0400, Marc-André Lureau wrote: > > >> + libvfiouser = static_library('vfiouser', > > >> + build_by_default: false, > > >> + sources: vfiouser_files, > > >> + dependencies: json_c, > > >> + include_directories: vfiouser_inc) > > > > This way appears to be present convention with QEMU - I’m also not > > very clear > > on the reason for it. > > > > I’m guessing it’s because QEMU doesn’t build all parts of a submodule. For > > example, QEMU only builds libfdt in the doc submodule. Similarly, > > libvfio-user only builds the core library without building the tests and > > samples. > > > You can give subproject options to build limited parts. > > Fwiw, since libvfio-user uses cmake, we may be able to use meson > cmake.subproject() (https://mesonbuild.com/CMake-module.html). That'd be great. We also briefly discussed moving away from cmake anyway - since both SPDK and qemu are meson-based, it seems like it would make sense. I'd prefer it to be easy to regularly update libvfio-user within these projects. Ideally, running qemu tests would actually run libvfio-user tests too, for some level of assurance on the library's internal expectations. regards john
Define vfio-user object which is remote process server for QEMU. Setup
object initialization functions and properties necessary to instantiate
the object
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
qapi/qom.json | 20 ++++++-
hw/remote/vfio-user-obj.c | 141 ++++++++++++++++++++++++++++++++++++++++++++++
MAINTAINERS | 1 +
hw/remote/meson.build | 1 +
hw/remote/trace-events | 3 +
5 files changed, 164 insertions(+), 2 deletions(-)
create mode 100644 hw/remote/vfio-user-obj.c
diff --git a/qapi/qom.json b/qapi/qom.json
index 652be31..e0716d2 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -684,6 +684,20 @@
'data': { 'fd': 'str', 'devid': 'str' } }
##
+# @VfioUserProperties:
+#
+# Properties for vfio-user objects.
+#
+# @socket: path to be used as socket by the libvfiouser library
+#
+# @devid: the id of the device to be associated with the file descriptor
+#
+# Since: 6.0
+##
+{ 'struct': 'VfioUserProperties',
+ 'data': { 'socket': 'str', 'devid': 'str' } }
+
+##
# @RngProperties:
#
# Properties for objects of classes derived from rng.
@@ -807,7 +821,8 @@
'tls-creds-psk',
'tls-creds-x509',
'tls-cipher-suites',
- 'x-remote-object'
+ 'x-remote-object',
+ 'vfio-user'
] }
##
@@ -863,7 +878,8 @@
'tls-creds-psk': 'TlsCredsPskProperties',
'tls-creds-x509': 'TlsCredsX509Properties',
'tls-cipher-suites': 'TlsCredsProperties',
- 'x-remote-object': 'RemoteObjectProperties'
+ 'x-remote-object': 'RemoteObjectProperties',
+ 'vfio-user': 'VfioUserProperties'
} }
##
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
new file mode 100644
index 0000000..5098169
--- /dev/null
+++ b/hw/remote/vfio-user-obj.c
@@ -0,0 +1,141 @@
+/*
+ * QEMU vfio-user server object
+ *
+ * Copyright © 2021 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL-v2, version 2 or later.
+ *
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/**
+ * Usage: add options:
+ * -machine x-remote
+ * -device <PCI-device>,id=<pci-dev-id>
+ * -object vfio-user,id=<id>,socket=<socket-path>,devid=<pci-dev-id>
+ *
+ * Note that vfio-user object must be used with x-remote machine only. This
+ * server could only support PCI devices for now.
+ *
+ * socket is path to a file. This file will be created by the server. It is
+ * a required option
+ *
+ * devid is the id of a PCI device on the server. It is also a required option.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "qom/object.h"
+#include "qom/object_interfaces.h"
+#include "qemu/error-report.h"
+#include "trace.h"
+#include "sysemu/runstate.h"
+
+#define TYPE_VFU_OBJECT "vfio-user"
+OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
+
+struct VfuObjectClass {
+ ObjectClass parent_class;
+
+ unsigned int nr_devs;
+
+ /* Maximum number of devices the server could support*/
+ unsigned int max_devs;
+};
+
+struct VfuObject {
+ /* private */
+ Object parent;
+
+ char *socket;
+ char *devid;
+};
+
+static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
+{
+ VfuObject *o = VFU_OBJECT(obj);
+
+ g_free(o->socket);
+
+ o->socket = g_strdup(str);
+
+ trace_vfu_prop("socket", str);
+}
+
+static void vfu_object_set_devid(Object *obj, const char *str, Error **errp)
+{
+ VfuObject *o = VFU_OBJECT(obj);
+
+ g_free(o->devid);
+
+ o->devid = g_strdup(str);
+
+ trace_vfu_prop("devid", str);
+}
+
+static void vfu_object_init(Object *obj)
+{
+ VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
+
+ /* Add test for remote machine and PCI device */
+
+ if (k->nr_devs >= k->max_devs) {
+ error_report("Reached maximum number of vfio-user devices: %u",
+ k->max_devs);
+ return;
+ }
+
+ k->nr_devs++;
+}
+
+static void vfu_object_finalize(Object *obj)
+{
+ VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
+ VfuObject *o = VFU_OBJECT(obj);
+
+ k->nr_devs--;
+
+ g_free(o->socket);
+ g_free(o->devid);
+
+ if (k->nr_devs == 0) {
+ qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+ }
+}
+
+static void vfu_object_class_init(ObjectClass *klass, void *data)
+{
+ VfuObjectClass *k = VFU_OBJECT_CLASS(klass);
+
+ /* Limiting maximum number of devices to 1 until IOMMU support is added */
+ k->max_devs = 1;
+ k->nr_devs = 0;
+
+ object_class_property_add_str(klass, "socket", NULL,
+ vfu_object_set_socket);
+ object_class_property_add_str(klass, "devid", NULL,
+ vfu_object_set_devid);
+}
+
+static const TypeInfo vfu_object_info = {
+ .name = TYPE_VFU_OBJECT,
+ .parent = TYPE_OBJECT,
+ .instance_size = sizeof(VfuObject),
+ .instance_init = vfu_object_init,
+ .instance_finalize = vfu_object_finalize,
+ .class_size = sizeof(VfuObjectClass),
+ .class_init = vfu_object_class_init,
+ .interfaces = (InterfaceInfo[]) {
+ { TYPE_USER_CREATABLE },
+ { }
+ }
+};
+
+static void vfu_register_types(void)
+{
+ type_register_static(&vfu_object_info);
+}
+
+type_init(vfu_register_types);
diff --git a/MAINTAINERS b/MAINTAINERS
index 99646e7..46ab6b6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3380,6 +3380,7 @@ F: hw/remote/proxy-memory-listener.c
F: include/hw/remote/proxy-memory-listener.h
F: hw/remote/iohub.c
F: include/hw/remote/iohub.h
+F: hw/remote/vfio-user-obj.c
EBPF:
M: Jason Wang <jasowang@redhat.com>
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index fb35fb8..cd44dfc 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -6,6 +6,7 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('vfio-user-obj.c'))
remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: vfiouser)
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 0b23974..7da12f0 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -2,3 +2,6 @@
mpqemu_send_io_error(int cmd, int size, int nfds) "send command %d size %d, %d file descriptors to remote process"
mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d, %d file descriptors to remote process"
+
+# vfio-user-obj.c
+vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
--
1.8.3.1
create a context with the vfio-user library for a device
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
hw/remote/vfio-user-obj.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 5098169..adb3193 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -27,11 +27,18 @@
#include "qemu/osdep.h"
#include "qemu-common.h"
+#include <errno.h>
+
#include "qom/object.h"
#include "qom/object_interfaces.h"
#include "qemu/error-report.h"
#include "trace.h"
#include "sysemu/runstate.h"
+#include "qemu/notify.h"
+#include "qapi/error.h"
+#include "sysemu/sysemu.h"
+
+#include "libvfio-user/include/libvfio-user.h"
#define TYPE_VFU_OBJECT "vfio-user"
OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -51,6 +58,10 @@ struct VfuObject {
char *socket;
char *devid;
+
+ Notifier machine_done;
+
+ vfu_ctx_t *vfu_ctx;
};
static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
@@ -75,9 +86,23 @@ static void vfu_object_set_devid(Object *obj, const char *str, Error **errp)
trace_vfu_prop("devid", str);
}
+static void vfu_object_machine_done(Notifier *notifier, void *data)
+{
+ VfuObject *o = container_of(notifier, VfuObject, machine_done);
+
+ o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket, 0,
+ o, VFU_DEV_TYPE_PCI);
+ if (o->vfu_ctx == NULL) {
+ error_setg(&error_abort, "vfu: Failed to create context - %s",
+ strerror(errno));
+ return;
+ }
+}
+
static void vfu_object_init(Object *obj)
{
VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
+ VfuObject *o = VFU_OBJECT(obj);
/* Add test for remote machine and PCI device */
@@ -88,6 +113,9 @@ static void vfu_object_init(Object *obj)
}
k->nr_devs++;
+
+ o->machine_done.notify = vfu_object_machine_done;
+ qemu_add_machine_init_done_notifier(&o->machine_done);
}
static void vfu_object_finalize(Object *obj)
@@ -97,6 +125,8 @@ static void vfu_object_finalize(Object *obj)
k->nr_devs--;
+ vfu_destroy_ctx(o->vfu_ctx);
+
g_free(o->socket);
g_free(o->devid);
--
1.8.3.1
Find the PCI device with specified id. Initialize the device context
with the QEMU PCI device
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
hw/remote/vfio-user-obj.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index adb3193..e362709 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -37,6 +37,8 @@
#include "qemu/notify.h"
#include "qapi/error.h"
#include "sysemu/sysemu.h"
+#include "hw/qdev-core.h"
+#include "hw/pci/pci.h"
#include "libvfio-user/include/libvfio-user.h"
@@ -62,6 +64,8 @@ struct VfuObject {
Notifier machine_done;
vfu_ctx_t *vfu_ctx;
+
+ PCIDevice *pci_dev;
};
static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
@@ -89,6 +93,8 @@ static void vfu_object_set_devid(Object *obj, const char *str, Error **errp)
static void vfu_object_machine_done(Notifier *notifier, void *data)
{
VfuObject *o = container_of(notifier, VfuObject, machine_done);
+ DeviceState *dev = NULL;
+ int ret;
o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket, 0,
o, VFU_DEV_TYPE_PCI);
@@ -97,6 +103,28 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
strerror(errno));
return;
}
+
+ dev = qdev_find_recursive(sysbus_get_default(), o->devid);
+ if (dev == NULL) {
+ error_setg(&error_abort, "vfu: Device %s not found", o->devid);
+ return;
+ }
+ o->pci_dev = PCI_DEVICE(dev);
+
+ ret = vfu_pci_init(o->vfu_ctx, VFU_PCI_TYPE_CONVENTIONAL,
+ PCI_HEADER_TYPE_NORMAL, 0);
+ if (ret < 0) {
+ error_setg(&error_abort,
+ "vfu: Failed to attach PCI device %s to context - %s",
+ o->devid, strerror(errno));
+ return;
+ }
+
+ vfu_pci_set_id(o->vfu_ctx,
+ pci_get_word(o->pci_dev->config + PCI_VENDOR_ID),
+ pci_get_word(o->pci_dev->config + PCI_DEVICE_ID),
+ pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_VENDOR_ID),
+ pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_ID));
}
static void vfu_object_init(Object *obj)
--
1.8.3.1
On Mon, Jul 19, 2021 at 04:00:06PM -0400, Jagannathan Raman wrote: > + vfu_pci_set_id(o->vfu_ctx, > + pci_get_word(o->pci_dev->config + PCI_VENDOR_ID), > + pci_get_word(o->pci_dev->config + PCI_DEVICE_ID), > + pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_VENDOR_ID), > + pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_ID)); Since you handle all config space accesses yourselves, is there even any need for this? regards john
> On Jul 26, 2021, at 11:05 AM, John Levon <levon@movementarian.org> wrote: > > On Mon, Jul 19, 2021 at 04:00:06PM -0400, Jagannathan Raman wrote: > >> + vfu_pci_set_id(o->vfu_ctx, >> + pci_get_word(o->pci_dev->config + PCI_VENDOR_ID), >> + pci_get_word(o->pci_dev->config + PCI_DEVICE_ID), >> + pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_VENDOR_ID), >> + pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_ID)); > > Since you handle all config space accesses yourselves, is there even any need > for this? I think that makes sense. Since the QEMU server handles all the config space accesses, it’s not necessary to register the device’s vendor/device ID with the library. Thank you! -- Jag > > regards > john
Setup a separate thread to run the vfio-user context. The thread acts as
the main loop for the device.
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
hw/remote/vfio-user-obj.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index e362709..6a2d0f5 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -35,6 +35,7 @@
#include "trace.h"
#include "sysemu/runstate.h"
#include "qemu/notify.h"
+#include "qemu/thread.h"
#include "qapi/error.h"
#include "sysemu/sysemu.h"
#include "hw/qdev-core.h"
@@ -66,6 +67,8 @@ struct VfuObject {
vfu_ctx_t *vfu_ctx;
PCIDevice *pci_dev;
+
+ QemuThread vfu_ctx_thread;
};
static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
@@ -90,6 +93,44 @@ static void vfu_object_set_devid(Object *obj, const char *str, Error **errp)
trace_vfu_prop("devid", str);
}
+static void *vfu_object_ctx_run(void *opaque)
+{
+ VfuObject *o = opaque;
+ int ret;
+
+ ret = vfu_realize_ctx(o->vfu_ctx);
+ if (ret < 0) {
+ error_setg(&error_abort, "vfu: Failed to realize device %s- %s",
+ o->devid, strerror(errno));
+ return NULL;
+ }
+
+ ret = vfu_attach_ctx(o->vfu_ctx);
+ if (ret < 0) {
+ error_setg(&error_abort,
+ "vfu: Failed to attach device %s to context - %s",
+ o->devid, strerror(errno));
+ return NULL;
+ }
+
+ do {
+ ret = vfu_run_ctx(o->vfu_ctx);
+ if (ret < 0) {
+ if (errno == EINTR) {
+ ret = 0;
+ } else if (errno == ENOTCONN) {
+ object_unparent(OBJECT(o));
+ break;
+ } else {
+ error_setg(&error_abort, "vfu: Failed to run device %s - %s",
+ o->devid, strerror(errno));
+ }
+ }
+ } while (ret == 0);
+
+ return NULL;
+}
+
static void vfu_object_machine_done(Notifier *notifier, void *data)
{
VfuObject *o = container_of(notifier, VfuObject, machine_done);
@@ -125,6 +166,9 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
pci_get_word(o->pci_dev->config + PCI_DEVICE_ID),
pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_VENDOR_ID),
pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_ID));
+
+ qemu_thread_create(&o->vfu_ctx_thread, "VFU ctx runner", vfu_object_ctx_run,
+ o, QEMU_THREAD_JOINABLE);
}
static void vfu_object_init(Object *obj)
--
1.8.3.1
> -----Original Message----- > From: Jagannathan Raman <jag.raman@oracle.com> > Sent: 19 July 2021 21:00 > To: qemu-devel@nongnu.org > Cc: stefanha@redhat.com; alex.williamson@redhat.com; > elena.ufimtseva@oracle.com; John Levon <john.levon@nutanix.com>; > john.g.johnson@oracle.com; Thanos Makatos > <thanos.makatos@nutanix.com>; Swapnil Ingle > <swapnil.ingle@nutanix.com>; jag.raman@oracle.com > Subject: [PATCH RFC server 05/11] vfio-user: run vfio-user context > > Setup a separate thread to run the vfio-user context. The thread acts as > the main loop for the device. In your "vfio-user: instantiate vfio-user context" patch you create the vfu context in blocking-mode, so the only way to run device emulation is in a separate thread. Were you going to create a separate thread anyway? You can run device emulation in polling mode therefore you can avoid creating a separate thread, thus saving resources. Do plan to do that in the future? > > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> > Signed-off-by: John G Johnson <john.g.johnson@oracle.com> > Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> > --- > hw/remote/vfio-user-obj.c | 44 > ++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 44 insertions(+) > > diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c > index e362709..6a2d0f5 100644 > --- a/hw/remote/vfio-user-obj.c > +++ b/hw/remote/vfio-user-obj.c > @@ -35,6 +35,7 @@ > #include "trace.h" > #include "sysemu/runstate.h" > #include "qemu/notify.h" > +#include "qemu/thread.h" > #include "qapi/error.h" > #include "sysemu/sysemu.h" > #include "hw/qdev-core.h" > @@ -66,6 +67,8 @@ struct VfuObject { > vfu_ctx_t *vfu_ctx; > > PCIDevice *pci_dev; > + > + QemuThread vfu_ctx_thread; > }; > > static void vfu_object_set_socket(Object *obj, const char *str, Error **errp) > @@ -90,6 +93,44 @@ static void vfu_object_set_devid(Object *obj, const > char *str, Error **errp) > trace_vfu_prop("devid", str); > } > > +static void *vfu_object_ctx_run(void *opaque) > +{ > + VfuObject *o = opaque; > + int ret; > + > + ret = vfu_realize_ctx(o->vfu_ctx); > + if (ret < 0) { > + error_setg(&error_abort, "vfu: Failed to realize device %s- %s", > + o->devid, strerror(errno)); > + return NULL; > + } > + > + ret = vfu_attach_ctx(o->vfu_ctx); > + if (ret < 0) { > + error_setg(&error_abort, > + "vfu: Failed to attach device %s to context - %s", > + o->devid, strerror(errno)); > + return NULL; > + } > + > + do { > + ret = vfu_run_ctx(o->vfu_ctx); > + if (ret < 0) { > + if (errno == EINTR) { > + ret = 0; > + } else if (errno == ENOTCONN) { > + object_unparent(OBJECT(o)); > + break; > + } else { > + error_setg(&error_abort, "vfu: Failed to run device %s - %s", > + o->devid, strerror(errno)); > + } > + } > + } while (ret == 0); > + > + return NULL; > +} > + > static void vfu_object_machine_done(Notifier *notifier, void *data) > { > VfuObject *o = container_of(notifier, VfuObject, machine_done); > @@ -125,6 +166,9 @@ static void vfu_object_machine_done(Notifier > *notifier, void *data) > pci_get_word(o->pci_dev->config + PCI_DEVICE_ID), > pci_get_word(o->pci_dev->config + > PCI_SUBSYSTEM_VENDOR_ID), > pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_ID)); > + > + qemu_thread_create(&o->vfu_ctx_thread, "VFU ctx runner", > vfu_object_ctx_run, > + o, QEMU_THREAD_JOINABLE); > } > > static void vfu_object_init(Object *obj) > -- > 1.8.3.1
> On Jul 20, 2021, at 10:17 AM, Thanos Makatos <thanos.makatos@nutanix.com> wrote: > >> -----Original Message----- >> From: Jagannathan Raman <jag.raman@oracle.com> >> Sent: 19 July 2021 21:00 >> To: qemu-devel@nongnu.org >> Cc: stefanha@redhat.com; alex.williamson@redhat.com; >> elena.ufimtseva@oracle.com; John Levon <john.levon@nutanix.com>; >> john.g.johnson@oracle.com; Thanos Makatos >> <thanos.makatos@nutanix.com>; Swapnil Ingle >> <swapnil.ingle@nutanix.com>; jag.raman@oracle.com >> Subject: [PATCH RFC server 05/11] vfio-user: run vfio-user context >> >> Setup a separate thread to run the vfio-user context. The thread acts as >> the main loop for the device. > > In your "vfio-user: instantiate vfio-user context" patch you create the vfu context in blocking-mode, so the only way to run device emulation is in a separate thread. > Were you going to create a separate thread anyway? You can run device emulation in polling mode therefore you can avoid creating a separate thread, thus saving resources. Do plan to do that in the future? Thanks for the information about the Blocking and Non-Blocking mode. I’d like to explain why we are using a separate thread presently and check with you if it’s possible to poll on multiple vfu contexts at the same time (similar to select/poll for fds). Concerning my understanding on how devices are executed in QEMU, QEMU initializes the device instance - where the device registers callbacks for BAR and config space accesses. The device is then subsequently driven by these callbacks - whenever the vcpu thread tries to access the BAR addresses or places a config space access to the PCI bus, the vcpu exits to QEMU which handles these accesses. As such, the device is driven by the vcpu thread. Since there are no vcpu threads in the remote process, we created a separate thread as a replacement. As you can see already, this thread blocks on vfu_run_ctx() which I believe polls on the socket for messages from client. If there is a way to run multiple vfu contexts at the same time, that would help with conserving threads on the host CPU. For example, if there’s a way to add vfu contexts to a list of contexts that expect messages from client, that could be a good idea. Alternatively, this QEMU server could also implement a similar mechanism to group all non-blocking vfu contexts to just a single thread, instead of having separate threads for each context. -- Jag > >> >> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> >> Signed-off-by: John G Johnson <john.g.johnson@oracle.com> >> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> >> --- >> hw/remote/vfio-user-obj.c | 44 >> ++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 44 insertions(+) >> >> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c >> index e362709..6a2d0f5 100644 >> --- a/hw/remote/vfio-user-obj.c >> +++ b/hw/remote/vfio-user-obj.c >> @@ -35,6 +35,7 @@ >> #include "trace.h" >> #include "sysemu/runstate.h" >> #include "qemu/notify.h" >> +#include "qemu/thread.h" >> #include "qapi/error.h" >> #include "sysemu/sysemu.h" >> #include "hw/qdev-core.h" >> @@ -66,6 +67,8 @@ struct VfuObject { >> vfu_ctx_t *vfu_ctx; >> >> PCIDevice *pci_dev; >> + >> + QemuThread vfu_ctx_thread; >> }; >> >> static void vfu_object_set_socket(Object *obj, const char *str, Error **errp) >> @@ -90,6 +93,44 @@ static void vfu_object_set_devid(Object *obj, const >> char *str, Error **errp) >> trace_vfu_prop("devid", str); >> } >> >> +static void *vfu_object_ctx_run(void *opaque) >> +{ >> + VfuObject *o = opaque; >> + int ret; >> + >> + ret = vfu_realize_ctx(o->vfu_ctx); >> + if (ret < 0) { >> + error_setg(&error_abort, "vfu: Failed to realize device %s- %s", >> + o->devid, strerror(errno)); >> + return NULL; >> + } >> + >> + ret = vfu_attach_ctx(o->vfu_ctx); >> + if (ret < 0) { >> + error_setg(&error_abort, >> + "vfu: Failed to attach device %s to context - %s", >> + o->devid, strerror(errno)); >> + return NULL; >> + } >> + >> + do { >> + ret = vfu_run_ctx(o->vfu_ctx); >> + if (ret < 0) { >> + if (errno == EINTR) { >> + ret = 0; >> + } else if (errno == ENOTCONN) { >> + object_unparent(OBJECT(o)); >> + break; >> + } else { >> + error_setg(&error_abort, "vfu: Failed to run device %s - %s", >> + o->devid, strerror(errno)); >> + } >> + } >> + } while (ret == 0); >> + >> + return NULL; >> +} >> + >> static void vfu_object_machine_done(Notifier *notifier, void *data) >> { >> VfuObject *o = container_of(notifier, VfuObject, machine_done); >> @@ -125,6 +166,9 @@ static void vfu_object_machine_done(Notifier >> *notifier, void *data) >> pci_get_word(o->pci_dev->config + PCI_DEVICE_ID), >> pci_get_word(o->pci_dev->config + >> PCI_SUBSYSTEM_VENDOR_ID), >> pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_ID)); >> + >> + qemu_thread_create(&o->vfu_ctx_thread, "VFU ctx runner", >> vfu_object_ctx_run, >> + o, QEMU_THREAD_JOINABLE); >> } >> >> static void vfu_object_init(Object *obj) >> -- >> 1.8.3.1 >
Define and register handlers for PCI config space accesses
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
hw/remote/vfio-user-obj.c | 41 +++++++++++++++++++++++++++++++++++++++++
hw/remote/trace-events | 2 ++
2 files changed, 43 insertions(+)
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 6a2d0f5..60d9fa8 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -36,6 +36,7 @@
#include "sysemu/runstate.h"
#include "qemu/notify.h"
#include "qemu/thread.h"
+#include "qemu/main-loop.h"
#include "qapi/error.h"
#include "sysemu/sysemu.h"
#include "hw/qdev-core.h"
@@ -131,6 +132,35 @@ static void *vfu_object_ctx_run(void *opaque)
return NULL;
}
+static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
+ size_t count, loff_t offset,
+ const bool is_write)
+{
+ VfuObject *o = vfu_get_private(vfu_ctx);
+ uint32_t val = 0;
+ int i;
+
+ qemu_mutex_lock_iothread();
+
+ for (i = 0; i < count; i++) {
+ if (is_write) {
+ val = *((uint8_t *)(buf + i));
+ trace_vfu_cfg_write((offset + i), val);
+ pci_default_write_config(PCI_DEVICE(o->pci_dev),
+ (offset + i), val, 1);
+ } else {
+ val = pci_default_read_config(PCI_DEVICE(o->pci_dev),
+ (offset + i), 1);
+ *((uint8_t *)(buf + i)) = (uint8_t)val;
+ trace_vfu_cfg_read((offset + i), val);
+ }
+ }
+
+ qemu_mutex_unlock_iothread();
+
+ return count;
+}
+
static void vfu_object_machine_done(Notifier *notifier, void *data)
{
VfuObject *o = container_of(notifier, VfuObject, machine_done);
@@ -167,6 +197,17 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_VENDOR_ID),
pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_ID));
+ ret = vfu_setup_region(o->vfu_ctx, VFU_PCI_DEV_CFG_REGION_IDX,
+ pci_config_size(o->pci_dev), &vfu_object_cfg_access,
+ VFU_REGION_FLAG_RW | VFU_REGION_FLAG_ALWAYS_CB,
+ NULL, 0, -1, 0);
+ if (ret < 0) {
+ error_setg(&error_abort,
+ "vfu: Failed to setup config space handlers for %s- %s",
+ o->devid, strerror(errno));
+ return;
+ }
+
qemu_thread_create(&o->vfu_ctx_thread, "VFU ctx runner", vfu_object_ctx_run,
o, QEMU_THREAD_JOINABLE);
}
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 7da12f0..2ef7884 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -5,3 +5,5 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d,
# vfio-user-obj.c
vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
+vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
+vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
--
1.8.3.1
On Mon, Jul 19, 2021 at 04:00:08PM -0400, Jagannathan Raman wrote: > Define and register handlers for PCI config space accesses > > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> > Signed-off-by: John G Johnson <john.g.johnson@oracle.com> > Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> > --- > hw/remote/vfio-user-obj.c | 41 +++++++++++++++++++++++++++++++++++++++++ > hw/remote/trace-events | 2 ++ > 2 files changed, 43 insertions(+) > > diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c > index 6a2d0f5..60d9fa8 100644 > --- a/hw/remote/vfio-user-obj.c > +++ b/hw/remote/vfio-user-obj.c > @@ -36,6 +36,7 @@ > #include "sysemu/runstate.h" > #include "qemu/notify.h" > #include "qemu/thread.h" > +#include "qemu/main-loop.h" > #include "qapi/error.h" > #include "sysemu/sysemu.h" > #include "hw/qdev-core.h" > @@ -131,6 +132,35 @@ static void *vfu_object_ctx_run(void *opaque) > return NULL; > } > > +static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf, > + size_t count, loff_t offset, > + const bool is_write) > +{ > + VfuObject *o = vfu_get_private(vfu_ctx); > + uint32_t val = 0; > + int i; > + > + qemu_mutex_lock_iothread(); > + > + for (i = 0; i < count; i++) { > + if (is_write) { > + val = *((uint8_t *)(buf + i)); > + trace_vfu_cfg_write((offset + i), val); > + pci_default_write_config(PCI_DEVICE(o->pci_dev), > + (offset + i), val, 1); > + } else { > + val = pci_default_read_config(PCI_DEVICE(o->pci_dev), > + (offset + i), 1); > + *((uint8_t *)(buf + i)) = (uint8_t)val; > + trace_vfu_cfg_read((offset + i), val); > + } > + } Is it always OK to split up the access into single bytes like this? regards john
Define and register callbacks to manage the RAM regions used for
device DMA
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
hw/remote/vfio-user-obj.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++
hw/remote/trace-events | 2 ++
2 files changed, 60 insertions(+)
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 60d9fa8..d158a7f 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -161,6 +161,57 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
return count;
}
+static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
+{
+ MemoryRegion *subregion = NULL;
+ g_autofree char *name = NULL;
+ static unsigned int suffix;
+ struct iovec *iov = &info->iova;
+
+ if (!info->vaddr) {
+ return;
+ }
+
+ name = g_strdup_printf("remote-mem-%u", suffix++);
+
+ subregion = g_new0(MemoryRegion, 1);
+
+ qemu_mutex_lock_iothread();
+
+ memory_region_init_ram_ptr(subregion, NULL, name,
+ iov->iov_len, info->vaddr);
+
+ memory_region_add_subregion(get_system_memory(), (hwaddr)iov->iov_base,
+ subregion);
+
+ qemu_mutex_unlock_iothread();
+
+ trace_vfu_dma_register((uint64_t)iov->iov_base, iov->iov_len);
+}
+
+static int dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
+{
+ MemoryRegion *mr = NULL;
+ ram_addr_t offset;
+
+ mr = memory_region_from_host(info->vaddr, &offset);
+ if (!mr) {
+ return 0;
+ }
+
+ qemu_mutex_lock_iothread();
+
+ memory_region_del_subregion(get_system_memory(), mr);
+
+ object_unparent((OBJECT(mr)));
+
+ qemu_mutex_unlock_iothread();
+
+ trace_vfu_dma_unregister((uint64_t)info->iova.iov_base);
+
+ return 0;
+}
+
static void vfu_object_machine_done(Notifier *notifier, void *data)
{
VfuObject *o = container_of(notifier, VfuObject, machine_done);
@@ -208,6 +259,13 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
return;
}
+ ret = vfu_setup_device_dma(o->vfu_ctx, &dma_register, &dma_unregister);
+ if (ret < 0) {
+ error_setg(&error_abort, "vfu: Failed to setup DMA handlers for %s",
+ o->devid);
+ return;
+ }
+
qemu_thread_create(&o->vfu_ctx_thread, "VFU ctx runner", vfu_object_ctx_run,
o, QEMU_THREAD_JOINABLE);
}
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 2ef7884..f945c7e 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -7,3 +7,5 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d,
vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
+vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", %zu bytes"
+vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
--
1.8.3.1
> -----Original Message----- > From: Jagannathan Raman <jag.raman@oracle.com> > Sent: 19 July 2021 21:00 > To: qemu-devel@nongnu.org > Cc: stefanha@redhat.com; alex.williamson@redhat.com; > elena.ufimtseva@oracle.com; John Levon <john.levon@nutanix.com>; > john.g.johnson@oracle.com; Thanos Makatos > <thanos.makatos@nutanix.com>; Swapnil Ingle > <swapnil.ingle@nutanix.com>; jag.raman@oracle.com > Subject: [PATCH RFC server 07/11] vfio-user: handle DMA mappings > > Define and register callbacks to manage the RAM regions used for > device DMA > > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> > Signed-off-by: John G Johnson <john.g.johnson@oracle.com> > Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> > --- > hw/remote/vfio-user-obj.c | 58 > +++++++++++++++++++++++++++++++++++++++++++++++ > hw/remote/trace-events | 2 ++ > 2 files changed, 60 insertions(+) > > diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c > index 60d9fa8..d158a7f 100644 > --- a/hw/remote/vfio-user-obj.c > +++ b/hw/remote/vfio-user-obj.c > @@ -161,6 +161,57 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t > *vfu_ctx, char * const buf, > return count; > } > > +static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info) > +{ > + MemoryRegion *subregion = NULL; > + g_autofree char *name = NULL; > + static unsigned int suffix; > + struct iovec *iov = &info->iova; > + > + if (!info->vaddr) { > + return; > + } This shouldn't happen, you can replace it with an assert if you want. > + > + name = g_strdup_printf("remote-mem-%u", suffix++); > + > + subregion = g_new0(MemoryRegion, 1); > + > + qemu_mutex_lock_iothread(); > + > + memory_region_init_ram_ptr(subregion, NULL, name, > + iov->iov_len, info->vaddr); > + > + memory_region_add_subregion(get_system_memory(), (hwaddr)iov- > >iov_base, > + subregion); > + > + qemu_mutex_unlock_iothread(); > + > + trace_vfu_dma_register((uint64_t)iov->iov_base, iov->iov_len); > +} > + > +static int dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info) > +{ > + MemoryRegion *mr = NULL; > + ram_addr_t offset; > + > + mr = memory_region_from_host(info->vaddr, &offset); > + if (!mr) { Is this expected? If not then should we at least log something? > + return 0; > + } > + > + qemu_mutex_lock_iothread(); > + > + memory_region_del_subregion(get_system_memory(), mr); > + > + object_unparent((OBJECT(mr))); > + > + qemu_mutex_unlock_iothread(); > + > + trace_vfu_dma_unregister((uint64_t)info->iova.iov_base); > + > + return 0; > +} > + > static void vfu_object_machine_done(Notifier *notifier, void *data) > { > VfuObject *o = container_of(notifier, VfuObject, machine_done); > @@ -208,6 +259,13 @@ static void vfu_object_machine_done(Notifier > *notifier, void *data) > return; > } > > + ret = vfu_setup_device_dma(o->vfu_ctx, &dma_register, > &dma_unregister); > + if (ret < 0) { > + error_setg(&error_abort, "vfu: Failed to setup DMA handlers for %s", > + o->devid); > + return; > + } > + > qemu_thread_create(&o->vfu_ctx_thread, "VFU ctx runner", > vfu_object_ctx_run, > o, QEMU_THREAD_JOINABLE); > } > diff --git a/hw/remote/trace-events b/hw/remote/trace-events > index 2ef7884..f945c7e 100644 > --- a/hw/remote/trace-events > +++ b/hw/remote/trace-events > @@ -7,3 +7,5 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed > to receive %d size %d, > vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s" > vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x" > vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x" > +vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA > 0x%"PRIx64", %zu bytes" > +vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64"" > -- > 1.8.3.1
Determine the BARs used by the PCI device and register handlers to
manage the access to the same.
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
hw/remote/vfio-user-obj.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++
hw/remote/trace-events | 2 +
2 files changed, 97 insertions(+)
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index d158a7f..9853feb 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -212,6 +212,99 @@ static int dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
return 0;
}
+static ssize_t vfu_object_bar_rw(PCIDevice *pci_dev, hwaddr addr, size_t count,
+ char * const buf, const bool is_write,
+ uint8_t type)
+{
+ AddressSpace *as = NULL;
+ MemTxResult res;
+
+ if (type == PCI_BASE_ADDRESS_SPACE_MEMORY) {
+ as = pci_device_iommu_address_space(pci_dev);
+ } else {
+ as = &address_space_io;
+ }
+
+ trace_vfu_bar_rw_enter(is_write ? "Write" : "Read", (uint64_t)addr);
+
+ res = address_space_rw(as, addr, MEMTXATTRS_UNSPECIFIED, (void *)buf,
+ (hwaddr)count, is_write);
+ if (res != MEMTX_OK) {
+ warn_report("vfu: failed to %s 0x%"PRIx64"",
+ is_write ? "write to" : "read from",
+ addr);
+ return -1;
+ }
+
+ trace_vfu_bar_rw_exit(is_write ? "Write" : "Read", (uint64_t)addr);
+
+ return count;
+}
+
+/**
+ * VFU_OBJECT_BAR_HANDLER - macro for defining handlers for PCI BARs.
+ *
+ * To create handler for BAR number 2, VFU_OBJECT_BAR_HANDLER(2) would
+ * define vfu_object_bar2_handler
+ */
+#define VFU_OBJECT_BAR_HANDLER(BAR_NO) \
+ static ssize_t vfu_object_bar##BAR_NO##_handler(vfu_ctx_t *vfu_ctx, \
+ char * const buf, size_t count, \
+ loff_t offset, const bool is_write) \
+ { \
+ VfuObject *o = vfu_get_private(vfu_ctx); \
+ hwaddr addr = (hwaddr)(pci_get_long(o->pci_dev->config + \
+ PCI_BASE_ADDRESS_0 + \
+ (4 * BAR_NO)) + offset); \
+ \
+ return vfu_object_bar_rw(o->pci_dev, addr, count, buf, is_write, \
+ o->pci_dev->io_regions[BAR_NO].type); \
+ } \
+
+VFU_OBJECT_BAR_HANDLER(0)
+VFU_OBJECT_BAR_HANDLER(1)
+VFU_OBJECT_BAR_HANDLER(2)
+VFU_OBJECT_BAR_HANDLER(3)
+VFU_OBJECT_BAR_HANDLER(4)
+VFU_OBJECT_BAR_HANDLER(5)
+
+static vfu_region_access_cb_t *vfu_object_bar_handlers[PCI_NUM_REGIONS] = {
+ &vfu_object_bar0_handler,
+ &vfu_object_bar1_handler,
+ &vfu_object_bar2_handler,
+ &vfu_object_bar3_handler,
+ &vfu_object_bar4_handler,
+ &vfu_object_bar5_handler,
+};
+
+/**
+ * vfu_object_register_bars - Identify active BAR regions of pdev and setup
+ * callbacks to handle read/write accesses
+ */
+static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
+{
+ uint32_t orig_val, new_val;
+ int i, size;
+
+ for (i = 0; i < PCI_NUM_REGIONS; i++) {
+ orig_val = pci_default_read_config(pdev,
+ PCI_BASE_ADDRESS_0 + (4 * i), 4);
+ new_val = 0xffffffff;
+ pci_default_write_config(pdev,
+ PCI_BASE_ADDRESS_0 + (4 * i), new_val, 4);
+ new_val = pci_default_read_config(pdev,
+ PCI_BASE_ADDRESS_0 + (4 * i), 4);
+ size = (~(new_val & 0xFFFFFFF0)) + 1;
+ pci_default_write_config(pdev, PCI_BASE_ADDRESS_0 + (4 * i),
+ orig_val, 4);
+ if (size) {
+ vfu_setup_region(vfu_ctx, VFU_PCI_DEV_BAR0_REGION_IDX + i, size,
+ vfu_object_bar_handlers[i], VFU_REGION_FLAG_RW,
+ NULL, 0, -1, 0);
+ }
+ }
+}
+
static void vfu_object_machine_done(Notifier *notifier, void *data)
{
VfuObject *o = container_of(notifier, VfuObject, machine_done);
@@ -266,6 +359,8 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
return;
}
+ vfu_object_register_bars(o->vfu_ctx, o->pci_dev);
+
qemu_thread_create(&o->vfu_ctx_thread, "VFU ctx runner", vfu_object_ctx_run,
o, QEMU_THREAD_JOINABLE);
}
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index f945c7e..f3f65e2 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -9,3 +9,5 @@ vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", %zu bytes"
vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
+vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR address 0x%"PRIx64""
+vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR address 0x%"PRIx64""
--
1.8.3.1
Forward remote device's interrupts to the guest
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
include/hw/remote/iohub.h | 2 ++
hw/remote/iohub.c | 6 ++++++
hw/remote/vfio-user-obj.c | 30 ++++++++++++++++++++++++++++++
hw/remote/trace-events | 1 +
4 files changed, 39 insertions(+)
diff --git a/include/hw/remote/iohub.h b/include/hw/remote/iohub.h
index 0bf98e0..132f496 100644
--- a/include/hw/remote/iohub.h
+++ b/include/hw/remote/iohub.h
@@ -15,6 +15,7 @@
#include "qemu/event_notifier.h"
#include "qemu/thread-posix.h"
#include "hw/remote/mpqemu-link.h"
+#include "libvfio-user/include/libvfio-user.h"
#define REMOTE_IOHUB_NB_PIRQS PCI_DEVFN_MAX
@@ -30,6 +31,7 @@ typedef struct RemoteIOHubState {
unsigned int irq_level[REMOTE_IOHUB_NB_PIRQS];
ResampleToken token[REMOTE_IOHUB_NB_PIRQS];
QemuMutex irq_level_lock[REMOTE_IOHUB_NB_PIRQS];
+ vfu_ctx_t *vfu_ctx[REMOTE_IOHUB_NB_PIRQS];
} RemoteIOHubState;
int remote_iohub_map_irq(PCIDevice *pci_dev, int intx);
diff --git a/hw/remote/iohub.c b/hw/remote/iohub.c
index 547d597..241c8d7 100644
--- a/hw/remote/iohub.c
+++ b/hw/remote/iohub.c
@@ -18,6 +18,8 @@
#include "hw/remote/machine.h"
#include "hw/remote/iohub.h"
#include "qemu/main-loop.h"
+#include "libvfio-user/include/libvfio-user.h"
+#include "trace.h"
void remote_iohub_init(RemoteIOHubState *iohub)
{
@@ -62,6 +64,10 @@ void remote_iohub_set_irq(void *opaque, int pirq, int level)
QEMU_LOCK_GUARD(&iohub->irq_level_lock[pirq]);
if (level) {
+ if (iohub->vfu_ctx[pirq]) {
+ trace_vfu_interrupt(pirq);
+ vfu_irq_trigger(iohub->vfu_ctx[pirq], 0);
+ }
if (++iohub->irq_level[pirq] == 1) {
event_notifier_set(&iohub->irqfds[pirq]);
}
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 9853feb..d2a2e51 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -41,6 +41,9 @@
#include "sysemu/sysemu.h"
#include "hw/qdev-core.h"
#include "hw/pci/pci.h"
+#include "hw/boards.h"
+#include "hw/remote/iohub.h"
+#include "hw/remote/machine.h"
#include "libvfio-user/include/libvfio-user.h"
@@ -305,6 +308,26 @@ static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
}
}
+static int vfu_object_setup_irqs(vfu_ctx_t *vfu_ctx, PCIDevice *pci_dev)
+{
+ RemoteMachineState *machine = REMOTE_MACHINE(current_machine);
+ RemoteIOHubState *iohub = &machine->iohub;
+ int pirq, intx, ret;
+
+ ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_INTX_IRQ, 1);
+ if (ret < 0) {
+ return ret;
+ }
+
+ intx = pci_get_byte(pci_dev->config + PCI_INTERRUPT_PIN) - 1;
+
+ pirq = remote_iohub_map_irq(pci_dev, intx);
+
+ iohub->vfu_ctx[pirq] = vfu_ctx;
+
+ return 0;
+}
+
static void vfu_object_machine_done(Notifier *notifier, void *data)
{
VfuObject *o = container_of(notifier, VfuObject, machine_done);
@@ -361,6 +384,13 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
vfu_object_register_bars(o->vfu_ctx, o->pci_dev);
+ ret = vfu_object_setup_irqs(o->vfu_ctx, o->pci_dev);
+ if (ret < 0) {
+ error_setg(&error_abort, "vfu: Failed to setup interrupts for %s",
+ o->devid);
+ return;
+ }
+
qemu_thread_create(&o->vfu_ctx_thread, "VFU ctx runner", vfu_object_ctx_run,
o, QEMU_THREAD_JOINABLE);
}
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index f3f65e2..b419d6f 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -11,3 +11,4 @@ vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", %z
vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR address 0x%"PRIx64""
vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR address 0x%"PRIx64""
+vfu_interrupt(int pirq) "vfu: sending interrupt to device - PIRQ %d"
--
1.8.3.1
Store and load the device's state using handlers for live migration
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
migration/savevm.h | 2 +
hw/remote/vfio-user-obj.c | 287 ++++++++++++++++++++++++++++++++++++++++++++++
migration/savevm.c | 63 ++++++++++
3 files changed, 352 insertions(+)
diff --git a/migration/savevm.h b/migration/savevm.h
index 6461342..71d1733 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -67,5 +67,7 @@ int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
int qemu_load_device_state(QEMUFile *f);
int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
bool in_postcopy, bool inactivate_disks);
+int qemu_remote_savevm(QEMUFile *f);
+int qemu_remote_loadvm(QEMUFile *f);
#endif
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index d2a2e51..5948576 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -44,6 +44,10 @@
#include "hw/boards.h"
#include "hw/remote/iohub.h"
#include "hw/remote/machine.h"
+#include "migration/qemu-file.h"
+#include "migration/savevm.h"
+#include "migration/global_state.h"
+#include "block/block.h"
#include "libvfio-user/include/libvfio-user.h"
@@ -73,6 +77,31 @@ struct VfuObject {
PCIDevice *pci_dev;
QemuThread vfu_ctx_thread;
+
+ /*
+ * vfu_mig_buf holds the migration data. In the remote process, this
+ * buffer replaces the role of an IO channel which links the source
+ * and the destination.
+ *
+ * Whenever the client QEMU process initiates migration, the libvfio-user
+ * library notifies that to this server. The remote/server QEMU sets up a
+ * QEMUFile object using this buffer as backend. The remote passes this
+ * object to its migration subsystem, and it slirps the VMSDs of all its
+ * devices and stores them in this buffer.
+ *
+ * libvfio-user library subsequetly asks the remote for any data that needs
+ * to be moved over to the destination using its vfu_migration_callbacks_t
+ * APIs. The remote hands over this buffer as data at this time.
+ *
+ * A reverse of this process happens at the destination.
+ */
+ uint8_t *vfu_mig_buf;
+
+ uint64_t vfu_mig_buf_size;
+
+ uint64_t vfu_mig_buf_pending;
+
+ QEMUFile *vfu_mig_file;
};
static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
@@ -97,6 +126,226 @@ static void vfu_object_set_devid(Object *obj, const char *str, Error **errp)
trace_vfu_prop("devid", str);
}
+/**
+ * Migration helper functions
+ *
+ * vfu_mig_buf_read & vfu_mig_buf_write are used by QEMU's migration
+ * subsystem - qemu_remote_savevm & qemu_remote_loadvm. savevm/loadvm
+ * call these functions via QEMUFileOps to save/load the VMSD of all
+ * the devices into vfu_mig_buf
+ *
+ */
+static ssize_t vfu_mig_buf_read(void *opaque, uint8_t *buf, int64_t pos,
+ size_t size, Error **errp)
+{
+ VfuObject *o = opaque;
+
+ if (pos > o->vfu_mig_buf_size) {
+ size = 0;
+ } else if ((pos + size) > o->vfu_mig_buf_size) {
+ size = o->vfu_mig_buf_size;
+ }
+
+ memcpy(buf, (o->vfu_mig_buf + pos), size);
+
+ o->vfu_mig_buf_size -= size;
+
+ return size;
+}
+
+static ssize_t vfu_mig_buf_write(void *opaque, struct iovec *iov, int iovcnt,
+ int64_t pos, Error **errp)
+{
+ VfuObject *o = opaque;
+ uint64_t end = pos + iov_size(iov, iovcnt);
+ int i;
+
+ if (end > o->vfu_mig_buf_size) {
+ o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
+ }
+
+ for (i = 0; i < iovcnt; i++) {
+ memcpy((o->vfu_mig_buf + o->vfu_mig_buf_size), iov[i].iov_base,
+ iov[i].iov_len);
+ o->vfu_mig_buf_size += iov[i].iov_len;
+ o->vfu_mig_buf_pending += iov[i].iov_len;
+ }
+
+ return iov_size(iov, iovcnt);
+}
+
+static int vfu_mig_buf_shutdown(void *opaque, bool rd, bool wr, Error **errp)
+{
+ VfuObject *o = opaque;
+
+ o->vfu_mig_buf_size = 0;
+
+ g_free(o->vfu_mig_buf);
+
+ return 0;
+}
+
+static const QEMUFileOps vfu_mig_fops_save = {
+ .writev_buffer = vfu_mig_buf_write,
+ .shut_down = vfu_mig_buf_shutdown,
+};
+
+static const QEMUFileOps vfu_mig_fops_load = {
+ .get_buffer = vfu_mig_buf_read,
+ .shut_down = vfu_mig_buf_shutdown,
+};
+
+/**
+ * handlers for vfu_migration_callbacks_t
+ *
+ * The libvfio-user library accesses these handlers to drive the migration
+ * at the remote end, and also to transport the data stored in vfu_mig_buf
+ *
+ */
+static void vfu_mig_state_precopy(vfu_ctx_t *vfu_ctx)
+{
+ VfuObject *o = vfu_get_private(vfu_ctx);
+ int ret;
+
+ if (!o->vfu_mig_file) {
+ o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_save);
+ }
+
+ global_state_store();
+
+ ret = qemu_remote_savevm(o->vfu_mig_file);
+ if (ret) {
+ qemu_file_shutdown(o->vfu_mig_file);
+ return;
+ }
+
+ qemu_fflush(o->vfu_mig_file);
+
+ bdrv_inactivate_all();
+}
+
+static void vfu_mig_state_running(vfu_ctx_t *vfu_ctx)
+{
+ VfuObject *o = vfu_get_private(vfu_ctx);
+ Error *local_err = NULL;
+ int ret;
+
+ ret = qemu_remote_loadvm(o->vfu_mig_file);
+ if (ret) {
+ error_setg(&error_abort, "vfu: failed to restore device state");
+ return;
+ }
+
+ bdrv_invalidate_cache_all(&local_err);
+ if (local_err) {
+ error_report_err(local_err);
+ return;
+ }
+
+ vm_start();
+}
+
+static int vfu_mig_transition(vfu_ctx_t *vfu_ctx, vfu_migr_state_t state)
+{
+ switch (state) {
+ case VFU_MIGR_STATE_RESUME:
+ case VFU_MIGR_STATE_STOP_AND_COPY:
+ case VFU_MIGR_STATE_STOP:
+ break;
+ case VFU_MIGR_STATE_PRE_COPY:
+ vfu_mig_state_precopy(vfu_ctx);
+ break;
+ case VFU_MIGR_STATE_RUNNING:
+ if (!runstate_is_running()) {
+ vfu_mig_state_running(vfu_ctx);
+ }
+ break;
+ default:
+ warn_report("vfu: Unknown migration state %d", state);
+ }
+
+ return 0;
+}
+
+static uint64_t vfu_mig_get_pending_bytes(vfu_ctx_t *vfu_ctx)
+{
+ VfuObject *o = vfu_get_private(vfu_ctx);
+
+ return o->vfu_mig_buf_pending;
+}
+
+static int vfu_mig_prepare_data(vfu_ctx_t *vfu_ctx, uint64_t *offset,
+ uint64_t *size)
+{
+ VfuObject *o = vfu_get_private(vfu_ctx);
+
+ if (offset) {
+ *offset = 0;
+ }
+
+ if (size) {
+ *size = o->vfu_mig_buf_size;
+ }
+
+ return 0;
+}
+
+static ssize_t vfu_mig_read_data(vfu_ctx_t *vfu_ctx, void *buf,
+ uint64_t size, uint64_t offset)
+{
+ VfuObject *o = vfu_get_private(vfu_ctx);
+
+ if (offset > o->vfu_mig_buf_size) {
+ return -1;
+ }
+
+ if ((offset + size) > o->vfu_mig_buf_size) {
+ warn_report("vfu: buffer overflow - check pending_bytes");
+ size = o->vfu_mig_buf_size - offset;
+ }
+
+ memcpy(buf, (o->vfu_mig_buf + offset), size);
+
+ o->vfu_mig_buf_pending -= size;
+
+ return size;
+}
+
+static ssize_t vfu_mig_write_data(vfu_ctx_t *vfu_ctx, void *data,
+ uint64_t size, uint64_t offset)
+{
+ VfuObject *o = vfu_get_private(vfu_ctx);
+ uint64_t end = offset + size;
+
+ if (end > o->vfu_mig_buf_size) {
+ o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
+ o->vfu_mig_buf_size = end;
+ }
+
+ memcpy((o->vfu_mig_buf + offset), data, size);
+
+ if (!o->vfu_mig_file) {
+ o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_load);
+ }
+
+ return size;
+}
+
+static int vfu_mig_data_written(vfu_ctx_t *vfu_ctx, uint64_t count)
+{
+ return 0;
+}
+
+static const vfu_migration_callbacks_t vfu_mig_cbs = {
+ .version = VFU_MIGR_CALLBACKS_VERS,
+ .transition = &vfu_mig_transition,
+ .get_pending_bytes = &vfu_mig_get_pending_bytes,
+ .prepare_data = &vfu_mig_prepare_data,
+ .read_data = &vfu_mig_read_data,
+ .data_written = &vfu_mig_data_written,
+ .write_data = &vfu_mig_write_data,
+};
+
static void *vfu_object_ctx_run(void *opaque)
{
VfuObject *o = opaque;
@@ -332,6 +581,7 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
{
VfuObject *o = container_of(notifier, VfuObject, machine_done);
DeviceState *dev = NULL;
+ size_t migr_area_size;
int ret;
o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket, 0,
@@ -391,6 +641,35 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
return;
}
+ /*
+ * TODO: The 0x20000 number used below is a temporary. We are working on
+ * a cleaner fix for this.
+ *
+ * The libvfio-user library assumes that the remote knows the size of
+ * the data to be migrated at boot time, but that is not the case with
+ * VMSDs, as it can contain a variable-size buffer. 0x20000 is used
+ * as a sufficiently large buffer to demonstrate migration, but that
+ * cannot be used as a solution.
+ *
+ */
+ ret = vfu_setup_region(o->vfu_ctx, VFU_PCI_DEV_MIGR_REGION_IDX,
+ 0x20000, NULL,
+ VFU_REGION_FLAG_RW, NULL, 0, -1, 0);
+ if (ret < 0) {
+ error_setg(&error_abort, "vfu: Failed to register migration BAR %s- %s",
+ o->devid, strerror(errno));
+ return;
+ }
+
+ migr_area_size = vfu_get_migr_register_area_size();
+ ret = vfu_setup_device_migration_callbacks(o->vfu_ctx, &vfu_mig_cbs,
+ migr_area_size);
+ if (ret < 0) {
+ error_setg(&error_abort, "vfu: Failed to setup migration %s- %s",
+ o->devid, strerror(errno));
+ return;
+ }
+
qemu_thread_create(&o->vfu_ctx_thread, "VFU ctx runner", vfu_object_ctx_run,
o, QEMU_THREAD_JOINABLE);
}
@@ -412,6 +691,14 @@ static void vfu_object_init(Object *obj)
o->machine_done.notify = vfu_object_machine_done;
qemu_add_machine_init_done_notifier(&o->machine_done);
+
+ o->vfu_mig_file = NULL;
+
+ o->vfu_mig_buf = NULL;
+
+ o->vfu_mig_buf_size = 0;
+
+ o->vfu_mig_buf_pending = 0;
}
static void vfu_object_finalize(Object *obj)
diff --git a/migration/savevm.c b/migration/savevm.c
index 72848b9..c2279af 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1603,6 +1603,33 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
return ret;
}
+int qemu_remote_savevm(QEMUFile *f)
+{
+ SaveStateEntry *se;
+ int ret;
+
+ QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+ if (!se->vmsd || !vmstate_save_needed(se->vmsd, se->opaque)) {
+ continue;
+ }
+
+ save_section_header(f, se, QEMU_VM_SECTION_FULL);
+
+ ret = vmstate_save(f, se, NULL);
+ if (ret) {
+ qemu_file_set_error(f, ret);
+ return ret;
+ }
+
+ save_section_footer(f, se);
+ }
+
+ qemu_put_byte(f, QEMU_VM_EOF);
+ qemu_fflush(f);
+
+ return 0;
+}
+
void qemu_savevm_live_state(QEMUFile *f)
{
/* save QEMU_VM_SECTION_END section */
@@ -2443,6 +2470,42 @@ qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
return 0;
}
+int qemu_remote_loadvm(QEMUFile *f)
+{
+ uint8_t section_type;
+ int ret = 0;
+
+ qemu_mutex_lock_iothread();
+
+ while (true) {
+ section_type = qemu_get_byte(f);
+
+ if (qemu_file_get_error(f)) {
+ ret = qemu_file_get_error(f);
+ break;
+ }
+
+ switch (section_type) {
+ case QEMU_VM_SECTION_FULL:
+ ret = qemu_loadvm_section_start_full(f, NULL);
+ if (ret < 0) {
+ break;
+ }
+ break;
+ case QEMU_VM_EOF:
+ goto out;
+ default:
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+
+out:
+ qemu_mutex_unlock_iothread();
+
+ return ret;
+}
+
static int
qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
{
--
1.8.3.1
> -----Original Message----- > From: Jagannathan Raman <jag.raman@oracle.com> > Sent: 19 July 2021 21:00 > To: qemu-devel@nongnu.org > Cc: stefanha@redhat.com; alex.williamson@redhat.com; > elena.ufimtseva@oracle.com; John Levon <john.levon@nutanix.com>; > john.g.johnson@oracle.com; Thanos Makatos > <thanos.makatos@nutanix.com>; Swapnil Ingle > <swapnil.ingle@nutanix.com>; jag.raman@oracle.com > Subject: [PATCH RFC server 10/11] vfio-user: register handlers to facilitate > migration > > Store and load the device's state using handlers for live migration > > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> > Signed-off-by: John G Johnson <john.g.johnson@oracle.com> > Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> > --- > migration/savevm.h | 2 + > hw/remote/vfio-user-obj.c | 287 > ++++++++++++++++++++++++++++++++++++++++++++++ > migration/savevm.c | 63 ++++++++++ > 3 files changed, 352 insertions(+) > > diff --git a/migration/savevm.h b/migration/savevm.h > index 6461342..71d1733 100644 > --- a/migration/savevm.h > +++ b/migration/savevm.h > @@ -67,5 +67,7 @@ int qemu_loadvm_state_main(QEMUFile *f, > MigrationIncomingState *mis); > int qemu_load_device_state(QEMUFile *f); > int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f, > bool in_postcopy, bool inactivate_disks); > +int qemu_remote_savevm(QEMUFile *f); > +int qemu_remote_loadvm(QEMUFile *f); > > #endif > diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c > index d2a2e51..5948576 100644 > --- a/hw/remote/vfio-user-obj.c > +++ b/hw/remote/vfio-user-obj.c > @@ -44,6 +44,10 @@ > #include "hw/boards.h" > #include "hw/remote/iohub.h" > #include "hw/remote/machine.h" > +#include "migration/qemu-file.h" > +#include "migration/savevm.h" > +#include "migration/global_state.h" > +#include "block/block.h" > > #include "libvfio-user/include/libvfio-user.h" > > @@ -73,6 +77,31 @@ struct VfuObject { > PCIDevice *pci_dev; > > QemuThread vfu_ctx_thread; > + > + /* > + * vfu_mig_buf holds the migration data. In the remote process, this > + * buffer replaces the role of an IO channel which links the source > + * and the destination. > + * > + * Whenever the client QEMU process initiates migration, the libvfio-user > + * library notifies that to this server. The remote/server QEMU sets up a > + * QEMUFile object using this buffer as backend. The remote passes this Can we use remote/server more consistently? E.g. "remote process" or "server" instead of just "remote"? (makes me think of git remotes :D) > + * object to its migration subsystem, and it slirps the VMSDs of all its By "slirps" do you mean transfer from the client to the server over the SLiRP network? > + * devices and stores them in this buffer. Isn't this a per-device object? If so, then why do we store the VMSDs of *all* the devices in a single device's buffer? I think I'm missing something here. > + * > + * libvfio-user library subsequetly asks the remote for any data that needs > + * to be moved over to the destination using its vfu_migration_callbacks_t It's not obvious to me, is this the libvfio-user library running at the server? > + * APIs. The remote hands over this buffer as data at this time. Hands over the buffer to whom? > + * > + * A reverse of this process happens at the destination. > + */ > + uint8_t *vfu_mig_buf; Does the above description refer to a typical use case of the VFIO migration protocol where data is copied in an iterative manner (implemented in libvfio-user by the migration callbacks)? Is this what you're documenting here? > + > + uint64_t vfu_mig_buf_size; > + > + uint64_t vfu_mig_buf_pending; > + > + QEMUFile *vfu_mig_file; > }; > > static void vfu_object_set_socket(Object *obj, const char *str, Error **errp) > @@ -97,6 +126,226 @@ static void vfu_object_set_devid(Object *obj, const > char *str, Error **errp) > trace_vfu_prop("devid", str); > } > > +/** > + * Migration helper functions > + * > + * vfu_mig_buf_read & vfu_mig_buf_write are used by QEMU's migration > + * subsystem - qemu_remote_savevm & qemu_remote_loadvm. vfu_mig_buf_read is used by qemu_remote_loadvm and vfu_mig_buf_write is used by qemu_remote_savevm, right? The order they're written suggests the opposite. > savevm/loadvm > + * call these functions via QEMUFileOps to save/load the VMSD of all > + * the devices into vfu_mig_buf > + * > + */ > +static ssize_t vfu_mig_buf_read(void *opaque, uint8_t *buf, int64_t pos, > + size_t size, Error **errp) > +{ > + VfuObject *o = opaque; > + > + if (pos > o->vfu_mig_buf_size) { > + size = 0; > + } else if ((pos + size) > o->vfu_mig_buf_size) { > + size = o->vfu_mig_buf_size; > + } > + > + memcpy(buf, (o->vfu_mig_buf + pos), size); > + > + o->vfu_mig_buf_size -= size; > + > + return size; > +} > + > +static ssize_t vfu_mig_buf_write(void *opaque, struct iovec *iov, int iovcnt, > + int64_t pos, Error **errp) > +{ > + VfuObject *o = opaque; > + uint64_t end = pos + iov_size(iov, iovcnt); > + int i; > + > + if (end > o->vfu_mig_buf_size) { > + o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end); > + } > + > + for (i = 0; i < iovcnt; i++) { > + memcpy((o->vfu_mig_buf + o->vfu_mig_buf_size), iov[i].iov_base, > + iov[i].iov_len); > + o->vfu_mig_buf_size += iov[i].iov_len; > + o->vfu_mig_buf_pending += iov[i].iov_len; > + } > + > + return iov_size(iov, iovcnt); > +} > + > +static int vfu_mig_buf_shutdown(void *opaque, bool rd, bool wr, Error > **errp) > +{ > + VfuObject *o = opaque; > + > + o->vfu_mig_buf_size = 0; > + > + g_free(o->vfu_mig_buf); > + > + return 0; > +} > + > +static const QEMUFileOps vfu_mig_fops_save = { > + .writev_buffer = vfu_mig_buf_write, > + .shut_down = vfu_mig_buf_shutdown, > +}; > + > +static const QEMUFileOps vfu_mig_fops_load = { > + .get_buffer = vfu_mig_buf_read, > + .shut_down = vfu_mig_buf_shutdown, > +}; > + > +/** > + * handlers for vfu_migration_callbacks_t > + * > + * The libvfio-user library accesses these handlers to drive the migration > + * at the remote end, and also to transport the data stored in vfu_mig_buf > + * > + */ > +static void vfu_mig_state_precopy(vfu_ctx_t *vfu_ctx) > +{ > + VfuObject *o = vfu_get_private(vfu_ctx); > + int ret; > + > + if (!o->vfu_mig_file) { > + o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_save); > + } > + > + global_state_store(); > + > + ret = qemu_remote_savevm(o->vfu_mig_file); > + if (ret) { > + qemu_file_shutdown(o->vfu_mig_file); > + return; > + } > + > + qemu_fflush(o->vfu_mig_file); > + > + bdrv_inactivate_all(); > +} > + > +static void vfu_mig_state_running(vfu_ctx_t *vfu_ctx) > +{ > + VfuObject *o = vfu_get_private(vfu_ctx); > + Error *local_err = NULL; > + int ret; > + > + ret = qemu_remote_loadvm(o->vfu_mig_file); > + if (ret) { > + error_setg(&error_abort, "vfu: failed to restore device state"); > + return; > + } > + > + bdrv_invalidate_cache_all(&local_err); > + if (local_err) { > + error_report_err(local_err); > + return; > + } > + > + vm_start(); > +} > + > +static int vfu_mig_transition(vfu_ctx_t *vfu_ctx, vfu_migr_state_t state) > +{ > + switch (state) { > + case VFU_MIGR_STATE_RESUME: > + case VFU_MIGR_STATE_STOP_AND_COPY: > + case VFU_MIGR_STATE_STOP: > + break; Can you explain why we don't have to do anything in the above cases? > + case VFU_MIGR_STATE_PRE_COPY: > + vfu_mig_state_precopy(vfu_ctx); > + break; > + case VFU_MIGR_STATE_RUNNING: > + if (!runstate_is_running()) { > + vfu_mig_state_running(vfu_ctx); > + } > + break; > + default: > + warn_report("vfu: Unknown migration state %d", state); > + } > + > + return 0; > +} > + > +static uint64_t vfu_mig_get_pending_bytes(vfu_ctx_t *vfu_ctx) > +{ > + VfuObject *o = vfu_get_private(vfu_ctx); > + > + return o->vfu_mig_buf_pending; > +} > + > +static int vfu_mig_prepare_data(vfu_ctx_t *vfu_ctx, uint64_t *offset, > + uint64_t *size) > +{ > + VfuObject *o = vfu_get_private(vfu_ctx); > + > + if (offset) { > + *offset = 0; > + } > + > + if (size) { > + *size = o->vfu_mig_buf_size; > + } > + > + return 0; > +} > + > +static ssize_t vfu_mig_read_data(vfu_ctx_t *vfu_ctx, void *buf, > + uint64_t size, uint64_t offset) > +{ > + VfuObject *o = vfu_get_private(vfu_ctx); > + > + if (offset > o->vfu_mig_buf_size) { > + return -1; > + } > + > + if ((offset + size) > o->vfu_mig_buf_size) { > + warn_report("vfu: buffer overflow - check pending_bytes"); > + size = o->vfu_mig_buf_size - offset; > + } > + > + memcpy(buf, (o->vfu_mig_buf + offset), size); > + > + o->vfu_mig_buf_pending -= size; > + > + return size; > +} > + > +static ssize_t vfu_mig_write_data(vfu_ctx_t *vfu_ctx, void *data, > + uint64_t size, uint64_t offset) > +{ > + VfuObject *o = vfu_get_private(vfu_ctx); > + uint64_t end = offset + size; > + > + if (end > o->vfu_mig_buf_size) { > + o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end); > + o->vfu_mig_buf_size = end; > + } > + > + memcpy((o->vfu_mig_buf + offset), data, size); > + > + if (!o->vfu_mig_file) { > + o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_load); > + } > + > + return size; > +} > + > +static int vfu_mig_data_written(vfu_ctx_t *vfu_ctx, uint64_t count) > +{ > + return 0; > +} > + > +static const vfu_migration_callbacks_t vfu_mig_cbs = { > + .version = VFU_MIGR_CALLBACKS_VERS, > + .transition = &vfu_mig_transition, > + .get_pending_bytes = &vfu_mig_get_pending_bytes, > + .prepare_data = &vfu_mig_prepare_data, > + .read_data = &vfu_mig_read_data, > + .data_written = &vfu_mig_data_written, > + .write_data = &vfu_mig_write_data, > +}; > + > static void *vfu_object_ctx_run(void *opaque) > { > VfuObject *o = opaque; > @@ -332,6 +581,7 @@ static void vfu_object_machine_done(Notifier > *notifier, void *data) > { > VfuObject *o = container_of(notifier, VfuObject, machine_done); > DeviceState *dev = NULL; > + size_t migr_area_size; > int ret; > > o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket, 0, > @@ -391,6 +641,35 @@ static void vfu_object_machine_done(Notifier > *notifier, void *data) > return; > } > > + /* > + * TODO: The 0x20000 number used below is a temporary. We are > working on > + * a cleaner fix for this. > + * > + * The libvfio-user library assumes that the remote knows the size of > + * the data to be migrated at boot time, but that is not the case with > + * VMSDs, as it can contain a variable-size buffer. 0x20000 is used > + * as a sufficiently large buffer to demonstrate migration, but that > + * cannot be used as a solution. > + * > + */ The size of the migration region dictates the amount of migration data that can be produced/consumed in one-go, it's not necessarily the total size of the migration data produced/consumed throughout the migration operation. > + ret = vfu_setup_region(o->vfu_ctx, VFU_PCI_DEV_MIGR_REGION_IDX, > + 0x20000, NULL, > + VFU_REGION_FLAG_RW, NULL, 0, -1, 0); > + if (ret < 0) { > + error_setg(&error_abort, "vfu: Failed to register migration BAR %s- %s", > + o->devid, strerror(errno)); > + return; > + } > + > + migr_area_size = vfu_get_migr_register_area_size(); > + ret = vfu_setup_device_migration_callbacks(o->vfu_ctx, &vfu_mig_cbs, > + migr_area_size); > + if (ret < 0) { > + error_setg(&error_abort, "vfu: Failed to setup migration %s- %s", > + o->devid, strerror(errno)); > + return; > + } > + > qemu_thread_create(&o->vfu_ctx_thread, "VFU ctx runner", > vfu_object_ctx_run, > o, QEMU_THREAD_JOINABLE); > } > @@ -412,6 +691,14 @@ static void vfu_object_init(Object *obj) > > o->machine_done.notify = vfu_object_machine_done; > qemu_add_machine_init_done_notifier(&o->machine_done); > + > + o->vfu_mig_file = NULL; > + > + o->vfu_mig_buf = NULL; > + > + o->vfu_mig_buf_size = 0; > + > + o->vfu_mig_buf_pending = 0; > } > > static void vfu_object_finalize(Object *obj) > diff --git a/migration/savevm.c b/migration/savevm.c > index 72848b9..c2279af 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -1603,6 +1603,33 @@ static int qemu_savevm_state(QEMUFile *f, Error > **errp) > return ret; > } > > +int qemu_remote_savevm(QEMUFile *f) > +{ > + SaveStateEntry *se; > + int ret; > + > + QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { > + if (!se->vmsd || !vmstate_save_needed(se->vmsd, se->opaque)) { > + continue; > + } > + > + save_section_header(f, se, QEMU_VM_SECTION_FULL); > + > + ret = vmstate_save(f, se, NULL); > + if (ret) { > + qemu_file_set_error(f, ret); > + return ret; > + } > + > + save_section_footer(f, se); > + } > + > + qemu_put_byte(f, QEMU_VM_EOF); > + qemu_fflush(f); > + > + return 0; > +} > + > void qemu_savevm_live_state(QEMUFile *f) > { > /* save QEMU_VM_SECTION_END section */ > @@ -2443,6 +2470,42 @@ qemu_loadvm_section_start_full(QEMUFile *f, > MigrationIncomingState *mis) > return 0; > } > > +int qemu_remote_loadvm(QEMUFile *f) > +{ > + uint8_t section_type; > + int ret = 0; > + > + qemu_mutex_lock_iothread(); > + > + while (true) { > + section_type = qemu_get_byte(f); > + > + if (qemu_file_get_error(f)) { > + ret = qemu_file_get_error(f); > + break; > + } > + > + switch (section_type) { > + case QEMU_VM_SECTION_FULL: > + ret = qemu_loadvm_section_start_full(f, NULL); > + if (ret < 0) { > + break; > + } > + break; > + case QEMU_VM_EOF: > + goto out; > + default: > + ret = -EINVAL; > + goto out; > + } > + } > + > +out: > + qemu_mutex_unlock_iothread(); > + > + return ret; > +} > + > static int > qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState > *mis) > { > -- > 1.8.3.1 My background in implementing device migration with libvfio-user is for a specific device, it seems to me that you're using this functionality differently? Maybe that's why I'm getting confused. If this is the case, could you explain in more detail how you're using libvfio-user here?
Acceptance test for libvfio-user in QEMU
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
MAINTAINERS | 1 +
tests/acceptance/vfio-user.py | 94 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 95 insertions(+)
create mode 100644 tests/acceptance/vfio-user.py
diff --git a/MAINTAINERS b/MAINTAINERS
index 46ab6b6..644bd35 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3381,6 +3381,7 @@ F: include/hw/remote/proxy-memory-listener.h
F: hw/remote/iohub.c
F: include/hw/remote/iohub.h
F: hw/remote/vfio-user-obj.c
+F: tests/acceptance/vfio-user.py
EBPF:
M: Jason Wang <jasowang@redhat.com>
diff --git a/tests/acceptance/vfio-user.py b/tests/acceptance/vfio-user.py
new file mode 100644
index 0000000..ef318d9
--- /dev/null
+++ b/tests/acceptance/vfio-user.py
@@ -0,0 +1,94 @@
+# vfio-user protocol sanity test
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later. See the COPYING file in the top-level directory.
+
+
+import os
+import socket
+import uuid
+
+from avocado_qemu import Test
+from avocado_qemu import wait_for_console_pattern
+from avocado_qemu import exec_command
+from avocado_qemu import exec_command_and_wait_for_pattern
+
+class VfioUser(Test):
+ """
+ :avocado: tags=vfiouser
+ """
+ KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
+
+ def do_test(self, kernel_url, initrd_url, kernel_command_line,
+ machine_type):
+ """Main test method"""
+ self.require_accelerator('kvm')
+
+ kernel_path = self.fetch_asset(kernel_url)
+ initrd_path = self.fetch_asset(initrd_url)
+
+ socket = os.path.join('/tmp', str(uuid.uuid4()))
+ if os.path.exists(socket):
+ os.remove(socket)
+
+ # Create remote process
+ remote_vm = self.get_vm()
+ remote_vm.add_args('-machine', 'x-remote')
+ remote_vm.add_args('-nodefaults')
+ remote_vm.add_args('-device', 'lsi53c895a,id=lsi1')
+ remote_vm.add_args('-object', 'vfio-user,id=vfioobj1,'
+ 'devid=lsi1,socket='+socket)
+ remote_vm.launch()
+
+ # Create proxy process
+ self.vm.set_console()
+ self.vm.add_args('-machine', machine_type)
+ self.vm.add_args('-accel', 'kvm')
+ self.vm.add_args('-cpu', 'host')
+ self.vm.add_args('-object',
+ 'memory-backend-memfd,id=sysmem-file,size=2G')
+ self.vm.add_args('--numa', 'node,memdev=sysmem-file')
+ self.vm.add_args('-m', '2048')
+ self.vm.add_args('-kernel', kernel_path,
+ '-initrd', initrd_path,
+ '-append', kernel_command_line)
+ self.vm.add_args('-device',
+ 'vfio-user-pci,'
+ 'socket='+socket)
+ self.vm.launch()
+ wait_for_console_pattern(self, 'as init process',
+ 'Kernel panic - not syncing')
+ exec_command(self, 'mount -t sysfs sysfs /sys')
+ exec_command_and_wait_for_pattern(self,
+ 'cat /sys/bus/pci/devices/*/uevent',
+ 'PCI_ID=1000:0012')
+
+ def test_multiprocess_x86_64(self):
+ """
+ :avocado: tags=arch:x86_64
+ """
+ kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+ '/linux/releases/31/Everything/x86_64/os/images'
+ '/pxeboot/vmlinuz')
+ initrd_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+ '/linux/releases/31/Everything/x86_64/os/images'
+ '/pxeboot/initrd.img')
+ kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+ 'console=ttyS0 rdinit=/bin/bash')
+ machine_type = 'pc'
+ self.do_test(kernel_url, initrd_url, kernel_command_line, machine_type)
+
+ def test_multiprocess_aarch64(self):
+ """
+ :avocado: tags=arch:aarch64
+ """
+ kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+ '/linux/releases/31/Everything/aarch64/os/images'
+ '/pxeboot/vmlinuz')
+ initrd_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+ '/linux/releases/31/Everything/aarch64/os/images'
+ '/pxeboot/initrd.img')
+ kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+ 'rdinit=/bin/bash console=ttyAMA0')
+ machine_type = 'virt,gic-version=3'
+ self.do_test(kernel_url, initrd_url, kernel_command_line, machine_type)
--
1.8.3.1
> -----Original Message----- > From: Jagannathan Raman <jag.raman@oracle.com> > Sent: 19 July 2021 21:00 > To: qemu-devel@nongnu.org > Cc: stefanha@redhat.com; alex.williamson@redhat.com; > elena.ufimtseva@oracle.com; John Levon <john.levon@nutanix.com>; > john.g.johnson@oracle.com; Thanos Makatos > <thanos.makatos@nutanix.com>; Swapnil Ingle > <swapnil.ingle@nutanix.com>; jag.raman@oracle.com > Subject: [PATCH RFC server 11/11] vfio-user: acceptance test > > Acceptance test for libvfio-user in QEMU > > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> > Signed-off-by: John G Johnson <john.g.johnson@oracle.com> > Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> > --- > MAINTAINERS | 1 + > tests/acceptance/vfio-user.py | 94 > +++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 95 insertions(+) > create mode 100644 tests/acceptance/vfio-user.py > > diff --git a/MAINTAINERS b/MAINTAINERS > index 46ab6b6..644bd35 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -3381,6 +3381,7 @@ F: include/hw/remote/proxy-memory-listener.h > F: hw/remote/iohub.c > F: include/hw/remote/iohub.h > F: hw/remote/vfio-user-obj.c > +F: tests/acceptance/vfio-user.py > > EBPF: > M: Jason Wang <jasowang@redhat.com> > diff --git a/tests/acceptance/vfio-user.py b/tests/acceptance/vfio-user.py > new file mode 100644 > index 0000000..ef318d9 > --- /dev/null > +++ b/tests/acceptance/vfio-user.py > @@ -0,0 +1,94 @@ > +# vfio-user protocol sanity test > +# > +# This work is licensed under the terms of the GNU GPL, version 2 or > +# later. See the COPYING file in the top-level directory. > + > + > +import os > +import socket > +import uuid > + > +from avocado_qemu import Test > +from avocado_qemu import wait_for_console_pattern > +from avocado_qemu import exec_command > +from avocado_qemu import exec_command_and_wait_for_pattern > + > +class VfioUser(Test): > + """ > + :avocado: tags=vfiouser > + """ > + KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 ' > + > + def do_test(self, kernel_url, initrd_url, kernel_command_line, > + machine_type): > + """Main test method""" > + self.require_accelerator('kvm') > + > + kernel_path = self.fetch_asset(kernel_url) > + initrd_path = self.fetch_asset(initrd_url) > + > + socket = os.path.join('/tmp', str(uuid.uuid4())) > + if os.path.exists(socket): > + os.remove(socket) > + > + # Create remote process > + remote_vm = self.get_vm() > + remote_vm.add_args('-machine', 'x-remote') > + remote_vm.add_args('-nodefaults') > + remote_vm.add_args('-device', 'lsi53c895a,id=lsi1') IIUC the LSI controller will now be a migratable device and migration will be handled by vfu_mig_transition() introduced in your "vfio-user: register handlers to facilitate migration" patch. In vfu_mig_transition(), you don’t copy migration data in the VFU_MIGR_STATE_STOP_AND_COPY case but only in VFU_MIGR_STATE_PRE_COPY, however I believe that in VFIO it's possible to jump from the running state straight to the stop-and-copy state. Are you relying on QEMU not doing this? > + remote_vm.add_args('-object', 'vfio-user,id=vfioobj1,' > + 'devid=lsi1,socket='+socket) > + remote_vm.launch() > + > + # Create proxy process > + self.vm.set_console() > + self.vm.add_args('-machine', machine_type) > + self.vm.add_args('-accel', 'kvm') > + self.vm.add_args('-cpu', 'host') > + self.vm.add_args('-object', > + 'memory-backend-memfd,id=sysmem-file,size=2G') > + self.vm.add_args('--numa', 'node,memdev=sysmem-file') > + self.vm.add_args('-m', '2048') > + self.vm.add_args('-kernel', kernel_path, > + '-initrd', initrd_path, > + '-append', kernel_command_line) > + self.vm.add_args('-device', > + 'vfio-user-pci,' > + 'socket='+socket) > + self.vm.launch() > + wait_for_console_pattern(self, 'as init process', > + 'Kernel panic - not syncing') > + exec_command(self, 'mount -t sysfs sysfs /sys') > + exec_command_and_wait_for_pattern(self, > + 'cat /sys/bus/pci/devices/*/uevent', > + 'PCI_ID=1000:0012') > + > + def test_multiprocess_x86_64(self): > + """ > + :avocado: tags=arch:x86_64 > + """ > + kernel_url = ('https://urldefense.proofpoint.com/v2/url?u=https- > 3A__archives.fedoraproject.org_pub_archive_fedora&d=DwIBAg&c=s883G > pUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw6ogtti46atk736SI4vgsJiUKIyDE&m= > 4nAZXeA9xd82TON6H7CXF5LVa0jKBAJkyu0Y- > curSd4&s=hP6IktdmIVlw3gMuZlWRkPvFq9OzjUji6sb_28sapwk&e= ' > + '/linux/releases/31/Everything/x86_64/os/images' > + '/pxeboot/vmlinuz') > + initrd_url = ('https://urldefense.proofpoint.com/v2/url?u=https- > 3A__archives.fedoraproject.org_pub_archive_fedora&d=DwIBAg&c=s883G > pUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw6ogtti46atk736SI4vgsJiUKIyDE&m= > 4nAZXeA9xd82TON6H7CXF5LVa0jKBAJkyu0Y- > curSd4&s=hP6IktdmIVlw3gMuZlWRkPvFq9OzjUji6sb_28sapwk&e= ' > + '/linux/releases/31/Everything/x86_64/os/images' > + '/pxeboot/initrd.img') > + kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE + > + 'console=ttyS0 rdinit=/bin/bash') > + machine_type = 'pc' > + self.do_test(kernel_url, initrd_url, kernel_command_line, > machine_type) > + > + def test_multiprocess_aarch64(self): > + """ > + :avocado: tags=arch:aarch64 > + """ > + kernel_url = ('https://urldefense.proofpoint.com/v2/url?u=https- > 3A__archives.fedoraproject.org_pub_archive_fedora&d=DwIBAg&c=s883G > pUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw6ogtti46atk736SI4vgsJiUKIyDE&m= > 4nAZXeA9xd82TON6H7CXF5LVa0jKBAJkyu0Y- > curSd4&s=hP6IktdmIVlw3gMuZlWRkPvFq9OzjUji6sb_28sapwk&e= ' > + '/linux/releases/31/Everything/aarch64/os/images' > + '/pxeboot/vmlinuz') > + initrd_url = ('https://urldefense.proofpoint.com/v2/url?u=https- > 3A__archives.fedoraproject.org_pub_archive_fedora&d=DwIBAg&c=s883G > pUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw6ogtti46atk736SI4vgsJiUKIyDE&m= > 4nAZXeA9xd82TON6H7CXF5LVa0jKBAJkyu0Y- > curSd4&s=hP6IktdmIVlw3gMuZlWRkPvFq9OzjUji6sb_28sapwk&e= ' > + '/linux/releases/31/Everything/aarch64/os/images' > + '/pxeboot/initrd.img') > + kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE + > + 'rdinit=/bin/bash console=ttyAMA0') > + machine_type = 'virt,gic-version=3' > + self.do_test(kernel_url, initrd_url, kernel_command_line, > machine_type) > -- > 1.8.3.1
© 2016 - 2024 Red Hat, Inc.