[PATCH RFC 00/19] vfio-user implementation

Elena Ufimtseva posted 19 patches 2 years, 9 months ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/cover.1626675354.git.elena.ufimtseva@oracle.com
Maintainers: Thanos Makatos <thanos.makatos@nutanix.com>, Jagannathan Raman <jag.raman@oracle.com>, Elena Ufimtseva <elena.ufimtseva@oracle.com>, Alex Williamson <alex.williamson@redhat.com>, John G Johnson <john.g.johnson@oracle.com>
There is a newer version of this series
docs/devel/index.rst          |    1 +
docs/devel/vfio-user.rst      | 1809 +++++++++++++++++++++++++++++++++
hw/vfio/pci.h                 |   25 +-
hw/vfio/user.h                |  279 +++++
include/hw/vfio/vfio-common.h |    8 +
hw/vfio/common.c              |  273 ++++-
hw/vfio/migration.c           |   35 +-
hw/vfio/pci.c                 |  547 ++++++++--
hw/vfio/user.c                |  997 ++++++++++++++++++
MAINTAINERS                   |   10 +
hw/vfio/meson.build           |    1 +
11 files changed, 3879 insertions(+), 106 deletions(-)
create mode 100644 docs/devel/vfio-user.rst
create mode 100644 hw/vfio/user.h
create mode 100644 hw/vfio/user.c
[PATCH RFC 00/19] vfio-user implementation
Posted by Elena Ufimtseva 2 years, 9 months ago
Hi

We are happy to introduce the next stage of the multi-process QEMU project[1].

vfio-user is a protocol that allows a device to be emulated in a separate
process outside of QEMU. It encapsulates the messages sent from QEMU to the
kernel VFIO driver, and sends them to a remote process over a UNIX socket.

The vfio-user framework consists of 3 parts:
 1) The protocol specification.
 2) A server - the VFIO generic device in QEMU that exchanges the protocol messages with the client.
 3) A client - remote process that emulates a device.

This patchset implements parts 1 and 2.
The protocol's specification can be found here [2]:
We also include this as the first patch of the series.

The libvfio-user project (https://github.com/nutanix/libvfio-user)
can be used by a remote process to handle the protocol to implement the
third part.
We also worked on implementing a client and will be sending this patch
series shortly.

Contributors:

John G Johnson <john.g.johnson@oracle.com>
John Levon <john.levon@nutanix.com>
Thanos Makatos <thanos.makatos@nutanix.com>
Elena Ufimtseva <elena.ufimtseva@oracle.com>
Jagannathan Raman <jag.raman@oracle.com>

Please send your comments and questions!

Thank you.

References:
[1] https://wiki.qemu.org/Features/MultiProcessQEMU
[2] https://patchwork.kernel.org/project/qemu-devel/patch/20210614104608.212276-1-thanos.makatos@nutanix.com/

John G Johnson (18):
  vfio-user: add VFIO base abstract class
  vfio-user: define VFIO Proxy and communication functions
  vfio-user: Define type vfio_user_pci_dev_info
  vfio-user: connect vfio proxy to remote server
  vfio-user: negotiate protocol with remote server
  vfio-user: define vfio-user pci ops
  vfio-user: VFIO container setup & teardown
  vfio-user: get device info and get irq info
  vfio-user: device region read/write
  vfio-user: get region and DMA map/unmap operations
  vfio-user: probe remote device's BARs
  vfio-user: respond to remote DMA read/write requests
  vfio_user: setup MSI/X interrupts and PCI config operations
  vfio-user: vfio user device realize
  vfio-user: pci reset
  vfio-user: probe remote device ROM BAR
  vfio-user: migration support
  vfio-user: add migration cli options and version negotiation

Thanos Makatos (1):
  vfio-user: introduce vfio-user protocol specification

 docs/devel/index.rst          |    1 +
 docs/devel/vfio-user.rst      | 1809 +++++++++++++++++++++++++++++++++
 hw/vfio/pci.h                 |   25 +-
 hw/vfio/user.h                |  279 +++++
 include/hw/vfio/vfio-common.h |    8 +
 hw/vfio/common.c              |  273 ++++-
 hw/vfio/migration.c           |   35 +-
 hw/vfio/pci.c                 |  547 ++++++++--
 hw/vfio/user.c                |  997 ++++++++++++++++++
 MAINTAINERS                   |   10 +
 hw/vfio/meson.build           |    1 +
 11 files changed, 3879 insertions(+), 106 deletions(-)
 create mode 100644 docs/devel/vfio-user.rst
 create mode 100644 hw/vfio/user.h
 create mode 100644 hw/vfio/user.c

-- 
2.25.1


[PATCH RFC server 00/11] vfio-user server in QEMU
Posted by Jagannathan Raman 2 years, 9 months ago
Hi,

This series adds on to the following series from
Elena Ufimtseva <elena.ufimtseva@oracle.com>:
[PATCH RFC 00/19] vfio-user implementation

QEMU enabled out-of-process device emulation with multi-process [1].
multi-process used a custom protocol to interact between the client
and server, which is not desirable.

The vfio-user user protocol [2] implements a VFIO based mechanism to interact
between the client and server. Since VFIO is a well-established specification,
it is preferable in terms of maintenance. It makes sense for multi-process to
switch to the vfio-user protocol.

Nutanix implemented the vfio-user protocol in their libvfio-user library. The
source for this library is located below:
https://github.com/nutanix/libvfio-user

Elena previously sent the patches for the vfio-user client.

This series implements a vfio-user server for QEMU. It includes the
libvfio-user as a git submodule to QEMU, and builds it along with QEMU.

We would like to make the following notes:
  - Some of the existing multi-process code would become obsolete, and would
    need to be removed. This series does not remove them to keep the number
    of patches to a minimum. We will address them subsequently.
  - The libvfio-user library needs json-c package to build. It looks like the
    GitLab CI images used for build test don't have this package. As such it
    causes build failure.

The patches from both series are available in the following github repo:
https://github.com/oracle/qemu.git
The vfio-user-client-server branch provides the same patches along with a
python script (scripts/vfiouser-launcher.py) to launch the VM.

Contributors:
John G Johnson <john.g.johnson@oracle.com>
John Levon <john.levon@nutanix.com>
Thanos Makatos <thanos.makatos@nutanix.com>
Elena Ufimtseva <elena.ufimtseva@oracle.com>
Jagannathan Raman <jag.raman@oracle.com>

We are looking forward to your comments and questions.

Thank you!

[1]: https://patchew.org/QEMU/20210210092628.193785-1-stefanha@redhat.com/
[2]: https://patchwork.kernel.org/project/qemu-devel/patch/20210614104608.212276-1-thanos.makatos@nutanix.com/

Jagannathan Raman (11):
  vfio-user: build library
  vfio-user: define vfio-user object
  vfio-user: instantiate vfio-user context
  vfio-user: find and init PCI device
  vfio-user: run vfio-user context
  vfio-user: handle PCI config space accesses
  vfio-user: handle DMA mappings
  vfio-user: handle PCI BAR accesses
  vfio-user: handle device interrupts
  vfio-user: register handlers to facilitate migration
  vfio-user: acceptance test

 configure                     |  11 +
 meson.build                   |  35 ++
 qapi/qom.json                 |  20 +-
 include/hw/remote/iohub.h     |   2 +
 migration/savevm.h            |   2 +
 hw/remote/iohub.c             |   6 +
 hw/remote/vfio-user-obj.c     | 754 ++++++++++++++++++++++++++++++++++++++++++
 migration/savevm.c            |  63 ++++
 .gitmodules                   |   3 +
 MAINTAINERS                   |   9 +
 hw/remote/meson.build         |   3 +
 hw/remote/trace-events        |  10 +
 libvfio-user                  |   1 +
 tests/acceptance/vfio-user.py |  94 ++++++
 14 files changed, 1011 insertions(+), 2 deletions(-)
 create mode 100644 hw/remote/vfio-user-obj.c
 create mode 160000 libvfio-user
 create mode 100644 tests/acceptance/vfio-user.py

-- 
1.8.3.1


[PATCH RFC server 01/11] vfio-user: build library
Posted by Jagannathan Raman 2 years, 9 months ago
add the libvfio-user library as a submodule. build it as part of QEMU

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 configure             | 11 +++++++++++
 meson.build           | 35 +++++++++++++++++++++++++++++++++++
 .gitmodules           |  3 +++
 MAINTAINERS           |  7 +++++++
 hw/remote/meson.build |  2 ++
 libvfio-user          |  1 +
 6 files changed, 59 insertions(+)
 create mode 160000 libvfio-user

diff --git a/configure b/configure
index 49b5481..bc1c961 100755
--- a/configure
+++ b/configure
@@ -4297,6 +4297,17 @@ but not implemented on your system"
 fi
 
 ##########################################
+# check for multiprocess
+
+case "$multiprocess" in
+  auto | enabled )
+    if test "$git_submodules_action" != "ignore"; then
+      git_submodules="${git_submodules} libvfio-user"
+    fi
+    ;;
+esac
+
+##########################################
 # End of CC checks
 # After here, no more $cc or $ld runs
 
diff --git a/meson.build b/meson.build
index 6e4d2d8..f2f9f86 100644
--- a/meson.build
+++ b/meson.build
@@ -1894,6 +1894,41 @@ if get_option('cfi') and slirp_opt == 'system'
          + ' Please configure with --enable-slirp=git')
 endif
 
+vfiouser = not_found
+if have_system and multiprocess_allowed
+  have_internal = fs.exists(meson.current_source_dir() / 'libvfio-user/Makefile')
+
+  if not have_internal
+    error('libvfio-user source not found - please pull git submodule')
+  endif
+
+  vfiouser_files = [
+    'libvfio-user/lib/dma.c',
+    'libvfio-user/lib/irq.c',
+    'libvfio-user/lib/libvfio-user.c',
+    'libvfio-user/lib/migration.c',
+    'libvfio-user/lib/pci.c',
+    'libvfio-user/lib/pci_caps.c',
+    'libvfio-user/lib/tran_sock.c',
+  ]
+
+  vfiouser_inc = include_directories('libvfio-user/include', 'libvfio-user/lib')
+
+  json_c = dependency('json-c', required: false)
+  if not json_c.found()
+    json_c = dependency('libjson-c')
+  endif
+
+  libvfiouser = static_library('vfiouser',
+                               build_by_default: false,
+                               sources: vfiouser_files,
+                               dependencies: json_c,
+                               include_directories: vfiouser_inc)
+
+  vfiouser = declare_dependency(link_with: libvfiouser,
+                                include_directories: vfiouser_inc)
+endif
+
 fdt = not_found
 fdt_opt = get_option('fdt')
 if have_system
diff --git a/.gitmodules b/.gitmodules
index 08b1b48..a583a39 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -64,3 +64,6 @@
 [submodule "roms/vbootrom"]
 	path = roms/vbootrom
 	url = https://gitlab.com/qemu-project/vbootrom.git
+[submodule "libvfio-user"]
+	path = libvfio-user
+	url = https://github.com/nutanix/libvfio-user.git
diff --git a/MAINTAINERS b/MAINTAINERS
index aa4df6c..99646e7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3350,6 +3350,13 @@ F: semihosting/
 F: include/semihosting/
 F: tests/tcg/multiarch/arm-compat-semi/
 
+libvfio-user Library
+M: Thanos Makatos <thanos.makatos@nutanix.com>
+M: John Levon <john.levon@nutanix.com>
+T: https://github.com/nutanix/libvfio-user.git
+S: Maintained
+F: libvfio-user/*
+
 Multi-process QEMU
 M: Elena Ufimtseva <elena.ufimtseva@oracle.com>
 M: Jagannathan Raman <jag.raman@oracle.com>
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index e6a5574..fb35fb8 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -7,6 +7,8 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
 
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: vfiouser)
+
 specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
 specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy-memory-listener.c'))
 
diff --git a/libvfio-user b/libvfio-user
new file mode 160000
index 0000000..2a0a929
--- /dev/null
+++ b/libvfio-user
@@ -0,0 +1 @@
+Subproject commit 2a0a92912d598de871ab47c034432c5fa6546dc4
-- 
1.8.3.1


Re: [PATCH RFC server 01/11] vfio-user: build library
Posted by John Levon 2 years, 9 months ago
On Mon, Jul 19, 2021 at 04:00:03PM -0400, Jagannathan Raman wrote:

> add the libvfio-user library as a submodule. build it as part of QEMU
> 
> diff --git a/meson.build b/meson.build
> index 6e4d2d8..f2f9f86 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -1894,6 +1894,41 @@ if get_option('cfi') and slirp_opt == 'system'
>           + ' Please configure with --enable-slirp=git')
>  endif
>  
> +vfiouser = not_found
> +if have_system and multiprocess_allowed
> +  have_internal = fs.exists(meson.current_source_dir() / 'libvfio-user/Makefile')
> +
> +  if not have_internal
> +    error('libvfio-user source not found - please pull git submodule')
> +  endif
> +
> +  vfiouser_files = [
> +    'libvfio-user/lib/dma.c',
> +    'libvfio-user/lib/irq.c',
> +    'libvfio-user/lib/libvfio-user.c',
> +    'libvfio-user/lib/migration.c',
> +    'libvfio-user/lib/pci.c',
> +    'libvfio-user/lib/pci_caps.c',
> +    'libvfio-user/lib/tran_sock.c',
> +  ]
> +
> +  vfiouser_inc = include_directories('libvfio-user/include', 'libvfio-user/lib')
> +
> +  json_c = dependency('json-c', required: false)
> +  if not json_c.found()
> +    json_c = dependency('libjson-c')
> +  endif
> +
> +  libvfiouser = static_library('vfiouser',
> +                               build_by_default: false,
> +                               sources: vfiouser_files,
> +                               dependencies: json_c,
> +                               include_directories: vfiouser_inc)
> +
> +  vfiouser = declare_dependency(link_with: libvfiouser,
> +                                include_directories: vfiouser_inc)
> +endif

Why this way, rather than recursing into the submodule? Seems a bit fragile to
encode details of the library here.

regards
john
Re: [PATCH RFC server 01/11] vfio-user: build library
Posted by Jag Raman 2 years, 9 months ago

> On Jul 19, 2021, at 4:24 PM, John Levon <john.levon@nutanix.com> wrote:
> 
> On Mon, Jul 19, 2021 at 04:00:03PM -0400, Jagannathan Raman wrote:
> 
>> add the libvfio-user library as a submodule. build it as part of QEMU
>> 
>> diff --git a/meson.build b/meson.build
>> index 6e4d2d8..f2f9f86 100644
>> --- a/meson.build
>> +++ b/meson.build
>> @@ -1894,6 +1894,41 @@ if get_option('cfi') and slirp_opt == 'system'
>>          + ' Please configure with --enable-slirp=git')
>> endif
>> 
>> +vfiouser = not_found
>> +if have_system and multiprocess_allowed
>> +  have_internal = fs.exists(meson.current_source_dir() / 'libvfio-user/Makefile')
>> +
>> +  if not have_internal
>> +    error('libvfio-user source not found - please pull git submodule')
>> +  endif
>> +
>> +  vfiouser_files = [
>> +    'libvfio-user/lib/dma.c',
>> +    'libvfio-user/lib/irq.c',
>> +    'libvfio-user/lib/libvfio-user.c',
>> +    'libvfio-user/lib/migration.c',
>> +    'libvfio-user/lib/pci.c',
>> +    'libvfio-user/lib/pci_caps.c',
>> +    'libvfio-user/lib/tran_sock.c',
>> +  ]
>> +
>> +  vfiouser_inc = include_directories('libvfio-user/include', 'libvfio-user/lib')
>> +
>> +  json_c = dependency('json-c', required: false)
>> +  if not json_c.found()
>> +    json_c = dependency('libjson-c')
>> +  endif
>> +
>> +  libvfiouser = static_library('vfiouser',
>> +                               build_by_default: false,
>> +                               sources: vfiouser_files,
>> +                               dependencies: json_c,
>> +                               include_directories: vfiouser_inc)
>> +
>> +  vfiouser = declare_dependency(link_with: libvfiouser,
>> +                                include_directories: vfiouser_inc)
>> +endif
> 
> Why this way, rather than recursing into the submodule? Seems a bit fragile to
> encode details of the library here.

+maintainers of meson.build. I apologize for not adding them when I sent the
patches out initially. I copied the email list from Elena, but Elena did not make
any changes to meson.build - stupid me.

John, 

    This way appears to be present convention with QEMU - I’m also not very clear
on the reason for it.

For example submodules such as slirp (libslirp), capstone (libcapstone),
dtc (libfdt) are built this way.

I’m guessing it’s because QEMU doesn’t build all parts of a submodule. For
example, QEMU only builds libfdt in the doc submodule. Similarly,
libvfio-user only builds the core library without building the tests and samples.

> 
> regards
> john

Re: [PATCH RFC server 01/11] vfio-user: build library
Posted by Marc-André Lureau 2 years, 9 months ago
Hi

On Tue, Jul 20, 2021 at 4:12 PM Jag Raman <jag.raman@oracle.com> wrote:

>
>
> > On Jul 19, 2021, at 4:24 PM, John Levon <john.levon@nutanix.com> wrote:
> >
> > On Mon, Jul 19, 2021 at 04:00:03PM -0400, Jagannathan Raman wrote:
> >
> >> add the libvfio-user library as a submodule. build it as part of QEMU
> >>
> >> diff --git a/meson.build b/meson.build
> >> index 6e4d2d8..f2f9f86 100644
> >> --- a/meson.build
> >> +++ b/meson.build
> >> @@ -1894,6 +1894,41 @@ if get_option('cfi') and slirp_opt == 'system'
> >>          + ' Please configure with --enable-slirp=git')
> >> endif
> >>
> >> +vfiouser = not_found
> >> +if have_system and multiprocess_allowed
> >> +  have_internal = fs.exists(meson.current_source_dir() /
> 'libvfio-user/Makefile')
> >> +
> >> +  if not have_internal
> >> +    error('libvfio-user source not found - please pull git submodule')
> >> +  endif
> >> +
> >> +  vfiouser_files = [
> >> +    'libvfio-user/lib/dma.c',
> >> +    'libvfio-user/lib/irq.c',
> >> +    'libvfio-user/lib/libvfio-user.c',
> >> +    'libvfio-user/lib/migration.c',
> >> +    'libvfio-user/lib/pci.c',
> >> +    'libvfio-user/lib/pci_caps.c',
> >> +    'libvfio-user/lib/tran_sock.c',
> >> +  ]
> >> +
> >> +  vfiouser_inc = include_directories('libvfio-user/include',
> 'libvfio-user/lib')
> >> +
> >> +  json_c = dependency('json-c', required: false)
> >> +  if not json_c.found()
> >> +    json_c = dependency('libjson-c')
> >> +  endif
> >> +
> >> +  libvfiouser = static_library('vfiouser',
> >> +                               build_by_default: false,
> >> +                               sources: vfiouser_files,
> >> +                               dependencies: json_c,
> >> +                               include_directories: vfiouser_inc)
> >> +
> >> +  vfiouser = declare_dependency(link_with: libvfiouser,
> >> +                                include_directories: vfiouser_inc)
> >> +endif
> >
> > Why this way, rather than recursing into the submodule? Seems a bit
> fragile to
> > encode details of the library here.
>
> +maintainers of meson.build. I apologize for not adding them when I sent
> the
> patches out initially. I copied the email list from Elena, but Elena did
> not make
> any changes to meson.build - stupid me.
>
> John,
>
>     This way appears to be present convention with QEMU - I’m also not
> very clear
> on the reason for it.
>
> For example submodules such as slirp (libslirp), capstone (libcapstone),
> dtc (libfdt) are built this way.
>

For slirp and dtc, we are eventually going to use meson subproject(). No
idea about capstone.

>
> I’m guessing it’s because QEMU doesn’t build all parts of a submodule. For
> example, QEMU only builds libfdt in the doc submodule. Similarly,
> libvfio-user only builds the core library without building the tests and
> samples.
>
>
You can give subproject options to build limited parts.

Fwiw, since libvfio-user uses cmake, we may be able to use meson
cmake.subproject() (https://mesonbuild.com/CMake-module.html).

-- 
Marc-André Lureau
Re: [PATCH RFC server 01/11] vfio-user: build library
Posted by John Levon 2 years, 9 months ago
On Tue, Jul 20, 2021 at 04:20:13PM +0400, Marc-André Lureau wrote:

> > >> +  libvfiouser = static_library('vfiouser',
> > >> +                               build_by_default: false,
> > >> +                               sources: vfiouser_files,
> > >> +                               dependencies: json_c,
> > >> +                               include_directories: vfiouser_inc)
> >
> >     This way appears to be present convention with QEMU - I’m also not
> > very clear
> > on the reason for it.
> >
> > I’m guessing it’s because QEMU doesn’t build all parts of a submodule. For
> > example, QEMU only builds libfdt in the doc submodule. Similarly,
> > libvfio-user only builds the core library without building the tests and
> > samples.
> >
> You can give subproject options to build limited parts.
> 
> Fwiw, since libvfio-user uses cmake, we may be able to use meson
> cmake.subproject() (https://mesonbuild.com/CMake-module.html).

That'd be great. We also briefly discussed moving away from cmake anyway - since
both SPDK and qemu are meson-based, it seems like it would make sense. I'd
prefer it to be easy to regularly update libvfio-user within these projects.

Ideally, running qemu tests would actually run libvfio-user tests too, for some
level of assurance on the library's internal expectations.

regards
john

[PATCH RFC server 02/11] vfio-user: define vfio-user object
Posted by Jagannathan Raman 2 years, 9 months ago
Define vfio-user object which is remote process server for QEMU. Setup
object initialization functions and properties necessary to instantiate
the object

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 qapi/qom.json             |  20 ++++++-
 hw/remote/vfio-user-obj.c | 141 ++++++++++++++++++++++++++++++++++++++++++++++
 MAINTAINERS               |   1 +
 hw/remote/meson.build     |   1 +
 hw/remote/trace-events    |   3 +
 5 files changed, 164 insertions(+), 2 deletions(-)
 create mode 100644 hw/remote/vfio-user-obj.c

diff --git a/qapi/qom.json b/qapi/qom.json
index 652be31..e0716d2 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -684,6 +684,20 @@
   'data': { 'fd': 'str', 'devid': 'str' } }
 
 ##
+# @VfioUserProperties:
+#
+# Properties for vfio-user objects.
+#
+# @socket: path to be used as socket by the libvfiouser library
+#
+# @devid: the id of the device to be associated with the file descriptor
+#
+# Since: 6.0
+##
+{ 'struct': 'VfioUserProperties',
+  'data': { 'socket': 'str', 'devid': 'str' } }
+
+##
 # @RngProperties:
 #
 # Properties for objects of classes derived from rng.
@@ -807,7 +821,8 @@
     'tls-creds-psk',
     'tls-creds-x509',
     'tls-cipher-suites',
-    'x-remote-object'
+    'x-remote-object',
+    'vfio-user'
   ] }
 
 ##
@@ -863,7 +878,8 @@
       'tls-creds-psk':              'TlsCredsPskProperties',
       'tls-creds-x509':             'TlsCredsX509Properties',
       'tls-cipher-suites':          'TlsCredsProperties',
-      'x-remote-object':            'RemoteObjectProperties'
+      'x-remote-object':            'RemoteObjectProperties',
+      'vfio-user':                  'VfioUserProperties'
   } }
 
 ##
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
new file mode 100644
index 0000000..5098169
--- /dev/null
+++ b/hw/remote/vfio-user-obj.c
@@ -0,0 +1,141 @@
+/*
+ * QEMU vfio-user server object
+ *
+ * Copyright © 2021 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL-v2, version 2 or later.
+ *
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/**
+ * Usage: add options:
+ *     -machine x-remote
+ *     -device <PCI-device>,id=<pci-dev-id>
+ *     -object vfio-user,id=<id>,socket=<socket-path>,devid=<pci-dev-id>
+ *
+ * Note that vfio-user object must be used with x-remote machine only. This
+ * server could only support PCI devices for now.
+ *
+ * socket is path to a file. This file will be created by the server. It is
+ * a required option
+ *
+ * devid is the id of a PCI device on the server. It is also a required option.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "qom/object.h"
+#include "qom/object_interfaces.h"
+#include "qemu/error-report.h"
+#include "trace.h"
+#include "sysemu/runstate.h"
+
+#define TYPE_VFU_OBJECT "vfio-user"
+OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
+
+struct VfuObjectClass {
+    ObjectClass parent_class;
+
+    unsigned int nr_devs;
+
+    /* Maximum number of devices the server could support*/
+    unsigned int max_devs;
+};
+
+struct VfuObject {
+    /* private */
+    Object parent;
+
+    char *socket;
+    char *devid;
+};
+
+static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
+{
+    VfuObject *o = VFU_OBJECT(obj);
+
+    g_free(o->socket);
+
+    o->socket = g_strdup(str);
+
+    trace_vfu_prop("socket", str);
+}
+
+static void vfu_object_set_devid(Object *obj, const char *str, Error **errp)
+{
+    VfuObject *o = VFU_OBJECT(obj);
+
+    g_free(o->devid);
+
+    o->devid = g_strdup(str);
+
+    trace_vfu_prop("devid", str);
+}
+
+static void vfu_object_init(Object *obj)
+{
+    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
+
+    /* Add test for remote machine and PCI device */
+
+    if (k->nr_devs >= k->max_devs) {
+        error_report("Reached maximum number of vfio-user devices: %u",
+                     k->max_devs);
+        return;
+    }
+
+    k->nr_devs++;
+}
+
+static void vfu_object_finalize(Object *obj)
+{
+    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
+    VfuObject *o = VFU_OBJECT(obj);
+
+    k->nr_devs--;
+
+    g_free(o->socket);
+    g_free(o->devid);
+
+    if (k->nr_devs == 0) {
+        qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+    }
+}
+
+static void vfu_object_class_init(ObjectClass *klass, void *data)
+{
+    VfuObjectClass *k = VFU_OBJECT_CLASS(klass);
+
+    /* Limiting maximum number of devices to 1 until IOMMU support is added */
+    k->max_devs = 1;
+    k->nr_devs = 0;
+
+    object_class_property_add_str(klass, "socket", NULL,
+                                  vfu_object_set_socket);
+    object_class_property_add_str(klass, "devid", NULL,
+                                  vfu_object_set_devid);
+}
+
+static const TypeInfo vfu_object_info = {
+    .name = TYPE_VFU_OBJECT,
+    .parent = TYPE_OBJECT,
+    .instance_size = sizeof(VfuObject),
+    .instance_init = vfu_object_init,
+    .instance_finalize = vfu_object_finalize,
+    .class_size = sizeof(VfuObjectClass),
+    .class_init = vfu_object_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_USER_CREATABLE },
+        { }
+    }
+};
+
+static void vfu_register_types(void)
+{
+    type_register_static(&vfu_object_info);
+}
+
+type_init(vfu_register_types);
diff --git a/MAINTAINERS b/MAINTAINERS
index 99646e7..46ab6b6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3380,6 +3380,7 @@ F: hw/remote/proxy-memory-listener.c
 F: include/hw/remote/proxy-memory-listener.h
 F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
+F: hw/remote/vfio-user-obj.c
 
 EBPF:
 M: Jason Wang <jasowang@redhat.com>
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index fb35fb8..cd44dfc 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -6,6 +6,7 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('vfio-user-obj.c'))
 
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: vfiouser)
 
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 0b23974..7da12f0 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -2,3 +2,6 @@
 
 mpqemu_send_io_error(int cmd, int size, int nfds) "send command %d size %d, %d file descriptors to remote process"
 mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d, %d file descriptors to remote process"
+
+# vfio-user-obj.c
+vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
-- 
1.8.3.1


[PATCH RFC server 03/11] vfio-user: instantiate vfio-user context
Posted by Jagannathan Raman 2 years, 9 months ago
create a context with the vfio-user library for a device

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 hw/remote/vfio-user-obj.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 5098169..adb3193 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -27,11 +27,18 @@
 #include "qemu/osdep.h"
 #include "qemu-common.h"
 
+#include <errno.h>
+
 #include "qom/object.h"
 #include "qom/object_interfaces.h"
 #include "qemu/error-report.h"
 #include "trace.h"
 #include "sysemu/runstate.h"
+#include "qemu/notify.h"
+#include "qapi/error.h"
+#include "sysemu/sysemu.h"
+
+#include "libvfio-user/include/libvfio-user.h"
 
 #define TYPE_VFU_OBJECT "vfio-user"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -51,6 +58,10 @@ struct VfuObject {
 
     char *socket;
     char *devid;
+
+    Notifier machine_done;
+
+    vfu_ctx_t *vfu_ctx;
 };
 
 static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
@@ -75,9 +86,23 @@ static void vfu_object_set_devid(Object *obj, const char *str, Error **errp)
     trace_vfu_prop("devid", str);
 }
 
+static void vfu_object_machine_done(Notifier *notifier, void *data)
+{
+    VfuObject *o = container_of(notifier, VfuObject, machine_done);
+
+    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket, 0,
+                                o, VFU_DEV_TYPE_PCI);
+    if (o->vfu_ctx == NULL) {
+        error_setg(&error_abort, "vfu: Failed to create context - %s",
+                   strerror(errno));
+        return;
+    }
+}
+
 static void vfu_object_init(Object *obj)
 {
     VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
+    VfuObject *o = VFU_OBJECT(obj);
 
     /* Add test for remote machine and PCI device */
 
@@ -88,6 +113,9 @@ static void vfu_object_init(Object *obj)
     }
 
     k->nr_devs++;
+
+    o->machine_done.notify = vfu_object_machine_done;
+    qemu_add_machine_init_done_notifier(&o->machine_done);
 }
 
 static void vfu_object_finalize(Object *obj)
@@ -97,6 +125,8 @@ static void vfu_object_finalize(Object *obj)
 
     k->nr_devs--;
 
+    vfu_destroy_ctx(o->vfu_ctx);
+
     g_free(o->socket);
     g_free(o->devid);
 
-- 
1.8.3.1


[PATCH RFC server 04/11] vfio-user: find and init PCI device
Posted by Jagannathan Raman 2 years, 9 months ago
Find the PCI device with specified id. Initialize the device context
with the QEMU PCI device

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 hw/remote/vfio-user-obj.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index adb3193..e362709 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -37,6 +37,8 @@
 #include "qemu/notify.h"
 #include "qapi/error.h"
 #include "sysemu/sysemu.h"
+#include "hw/qdev-core.h"
+#include "hw/pci/pci.h"
 
 #include "libvfio-user/include/libvfio-user.h"
 
@@ -62,6 +64,8 @@ struct VfuObject {
     Notifier machine_done;
 
     vfu_ctx_t *vfu_ctx;
+
+    PCIDevice *pci_dev;
 };
 
 static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
@@ -89,6 +93,8 @@ static void vfu_object_set_devid(Object *obj, const char *str, Error **errp)
 static void vfu_object_machine_done(Notifier *notifier, void *data)
 {
     VfuObject *o = container_of(notifier, VfuObject, machine_done);
+    DeviceState *dev = NULL;
+    int ret;
 
     o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket, 0,
                                 o, VFU_DEV_TYPE_PCI);
@@ -97,6 +103,28 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
                    strerror(errno));
         return;
     }
+
+    dev = qdev_find_recursive(sysbus_get_default(), o->devid);
+    if (dev == NULL) {
+        error_setg(&error_abort, "vfu: Device %s not found", o->devid);
+        return;
+    }
+    o->pci_dev = PCI_DEVICE(dev);
+
+    ret = vfu_pci_init(o->vfu_ctx, VFU_PCI_TYPE_CONVENTIONAL,
+                       PCI_HEADER_TYPE_NORMAL, 0);
+    if (ret < 0) {
+        error_setg(&error_abort,
+                   "vfu: Failed to attach PCI device %s to context - %s",
+                   o->devid, strerror(errno));
+        return;
+    }
+
+    vfu_pci_set_id(o->vfu_ctx,
+                   pci_get_word(o->pci_dev->config + PCI_VENDOR_ID),
+                   pci_get_word(o->pci_dev->config + PCI_DEVICE_ID),
+                   pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_VENDOR_ID),
+                   pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_ID));
 }
 
 static void vfu_object_init(Object *obj)
-- 
1.8.3.1


Re: [PATCH RFC server 04/11] vfio-user: find and init PCI device
Posted by John Levon 2 years, 9 months ago
On Mon, Jul 19, 2021 at 04:00:06PM -0400, Jagannathan Raman wrote:

> +    vfu_pci_set_id(o->vfu_ctx,
> +                   pci_get_word(o->pci_dev->config + PCI_VENDOR_ID),
> +                   pci_get_word(o->pci_dev->config + PCI_DEVICE_ID),
> +                   pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_VENDOR_ID),
> +                   pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_ID));

Since you handle all config space accesses yourselves, is there even any need
for this?

regards
john

Re: [PATCH RFC server 04/11] vfio-user: find and init PCI device
Posted by Jag Raman 2 years, 9 months ago

> On Jul 26, 2021, at 11:05 AM, John Levon <levon@movementarian.org> wrote:
> 
> On Mon, Jul 19, 2021 at 04:00:06PM -0400, Jagannathan Raman wrote:
> 
>> +    vfu_pci_set_id(o->vfu_ctx,
>> +                   pci_get_word(o->pci_dev->config + PCI_VENDOR_ID),
>> +                   pci_get_word(o->pci_dev->config + PCI_DEVICE_ID),
>> +                   pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_VENDOR_ID),
>> +                   pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_ID));
> 
> Since you handle all config space accesses yourselves, is there even any need
> for this?

I think that makes sense. Since the QEMU server handles all the config space
accesses, it’s not necessary to register the device’s vendor/device ID with the library.

Thank you!
--
Jag

> 
> regards
> john

[PATCH RFC server 05/11] vfio-user: run vfio-user context
Posted by Jagannathan Raman 2 years, 9 months ago
Setup a separate thread to run the vfio-user context. The thread acts as
the main loop for the device.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 hw/remote/vfio-user-obj.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index e362709..6a2d0f5 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -35,6 +35,7 @@
 #include "trace.h"
 #include "sysemu/runstate.h"
 #include "qemu/notify.h"
+#include "qemu/thread.h"
 #include "qapi/error.h"
 #include "sysemu/sysemu.h"
 #include "hw/qdev-core.h"
@@ -66,6 +67,8 @@ struct VfuObject {
     vfu_ctx_t *vfu_ctx;
 
     PCIDevice *pci_dev;
+
+    QemuThread vfu_ctx_thread;
 };
 
 static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
@@ -90,6 +93,44 @@ static void vfu_object_set_devid(Object *obj, const char *str, Error **errp)
     trace_vfu_prop("devid", str);
 }
 
+static void *vfu_object_ctx_run(void *opaque)
+{
+    VfuObject *o = opaque;
+    int ret;
+
+    ret = vfu_realize_ctx(o->vfu_ctx);
+    if (ret < 0) {
+        error_setg(&error_abort, "vfu: Failed to realize device %s- %s",
+                   o->devid, strerror(errno));
+        return NULL;
+    }
+
+    ret = vfu_attach_ctx(o->vfu_ctx);
+    if (ret < 0) {
+        error_setg(&error_abort,
+                   "vfu: Failed to attach device %s to context - %s",
+                   o->devid, strerror(errno));
+        return NULL;
+    }
+
+    do {
+        ret = vfu_run_ctx(o->vfu_ctx);
+        if (ret < 0) {
+            if (errno == EINTR) {
+                ret = 0;
+            } else if (errno == ENOTCONN) {
+                object_unparent(OBJECT(o));
+                break;
+            } else {
+                error_setg(&error_abort, "vfu: Failed to run device %s - %s",
+                           o->devid, strerror(errno));
+            }
+        }
+    } while (ret == 0);
+
+    return NULL;
+}
+
 static void vfu_object_machine_done(Notifier *notifier, void *data)
 {
     VfuObject *o = container_of(notifier, VfuObject, machine_done);
@@ -125,6 +166,9 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
                    pci_get_word(o->pci_dev->config + PCI_DEVICE_ID),
                    pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_VENDOR_ID),
                    pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_ID));
+
+    qemu_thread_create(&o->vfu_ctx_thread, "VFU ctx runner", vfu_object_ctx_run,
+                       o, QEMU_THREAD_JOINABLE);
 }
 
 static void vfu_object_init(Object *obj)
-- 
1.8.3.1


RE: [PATCH RFC server 05/11] vfio-user: run vfio-user context
Posted by Thanos Makatos 2 years, 9 months ago
> -----Original Message-----
> From: Jagannathan Raman <jag.raman@oracle.com>
> Sent: 19 July 2021 21:00
> To: qemu-devel@nongnu.org
> Cc: stefanha@redhat.com; alex.williamson@redhat.com;
> elena.ufimtseva@oracle.com; John Levon <john.levon@nutanix.com>;
> john.g.johnson@oracle.com; Thanos Makatos
> <thanos.makatos@nutanix.com>; Swapnil Ingle
> <swapnil.ingle@nutanix.com>; jag.raman@oracle.com
> Subject: [PATCH RFC server 05/11] vfio-user: run vfio-user context
> 
> Setup a separate thread to run the vfio-user context. The thread acts as
> the main loop for the device.

In your "vfio-user: instantiate vfio-user context" patch you create the vfu context in blocking-mode, so the only way to run device emulation is in a separate thread.
Were you going to create a separate thread anyway? You can run device emulation in polling mode therefore you can avoid creating a separate thread, thus saving resources. Do plan to do that in the future?

> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  hw/remote/vfio-user-obj.c | 44
> ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 44 insertions(+)
> 
> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> index e362709..6a2d0f5 100644
> --- a/hw/remote/vfio-user-obj.c
> +++ b/hw/remote/vfio-user-obj.c
> @@ -35,6 +35,7 @@
>  #include "trace.h"
>  #include "sysemu/runstate.h"
>  #include "qemu/notify.h"
> +#include "qemu/thread.h"
>  #include "qapi/error.h"
>  #include "sysemu/sysemu.h"
>  #include "hw/qdev-core.h"
> @@ -66,6 +67,8 @@ struct VfuObject {
>      vfu_ctx_t *vfu_ctx;
> 
>      PCIDevice *pci_dev;
> +
> +    QemuThread vfu_ctx_thread;
>  };
> 
>  static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
> @@ -90,6 +93,44 @@ static void vfu_object_set_devid(Object *obj, const
> char *str, Error **errp)
>      trace_vfu_prop("devid", str);
>  }
> 
> +static void *vfu_object_ctx_run(void *opaque)
> +{
> +    VfuObject *o = opaque;
> +    int ret;
> +
> +    ret = vfu_realize_ctx(o->vfu_ctx);
> +    if (ret < 0) {
> +        error_setg(&error_abort, "vfu: Failed to realize device %s- %s",
> +                   o->devid, strerror(errno));
> +        return NULL;
> +    }
> +
> +    ret = vfu_attach_ctx(o->vfu_ctx);
> +    if (ret < 0) {
> +        error_setg(&error_abort,
> +                   "vfu: Failed to attach device %s to context - %s",
> +                   o->devid, strerror(errno));
> +        return NULL;
> +    }
> +
> +    do {
> +        ret = vfu_run_ctx(o->vfu_ctx);
> +        if (ret < 0) {
> +            if (errno == EINTR) {
> +                ret = 0;
> +            } else if (errno == ENOTCONN) {
> +                object_unparent(OBJECT(o));
> +                break;
> +            } else {
> +                error_setg(&error_abort, "vfu: Failed to run device %s - %s",
> +                           o->devid, strerror(errno));
> +            }
> +        }
> +    } while (ret == 0);
> +
> +    return NULL;
> +}
> +
>  static void vfu_object_machine_done(Notifier *notifier, void *data)
>  {
>      VfuObject *o = container_of(notifier, VfuObject, machine_done);
> @@ -125,6 +166,9 @@ static void vfu_object_machine_done(Notifier
> *notifier, void *data)
>                     pci_get_word(o->pci_dev->config + PCI_DEVICE_ID),
>                     pci_get_word(o->pci_dev->config +
> PCI_SUBSYSTEM_VENDOR_ID),
>                     pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_ID));
> +
> +    qemu_thread_create(&o->vfu_ctx_thread, "VFU ctx runner",
> vfu_object_ctx_run,
> +                       o, QEMU_THREAD_JOINABLE);
>  }
> 
>  static void vfu_object_init(Object *obj)
> --
> 1.8.3.1

Re: [PATCH RFC server 05/11] vfio-user: run vfio-user context
Posted by Jag Raman 2 years, 8 months ago

> On Jul 20, 2021, at 10:17 AM, Thanos Makatos <thanos.makatos@nutanix.com> wrote:
> 
>> -----Original Message-----
>> From: Jagannathan Raman <jag.raman@oracle.com>
>> Sent: 19 July 2021 21:00
>> To: qemu-devel@nongnu.org
>> Cc: stefanha@redhat.com; alex.williamson@redhat.com;
>> elena.ufimtseva@oracle.com; John Levon <john.levon@nutanix.com>;
>> john.g.johnson@oracle.com; Thanos Makatos
>> <thanos.makatos@nutanix.com>; Swapnil Ingle
>> <swapnil.ingle@nutanix.com>; jag.raman@oracle.com
>> Subject: [PATCH RFC server 05/11] vfio-user: run vfio-user context
>> 
>> Setup a separate thread to run the vfio-user context. The thread acts as
>> the main loop for the device.
> 
> In your "vfio-user: instantiate vfio-user context" patch you create the vfu context in blocking-mode, so the only way to run device emulation is in a separate thread.
> Were you going to create a separate thread anyway? You can run device emulation in polling mode therefore you can avoid creating a separate thread, thus saving resources. Do plan to do that in the future?

Thanks for the information about the Blocking and Non-Blocking mode.

I’d like to explain why we are using a separate thread presently and
check with you if it’s possible to poll on multiple vfu contexts at the
same time (similar to select/poll for fds).

Concerning my understanding on how devices are executed in QEMU,
QEMU initializes the device instance - where the device registers
callbacks for BAR and config space accesses. The device is then
subsequently driven by these callbacks - whenever the vcpu thread tries
to access the BAR addresses or places a config space access to the PCI
bus, the vcpu exits to QEMU which handles these accesses. As such, the
device is driven by the vcpu thread. Since there are no vcpu threads in the
remote process, we created a separate thread as a replacement. As you
can see already, this thread blocks on vfu_run_ctx() which I believe polls
on the socket for messages from client.

If there is a way to run multiple vfu contexts at the same time, that would
help with conserving threads on the host CPU. For example, if there’s a
way to add vfu contexts to a list of contexts that expect messages from
client, that could be a good idea. Alternatively, this QEMU server could
also implement a similar mechanism to group all non-blocking vfu
contexts to just a single thread, instead of having separate threads for
each context.

--
Jag

> 
>> 
>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>> ---
>> hw/remote/vfio-user-obj.c | 44
>> ++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 44 insertions(+)
>> 
>> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
>> index e362709..6a2d0f5 100644
>> --- a/hw/remote/vfio-user-obj.c
>> +++ b/hw/remote/vfio-user-obj.c
>> @@ -35,6 +35,7 @@
>> #include "trace.h"
>> #include "sysemu/runstate.h"
>> #include "qemu/notify.h"
>> +#include "qemu/thread.h"
>> #include "qapi/error.h"
>> #include "sysemu/sysemu.h"
>> #include "hw/qdev-core.h"
>> @@ -66,6 +67,8 @@ struct VfuObject {
>>     vfu_ctx_t *vfu_ctx;
>> 
>>     PCIDevice *pci_dev;
>> +
>> +    QemuThread vfu_ctx_thread;
>> };
>> 
>> static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
>> @@ -90,6 +93,44 @@ static void vfu_object_set_devid(Object *obj, const
>> char *str, Error **errp)
>>     trace_vfu_prop("devid", str);
>> }
>> 
>> +static void *vfu_object_ctx_run(void *opaque)
>> +{
>> +    VfuObject *o = opaque;
>> +    int ret;
>> +
>> +    ret = vfu_realize_ctx(o->vfu_ctx);
>> +    if (ret < 0) {
>> +        error_setg(&error_abort, "vfu: Failed to realize device %s- %s",
>> +                   o->devid, strerror(errno));
>> +        return NULL;
>> +    }
>> +
>> +    ret = vfu_attach_ctx(o->vfu_ctx);
>> +    if (ret < 0) {
>> +        error_setg(&error_abort,
>> +                   "vfu: Failed to attach device %s to context - %s",
>> +                   o->devid, strerror(errno));
>> +        return NULL;
>> +    }
>> +
>> +    do {
>> +        ret = vfu_run_ctx(o->vfu_ctx);
>> +        if (ret < 0) {
>> +            if (errno == EINTR) {
>> +                ret = 0;
>> +            } else if (errno == ENOTCONN) {
>> +                object_unparent(OBJECT(o));
>> +                break;
>> +            } else {
>> +                error_setg(&error_abort, "vfu: Failed to run device %s - %s",
>> +                           o->devid, strerror(errno));
>> +            }
>> +        }
>> +    } while (ret == 0);
>> +
>> +    return NULL;
>> +}
>> +
>> static void vfu_object_machine_done(Notifier *notifier, void *data)
>> {
>>     VfuObject *o = container_of(notifier, VfuObject, machine_done);
>> @@ -125,6 +166,9 @@ static void vfu_object_machine_done(Notifier
>> *notifier, void *data)
>>                    pci_get_word(o->pci_dev->config + PCI_DEVICE_ID),
>>                    pci_get_word(o->pci_dev->config +
>> PCI_SUBSYSTEM_VENDOR_ID),
>>                    pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_ID));
>> +
>> +    qemu_thread_create(&o->vfu_ctx_thread, "VFU ctx runner",
>> vfu_object_ctx_run,
>> +                       o, QEMU_THREAD_JOINABLE);
>> }
>> 
>> static void vfu_object_init(Object *obj)
>> --
>> 1.8.3.1
> 

[PATCH RFC server 06/11] vfio-user: handle PCI config space accesses
Posted by Jagannathan Raman 2 years, 9 months ago
Define and register handlers for PCI config space accesses

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 hw/remote/vfio-user-obj.c | 41 +++++++++++++++++++++++++++++++++++++++++
 hw/remote/trace-events    |  2 ++
 2 files changed, 43 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 6a2d0f5..60d9fa8 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -36,6 +36,7 @@
 #include "sysemu/runstate.h"
 #include "qemu/notify.h"
 #include "qemu/thread.h"
+#include "qemu/main-loop.h"
 #include "qapi/error.h"
 #include "sysemu/sysemu.h"
 #include "hw/qdev-core.h"
@@ -131,6 +132,35 @@ static void *vfu_object_ctx_run(void *opaque)
     return NULL;
 }
 
+static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
+                                     size_t count, loff_t offset,
+                                     const bool is_write)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    uint32_t val = 0;
+    int i;
+
+    qemu_mutex_lock_iothread();
+
+    for (i = 0; i < count; i++) {
+        if (is_write) {
+            val = *((uint8_t *)(buf + i));
+            trace_vfu_cfg_write((offset + i), val);
+            pci_default_write_config(PCI_DEVICE(o->pci_dev),
+                                     (offset + i), val, 1);
+        } else {
+            val = pci_default_read_config(PCI_DEVICE(o->pci_dev),
+                                          (offset + i), 1);
+            *((uint8_t *)(buf + i)) = (uint8_t)val;
+            trace_vfu_cfg_read((offset + i), val);
+        }
+    }
+
+    qemu_mutex_unlock_iothread();
+
+    return count;
+}
+
 static void vfu_object_machine_done(Notifier *notifier, void *data)
 {
     VfuObject *o = container_of(notifier, VfuObject, machine_done);
@@ -167,6 +197,17 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
                    pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_VENDOR_ID),
                    pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_ID));
 
+    ret = vfu_setup_region(o->vfu_ctx, VFU_PCI_DEV_CFG_REGION_IDX,
+                           pci_config_size(o->pci_dev), &vfu_object_cfg_access,
+                           VFU_REGION_FLAG_RW | VFU_REGION_FLAG_ALWAYS_CB,
+                           NULL, 0, -1, 0);
+    if (ret < 0) {
+        error_setg(&error_abort,
+                   "vfu: Failed to setup config space handlers for %s- %s",
+                   o->devid, strerror(errno));
+        return;
+    }
+
     qemu_thread_create(&o->vfu_ctx_thread, "VFU ctx runner", vfu_object_ctx_run,
                        o, QEMU_THREAD_JOINABLE);
 }
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 7da12f0..2ef7884 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -5,3 +5,5 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d,
 
 # vfio-user-obj.c
 vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
+vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
+vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
-- 
1.8.3.1


Re: [PATCH RFC server 06/11] vfio-user: handle PCI config space accesses
Posted by John Levon 2 years, 9 months ago
On Mon, Jul 19, 2021 at 04:00:08PM -0400, Jagannathan Raman wrote:

> Define and register handlers for PCI config space accesses
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  hw/remote/vfio-user-obj.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  hw/remote/trace-events    |  2 ++
>  2 files changed, 43 insertions(+)
> 
> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> index 6a2d0f5..60d9fa8 100644
> --- a/hw/remote/vfio-user-obj.c
> +++ b/hw/remote/vfio-user-obj.c
> @@ -36,6 +36,7 @@
>  #include "sysemu/runstate.h"
>  #include "qemu/notify.h"
>  #include "qemu/thread.h"
> +#include "qemu/main-loop.h"
>  #include "qapi/error.h"
>  #include "sysemu/sysemu.h"
>  #include "hw/qdev-core.h"
> @@ -131,6 +132,35 @@ static void *vfu_object_ctx_run(void *opaque)
>      return NULL;
>  }
>  
> +static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
> +                                     size_t count, loff_t offset,
> +                                     const bool is_write)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    uint32_t val = 0;
> +    int i;
> +
> +    qemu_mutex_lock_iothread();
> +
> +    for (i = 0; i < count; i++) {
> +        if (is_write) {
> +            val = *((uint8_t *)(buf + i));
> +            trace_vfu_cfg_write((offset + i), val);
> +            pci_default_write_config(PCI_DEVICE(o->pci_dev),
> +                                     (offset + i), val, 1);
> +        } else {
> +            val = pci_default_read_config(PCI_DEVICE(o->pci_dev),
> +                                          (offset + i), 1);
> +            *((uint8_t *)(buf + i)) = (uint8_t)val;
> +            trace_vfu_cfg_read((offset + i), val);
> +        }
> +    }

Is it always OK to split up the access into single bytes like this?

regards
john

[PATCH RFC server 07/11] vfio-user: handle DMA mappings
Posted by Jagannathan Raman 2 years, 9 months ago
Define and register callbacks to manage the RAM regions used for
device DMA

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 hw/remote/vfio-user-obj.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++
 hw/remote/trace-events    |  2 ++
 2 files changed, 60 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 60d9fa8..d158a7f 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -161,6 +161,57 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
     return count;
 }
 
+static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
+{
+    MemoryRegion *subregion = NULL;
+    g_autofree char *name = NULL;
+    static unsigned int suffix;
+    struct iovec *iov = &info->iova;
+
+    if (!info->vaddr) {
+        return;
+    }
+
+    name = g_strdup_printf("remote-mem-%u", suffix++);
+
+    subregion = g_new0(MemoryRegion, 1);
+
+    qemu_mutex_lock_iothread();
+
+    memory_region_init_ram_ptr(subregion, NULL, name,
+                               iov->iov_len, info->vaddr);
+
+    memory_region_add_subregion(get_system_memory(), (hwaddr)iov->iov_base,
+                                subregion);
+
+    qemu_mutex_unlock_iothread();
+
+    trace_vfu_dma_register((uint64_t)iov->iov_base, iov->iov_len);
+}
+
+static int dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
+{
+    MemoryRegion *mr = NULL;
+    ram_addr_t offset;
+
+    mr = memory_region_from_host(info->vaddr, &offset);
+    if (!mr) {
+        return 0;
+    }
+
+    qemu_mutex_lock_iothread();
+
+    memory_region_del_subregion(get_system_memory(), mr);
+
+    object_unparent((OBJECT(mr)));
+
+    qemu_mutex_unlock_iothread();
+
+    trace_vfu_dma_unregister((uint64_t)info->iova.iov_base);
+
+    return 0;
+}
+
 static void vfu_object_machine_done(Notifier *notifier, void *data)
 {
     VfuObject *o = container_of(notifier, VfuObject, machine_done);
@@ -208,6 +259,13 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
         return;
     }
 
+    ret = vfu_setup_device_dma(o->vfu_ctx, &dma_register, &dma_unregister);
+    if (ret < 0) {
+        error_setg(&error_abort, "vfu: Failed to setup DMA handlers for %s",
+                   o->devid);
+        return;
+    }
+
     qemu_thread_create(&o->vfu_ctx_thread, "VFU ctx runner", vfu_object_ctx_run,
                        o, QEMU_THREAD_JOINABLE);
 }
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 2ef7884..f945c7e 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -7,3 +7,5 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d,
 vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
 vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
 vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
+vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", %zu bytes"
+vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
-- 
1.8.3.1


RE: [PATCH RFC server 07/11] vfio-user: handle DMA mappings
Posted by Thanos Makatos 2 years, 9 months ago
> -----Original Message-----
> From: Jagannathan Raman <jag.raman@oracle.com>
> Sent: 19 July 2021 21:00
> To: qemu-devel@nongnu.org
> Cc: stefanha@redhat.com; alex.williamson@redhat.com;
> elena.ufimtseva@oracle.com; John Levon <john.levon@nutanix.com>;
> john.g.johnson@oracle.com; Thanos Makatos
> <thanos.makatos@nutanix.com>; Swapnil Ingle
> <swapnil.ingle@nutanix.com>; jag.raman@oracle.com
> Subject: [PATCH RFC server 07/11] vfio-user: handle DMA mappings
> 
> Define and register callbacks to manage the RAM regions used for
> device DMA
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  hw/remote/vfio-user-obj.c | 58
> +++++++++++++++++++++++++++++++++++++++++++++++
>  hw/remote/trace-events    |  2 ++
>  2 files changed, 60 insertions(+)
> 
> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> index 60d9fa8..d158a7f 100644
> --- a/hw/remote/vfio-user-obj.c
> +++ b/hw/remote/vfio-user-obj.c
> @@ -161,6 +161,57 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t
> *vfu_ctx, char * const buf,
>      return count;
>  }
> 
> +static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
> +{
> +    MemoryRegion *subregion = NULL;
> +    g_autofree char *name = NULL;
> +    static unsigned int suffix;
> +    struct iovec *iov = &info->iova;
> +
> +    if (!info->vaddr) {
> +        return;
> +    }

This shouldn't happen, you can replace it with an assert if you want.

> +
> +    name = g_strdup_printf("remote-mem-%u", suffix++);
> +
> +    subregion = g_new0(MemoryRegion, 1);
> +
> +    qemu_mutex_lock_iothread();
> +
> +    memory_region_init_ram_ptr(subregion, NULL, name,
> +                               iov->iov_len, info->vaddr);
> +
> +    memory_region_add_subregion(get_system_memory(), (hwaddr)iov-
> >iov_base,
> +                                subregion);
> +
> +    qemu_mutex_unlock_iothread();
> +
> +    trace_vfu_dma_register((uint64_t)iov->iov_base, iov->iov_len);
> +}
> +
> +static int dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
> +{
> +    MemoryRegion *mr = NULL;
> +    ram_addr_t offset;
> +
> +    mr = memory_region_from_host(info->vaddr, &offset);
> +    if (!mr) {

Is this expected? If not then should we at least log something?

> +        return 0;
> +    }
> +
> +    qemu_mutex_lock_iothread();
> +
> +    memory_region_del_subregion(get_system_memory(), mr);
> +
> +    object_unparent((OBJECT(mr)));
> +
> +    qemu_mutex_unlock_iothread();
> +
> +    trace_vfu_dma_unregister((uint64_t)info->iova.iov_base);
> +
> +    return 0;
> +}
> +
>  static void vfu_object_machine_done(Notifier *notifier, void *data)
>  {
>      VfuObject *o = container_of(notifier, VfuObject, machine_done);
> @@ -208,6 +259,13 @@ static void vfu_object_machine_done(Notifier
> *notifier, void *data)
>          return;
>      }
> 
> +    ret = vfu_setup_device_dma(o->vfu_ctx, &dma_register,
> &dma_unregister);
> +    if (ret < 0) {
> +        error_setg(&error_abort, "vfu: Failed to setup DMA handlers for %s",
> +                   o->devid);
> +        return;
> +    }
> +
>      qemu_thread_create(&o->vfu_ctx_thread, "VFU ctx runner",
> vfu_object_ctx_run,
>                         o, QEMU_THREAD_JOINABLE);
>  }
> diff --git a/hw/remote/trace-events b/hw/remote/trace-events
> index 2ef7884..f945c7e 100644
> --- a/hw/remote/trace-events
> +++ b/hw/remote/trace-events
> @@ -7,3 +7,5 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed
> to receive %d size %d,
>  vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
>  vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
>  vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
> +vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA
> 0x%"PRIx64", %zu bytes"
> +vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
> --
> 1.8.3.1

[PATCH RFC server 08/11] vfio-user: handle PCI BAR accesses
Posted by Jagannathan Raman 2 years, 9 months ago
Determine the BARs used by the PCI device and register handlers to
manage the access to the same.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 hw/remote/vfio-user-obj.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++
 hw/remote/trace-events    |  2 +
 2 files changed, 97 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index d158a7f..9853feb 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -212,6 +212,99 @@ static int dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
     return 0;
 }
 
+static ssize_t vfu_object_bar_rw(PCIDevice *pci_dev, hwaddr addr, size_t count,
+                                 char * const buf, const bool is_write,
+                                 uint8_t type)
+{
+    AddressSpace *as = NULL;
+    MemTxResult res;
+
+    if (type == PCI_BASE_ADDRESS_SPACE_MEMORY) {
+        as = pci_device_iommu_address_space(pci_dev);
+    } else {
+        as = &address_space_io;
+    }
+
+    trace_vfu_bar_rw_enter(is_write ? "Write" : "Read", (uint64_t)addr);
+
+    res = address_space_rw(as, addr, MEMTXATTRS_UNSPECIFIED, (void *)buf,
+                           (hwaddr)count, is_write);
+    if (res != MEMTX_OK) {
+        warn_report("vfu: failed to %s 0x%"PRIx64"",
+                    is_write ? "write to" : "read from",
+                    addr);
+        return -1;
+    }
+
+    trace_vfu_bar_rw_exit(is_write ? "Write" : "Read", (uint64_t)addr);
+
+    return count;
+}
+
+/**
+ * VFU_OBJECT_BAR_HANDLER - macro for defining handlers for PCI BARs.
+ *
+ * To create handler for BAR number 2, VFU_OBJECT_BAR_HANDLER(2) would
+ * define vfu_object_bar2_handler
+ */
+#define VFU_OBJECT_BAR_HANDLER(BAR_NO)                                         \
+    static ssize_t vfu_object_bar##BAR_NO##_handler(vfu_ctx_t *vfu_ctx,        \
+                                        char * const buf, size_t count,        \
+                                        loff_t offset, const bool is_write)    \
+    {                                                                          \
+        VfuObject *o = vfu_get_private(vfu_ctx);                               \
+        hwaddr addr = (hwaddr)(pci_get_long(o->pci_dev->config +               \
+                                            PCI_BASE_ADDRESS_0 +               \
+                                            (4 * BAR_NO)) + offset);           \
+                                                                               \
+        return vfu_object_bar_rw(o->pci_dev, addr, count, buf, is_write,       \
+                                 o->pci_dev->io_regions[BAR_NO].type);         \
+    }                                                                          \
+
+VFU_OBJECT_BAR_HANDLER(0)
+VFU_OBJECT_BAR_HANDLER(1)
+VFU_OBJECT_BAR_HANDLER(2)
+VFU_OBJECT_BAR_HANDLER(3)
+VFU_OBJECT_BAR_HANDLER(4)
+VFU_OBJECT_BAR_HANDLER(5)
+
+static vfu_region_access_cb_t *vfu_object_bar_handlers[PCI_NUM_REGIONS] = {
+    &vfu_object_bar0_handler,
+    &vfu_object_bar1_handler,
+    &vfu_object_bar2_handler,
+    &vfu_object_bar3_handler,
+    &vfu_object_bar4_handler,
+    &vfu_object_bar5_handler,
+};
+
+/**
+ * vfu_object_register_bars - Identify active BAR regions of pdev and setup
+ *                            callbacks to handle read/write accesses
+ */
+static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
+{
+    uint32_t orig_val, new_val;
+    int i, size;
+
+    for (i = 0; i < PCI_NUM_REGIONS; i++) {
+        orig_val = pci_default_read_config(pdev,
+                                           PCI_BASE_ADDRESS_0 + (4 * i), 4);
+        new_val = 0xffffffff;
+        pci_default_write_config(pdev,
+                                 PCI_BASE_ADDRESS_0 + (4 * i), new_val, 4);
+        new_val = pci_default_read_config(pdev,
+                                          PCI_BASE_ADDRESS_0 + (4 * i), 4);
+        size = (~(new_val & 0xFFFFFFF0)) + 1;
+        pci_default_write_config(pdev, PCI_BASE_ADDRESS_0 + (4 * i),
+                                 orig_val, 4);
+        if (size) {
+            vfu_setup_region(vfu_ctx, VFU_PCI_DEV_BAR0_REGION_IDX + i, size,
+                             vfu_object_bar_handlers[i], VFU_REGION_FLAG_RW,
+                             NULL, 0, -1, 0);
+        }
+    }
+}
+
 static void vfu_object_machine_done(Notifier *notifier, void *data)
 {
     VfuObject *o = container_of(notifier, VfuObject, machine_done);
@@ -266,6 +359,8 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
         return;
     }
 
+    vfu_object_register_bars(o->vfu_ctx, o->pci_dev);
+
     qemu_thread_create(&o->vfu_ctx_thread, "VFU ctx runner", vfu_object_ctx_run,
                        o, QEMU_THREAD_JOINABLE);
 }
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index f945c7e..f3f65e2 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -9,3 +9,5 @@ vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
 vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
 vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", %zu bytes"
 vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
+vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR address 0x%"PRIx64""
+vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR address 0x%"PRIx64""
-- 
1.8.3.1


[PATCH RFC server 09/11] vfio-user: handle device interrupts
Posted by Jagannathan Raman 2 years, 9 months ago
Forward remote device's interrupts to the guest

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 include/hw/remote/iohub.h |  2 ++
 hw/remote/iohub.c         |  6 ++++++
 hw/remote/vfio-user-obj.c | 30 ++++++++++++++++++++++++++++++
 hw/remote/trace-events    |  1 +
 4 files changed, 39 insertions(+)

diff --git a/include/hw/remote/iohub.h b/include/hw/remote/iohub.h
index 0bf98e0..132f496 100644
--- a/include/hw/remote/iohub.h
+++ b/include/hw/remote/iohub.h
@@ -15,6 +15,7 @@
 #include "qemu/event_notifier.h"
 #include "qemu/thread-posix.h"
 #include "hw/remote/mpqemu-link.h"
+#include "libvfio-user/include/libvfio-user.h"
 
 #define REMOTE_IOHUB_NB_PIRQS    PCI_DEVFN_MAX
 
@@ -30,6 +31,7 @@ typedef struct RemoteIOHubState {
     unsigned int irq_level[REMOTE_IOHUB_NB_PIRQS];
     ResampleToken token[REMOTE_IOHUB_NB_PIRQS];
     QemuMutex irq_level_lock[REMOTE_IOHUB_NB_PIRQS];
+    vfu_ctx_t *vfu_ctx[REMOTE_IOHUB_NB_PIRQS];
 } RemoteIOHubState;
 
 int remote_iohub_map_irq(PCIDevice *pci_dev, int intx);
diff --git a/hw/remote/iohub.c b/hw/remote/iohub.c
index 547d597..241c8d7 100644
--- a/hw/remote/iohub.c
+++ b/hw/remote/iohub.c
@@ -18,6 +18,8 @@
 #include "hw/remote/machine.h"
 #include "hw/remote/iohub.h"
 #include "qemu/main-loop.h"
+#include "libvfio-user/include/libvfio-user.h"
+#include "trace.h"
 
 void remote_iohub_init(RemoteIOHubState *iohub)
 {
@@ -62,6 +64,10 @@ void remote_iohub_set_irq(void *opaque, int pirq, int level)
     QEMU_LOCK_GUARD(&iohub->irq_level_lock[pirq]);
 
     if (level) {
+        if (iohub->vfu_ctx[pirq]) {
+            trace_vfu_interrupt(pirq);
+            vfu_irq_trigger(iohub->vfu_ctx[pirq], 0);
+        }
         if (++iohub->irq_level[pirq] == 1) {
             event_notifier_set(&iohub->irqfds[pirq]);
         }
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 9853feb..d2a2e51 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -41,6 +41,9 @@
 #include "sysemu/sysemu.h"
 #include "hw/qdev-core.h"
 #include "hw/pci/pci.h"
+#include "hw/boards.h"
+#include "hw/remote/iohub.h"
+#include "hw/remote/machine.h"
 
 #include "libvfio-user/include/libvfio-user.h"
 
@@ -305,6 +308,26 @@ static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
     }
 }
 
+static int vfu_object_setup_irqs(vfu_ctx_t *vfu_ctx, PCIDevice *pci_dev)
+{
+    RemoteMachineState *machine = REMOTE_MACHINE(current_machine);
+    RemoteIOHubState *iohub = &machine->iohub;
+    int pirq, intx, ret;
+
+    ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_INTX_IRQ, 1);
+    if (ret < 0) {
+        return ret;
+    }
+
+    intx = pci_get_byte(pci_dev->config + PCI_INTERRUPT_PIN) - 1;
+
+    pirq = remote_iohub_map_irq(pci_dev, intx);
+
+    iohub->vfu_ctx[pirq] = vfu_ctx;
+
+    return 0;
+}
+
 static void vfu_object_machine_done(Notifier *notifier, void *data)
 {
     VfuObject *o = container_of(notifier, VfuObject, machine_done);
@@ -361,6 +384,13 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
 
     vfu_object_register_bars(o->vfu_ctx, o->pci_dev);
 
+    ret = vfu_object_setup_irqs(o->vfu_ctx, o->pci_dev);
+    if (ret < 0) {
+        error_setg(&error_abort, "vfu: Failed to setup interrupts for %s",
+                   o->devid);
+        return;
+    }
+
     qemu_thread_create(&o->vfu_ctx_thread, "VFU ctx runner", vfu_object_ctx_run,
                        o, QEMU_THREAD_JOINABLE);
 }
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index f3f65e2..b419d6f 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -11,3 +11,4 @@ vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", %z
 vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
 vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR address 0x%"PRIx64""
 vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR address 0x%"PRIx64""
+vfu_interrupt(int pirq) "vfu: sending interrupt to device - PIRQ %d"
-- 
1.8.3.1


[PATCH RFC server 10/11] vfio-user: register handlers to facilitate migration
Posted by Jagannathan Raman 2 years, 9 months ago
Store and load the device's state using handlers for live migration

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 migration/savevm.h        |   2 +
 hw/remote/vfio-user-obj.c | 287 ++++++++++++++++++++++++++++++++++++++++++++++
 migration/savevm.c        |  63 ++++++++++
 3 files changed, 352 insertions(+)

diff --git a/migration/savevm.h b/migration/savevm.h
index 6461342..71d1733 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -67,5 +67,7 @@ int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
 int qemu_load_device_state(QEMUFile *f);
 int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
         bool in_postcopy, bool inactivate_disks);
+int qemu_remote_savevm(QEMUFile *f);
+int qemu_remote_loadvm(QEMUFile *f);
 
 #endif
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index d2a2e51..5948576 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -44,6 +44,10 @@
 #include "hw/boards.h"
 #include "hw/remote/iohub.h"
 #include "hw/remote/machine.h"
+#include "migration/qemu-file.h"
+#include "migration/savevm.h"
+#include "migration/global_state.h"
+#include "block/block.h"
 
 #include "libvfio-user/include/libvfio-user.h"
 
@@ -73,6 +77,31 @@ struct VfuObject {
     PCIDevice *pci_dev;
 
     QemuThread vfu_ctx_thread;
+
+    /*
+     * vfu_mig_buf holds the migration data. In the remote process, this
+     * buffer replaces the role of an IO channel which links the source
+     * and the destination.
+     *
+     * Whenever the client QEMU process initiates migration, the libvfio-user
+     * library notifies that to this server. The remote/server QEMU sets up a
+     * QEMUFile object using this buffer as backend. The remote passes this
+     * object to its migration subsystem, and it slirps the VMSDs of all its
+     * devices and stores them in this buffer.
+     *
+     * libvfio-user library subsequetly asks the remote for any data that needs
+     * to be moved over to the destination using its vfu_migration_callbacks_t
+     * APIs. The remote hands over this buffer as data at this time.
+     *
+     * A reverse of this process happens at the destination.
+     */
+    uint8_t *vfu_mig_buf;
+
+    uint64_t vfu_mig_buf_size;
+
+    uint64_t vfu_mig_buf_pending;
+
+    QEMUFile *vfu_mig_file;
 };
 
 static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
@@ -97,6 +126,226 @@ static void vfu_object_set_devid(Object *obj, const char *str, Error **errp)
     trace_vfu_prop("devid", str);
 }
 
+/**
+ * Migration helper functions
+ *
+ * vfu_mig_buf_read & vfu_mig_buf_write are used by QEMU's migration
+ * subsystem - qemu_remote_savevm & qemu_remote_loadvm. savevm/loadvm
+ * call these functions via QEMUFileOps to save/load the VMSD of all
+ * the devices into vfu_mig_buf
+ *
+ */
+static ssize_t vfu_mig_buf_read(void *opaque, uint8_t *buf, int64_t pos,
+                                size_t size, Error **errp)
+{
+    VfuObject *o = opaque;
+
+    if (pos > o->vfu_mig_buf_size) {
+        size = 0;
+    } else if ((pos + size) > o->vfu_mig_buf_size) {
+        size = o->vfu_mig_buf_size;
+    }
+
+    memcpy(buf, (o->vfu_mig_buf + pos), size);
+
+    o->vfu_mig_buf_size -= size;
+
+    return size;
+}
+
+static ssize_t vfu_mig_buf_write(void *opaque, struct iovec *iov, int iovcnt,
+                                 int64_t pos, Error **errp)
+{
+    VfuObject *o = opaque;
+    uint64_t end = pos + iov_size(iov, iovcnt);
+    int i;
+
+    if (end > o->vfu_mig_buf_size) {
+        o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
+    }
+
+    for (i = 0; i < iovcnt; i++) {
+        memcpy((o->vfu_mig_buf + o->vfu_mig_buf_size), iov[i].iov_base,
+               iov[i].iov_len);
+        o->vfu_mig_buf_size += iov[i].iov_len;
+        o->vfu_mig_buf_pending += iov[i].iov_len;
+    }
+
+    return iov_size(iov, iovcnt);
+}
+
+static int vfu_mig_buf_shutdown(void *opaque, bool rd, bool wr, Error **errp)
+{
+    VfuObject *o = opaque;
+
+    o->vfu_mig_buf_size = 0;
+
+    g_free(o->vfu_mig_buf);
+
+    return 0;
+}
+
+static const QEMUFileOps vfu_mig_fops_save = {
+    .writev_buffer  = vfu_mig_buf_write,
+    .shut_down      = vfu_mig_buf_shutdown,
+};
+
+static const QEMUFileOps vfu_mig_fops_load = {
+    .get_buffer     = vfu_mig_buf_read,
+    .shut_down      = vfu_mig_buf_shutdown,
+};
+
+/**
+ * handlers for vfu_migration_callbacks_t
+ *
+ * The libvfio-user library accesses these handlers to drive the migration
+ * at the remote end, and also to transport the data stored in vfu_mig_buf
+ *
+ */
+static void vfu_mig_state_precopy(vfu_ctx_t *vfu_ctx)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    int ret;
+
+    if (!o->vfu_mig_file) {
+        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_save);
+    }
+
+    global_state_store();
+
+    ret = qemu_remote_savevm(o->vfu_mig_file);
+    if (ret) {
+        qemu_file_shutdown(o->vfu_mig_file);
+        return;
+    }
+
+    qemu_fflush(o->vfu_mig_file);
+
+    bdrv_inactivate_all();
+}
+
+static void vfu_mig_state_running(vfu_ctx_t *vfu_ctx)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    Error *local_err = NULL;
+    int ret;
+
+    ret = qemu_remote_loadvm(o->vfu_mig_file);
+    if (ret) {
+        error_setg(&error_abort, "vfu: failed to restore device state");
+        return;
+    }
+
+    bdrv_invalidate_cache_all(&local_err);
+    if (local_err) {
+        error_report_err(local_err);
+        return;
+    }
+
+    vm_start();
+}
+
+static int vfu_mig_transition(vfu_ctx_t *vfu_ctx, vfu_migr_state_t state)
+{
+    switch (state) {
+    case VFU_MIGR_STATE_RESUME:
+    case VFU_MIGR_STATE_STOP_AND_COPY:
+    case VFU_MIGR_STATE_STOP:
+        break;
+    case VFU_MIGR_STATE_PRE_COPY:
+        vfu_mig_state_precopy(vfu_ctx);
+        break;
+    case VFU_MIGR_STATE_RUNNING:
+        if (!runstate_is_running()) {
+            vfu_mig_state_running(vfu_ctx);
+        }
+        break;
+    default:
+        warn_report("vfu: Unknown migration state %d", state);
+    }
+
+    return 0;
+}
+
+static uint64_t vfu_mig_get_pending_bytes(vfu_ctx_t *vfu_ctx)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+
+    return o->vfu_mig_buf_pending;
+}
+
+static int vfu_mig_prepare_data(vfu_ctx_t *vfu_ctx, uint64_t *offset,
+                                uint64_t *size)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+
+    if (offset) {
+        *offset = 0;
+    }
+
+    if (size) {
+        *size = o->vfu_mig_buf_size;
+    }
+
+    return 0;
+}
+
+static ssize_t vfu_mig_read_data(vfu_ctx_t *vfu_ctx, void *buf,
+                                 uint64_t size, uint64_t offset)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+
+    if (offset > o->vfu_mig_buf_size) {
+        return -1;
+    }
+
+    if ((offset + size) > o->vfu_mig_buf_size) {
+        warn_report("vfu: buffer overflow - check pending_bytes");
+        size = o->vfu_mig_buf_size - offset;
+    }
+
+    memcpy(buf, (o->vfu_mig_buf + offset), size);
+
+    o->vfu_mig_buf_pending -= size;
+
+    return size;
+}
+
+static ssize_t vfu_mig_write_data(vfu_ctx_t *vfu_ctx, void *data,
+                                  uint64_t size, uint64_t offset)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    uint64_t end = offset + size;
+
+    if (end > o->vfu_mig_buf_size) {
+        o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
+        o->vfu_mig_buf_size = end;
+    }
+
+    memcpy((o->vfu_mig_buf + offset), data, size);
+
+    if (!o->vfu_mig_file) {
+        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_load);
+    }
+
+    return size;
+}
+
+static int vfu_mig_data_written(vfu_ctx_t *vfu_ctx, uint64_t count)
+{
+    return 0;
+}
+
+static const vfu_migration_callbacks_t vfu_mig_cbs = {
+    .version = VFU_MIGR_CALLBACKS_VERS,
+    .transition = &vfu_mig_transition,
+    .get_pending_bytes = &vfu_mig_get_pending_bytes,
+    .prepare_data = &vfu_mig_prepare_data,
+    .read_data = &vfu_mig_read_data,
+    .data_written = &vfu_mig_data_written,
+    .write_data = &vfu_mig_write_data,
+};
+
 static void *vfu_object_ctx_run(void *opaque)
 {
     VfuObject *o = opaque;
@@ -332,6 +581,7 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
 {
     VfuObject *o = container_of(notifier, VfuObject, machine_done);
     DeviceState *dev = NULL;
+    size_t migr_area_size;
     int ret;
 
     o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket, 0,
@@ -391,6 +641,35 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
         return;
     }
 
+    /*
+     * TODO: The 0x20000 number used below is a temporary. We are working on
+     *     a cleaner fix for this.
+     *
+     *     The libvfio-user library assumes that the remote knows the size of
+     *     the data to be migrated at boot time, but that is not the case with
+     *     VMSDs, as it can contain a variable-size buffer. 0x20000 is used
+     *     as a sufficiently large buffer to demonstrate migration, but that
+     *     cannot be used as a solution.
+     *
+     */
+    ret = vfu_setup_region(o->vfu_ctx, VFU_PCI_DEV_MIGR_REGION_IDX,
+                           0x20000, NULL,
+                           VFU_REGION_FLAG_RW, NULL, 0, -1, 0);
+    if (ret < 0) {
+        error_setg(&error_abort, "vfu: Failed to register migration BAR %s- %s",
+                   o->devid, strerror(errno));
+        return;
+    }
+
+    migr_area_size = vfu_get_migr_register_area_size();
+    ret = vfu_setup_device_migration_callbacks(o->vfu_ctx, &vfu_mig_cbs,
+                                               migr_area_size);
+    if (ret < 0) {
+        error_setg(&error_abort, "vfu: Failed to setup migration %s- %s",
+                   o->devid, strerror(errno));
+        return;
+    }
+
     qemu_thread_create(&o->vfu_ctx_thread, "VFU ctx runner", vfu_object_ctx_run,
                        o, QEMU_THREAD_JOINABLE);
 }
@@ -412,6 +691,14 @@ static void vfu_object_init(Object *obj)
 
     o->machine_done.notify = vfu_object_machine_done;
     qemu_add_machine_init_done_notifier(&o->machine_done);
+
+    o->vfu_mig_file = NULL;
+
+    o->vfu_mig_buf = NULL;
+
+    o->vfu_mig_buf_size = 0;
+
+    o->vfu_mig_buf_pending = 0;
 }
 
 static void vfu_object_finalize(Object *obj)
diff --git a/migration/savevm.c b/migration/savevm.c
index 72848b9..c2279af 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1603,6 +1603,33 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
     return ret;
 }
 
+int qemu_remote_savevm(QEMUFile *f)
+{
+    SaveStateEntry *se;
+    int ret;
+
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (!se->vmsd || !vmstate_save_needed(se->vmsd, se->opaque)) {
+            continue;
+        }
+
+        save_section_header(f, se, QEMU_VM_SECTION_FULL);
+
+        ret = vmstate_save(f, se, NULL);
+        if (ret) {
+            qemu_file_set_error(f, ret);
+            return ret;
+        }
+
+        save_section_footer(f, se);
+    }
+
+    qemu_put_byte(f, QEMU_VM_EOF);
+    qemu_fflush(f);
+
+    return 0;
+}
+
 void qemu_savevm_live_state(QEMUFile *f)
 {
     /* save QEMU_VM_SECTION_END section */
@@ -2443,6 +2470,42 @@ qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
     return 0;
 }
 
+int qemu_remote_loadvm(QEMUFile *f)
+{
+    uint8_t section_type;
+    int ret = 0;
+
+    qemu_mutex_lock_iothread();
+
+    while (true) {
+        section_type = qemu_get_byte(f);
+
+        if (qemu_file_get_error(f)) {
+            ret = qemu_file_get_error(f);
+            break;
+        }
+
+        switch (section_type) {
+        case QEMU_VM_SECTION_FULL:
+            ret = qemu_loadvm_section_start_full(f, NULL);
+            if (ret < 0) {
+                break;
+            }
+            break;
+        case QEMU_VM_EOF:
+            goto out;
+        default:
+            ret = -EINVAL;
+            goto out;
+        }
+    }
+
+out:
+    qemu_mutex_unlock_iothread();
+
+    return ret;
+}
+
 static int
 qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
 {
-- 
1.8.3.1


RE: [PATCH RFC server 10/11] vfio-user: register handlers to facilitate migration
Posted by Thanos Makatos 2 years, 9 months ago
> -----Original Message-----
> From: Jagannathan Raman <jag.raman@oracle.com>
> Sent: 19 July 2021 21:00
> To: qemu-devel@nongnu.org
> Cc: stefanha@redhat.com; alex.williamson@redhat.com;
> elena.ufimtseva@oracle.com; John Levon <john.levon@nutanix.com>;
> john.g.johnson@oracle.com; Thanos Makatos
> <thanos.makatos@nutanix.com>; Swapnil Ingle
> <swapnil.ingle@nutanix.com>; jag.raman@oracle.com
> Subject: [PATCH RFC server 10/11] vfio-user: register handlers to facilitate
> migration
> 
> Store and load the device's state using handlers for live migration
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  migration/savevm.h        |   2 +
>  hw/remote/vfio-user-obj.c | 287
> ++++++++++++++++++++++++++++++++++++++++++++++
>  migration/savevm.c        |  63 ++++++++++
>  3 files changed, 352 insertions(+)
> 
> diff --git a/migration/savevm.h b/migration/savevm.h
> index 6461342..71d1733 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -67,5 +67,7 @@ int qemu_loadvm_state_main(QEMUFile *f,
> MigrationIncomingState *mis);
>  int qemu_load_device_state(QEMUFile *f);
>  int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
>          bool in_postcopy, bool inactivate_disks);
> +int qemu_remote_savevm(QEMUFile *f);
> +int qemu_remote_loadvm(QEMUFile *f);
> 
>  #endif
> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> index d2a2e51..5948576 100644
> --- a/hw/remote/vfio-user-obj.c
> +++ b/hw/remote/vfio-user-obj.c
> @@ -44,6 +44,10 @@
>  #include "hw/boards.h"
>  #include "hw/remote/iohub.h"
>  #include "hw/remote/machine.h"
> +#include "migration/qemu-file.h"
> +#include "migration/savevm.h"
> +#include "migration/global_state.h"
> +#include "block/block.h"
> 
>  #include "libvfio-user/include/libvfio-user.h"
> 
> @@ -73,6 +77,31 @@ struct VfuObject {
>      PCIDevice *pci_dev;
> 
>      QemuThread vfu_ctx_thread;
> +
> +    /*
> +     * vfu_mig_buf holds the migration data. In the remote process, this
> +     * buffer replaces the role of an IO channel which links the source
> +     * and the destination.
> +     *
> +     * Whenever the client QEMU process initiates migration, the libvfio-user
> +     * library notifies that to this server. The remote/server QEMU sets up a
> +     * QEMUFile object using this buffer as backend. The remote passes this

Can we use remote/server more consistently? E.g. "remote process" or "server" instead of just "remote"? (makes me think of git remotes :D)

> +     * object to its migration subsystem, and it slirps the VMSDs of all its

By "slirps" do you mean transfer from the client to the server over the SLiRP network?

> +     * devices and stores them in this buffer.

Isn't this a per-device object? If so, then why do we store the VMSDs of *all* the devices in a single device's buffer? I think I'm missing something here.

> +     *
> +     * libvfio-user library subsequetly asks the remote for any data that needs
> +     * to be moved over to the destination using its vfu_migration_callbacks_t

It's not obvious to me, is this the libvfio-user library running at the server?

> +     * APIs. The remote hands over this buffer as data at this time.

Hands over the buffer to whom?

> +     *
> +     * A reverse of this process happens at the destination.
> +     */
> +    uint8_t *vfu_mig_buf;

Does the above description refer to a typical use case of the VFIO migration protocol where data is copied in an iterative manner (implemented in libvfio-user by the migration callbacks)? Is this what you're documenting here?

> +
> +    uint64_t vfu_mig_buf_size;
> +
> +    uint64_t vfu_mig_buf_pending;
> +
> +    QEMUFile *vfu_mig_file;
>  };
> 
>  static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
> @@ -97,6 +126,226 @@ static void vfu_object_set_devid(Object *obj, const
> char *str, Error **errp)
>      trace_vfu_prop("devid", str);
>  }
> 
> +/**
> + * Migration helper functions
> + *
> + * vfu_mig_buf_read & vfu_mig_buf_write are used by QEMU's migration
> + * subsystem - qemu_remote_savevm & qemu_remote_loadvm.

vfu_mig_buf_read is used by qemu_remote_loadvm and vfu_mig_buf_write is used by qemu_remote_savevm, right? The order they're written suggests the opposite.

> savevm/loadvm
> + * call these functions via QEMUFileOps to save/load the VMSD of all
> + * the devices into vfu_mig_buf
> + *
> + */
> +static ssize_t vfu_mig_buf_read(void *opaque, uint8_t *buf, int64_t pos,
> +                                size_t size, Error **errp)
> +{
> +    VfuObject *o = opaque;
> +
> +    if (pos > o->vfu_mig_buf_size) {
> +        size = 0;
> +    } else if ((pos + size) > o->vfu_mig_buf_size) {
> +        size = o->vfu_mig_buf_size;
> +    }
> +
> +    memcpy(buf, (o->vfu_mig_buf + pos), size);
> +
> +    o->vfu_mig_buf_size -= size;
> +
> +    return size;
> +}
> +
> +static ssize_t vfu_mig_buf_write(void *opaque, struct iovec *iov, int iovcnt,
> +                                 int64_t pos, Error **errp)
> +{
> +    VfuObject *o = opaque;
> +    uint64_t end = pos + iov_size(iov, iovcnt);
> +    int i;
> +
> +    if (end > o->vfu_mig_buf_size) {
> +        o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
> +    }
> +
> +    for (i = 0; i < iovcnt; i++) {
> +        memcpy((o->vfu_mig_buf + o->vfu_mig_buf_size), iov[i].iov_base,
> +               iov[i].iov_len);
> +        o->vfu_mig_buf_size += iov[i].iov_len;
> +        o->vfu_mig_buf_pending += iov[i].iov_len;
> +    }
> +
> +    return iov_size(iov, iovcnt);
> +}
> +
> +static int vfu_mig_buf_shutdown(void *opaque, bool rd, bool wr, Error
> **errp)
> +{
> +    VfuObject *o = opaque;
> +
> +    o->vfu_mig_buf_size = 0;
> +
> +    g_free(o->vfu_mig_buf);
> +
> +    return 0;
> +}
> +
> +static const QEMUFileOps vfu_mig_fops_save = {
> +    .writev_buffer  = vfu_mig_buf_write,
> +    .shut_down      = vfu_mig_buf_shutdown,
> +};
> +
> +static const QEMUFileOps vfu_mig_fops_load = {
> +    .get_buffer     = vfu_mig_buf_read,
> +    .shut_down      = vfu_mig_buf_shutdown,
> +};
> +
> +/**
> + * handlers for vfu_migration_callbacks_t
> + *
> + * The libvfio-user library accesses these handlers to drive the migration
> + * at the remote end, and also to transport the data stored in vfu_mig_buf
> + *
> + */
> +static void vfu_mig_state_precopy(vfu_ctx_t *vfu_ctx)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    int ret;
> +
> +    if (!o->vfu_mig_file) {
> +        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_save);
> +    }
> +
> +    global_state_store();
> +
> +    ret = qemu_remote_savevm(o->vfu_mig_file);
> +    if (ret) {
> +        qemu_file_shutdown(o->vfu_mig_file);
> +        return;
> +    }
> +
> +    qemu_fflush(o->vfu_mig_file);
> +
> +    bdrv_inactivate_all();
> +}
> +
> +static void vfu_mig_state_running(vfu_ctx_t *vfu_ctx)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    Error *local_err = NULL;
> +    int ret;
> +
> +    ret = qemu_remote_loadvm(o->vfu_mig_file);
> +    if (ret) {
> +        error_setg(&error_abort, "vfu: failed to restore device state");
> +        return;
> +    }
> +
> +    bdrv_invalidate_cache_all(&local_err);
> +    if (local_err) {
> +        error_report_err(local_err);
> +        return;
> +    }
> +
> +    vm_start();
> +}
> +
> +static int vfu_mig_transition(vfu_ctx_t *vfu_ctx, vfu_migr_state_t state)
> +{
> +    switch (state) {
> +    case VFU_MIGR_STATE_RESUME:
> +    case VFU_MIGR_STATE_STOP_AND_COPY:
> +    case VFU_MIGR_STATE_STOP:
> +        break;

Can you explain why we don't have to do anything in the above cases?

> +    case VFU_MIGR_STATE_PRE_COPY:
> +        vfu_mig_state_precopy(vfu_ctx);
> +        break;
> +    case VFU_MIGR_STATE_RUNNING:
> +        if (!runstate_is_running()) {
> +            vfu_mig_state_running(vfu_ctx);
> +        }
> +        break;
> +    default:
> +        warn_report("vfu: Unknown migration state %d", state);
> +    }
> +
> +    return 0;
> +}
> +
> +static uint64_t vfu_mig_get_pending_bytes(vfu_ctx_t *vfu_ctx)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +
> +    return o->vfu_mig_buf_pending;
> +}
> +
> +static int vfu_mig_prepare_data(vfu_ctx_t *vfu_ctx, uint64_t *offset,
> +                                uint64_t *size)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +
> +    if (offset) {
> +        *offset = 0;
> +    }
> +
> +    if (size) {
> +        *size = o->vfu_mig_buf_size;
> +    }
> +
> +    return 0;
> +}
> +
> +static ssize_t vfu_mig_read_data(vfu_ctx_t *vfu_ctx, void *buf,
> +                                 uint64_t size, uint64_t offset)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +
> +    if (offset > o->vfu_mig_buf_size) {
> +        return -1;
> +    }
> +
> +    if ((offset + size) > o->vfu_mig_buf_size) {
> +        warn_report("vfu: buffer overflow - check pending_bytes");
> +        size = o->vfu_mig_buf_size - offset;
> +    }
> +
> +    memcpy(buf, (o->vfu_mig_buf + offset), size);
> +
> +    o->vfu_mig_buf_pending -= size;
> +
> +    return size;
> +}
> +
> +static ssize_t vfu_mig_write_data(vfu_ctx_t *vfu_ctx, void *data,
> +                                  uint64_t size, uint64_t offset)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    uint64_t end = offset + size;
> +
> +    if (end > o->vfu_mig_buf_size) {
> +        o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
> +        o->vfu_mig_buf_size = end;
> +    }
> +
> +    memcpy((o->vfu_mig_buf + offset), data, size);
> +
> +    if (!o->vfu_mig_file) {
> +        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_load);
> +    }
> +
> +    return size;
> +}
> +
> +static int vfu_mig_data_written(vfu_ctx_t *vfu_ctx, uint64_t count)
> +{
> +    return 0;
> +}
> +
> +static const vfu_migration_callbacks_t vfu_mig_cbs = {
> +    .version = VFU_MIGR_CALLBACKS_VERS,
> +    .transition = &vfu_mig_transition,
> +    .get_pending_bytes = &vfu_mig_get_pending_bytes,
> +    .prepare_data = &vfu_mig_prepare_data,
> +    .read_data = &vfu_mig_read_data,
> +    .data_written = &vfu_mig_data_written,
> +    .write_data = &vfu_mig_write_data,
> +};
> +
>  static void *vfu_object_ctx_run(void *opaque)
>  {
>      VfuObject *o = opaque;
> @@ -332,6 +581,7 @@ static void vfu_object_machine_done(Notifier
> *notifier, void *data)
>  {
>      VfuObject *o = container_of(notifier, VfuObject, machine_done);
>      DeviceState *dev = NULL;
> +    size_t migr_area_size;
>      int ret;
> 
>      o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket, 0,
> @@ -391,6 +641,35 @@ static void vfu_object_machine_done(Notifier
> *notifier, void *data)
>          return;
>      }
> 
> +    /*
> +     * TODO: The 0x20000 number used below is a temporary. We are
> working on
> +     *     a cleaner fix for this.
> +     *
> +     *     The libvfio-user library assumes that the remote knows the size of
> +     *     the data to be migrated at boot time, but that is not the case with
> +     *     VMSDs, as it can contain a variable-size buffer. 0x20000 is used
> +     *     as a sufficiently large buffer to demonstrate migration, but that
> +     *     cannot be used as a solution.
> +     *
> +     */

The size of the migration region dictates the amount of migration data that can be produced/consumed in one-go, it's not necessarily the total size of the migration data produced/consumed throughout the migration operation.

> +    ret = vfu_setup_region(o->vfu_ctx, VFU_PCI_DEV_MIGR_REGION_IDX,
> +                           0x20000, NULL,
> +                           VFU_REGION_FLAG_RW, NULL, 0, -1, 0);
> +    if (ret < 0) {
> +        error_setg(&error_abort, "vfu: Failed to register migration BAR %s- %s",
> +                   o->devid, strerror(errno));
> +        return;
> +    }
> +
> +    migr_area_size = vfu_get_migr_register_area_size();
> +    ret = vfu_setup_device_migration_callbacks(o->vfu_ctx, &vfu_mig_cbs,
> +                                               migr_area_size);
> +    if (ret < 0) {
> +        error_setg(&error_abort, "vfu: Failed to setup migration %s- %s",
> +                   o->devid, strerror(errno));
> +        return;
> +    }
> +
>      qemu_thread_create(&o->vfu_ctx_thread, "VFU ctx runner",
> vfu_object_ctx_run,
>                         o, QEMU_THREAD_JOINABLE);
>  }
> @@ -412,6 +691,14 @@ static void vfu_object_init(Object *obj)
> 
>      o->machine_done.notify = vfu_object_machine_done;
>      qemu_add_machine_init_done_notifier(&o->machine_done);
> +
> +    o->vfu_mig_file = NULL;
> +
> +    o->vfu_mig_buf = NULL;
> +
> +    o->vfu_mig_buf_size = 0;
> +
> +    o->vfu_mig_buf_pending = 0;
>  }
> 
>  static void vfu_object_finalize(Object *obj)
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 72848b9..c2279af 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1603,6 +1603,33 @@ static int qemu_savevm_state(QEMUFile *f, Error
> **errp)
>      return ret;
>  }
> 
> +int qemu_remote_savevm(QEMUFile *f)
> +{
> +    SaveStateEntry *se;
> +    int ret;
> +
> +    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> +        if (!se->vmsd || !vmstate_save_needed(se->vmsd, se->opaque)) {
> +            continue;
> +        }
> +
> +        save_section_header(f, se, QEMU_VM_SECTION_FULL);
> +
> +        ret = vmstate_save(f, se, NULL);
> +        if (ret) {
> +            qemu_file_set_error(f, ret);
> +            return ret;
> +        }
> +
> +        save_section_footer(f, se);
> +    }
> +
> +    qemu_put_byte(f, QEMU_VM_EOF);
> +    qemu_fflush(f);
> +
> +    return 0;
> +}
> +
>  void qemu_savevm_live_state(QEMUFile *f)
>  {
>      /* save QEMU_VM_SECTION_END section */
> @@ -2443,6 +2470,42 @@ qemu_loadvm_section_start_full(QEMUFile *f,
> MigrationIncomingState *mis)
>      return 0;
>  }
> 
> +int qemu_remote_loadvm(QEMUFile *f)
> +{
> +    uint8_t section_type;
> +    int ret = 0;
> +
> +    qemu_mutex_lock_iothread();
> +
> +    while (true) {
> +        section_type = qemu_get_byte(f);
> +
> +        if (qemu_file_get_error(f)) {
> +            ret = qemu_file_get_error(f);
> +            break;
> +        }
> +
> +        switch (section_type) {
> +        case QEMU_VM_SECTION_FULL:
> +            ret = qemu_loadvm_section_start_full(f, NULL);
> +            if (ret < 0) {
> +                break;
> +            }
> +            break;
> +        case QEMU_VM_EOF:
> +            goto out;
> +        default:
> +            ret = -EINVAL;
> +            goto out;
> +        }
> +    }
> +
> +out:
> +    qemu_mutex_unlock_iothread();
> +
> +    return ret;
> +}
> +
>  static int
>  qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState
> *mis)
>  {
> --
> 1.8.3.1

My background in implementing device migration with libvfio-user is for a specific device, it seems to me that you're using this functionality differently? Maybe that's why I'm getting confused. If this is the case, could you explain in more detail how you're using libvfio-user here?
[PATCH RFC server 11/11] vfio-user: acceptance test
Posted by Jagannathan Raman 2 years, 9 months ago
Acceptance test for libvfio-user in QEMU

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 MAINTAINERS                   |  1 +
 tests/acceptance/vfio-user.py | 94 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 95 insertions(+)
 create mode 100644 tests/acceptance/vfio-user.py

diff --git a/MAINTAINERS b/MAINTAINERS
index 46ab6b6..644bd35 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3381,6 +3381,7 @@ F: include/hw/remote/proxy-memory-listener.h
 F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
 F: hw/remote/vfio-user-obj.c
+F: tests/acceptance/vfio-user.py
 
 EBPF:
 M: Jason Wang <jasowang@redhat.com>
diff --git a/tests/acceptance/vfio-user.py b/tests/acceptance/vfio-user.py
new file mode 100644
index 0000000..ef318d9
--- /dev/null
+++ b/tests/acceptance/vfio-user.py
@@ -0,0 +1,94 @@
+# vfio-user protocol sanity test
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+
+import os
+import socket
+import uuid
+
+from avocado_qemu import Test
+from avocado_qemu import wait_for_console_pattern
+from avocado_qemu import exec_command
+from avocado_qemu import exec_command_and_wait_for_pattern
+
+class VfioUser(Test):
+    """
+    :avocado: tags=vfiouser
+    """
+    KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
+
+    def do_test(self, kernel_url, initrd_url, kernel_command_line,
+                machine_type):
+        """Main test method"""
+        self.require_accelerator('kvm')
+
+        kernel_path = self.fetch_asset(kernel_url)
+        initrd_path = self.fetch_asset(initrd_url)
+
+        socket = os.path.join('/tmp', str(uuid.uuid4()))
+        if os.path.exists(socket):
+            os.remove(socket)
+
+        # Create remote process
+        remote_vm = self.get_vm()
+        remote_vm.add_args('-machine', 'x-remote')
+        remote_vm.add_args('-nodefaults')
+        remote_vm.add_args('-device', 'lsi53c895a,id=lsi1')
+        remote_vm.add_args('-object', 'vfio-user,id=vfioobj1,'
+                           'devid=lsi1,socket='+socket)
+        remote_vm.launch()
+
+        # Create proxy process
+        self.vm.set_console()
+        self.vm.add_args('-machine', machine_type)
+        self.vm.add_args('-accel', 'kvm')
+        self.vm.add_args('-cpu', 'host')
+        self.vm.add_args('-object',
+                         'memory-backend-memfd,id=sysmem-file,size=2G')
+        self.vm.add_args('--numa', 'node,memdev=sysmem-file')
+        self.vm.add_args('-m', '2048')
+        self.vm.add_args('-kernel', kernel_path,
+                         '-initrd', initrd_path,
+                         '-append', kernel_command_line)
+        self.vm.add_args('-device',
+                         'vfio-user-pci,'
+                         'socket='+socket)
+        self.vm.launch()
+        wait_for_console_pattern(self, 'as init process',
+                                 'Kernel panic - not syncing')
+        exec_command(self, 'mount -t sysfs sysfs /sys')
+        exec_command_and_wait_for_pattern(self,
+                                          'cat /sys/bus/pci/devices/*/uevent',
+                                          'PCI_ID=1000:0012')
+
+    def test_multiprocess_x86_64(self):
+        """
+        :avocado: tags=arch:x86_64
+        """
+        kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/x86_64/os/images'
+                      '/pxeboot/vmlinuz')
+        initrd_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/x86_64/os/images'
+                      '/pxeboot/initrd.img')
+        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+                               'console=ttyS0 rdinit=/bin/bash')
+        machine_type = 'pc'
+        self.do_test(kernel_url, initrd_url, kernel_command_line, machine_type)
+
+    def test_multiprocess_aarch64(self):
+        """
+        :avocado: tags=arch:aarch64
+        """
+        kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/aarch64/os/images'
+                      '/pxeboot/vmlinuz')
+        initrd_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/aarch64/os/images'
+                      '/pxeboot/initrd.img')
+        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+                               'rdinit=/bin/bash console=ttyAMA0')
+        machine_type = 'virt,gic-version=3'
+        self.do_test(kernel_url, initrd_url, kernel_command_line, machine_type)
-- 
1.8.3.1


RE: [PATCH RFC server 11/11] vfio-user: acceptance test
Posted by Thanos Makatos 2 years, 9 months ago

> -----Original Message-----
> From: Jagannathan Raman <jag.raman@oracle.com>
> Sent: 19 July 2021 21:00
> To: qemu-devel@nongnu.org
> Cc: stefanha@redhat.com; alex.williamson@redhat.com;
> elena.ufimtseva@oracle.com; John Levon <john.levon@nutanix.com>;
> john.g.johnson@oracle.com; Thanos Makatos
> <thanos.makatos@nutanix.com>; Swapnil Ingle
> <swapnil.ingle@nutanix.com>; jag.raman@oracle.com
> Subject: [PATCH RFC server 11/11] vfio-user: acceptance test
> 
> Acceptance test for libvfio-user in QEMU
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  MAINTAINERS                   |  1 +
>  tests/acceptance/vfio-user.py | 94
> +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 95 insertions(+)
>  create mode 100644 tests/acceptance/vfio-user.py
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 46ab6b6..644bd35 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3381,6 +3381,7 @@ F: include/hw/remote/proxy-memory-listener.h
>  F: hw/remote/iohub.c
>  F: include/hw/remote/iohub.h
>  F: hw/remote/vfio-user-obj.c
> +F: tests/acceptance/vfio-user.py
> 
>  EBPF:
>  M: Jason Wang <jasowang@redhat.com>
> diff --git a/tests/acceptance/vfio-user.py b/tests/acceptance/vfio-user.py
> new file mode 100644
> index 0000000..ef318d9
> --- /dev/null
> +++ b/tests/acceptance/vfio-user.py
> @@ -0,0 +1,94 @@
> +# vfio-user protocol sanity test
> +#
> +# This work is licensed under the terms of the GNU GPL, version 2 or
> +# later.  See the COPYING file in the top-level directory.
> +
> +
> +import os
> +import socket
> +import uuid
> +
> +from avocado_qemu import Test
> +from avocado_qemu import wait_for_console_pattern
> +from avocado_qemu import exec_command
> +from avocado_qemu import exec_command_and_wait_for_pattern
> +
> +class VfioUser(Test):
> +    """
> +    :avocado: tags=vfiouser
> +    """
> +    KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
> +
> +    def do_test(self, kernel_url, initrd_url, kernel_command_line,
> +                machine_type):
> +        """Main test method"""
> +        self.require_accelerator('kvm')
> +
> +        kernel_path = self.fetch_asset(kernel_url)
> +        initrd_path = self.fetch_asset(initrd_url)
> +
> +        socket = os.path.join('/tmp', str(uuid.uuid4()))
> +        if os.path.exists(socket):
> +            os.remove(socket)
> +
> +        # Create remote process
> +        remote_vm = self.get_vm()
> +        remote_vm.add_args('-machine', 'x-remote')
> +        remote_vm.add_args('-nodefaults')
> +        remote_vm.add_args('-device', 'lsi53c895a,id=lsi1')

IIUC the LSI controller will now be a migratable device and migration will be handled by vfu_mig_transition() introduced in your "vfio-user: register handlers to facilitate migration" patch. In vfu_mig_transition(), you don’t copy migration data in the VFU_MIGR_STATE_STOP_AND_COPY case but only in VFU_MIGR_STATE_PRE_COPY, however I believe that in VFIO it's possible to jump from the running state straight to the stop-and-copy state. Are you relying on QEMU not doing this?

> +        remote_vm.add_args('-object', 'vfio-user,id=vfioobj1,'
> +                           'devid=lsi1,socket='+socket)
> +        remote_vm.launch()
> +
> +        # Create proxy process
> +        self.vm.set_console()
> +        self.vm.add_args('-machine', machine_type)
> +        self.vm.add_args('-accel', 'kvm')
> +        self.vm.add_args('-cpu', 'host')
> +        self.vm.add_args('-object',
> +                         'memory-backend-memfd,id=sysmem-file,size=2G')
> +        self.vm.add_args('--numa', 'node,memdev=sysmem-file')
> +        self.vm.add_args('-m', '2048')
> +        self.vm.add_args('-kernel', kernel_path,
> +                         '-initrd', initrd_path,
> +                         '-append', kernel_command_line)
> +        self.vm.add_args('-device',
> +                         'vfio-user-pci,'
> +                         'socket='+socket)
> +        self.vm.launch()
> +        wait_for_console_pattern(self, 'as init process',
> +                                 'Kernel panic - not syncing')
> +        exec_command(self, 'mount -t sysfs sysfs /sys')
> +        exec_command_and_wait_for_pattern(self,
> +                                          'cat /sys/bus/pci/devices/*/uevent',
> +                                          'PCI_ID=1000:0012')
> +
> +    def test_multiprocess_x86_64(self):
> +        """
> +        :avocado: tags=arch:x86_64
> +        """
> +        kernel_url = ('https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__archives.fedoraproject.org_pub_archive_fedora&d=DwIBAg&c=s883G
> pUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw6ogtti46atk736SI4vgsJiUKIyDE&m=
> 4nAZXeA9xd82TON6H7CXF5LVa0jKBAJkyu0Y-
> curSd4&s=hP6IktdmIVlw3gMuZlWRkPvFq9OzjUji6sb_28sapwk&e= '
> +                      '/linux/releases/31/Everything/x86_64/os/images'
> +                      '/pxeboot/vmlinuz')
> +        initrd_url = ('https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__archives.fedoraproject.org_pub_archive_fedora&d=DwIBAg&c=s883G
> pUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw6ogtti46atk736SI4vgsJiUKIyDE&m=
> 4nAZXeA9xd82TON6H7CXF5LVa0jKBAJkyu0Y-
> curSd4&s=hP6IktdmIVlw3gMuZlWRkPvFq9OzjUji6sb_28sapwk&e= '
> +                      '/linux/releases/31/Everything/x86_64/os/images'
> +                      '/pxeboot/initrd.img')
> +        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
> +                               'console=ttyS0 rdinit=/bin/bash')
> +        machine_type = 'pc'
> +        self.do_test(kernel_url, initrd_url, kernel_command_line,
> machine_type)
> +
> +    def test_multiprocess_aarch64(self):
> +        """
> +        :avocado: tags=arch:aarch64
> +        """
> +        kernel_url = ('https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__archives.fedoraproject.org_pub_archive_fedora&d=DwIBAg&c=s883G
> pUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw6ogtti46atk736SI4vgsJiUKIyDE&m=
> 4nAZXeA9xd82TON6H7CXF5LVa0jKBAJkyu0Y-
> curSd4&s=hP6IktdmIVlw3gMuZlWRkPvFq9OzjUji6sb_28sapwk&e= '
> +                      '/linux/releases/31/Everything/aarch64/os/images'
> +                      '/pxeboot/vmlinuz')
> +        initrd_url = ('https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__archives.fedoraproject.org_pub_archive_fedora&d=DwIBAg&c=s883G
> pUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw6ogtti46atk736SI4vgsJiUKIyDE&m=
> 4nAZXeA9xd82TON6H7CXF5LVa0jKBAJkyu0Y-
> curSd4&s=hP6IktdmIVlw3gMuZlWRkPvFq9OzjUji6sb_28sapwk&e= '
> +                      '/linux/releases/31/Everything/aarch64/os/images'
> +                      '/pxeboot/initrd.img')
> +        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
> +                               'rdinit=/bin/bash console=ttyAMA0')
> +        machine_type = 'virt,gic-version=3'
> +        self.do_test(kernel_url, initrd_url, kernel_command_line,
> machine_type)
> --
> 1.8.3.1