[PATCH RFC v2 00/16] vfio-user implementation

Elena Ufimtseva posted 16 patches 2 years, 8 months ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/cover.1629131628.git.elena.ufimtseva@oracle.com
Maintainers: Thanos Makatos <thanos.makatos@nutanix.com>, Jagannathan Raman <jag.raman@oracle.com>, John G Johnson <john.g.johnson@oracle.com>, Elena Ufimtseva <elena.ufimtseva@oracle.com>, Alex Williamson <alex.williamson@redhat.com>
docs/devel/index.rst          |    1 +
docs/devel/vfio-user.rst      | 1809 +++++++++++++++++++++++++++++++++
hw/vfio/pci.h                 |   25 +-
hw/vfio/user-protocol.h       |  210 ++++
hw/vfio/user.h                |   95 ++
include/hw/vfio/vfio-common.h |    9 +
hw/vfio/common.c              |  296 +++++-
hw/vfio/migration.c           |   34 +-
hw/vfio/pci.c                 |  571 +++++++++--
hw/vfio/user.c                | 1104 ++++++++++++++++++++
MAINTAINERS                   |   11 +
hw/vfio/meson.build           |    1 +
12 files changed, 4062 insertions(+), 104 deletions(-)
create mode 100644 docs/devel/vfio-user.rst
create mode 100644 hw/vfio/user-protocol.h
create mode 100644 hw/vfio/user.h
create mode 100644 hw/vfio/user.c
[PATCH RFC v2 00/16] vfio-user implementation
Posted by Elena Ufimtseva 2 years, 8 months ago
Hi

This is v2 of the RFC patches for vfio-user multi-process QEMU project[1].

Thank you for the review of v1 of the RFC patches.

vfio-user is a protocol that allows a device to be emulated in a separate
process outside of QEMU. It encapsulates the messages sent from QEMU to the
kernel VFIO driver, and sends them to a remote process over a UNIX socket.

The vfio-user framework consists of 3 parts:
 1) The protocol specification.
 2) A server - the VFIO generic device in QEMU that exchanges the protocol messages with the client.
 3) A client - remote process that emulates a device.

This patchset implements parts 1 and 2.
The protocol's specification can be found here [2]:
We also include this as the first patch of the series.

The libvfio-user project (https://github.com/nutanix/libvfio-user)
can be used by a remote process to handle the protocol to implement the
third part.
We also worked on implementing a client and will be sending this patch
series shortly.

Contributors:

John G Johnson <john.g.johnson@oracle.com>
John Levon <john.levon@nutanix.com>
Thanos Makatos <thanos.makatos@nutanix.com>
Elena Ufimtseva <elena.ufimtseva@oracle.com>
Jagannathan Raman <jag.raman@oracle.com>

 Changes in v2:
 - combine some patches with relevant functionality.
 - use SocketAddress with idea to modify later the command line options.
 - define protocol bits in user-protocol.h.
 - use QEMU_LOCK_GUARD where appropriate.
 - fix the locking when event signaling.
 - do not drop BQL on dma map/unmap.
 - added checks for message sizes in communication functions.

John Johnson (15):
  vfio-user: add VFIO base abstract class
  vfio-user: Define type vfio_user_pci_dev_info
  vfio-user: connect vfio proxy to remote server
  vfio-user: define VFIO Proxy and communication functions
  vfio-user: negotiate version with remote server
  vfio-user: get device info
  vfio-user: get region info
  vfio-user: region read/write
  vfio-user: pci_user_realize PCI setup
  vfio-user: get and set IRQs
  vfio-user: proxy container connect/disconnect
  vfio-user: dma map/unmap operations
  vfio-user: dma read/write operations
  vfio-user: pci reset
  vfio-user: migration support

Thanos Makatos (1):
  vfio-user: introduce vfio-user protocol specification

 docs/devel/index.rst          |    1 +
 docs/devel/vfio-user.rst      | 1809 +++++++++++++++++++++++++++++++++
 hw/vfio/pci.h                 |   25 +-
 hw/vfio/user-protocol.h       |  210 ++++
 hw/vfio/user.h                |   95 ++
 include/hw/vfio/vfio-common.h |    9 +
 hw/vfio/common.c              |  296 +++++-
 hw/vfio/migration.c           |   34 +-
 hw/vfio/pci.c                 |  571 +++++++++--
 hw/vfio/user.c                | 1104 ++++++++++++++++++++
 MAINTAINERS                   |   11 +
 hw/vfio/meson.build           |    1 +
 12 files changed, 4062 insertions(+), 104 deletions(-)
 create mode 100644 docs/devel/vfio-user.rst
 create mode 100644 hw/vfio/user-protocol.h
 create mode 100644 hw/vfio/user.h
 create mode 100644 hw/vfio/user.c

-- 
2.25.1


[PATCH RFC server v2 00/11] vfio-user server in QEMU
Posted by Jagannathan Raman 2 years, 8 months ago
Hi,

This series depends on the following series from
Elena Ufimtseva <elena.ufimtseva@oracle.com>:
[PATCH RFC v2 00/16] vfio-user implementation

Thank you for your feedback for the v1 patches!
https://www.mail-archive.com/qemu-devel@nongnu.org/msg825021.html

We have incorporated the following feedback from v1 of the
review cycle:

[PATCH RFC server v2 01/11] vfio-user: build library
  - Using cmake subproject to build libvfio-user

[PATCH RFC server v2 02/11] vfio-user: define vfio-user object
  - Added check to confirm that TYPE_REMOTE_MACHINE is used
    with TYPE_VFU_OBJECT

[PATCH RFC server v2 04/11] vfio-user: find and init PCI device
  - Removed call to vfu_pci_set_id()
  - Added check to confirm that TYPE_PCI_DEVICE is used with
    TYPE_VFU_OBJECT

[PATCH RFC server v2 05/11] vfio-user: run vfio-user context
  - Using QEMU main-loop to drive the vfu_ctx (using
    vfu_get_poll_fd() & qemu_set_fd_handler())
  - Set vfu_ctx to non-blocking mode (LIBVFIO_USER_FLAG_ATTACH_NB)
  - Modified how QEMU attaches to the vfu_ctx

[PATCH RFC server v2 06/11] handle PCI config space accesses
  - Broke-up PCI config space access to 4-byte accesses

[PATCH RFC server v2 07/11] vfio-user: handle DMA mappings
  - Received feedback to assert that vfu_dma_info_t->vaddr is not
    NULL - unable to do it as it appears to be a valid case.

[PATCH RFC server v2 10/11] register handlers to facilitate migration
  - Migrate only one device's data per contect

Would appreciate if you could kindly review this v2 series. Looking
forward to your comments.

Thank you!

Jagannathan Raman (11):
  vfio-user: build library
  vfio-user: define vfio-user object
  vfio-user: instantiate vfio-user context
  vfio-user: find and init PCI device
  vfio-user: run vfio-user context
  vfio-user: handle PCI config space accesses
  vfio-user: handle DMA mappings
  vfio-user: handle PCI BAR accesses
  vfio-user: handle device interrupts
  vfio-user: register handlers to facilitate migration
  vfio-user: acceptance test

 configure                     |  11 +
 meson.build                   |  28 ++
 qapi/qom.json                 |  20 +-
 include/hw/remote/iohub.h     |   2 +
 migration/savevm.h            |   2 +
 hw/remote/iohub.c             |   5 +
 hw/remote/vfio-user-obj.c     | 803 ++++++++++++++++++++++++++++++++++++++++++
 migration/savevm.c            |  73 ++++
 .gitmodules                   |   3 +
 MAINTAINERS                   |   9 +
 hw/remote/meson.build         |   3 +
 hw/remote/trace-events        |  10 +
 subprojects/libvfio-user      |   1 +
 tests/acceptance/vfio-user.py |  94 +++++
 14 files changed, 1062 insertions(+), 2 deletions(-)
 create mode 100644 hw/remote/vfio-user-obj.c
 create mode 160000 subprojects/libvfio-user
 create mode 100644 tests/acceptance/vfio-user.py

-- 
1.8.3.1


Re: [PATCH RFC server v2 00/11] vfio-user server in QEMU
Posted by Stefan Hajnoczi 2 years, 7 months ago
Hi Jag,
I have finished reviewing these patches and left comments. I didn't take
a look at the libvfio-user's implementation.

Stefan
Re: [PATCH RFC server v2 00/11] vfio-user server in QEMU
Posted by Jag Raman 2 years, 7 months ago

> On Sep 9, 2021, at 4:17 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> Hi Jag,
> I have finished reviewing these patches and left comments. I didn't take
> a look at the libvfio-user's implementation.

Thank you for you comments, Stefan - we’ll get cracking on them. :)

--
Jag

> 
> Stefan

[PATCH RFC server v2 01/11] vfio-user: build library
Posted by Jagannathan Raman 2 years, 8 months ago
add the libvfio-user library as a submodule. build it as a cmake
subproject.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 configure                | 11 +++++++++++
 meson.build              | 28 ++++++++++++++++++++++++++++
 .gitmodules              |  3 +++
 MAINTAINERS              |  7 +++++++
 hw/remote/meson.build    |  2 ++
 subprojects/libvfio-user |  1 +
 6 files changed, 52 insertions(+)
 create mode 160000 subprojects/libvfio-user

diff --git a/configure b/configure
index 9a79a00..794e900 100755
--- a/configure
+++ b/configure
@@ -4291,6 +4291,17 @@ but not implemented on your system"
 fi
 
 ##########################################
+# check for multiprocess
+
+case "$multiprocess" in
+  auto | enabled )
+    if test "$git_submodules_action" != "ignore"; then
+      git_submodules="${git_submodules} libvfio-user"
+    fi
+    ;;
+esac
+
+##########################################
 # End of CC checks
 # After here, no more $cc or $ld runs
 
diff --git a/meson.build b/meson.build
index bf63784..2b2d5c2 100644
--- a/meson.build
+++ b/meson.build
@@ -1898,6 +1898,34 @@ if get_option('cfi') and slirp_opt == 'system'
          + ' Please configure with --enable-slirp=git')
 endif
 
+vfiouser = not_found
+if have_system and multiprocess_allowed
+  have_internal = fs.exists(meson.current_source_dir() / 'subprojects/libvfio-user/Makefile')
+
+  if not have_internal
+    error('libvfio-user source not found - please pull git submodule')
+  endif
+
+  json_c = dependency('json-c', required: false)
+    if not json_c.found()
+      json_c = dependency('libjson-c')
+  endif
+
+  cmake = import('cmake')
+
+  vfiouser_subproj = cmake.subproject('libvfio-user')
+
+  vfiouser_sl = vfiouser_subproj.dependency('vfio-user-static')
+
+  # Although cmake links the json-c library with vfio-user-static
+  # target, that info is not available to meson via cmake.subproject.
+  # As such, we have to separately declare the json-c dependency here.
+  # This appears to be a current limitation of using cmake inside meson.
+  # libvfio-user is planning a switch to meson in the future, which
+  # would address this item automatically.
+  vfiouser = declare_dependency(dependencies: [vfiouser_sl, json_c])
+endif
+
 fdt = not_found
 fdt_opt = get_option('fdt')
 if have_system
diff --git a/.gitmodules b/.gitmodules
index 08b1b48..cfeea7c 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -64,3 +64,6 @@
 [submodule "roms/vbootrom"]
 	path = roms/vbootrom
 	url = https://gitlab.com/qemu-project/vbootrom.git
+[submodule "subprojects/libvfio-user"]
+	path = subprojects/libvfio-user
+	url = https://github.com/nutanix/libvfio-user.git
diff --git a/MAINTAINERS b/MAINTAINERS
index 4039d3c..0c5a18e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3361,6 +3361,13 @@ F: semihosting/
 F: include/semihosting/
 F: tests/tcg/multiarch/arm-compat-semi/
 
+libvfio-user Library
+M: Thanos Makatos <thanos.makatos@nutanix.com>
+M: John Levon <john.levon@nutanix.com>
+T: https://github.com/nutanix/libvfio-user.git
+S: Maintained
+F: subprojects/libvfio-user/*
+
 Multi-process QEMU
 M: Elena Ufimtseva <elena.ufimtseva@oracle.com>
 M: Jagannathan Raman <jag.raman@oracle.com>
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index e6a5574..fb35fb8 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -7,6 +7,8 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
 
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: vfiouser)
+
 specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
 specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy-memory-listener.c'))
 
diff --git a/subprojects/libvfio-user b/subprojects/libvfio-user
new file mode 160000
index 0000000..647c934
--- /dev/null
+++ b/subprojects/libvfio-user
@@ -0,0 +1 @@
+Subproject commit 647c9341d2e06266a710ddd075f69c95dd3b8446
-- 
1.8.3.1


Re: [PATCH RFC server v2 01/11] vfio-user: build library
Posted by Jag Raman 2 years, 8 months ago

> On Aug 27, 2021, at 1:53 PM, Jag Raman <jag.raman@oracle.com> wrote:
> 
> add the libvfio-user library as a submodule. build it as a cmake
> subproject.
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
> configure                | 11 +++++++++++
> meson.build              | 28 ++++++++++++++++++++++++++++
> .gitmodules              |  3 +++
> MAINTAINERS              |  7 +++++++
> hw/remote/meson.build    |  2 ++
> subprojects/libvfio-user |  1 +
> 6 files changed, 52 insertions(+)
> create mode 160000 subprojects/libvfio-user
> 
> diff --git a/configure b/configure
> index 9a79a00..794e900 100755
> --- a/configure
> +++ b/configure
> @@ -4291,6 +4291,17 @@ but not implemented on your system"
> fi
> 
> ##########################################
> +# check for multiprocess
> +
> +case "$multiprocess" in
> +  auto | enabled )
> +    if test "$git_submodules_action" != "ignore"; then
> +      git_submodules="${git_submodules} libvfio-user"
> +    fi
> +    ;;
> +esac
> +
> +##########################################
> # End of CC checks
> # After here, no more $cc or $ld runs
> 
> diff --git a/meson.build b/meson.build
> index bf63784..2b2d5c2 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -1898,6 +1898,34 @@ if get_option('cfi') and slirp_opt == 'system'
>          + ' Please configure with --enable-slirp=git')
> endif
> 
> +vfiouser = not_found
> +if have_system and multiprocess_allowed
> +  have_internal = fs.exists(meson.current_source_dir() / 'subprojects/libvfio-user/Makefile')
> +
> +  if not have_internal
> +    error('libvfio-user source not found - please pull git submodule')
> +  endif
> +
> +  json_c = dependency('json-c', required: false)
> +    if not json_c.found()
> +      json_c = dependency('libjson-c')
> +  endif

One of the things we’re wondering is about this json-c package that we need to build
libvfio-user library.

The gitlab runners typically don’t have this package installed, as such the gitlab builds
fail. Wondering if there's a way to install this package for all QEMU builds?

We checked out the various jobs defined in “.gitlab-ci.d/buildtest.yml” - there is a
“before_script” keyword which we could use to install this package. The “before_script”
keyword appears to be run every time before a job’s script is executed. But this option
appears to be per job/build. Wondering if there's a distro-independent global way to
install a required package for all builds.

Thank you!
--
Jag

> +
> +  cmake = import('cmake')
> +
> +  vfiouser_subproj = cmake.subproject('libvfio-user')
> +
> +  vfiouser_sl = vfiouser_subproj.dependency('vfio-user-static')
> +
> +  # Although cmake links the json-c library with vfio-user-static
> +  # target, that info is not available to meson via cmake.subproject.
> +  # As such, we have to separately declare the json-c dependency here.
> +  # This appears to be a current limitation of using cmake inside meson.
> +  # libvfio-user is planning a switch to meson in the future, which
> +  # would address this item automatically.
> +  vfiouser = declare_dependency(dependencies: [vfiouser_sl, json_c])
> +endif
> +
> fdt = not_found
> fdt_opt = get_option('fdt')
> if have_system
> diff --git a/.gitmodules b/.gitmodules
> index 08b1b48..cfeea7c 100644
> --- a/.gitmodules
> +++ b/.gitmodules
> @@ -64,3 +64,6 @@
> [submodule "roms/vbootrom"]
> 	path = roms/vbootrom
> 	url = https://gitlab.com/qemu-project/vbootrom.git
> +[submodule "subprojects/libvfio-user"]
> +	path = subprojects/libvfio-user
> +	url = https://github.com/nutanix/libvfio-user.git
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4039d3c..0c5a18e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3361,6 +3361,13 @@ F: semihosting/
> F: include/semihosting/
> F: tests/tcg/multiarch/arm-compat-semi/
> 
> +libvfio-user Library
> +M: Thanos Makatos <thanos.makatos@nutanix.com>
> +M: John Levon <john.levon@nutanix.com>
> +T: https://github.com/nutanix/libvfio-user.git
> +S: Maintained
> +F: subprojects/libvfio-user/*
> +
> Multi-process QEMU
> M: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> M: Jagannathan Raman <jag.raman@oracle.com>
> diff --git a/hw/remote/meson.build b/hw/remote/meson.build
> index e6a5574..fb35fb8 100644
> --- a/hw/remote/meson.build
> +++ b/hw/remote/meson.build
> @@ -7,6 +7,8 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
> remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
> remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
> 
> +remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: vfiouser)
> +
> specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
> specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy-memory-listener.c'))
> 
> diff --git a/subprojects/libvfio-user b/subprojects/libvfio-user
> new file mode 160000
> index 0000000..647c934
> --- /dev/null
> +++ b/subprojects/libvfio-user
> @@ -0,0 +1 @@
> +Subproject commit 647c9341d2e06266a710ddd075f69c95dd3b8446
> -- 
> 1.8.3.1
> 

Re: [PATCH RFC server v2 01/11] vfio-user: build library
Posted by Stefan Hajnoczi 2 years, 7 months ago
On Fri, Aug 27, 2021 at 01:53:20PM -0400, Jagannathan Raman wrote:
> diff --git a/meson.build b/meson.build
> index bf63784..2b2d5c2 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -1898,6 +1898,34 @@ if get_option('cfi') and slirp_opt == 'system'
>           + ' Please configure with --enable-slirp=git')
>  endif
>  
> +vfiouser = not_found
> +if have_system and multiprocess_allowed
> +  have_internal = fs.exists(meson.current_source_dir() / 'subprojects/libvfio-user/Makefile')
> +
> +  if not have_internal
> +    error('libvfio-user source not found - please pull git submodule')
> +  endif
> +
> +  json_c = dependency('json-c', required: false)
> +    if not json_c.found()

Indentation is off.

> +      json_c = dependency('libjson-c')
> +  endif
> +
> +  cmake = import('cmake')
> +
> +  vfiouser_subproj = cmake.subproject('libvfio-user')
> +
> +  vfiouser_sl = vfiouser_subproj.dependency('vfio-user-static')
> +
> +  # Although cmake links the json-c library with vfio-user-static
> +  # target, that info is not available to meson via cmake.subproject.
> +  # As such, we have to separately declare the json-c dependency here.
> +  # This appears to be a current limitation of using cmake inside meson.
> +  # libvfio-user is planning a switch to meson in the future, which
> +  # would address this item automatically.
> +  vfiouser = declare_dependency(dependencies: [vfiouser_sl, json_c])
> +endif
> +
>  fdt = not_found
>  fdt_opt = get_option('fdt')
>  if have_system
> diff --git a/.gitmodules b/.gitmodules
> index 08b1b48..cfeea7c 100644
> --- a/.gitmodules
> +++ b/.gitmodules
> @@ -64,3 +64,6 @@
>  [submodule "roms/vbootrom"]
>  	path = roms/vbootrom
>  	url = https://gitlab.com/qemu-project/vbootrom.git
> +[submodule "subprojects/libvfio-user"]
> +	path = subprojects/libvfio-user
> +	url = https://github.com/nutanix/libvfio-user.git

Once this is merged I'll set up a
gitlab.com/qemu-project/libvfio-user.git mirror. This ensures that no
matter what happens with upstream libvfio-user.git, the source code that
QEMU builds against will remain archived/available.

> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4039d3c..0c5a18e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3361,6 +3361,13 @@ F: semihosting/
>  F: include/semihosting/
>  F: tests/tcg/multiarch/arm-compat-semi/
>  
> +libvfio-user Library
> +M: Thanos Makatos <thanos.makatos@nutanix.com>
> +M: John Levon <john.levon@nutanix.com>
> +T: https://github.com/nutanix/libvfio-user.git
> +S: Maintained
> +F: subprojects/libvfio-user/*

A MAINTAINERS entry isn't necessary for git submodules. This could
become outdated. People should look at the upstream project instead for
information on maintainership and how to contribute.
Re: [PATCH RFC server v2 01/11] vfio-user: build library
Posted by Philippe Mathieu-Daudé 2 years, 7 months ago
On 9/8/21 2:25 PM, Stefan Hajnoczi wrote:
> On Fri, Aug 27, 2021 at 01:53:20PM -0400, Jagannathan Raman wrote:

>> diff --git a/.gitmodules b/.gitmodules
>> index 08b1b48..cfeea7c 100644
>> --- a/.gitmodules
>> +++ b/.gitmodules
>> @@ -64,3 +64,6 @@
>>  [submodule "roms/vbootrom"]
>>  	path = roms/vbootrom
>>  	url = https://gitlab.com/qemu-project/vbootrom.git
>> +[submodule "subprojects/libvfio-user"]
>> +	path = subprojects/libvfio-user
>> +	url = https://github.com/nutanix/libvfio-user.git
> 
> Once this is merged I'll set up a
> gitlab.com/qemu-project/libvfio-user.git mirror. This ensures that no
> matter what happens with upstream libvfio-user.git, the source code that
> QEMU builds against will remain archived/available.

Can we do it the other way around? When the series is OK to be merged,
setup the https://gitlab.com/qemu-project/libvfio-user.git mirror and
have the submodule point to it?


Re: [PATCH RFC server v2 01/11] vfio-user: build library
Posted by Stefan Hajnoczi 2 years, 7 months ago
On Fri, Sep 10, 2021 at 05:21:33PM +0200, Philippe Mathieu-Daudé wrote:
> On 9/8/21 2:25 PM, Stefan Hajnoczi wrote:
> > On Fri, Aug 27, 2021 at 01:53:20PM -0400, Jagannathan Raman wrote:
> 
> >> diff --git a/.gitmodules b/.gitmodules
> >> index 08b1b48..cfeea7c 100644
> >> --- a/.gitmodules
> >> +++ b/.gitmodules
> >> @@ -64,3 +64,6 @@
> >>  [submodule "roms/vbootrom"]
> >>  	path = roms/vbootrom
> >>  	url = https://gitlab.com/qemu-project/vbootrom.git
> >> +[submodule "subprojects/libvfio-user"]
> >> +	path = subprojects/libvfio-user
> >> +	url = https://github.com/nutanix/libvfio-user.git
> > 
> > Once this is merged I'll set up a
> > gitlab.com/qemu-project/libvfio-user.git mirror. This ensures that no
> > matter what happens with upstream libvfio-user.git, the source code that
> > QEMU builds against will remain archived/available.
> 
> Can we do it the other way around? When the series is OK to be merged,
> setup the https://gitlab.com/qemu-project/libvfio-user.git mirror and
> have the submodule point to it?

Yes, good idea.

Stefan
Re: [PATCH RFC server v2 01/11] vfio-user: build library
Posted by Philippe Mathieu-Daudé 2 years, 7 months ago
On 8/27/21 7:53 PM, Jagannathan Raman wrote:
> add the libvfio-user library as a submodule. build it as a cmake
> subproject.
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  configure                | 11 +++++++++++
>  meson.build              | 28 ++++++++++++++++++++++++++++
>  .gitmodules              |  3 +++
>  MAINTAINERS              |  7 +++++++
>  hw/remote/meson.build    |  2 ++
>  subprojects/libvfio-user |  1 +
>  6 files changed, 52 insertions(+)
>  create mode 160000 subprojects/libvfio-user

> diff --git a/subprojects/libvfio-user b/subprojects/libvfio-user
> new file mode 160000
> index 0000000..647c934
> --- /dev/null
> +++ b/subprojects/libvfio-user
> @@ -0,0 +1 @@
> +Subproject commit 647c9341d2e06266a710ddd075f69c95dd3b8446
> 

Could we point to a sha1 of a released tag instead?


Re: [PATCH RFC server v2 01/11] vfio-user: build library
Posted by Jag Raman 2 years, 7 months ago

> On Sep 10, 2021, at 11:20 AM, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
> 
> On 8/27/21 7:53 PM, Jagannathan Raman wrote:
>> add the libvfio-user library as a submodule. build it as a cmake
>> subproject.
>> 
>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>> ---
>> configure                | 11 +++++++++++
>> meson.build              | 28 ++++++++++++++++++++++++++++
>> .gitmodules              |  3 +++
>> MAINTAINERS              |  7 +++++++
>> hw/remote/meson.build    |  2 ++
>> subprojects/libvfio-user |  1 +
>> 6 files changed, 52 insertions(+)
>> create mode 160000 subprojects/libvfio-user
> 
>> diff --git a/subprojects/libvfio-user b/subprojects/libvfio-user
>> new file mode 160000
>> index 0000000..647c934
>> --- /dev/null
>> +++ b/subprojects/libvfio-user
>> @@ -0,0 +1 @@
>> +Subproject commit 647c9341d2e06266a710ddd075f69c95dd3b8446
>> 
> 
> Could we point to a sha1 of a released tag instead?

OK, will do.

--
Jag

> 

Re: [PATCH RFC server v2 01/11] vfio-user: build library
Posted by John Levon 2 years, 7 months ago
On Fri, Sep 10, 2021 at 05:20:09PM +0200, Philippe Mathieu-Daudé wrote:

> On 8/27/21 7:53 PM, Jagannathan Raman wrote:
> > add the libvfio-user library as a submodule. build it as a cmake
> > subproject.
> > 
> > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> > Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> > ---
> >  configure                | 11 +++++++++++
> >  meson.build              | 28 ++++++++++++++++++++++++++++
> >  .gitmodules              |  3 +++
> >  MAINTAINERS              |  7 +++++++
> >  hw/remote/meson.build    |  2 ++
> >  subprojects/libvfio-user |  1 +
> >  6 files changed, 52 insertions(+)
> >  create mode 160000 subprojects/libvfio-user
> 
> > diff --git a/subprojects/libvfio-user b/subprojects/libvfio-user
> > new file mode 160000
> > index 0000000..647c934
> > --- /dev/null
> > +++ b/subprojects/libvfio-user
> > @@ -0,0 +1 @@
> > +Subproject commit 647c9341d2e06266a710ddd075f69c95dd3b8446
> 
> Could we point to a sha1 of a released tag instead?

We don't have releases (yet) partly because we haven't yet stabilized the API.

regards
john
Re: [PATCH RFC server v2 01/11] vfio-user: build library
Posted by Philippe Mathieu-Daudé 2 years, 7 months ago
On 9/12/21 12:29 AM, John Levon wrote:
> On Fri, Sep 10, 2021 at 05:20:09PM +0200, Philippe Mathieu-Daudé wrote:
>> On 8/27/21 7:53 PM, Jagannathan Raman wrote:
>>> add the libvfio-user library as a submodule. build it as a cmake
>>> subproject.
>>>
>>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>>> ---
>>>  configure                | 11 +++++++++++
>>>  meson.build              | 28 ++++++++++++++++++++++++++++
>>>  .gitmodules              |  3 +++
>>>  MAINTAINERS              |  7 +++++++
>>>  hw/remote/meson.build    |  2 ++
>>>  subprojects/libvfio-user |  1 +
>>>  6 files changed, 52 insertions(+)
>>>  create mode 160000 subprojects/libvfio-user
>>
>>> diff --git a/subprojects/libvfio-user b/subprojects/libvfio-user
>>> new file mode 160000
>>> index 0000000..647c934
>>> --- /dev/null
>>> +++ b/subprojects/libvfio-user
>>> @@ -0,0 +1 @@
>>> +Subproject commit 647c9341d2e06266a710ddd075f69c95dd3b8446
>>
>> Could we point to a sha1 of a released tag instead?
> 
> We don't have releases (yet) partly because we haven't yet stabilized the API.

OK. Maybe acceptable, up to the maintainer then ¯\_(ツ)_/¯


[PATCH RFC server v2 02/11] vfio-user: define vfio-user object
Posted by Jagannathan Raman 2 years, 8 months ago
Define vfio-user object which is remote process server for QEMU. Setup
object initialization functions and properties necessary to instantiate
the object

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 qapi/qom.json             |  20 ++++++-
 hw/remote/vfio-user-obj.c | 145 ++++++++++++++++++++++++++++++++++++++++++++++
 MAINTAINERS               |   1 +
 hw/remote/meson.build     |   1 +
 hw/remote/trace-events    |   3 +
 5 files changed, 168 insertions(+), 2 deletions(-)
 create mode 100644 hw/remote/vfio-user-obj.c

diff --git a/qapi/qom.json b/qapi/qom.json
index a25616b..3e941ee 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -689,6 +689,20 @@
   'data': { 'fd': 'str', 'devid': 'str' } }
 
 ##
+# @VfioUserProperties:
+#
+# Properties for vfio-user objects.
+#
+# @socket: path to be used as socket by the libvfiouser library
+#
+# @devid: the id of the device to be associated with the file descriptor
+#
+# Since: 6.0
+##
+{ 'struct': 'VfioUserProperties',
+  'data': { 'socket': 'str', 'devid': 'str' } }
+
+##
 # @RngProperties:
 #
 # Properties for objects of classes derived from rng.
@@ -812,7 +826,8 @@
     'tls-creds-psk',
     'tls-creds-x509',
     'tls-cipher-suites',
-    'x-remote-object'
+    'x-remote-object',
+    'vfio-user'
   ] }
 
 ##
@@ -868,7 +883,8 @@
       'tls-creds-psk':              'TlsCredsPskProperties',
       'tls-creds-x509':             'TlsCredsX509Properties',
       'tls-cipher-suites':          'TlsCredsProperties',
-      'x-remote-object':            'RemoteObjectProperties'
+      'x-remote-object':            'RemoteObjectProperties',
+      'vfio-user':                  'VfioUserProperties'
   } }
 
 ##
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
new file mode 100644
index 0000000..4a1e297
--- /dev/null
+++ b/hw/remote/vfio-user-obj.c
@@ -0,0 +1,145 @@
+/**
+ * QEMU vfio-user server object
+ *
+ * Copyright © 2021 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL-v2, version 2 or later.
+ *
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/**
+ * Usage: add options:
+ *     -machine x-remote
+ *     -device <PCI-device>,id=<pci-dev-id>
+ *     -object vfio-user,id=<id>,socket=<socket-path>,devid=<pci-dev-id>
+ *
+ * Note that vfio-user object must be used with x-remote machine only. This
+ * server could only support PCI devices for now.
+ *
+ * socket is path to a file. This file will be created by the server. It is
+ * a required option
+ *
+ * devid is the id of a PCI device on the server. It is also a required option.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "qom/object.h"
+#include "qom/object_interfaces.h"
+#include "qemu/error-report.h"
+#include "trace.h"
+#include "sysemu/runstate.h"
+
+#define TYPE_VFU_OBJECT "vfio-user"
+OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
+
+struct VfuObjectClass {
+    ObjectClass parent_class;
+
+    unsigned int nr_devs;
+
+    /* Maximum number of devices the server could support */
+    unsigned int max_devs;
+};
+
+struct VfuObject {
+    /* private */
+    Object parent;
+
+    char *socket;
+    char *devid;
+};
+
+static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
+{
+    VfuObject *o = VFU_OBJECT(obj);
+
+    g_free(o->socket);
+
+    o->socket = g_strdup(str);
+
+    trace_vfu_prop("socket", str);
+}
+
+static void vfu_object_set_devid(Object *obj, const char *str, Error **errp)
+{
+    VfuObject *o = VFU_OBJECT(obj);
+
+    g_free(o->devid);
+
+    o->devid = g_strdup(str);
+
+    trace_vfu_prop("devid", str);
+}
+
+static void vfu_object_init(Object *obj)
+{
+    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
+
+    if (!object_dynamic_cast(OBJECT(current_machine), TYPE_REMOTE_MACHINE)) {
+        error_report("vfu: %s only compatible with %s machine",
+                     TYPE_VFU_OBJECT, TYPE_REMOTE_MACHINE);
+        return;
+    }
+
+    if (k->nr_devs >= k->max_devs) {
+        error_report("Reached maximum number of vfio-user devices: %u",
+                     k->max_devs);
+        return;
+    }
+
+    k->nr_devs++;
+}
+
+static void vfu_object_finalize(Object *obj)
+{
+    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
+    VfuObject *o = VFU_OBJECT(obj);
+
+    k->nr_devs--;
+
+    g_free(o->socket);
+    g_free(o->devid);
+
+    if (k->nr_devs == 0) {
+        qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+    }
+}
+
+static void vfu_object_class_init(ObjectClass *klass, void *data)
+{
+    VfuObjectClass *k = VFU_OBJECT_CLASS(klass);
+
+    /* Limiting maximum number of devices to 1 until IOMMU support is added */
+    k->max_devs = 1;
+    k->nr_devs = 0;
+
+    object_class_property_add_str(klass, "socket", NULL,
+                                  vfu_object_set_socket);
+    object_class_property_add_str(klass, "devid", NULL,
+                                  vfu_object_set_devid);
+}
+
+static const TypeInfo vfu_object_info = {
+    .name = TYPE_VFU_OBJECT,
+    .parent = TYPE_OBJECT,
+    .instance_size = sizeof(VfuObject),
+    .instance_init = vfu_object_init,
+    .instance_finalize = vfu_object_finalize,
+    .class_size = sizeof(VfuObjectClass),
+    .class_init = vfu_object_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_USER_CREATABLE },
+        { }
+    }
+};
+
+static void vfu_register_types(void)
+{
+    type_register_static(&vfu_object_info);
+}
+
+type_init(vfu_register_types);
diff --git a/MAINTAINERS b/MAINTAINERS
index 0c5a18e..f9d8092 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3391,6 +3391,7 @@ F: hw/remote/proxy-memory-listener.c
 F: include/hw/remote/proxy-memory-listener.h
 F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
+F: hw/remote/vfio-user-obj.c
 
 EBPF:
 M: Jason Wang <jasowang@redhat.com>
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index fb35fb8..cd44dfc 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -6,6 +6,7 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('vfio-user-obj.c'))
 
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: vfiouser)
 
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 0b23974..7da12f0 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -2,3 +2,6 @@
 
 mpqemu_send_io_error(int cmd, int size, int nfds) "send command %d size %d, %d file descriptors to remote process"
 mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d, %d file descriptors to remote process"
+
+# vfio-user-obj.c
+vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
-- 
1.8.3.1


[PATCH RFC server v2 03/11] vfio-user: instantiate vfio-user context
Posted by Jagannathan Raman 2 years, 8 months ago
create a context with the vfio-user library to run a PCI device

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 hw/remote/vfio-user-obj.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 4a1e297..99d3dd1 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -27,11 +27,17 @@
 #include "qemu/osdep.h"
 #include "qemu-common.h"
 
+#include <errno.h>
+
 #include "qom/object.h"
 #include "qom/object_interfaces.h"
 #include "qemu/error-report.h"
 #include "trace.h"
 #include "sysemu/runstate.h"
+#include "qemu/notify.h"
+#include "qapi/error.h"
+#include "sysemu/sysemu.h"
+#include "libvfio-user.h"
 
 #define TYPE_VFU_OBJECT "vfio-user"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -51,6 +57,10 @@ struct VfuObject {
 
     char *socket;
     char *devid;
+
+    Notifier machine_done;
+
+    vfu_ctx_t *vfu_ctx;
 };
 
 static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
@@ -75,9 +85,23 @@ static void vfu_object_set_devid(Object *obj, const char *str, Error **errp)
     trace_vfu_prop("devid", str);
 }
 
+static void vfu_object_machine_done(Notifier *notifier, void *data)
+{
+    VfuObject *o = container_of(notifier, VfuObject, machine_done);
+
+    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket, 0,
+                                o, VFU_DEV_TYPE_PCI);
+    if (o->vfu_ctx == NULL) {
+        error_setg(&error_abort, "vfu: Failed to create context - %s",
+                   strerror(errno));
+        return;
+    }
+}
+
 static void vfu_object_init(Object *obj)
 {
     VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
+    VfuObject *o = VFU_OBJECT(obj);
 
     if (!object_dynamic_cast(OBJECT(current_machine), TYPE_REMOTE_MACHINE)) {
         error_report("vfu: %s only compatible with %s machine",
@@ -92,6 +116,9 @@ static void vfu_object_init(Object *obj)
     }
 
     k->nr_devs++;
+
+    o->machine_done.notify = vfu_object_machine_done;
+    qemu_add_machine_init_done_notifier(&o->machine_done);
 }
 
 static void vfu_object_finalize(Object *obj)
@@ -101,6 +128,8 @@ static void vfu_object_finalize(Object *obj)
 
     k->nr_devs--;
 
+    vfu_destroy_ctx(o->vfu_ctx);
+
     g_free(o->socket);
     g_free(o->devid);
 
-- 
1.8.3.1


Re: [PATCH RFC server v2 03/11] vfio-user: instantiate vfio-user context
Posted by Stefan Hajnoczi 2 years, 7 months ago
On Fri, Aug 27, 2021 at 01:53:22PM -0400, Jagannathan Raman wrote:
> create a context with the vfio-user library to run a PCI device
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  hw/remote/vfio-user-obj.c | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> index 4a1e297..99d3dd1 100644
> --- a/hw/remote/vfio-user-obj.c
> +++ b/hw/remote/vfio-user-obj.c
> @@ -27,11 +27,17 @@
>  #include "qemu/osdep.h"
>  #include "qemu-common.h"
>  
> +#include <errno.h>

qemu/osdep.h already includes <errno.h>

> +
>  #include "qom/object.h"
>  #include "qom/object_interfaces.h"
>  #include "qemu/error-report.h"
>  #include "trace.h"
>  #include "sysemu/runstate.h"
> +#include "qemu/notify.h"
> +#include "qapi/error.h"
> +#include "sysemu/sysemu.h"
> +#include "libvfio-user.h"
>  
>  #define TYPE_VFU_OBJECT "vfio-user"
>  OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
> @@ -51,6 +57,10 @@ struct VfuObject {
>  
>      char *socket;
>      char *devid;
> +
> +    Notifier machine_done;
> +
> +    vfu_ctx_t *vfu_ctx;
>  };
>  
>  static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
> @@ -75,9 +85,23 @@ static void vfu_object_set_devid(Object *obj, const char *str, Error **errp)
>      trace_vfu_prop("devid", str);
>  }
>  
> +static void vfu_object_machine_done(Notifier *notifier, void *data)

Please document the reason for using a machine init done notifier.

> +{
> +    VfuObject *o = container_of(notifier, VfuObject, machine_done);
> +
> +    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket, 0,
> +                                o, VFU_DEV_TYPE_PCI);
> +    if (o->vfu_ctx == NULL) {
> +        error_setg(&error_abort, "vfu: Failed to create context - %s",
> +                   strerror(errno));
> +        return;
> +    }
> +}
> +
>  static void vfu_object_init(Object *obj)
>  {
>      VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
> +    VfuObject *o = VFU_OBJECT(obj);
>  
>      if (!object_dynamic_cast(OBJECT(current_machine), TYPE_REMOTE_MACHINE)) {
>          error_report("vfu: %s only compatible with %s machine",
> @@ -92,6 +116,9 @@ static void vfu_object_init(Object *obj)
>      }
>  
>      k->nr_devs++;
> +
> +    o->machine_done.notify = vfu_object_machine_done;
> +    qemu_add_machine_init_done_notifier(&o->machine_done);
>  }
>  
>  static void vfu_object_finalize(Object *obj)
> @@ -101,6 +128,8 @@ static void vfu_object_finalize(Object *obj)
>  
>      k->nr_devs--;
>  
> +    vfu_destroy_ctx(o->vfu_ctx);

Will this function ever be called before vfu_object_machine_done() is
called? In that case vfu_ctx isn't initialized.
Re: [PATCH RFC server v2 03/11] vfio-user: instantiate vfio-user context
Posted by Jag Raman 2 years, 7 months ago

> On Sep 8, 2021, at 8:40 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Fri, Aug 27, 2021 at 01:53:22PM -0400, Jagannathan Raman wrote:
>> create a context with the vfio-user library to run a PCI device
>> 
>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>> ---
>> hw/remote/vfio-user-obj.c | 29 +++++++++++++++++++++++++++++
>> 1 file changed, 29 insertions(+)
>> 
>> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
>> index 4a1e297..99d3dd1 100644
>> --- a/hw/remote/vfio-user-obj.c
>> +++ b/hw/remote/vfio-user-obj.c
>> @@ -27,11 +27,17 @@
>> #include "qemu/osdep.h"
>> #include "qemu-common.h"
>> 
>> +#include <errno.h>
> 
> qemu/osdep.h already includes <errno.h>
> 
>> +
>> #include "qom/object.h"
>> #include "qom/object_interfaces.h"
>> #include "qemu/error-report.h"
>> #include "trace.h"
>> #include "sysemu/runstate.h"
>> +#include "qemu/notify.h"
>> +#include "qapi/error.h"
>> +#include "sysemu/sysemu.h"
>> +#include "libvfio-user.h"
>> 
>> #define TYPE_VFU_OBJECT "vfio-user"
>> OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
>> @@ -51,6 +57,10 @@ struct VfuObject {
>> 
>>     char *socket;
>>     char *devid;
>> +
>> +    Notifier machine_done;
>> +
>> +    vfu_ctx_t *vfu_ctx;
>> };
>> 
>> static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
>> @@ -75,9 +85,23 @@ static void vfu_object_set_devid(Object *obj, const char *str, Error **errp)
>>     trace_vfu_prop("devid", str);
>> }
>> 
>> +static void vfu_object_machine_done(Notifier *notifier, void *data)
> 
> Please document the reason for using a machine init done notifier.

OK, will do.

> 
>> +{
>> +    VfuObject *o = container_of(notifier, VfuObject, machine_done);
>> +
>> +    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket, 0,
>> +                                o, VFU_DEV_TYPE_PCI);
>> +    if (o->vfu_ctx == NULL) {
>> +        error_setg(&error_abort, "vfu: Failed to create context - %s",
>> +                   strerror(errno));
>> +        return;
>> +    }
>> +}
>> +
>> static void vfu_object_init(Object *obj)
>> {
>>     VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
>> +    VfuObject *o = VFU_OBJECT(obj);
>> 
>>     if (!object_dynamic_cast(OBJECT(current_machine), TYPE_REMOTE_MACHINE)) {
>>         error_report("vfu: %s only compatible with %s machine",
>> @@ -92,6 +116,9 @@ static void vfu_object_init(Object *obj)
>>     }
>> 
>>     k->nr_devs++;
>> +
>> +    o->machine_done.notify = vfu_object_machine_done;
>> +    qemu_add_machine_init_done_notifier(&o->machine_done);
>> }
>> 
>> static void vfu_object_finalize(Object *obj)
>> @@ -101,6 +128,8 @@ static void vfu_object_finalize(Object *obj)
>> 
>>     k->nr_devs--;
>> 
>> +    vfu_destroy_ctx(o->vfu_ctx);
> 
> Will this function ever be called before vfu_object_machine_done() is
> called? In that case vfu_ctx isn't initialized.

There are some case where vfu_object_finalize() could be called before
vfu_object_machine_done() executes. In that case o->vfu_ctx would be
NULL - we didn’t account for that before.

vfu_destroy_ctx() does check for NULL - however, we’ll add a check
here as well in case vfu_destroy_ctx() changes in the future.

--
Jag

[PATCH RFC server v2 04/11] vfio-user: find and init PCI device
Posted by Jagannathan Raman 2 years, 8 months ago
Find the PCI device with specified id. Initialize the device context
with the QEMU PCI device

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 hw/remote/vfio-user-obj.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 99d3dd1..5ae0991 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -38,6 +38,8 @@
 #include "qapi/error.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
+#include "hw/qdev-core.h"
+#include "hw/pci/pci.h"
 
 #define TYPE_VFU_OBJECT "vfio-user"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -61,6 +63,8 @@ struct VfuObject {
     Notifier machine_done;
 
     vfu_ctx_t *vfu_ctx;
+
+    PCIDevice *pci_dev;
 };
 
 static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
@@ -88,6 +92,8 @@ static void vfu_object_set_devid(Object *obj, const char *str, Error **errp)
 static void vfu_object_machine_done(Notifier *notifier, void *data)
 {
     VfuObject *o = container_of(notifier, VfuObject, machine_done);
+    DeviceState *dev = NULL;
+    int ret;
 
     o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket, 0,
                                 o, VFU_DEV_TYPE_PCI);
@@ -96,6 +102,28 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
                    strerror(errno));
         return;
     }
+
+    dev = qdev_find_recursive(sysbus_get_default(), o->devid);
+    if (dev == NULL) {
+        error_setg(&error_abort, "vfu: Device %s not found", o->devid);
+        return;
+    }
+
+    if (!object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+        error_setg(&error_abort, "vfu: %s not a PCI devices", o->devid);
+        return;
+    }
+
+    o->pci_dev = PCI_DEVICE(dev);
+
+    ret = vfu_pci_init(o->vfu_ctx, VFU_PCI_TYPE_CONVENTIONAL,
+                       PCI_HEADER_TYPE_NORMAL, 0);
+    if (ret < 0) {
+        error_setg(&error_abort,
+                   "vfu: Failed to attach PCI device %s to context - %s",
+                   o->devid, strerror(errno));
+        return;
+    }
 }
 
 static void vfu_object_init(Object *obj)
-- 
1.8.3.1


Re: [PATCH RFC server v2 04/11] vfio-user: find and init PCI device
Posted by Stefan Hajnoczi 2 years, 7 months ago
On Fri, Aug 27, 2021 at 01:53:23PM -0400, Jagannathan Raman wrote:
> @@ -96,6 +102,28 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
>                     strerror(errno));
>          return;
>      }
> +
> +    dev = qdev_find_recursive(sysbus_get_default(), o->devid);
> +    if (dev == NULL) {
> +        error_setg(&error_abort, "vfu: Device %s not found", o->devid);
> +        return;
> +    }
> +
> +    if (!object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
> +        error_setg(&error_abort, "vfu: %s not a PCI devices", o->devid);
> +        return;
> +    }
> +
> +    o->pci_dev = PCI_DEVICE(dev);
> +
> +    ret = vfu_pci_init(o->vfu_ctx, VFU_PCI_TYPE_CONVENTIONAL,
> +                       PCI_HEADER_TYPE_NORMAL, 0);

What is needed to support PCI Express?
Re: [PATCH RFC server v2 04/11] vfio-user: find and init PCI device
Posted by Jag Raman 2 years, 7 months ago

> On Sep 8, 2021, at 8:43 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Fri, Aug 27, 2021 at 01:53:23PM -0400, Jagannathan Raman wrote:
>> @@ -96,6 +102,28 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
>>                    strerror(errno));
>>         return;
>>     }
>> +
>> +    dev = qdev_find_recursive(sysbus_get_default(), o->devid);
>> +    if (dev == NULL) {
>> +        error_setg(&error_abort, "vfu: Device %s not found", o->devid);
>> +        return;
>> +    }
>> +
>> +    if (!object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>> +        error_setg(&error_abort, "vfu: %s not a PCI devices", o->devid);
>> +        return;
>> +    }
>> +
>> +    o->pci_dev = PCI_DEVICE(dev);
>> +
>> +    ret = vfu_pci_init(o->vfu_ctx, VFU_PCI_TYPE_CONVENTIONAL,
>> +                       PCI_HEADER_TYPE_NORMAL, 0);
> 
> What is needed to support PCI Express?

I think we could check if o->pci_dev supports QEMU_PCI_CAP_EXPRESS,
and based on that choose if we should use
VFU_PCI_TYPE_CONVENTIONAL or VFU_PCI_TYPE_EXPRESS.

pci_is_express() is already doing that, although it’s a private function
now. It’s a good time to export it.

--
Jag
[PATCH RFC server v2 05/11] vfio-user: run vfio-user context
Posted by Jagannathan Raman 2 years, 8 months ago
Setup a handler to run vfio-user context. The context is driven by
messages to the file descriptor associated with it - get the fd for
the context and hook up the handler with it

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 hw/remote/vfio-user-obj.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 70 insertions(+), 1 deletion(-)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 5ae0991..0726eb9 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -35,6 +35,7 @@
 #include "trace.h"
 #include "sysemu/runstate.h"
 #include "qemu/notify.h"
+#include "qemu/thread.h"
 #include "qapi/error.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
@@ -65,6 +66,8 @@ struct VfuObject {
     vfu_ctx_t *vfu_ctx;
 
     PCIDevice *pci_dev;
+
+    int vfu_poll_fd;
 };
 
 static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
@@ -89,13 +92,67 @@ static void vfu_object_set_devid(Object *obj, const char *str, Error **errp)
     trace_vfu_prop("devid", str);
 }
 
+static void vfu_object_ctx_run(void *opaque)
+{
+    VfuObject *o = opaque;
+    int ret = -1;
+
+    while (ret != 0) {
+        ret = vfu_run_ctx(o->vfu_ctx);
+        if (ret < 0) {
+            if (errno == EINTR) {
+                continue;
+            } else if (errno == ENOTCONN) {
+                qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
+                o->vfu_poll_fd = -1;
+                object_unparent(OBJECT(o));
+                break;
+            } else {
+                error_setg(&error_abort, "vfu: Failed to run device %s - %s",
+                           o->devid, strerror(errno));
+                 break;
+            }
+        }
+    }
+}
+
+static void *vfu_object_attach_ctx(void *opaque)
+{
+    VfuObject *o = opaque;
+    int ret;
+
+retry_attach:
+    ret = vfu_attach_ctx(o->vfu_ctx);
+    if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK)) {
+        goto retry_attach;
+    } else if (ret < 0) {
+        error_setg(&error_abort,
+                   "vfu: Failed to attach device %s to context - %s",
+                   o->devid, strerror(errno));
+        return NULL;
+    }
+
+    o->vfu_poll_fd = vfu_get_poll_fd(o->vfu_ctx);
+    if (o->vfu_poll_fd < 0) {
+        error_setg(&error_abort, "vfu: Failed to get poll fd %s", o->devid);
+        return NULL;
+    }
+
+    qemu_set_fd_handler(o->vfu_poll_fd, vfu_object_ctx_run,
+                        NULL, o);
+
+    return NULL;
+}
+
 static void vfu_object_machine_done(Notifier *notifier, void *data)
 {
     VfuObject *o = container_of(notifier, VfuObject, machine_done);
     DeviceState *dev = NULL;
+    QemuThread thread;
     int ret;
 
-    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket, 0,
+    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket,
+                                LIBVFIO_USER_FLAG_ATTACH_NB,
                                 o, VFU_DEV_TYPE_PCI);
     if (o->vfu_ctx == NULL) {
         error_setg(&error_abort, "vfu: Failed to create context - %s",
@@ -124,6 +181,16 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
                    o->devid, strerror(errno));
         return;
     }
+
+    ret = vfu_realize_ctx(o->vfu_ctx);
+    if (ret < 0) {
+        error_setg(&error_abort, "vfu: Failed to realize device %s- %s",
+                   o->devid, strerror(errno));
+        return;
+    }
+
+    qemu_thread_create(&thread, o->socket, vfu_object_attach_ctx, o,
+                       QEMU_THREAD_DETACHED);
 }
 
 static void vfu_object_init(Object *obj)
@@ -147,6 +214,8 @@ static void vfu_object_init(Object *obj)
 
     o->machine_done.notify = vfu_object_machine_done;
     qemu_add_machine_init_done_notifier(&o->machine_done);
+
+    o->vfu_poll_fd = -1;
 }
 
 static void vfu_object_finalize(Object *obj)
-- 
1.8.3.1


Re: [PATCH RFC server v2 05/11] vfio-user: run vfio-user context
Posted by Stefan Hajnoczi 2 years, 7 months ago
On Fri, Aug 27, 2021 at 01:53:24PM -0400, Jagannathan Raman wrote:
> Setup a handler to run vfio-user context. The context is driven by
> messages to the file descriptor associated with it - get the fd for
> the context and hook up the handler with it
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  hw/remote/vfio-user-obj.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 70 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> index 5ae0991..0726eb9 100644
> --- a/hw/remote/vfio-user-obj.c
> +++ b/hw/remote/vfio-user-obj.c
> @@ -35,6 +35,7 @@
>  #include "trace.h"
>  #include "sysemu/runstate.h"
>  #include "qemu/notify.h"
> +#include "qemu/thread.h"
>  #include "qapi/error.h"
>  #include "sysemu/sysemu.h"
>  #include "libvfio-user.h"
> @@ -65,6 +66,8 @@ struct VfuObject {
>      vfu_ctx_t *vfu_ctx;
>  
>      PCIDevice *pci_dev;
> +
> +    int vfu_poll_fd;
>  };
>  
>  static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
> @@ -89,13 +92,67 @@ static void vfu_object_set_devid(Object *obj, const char *str, Error **errp)
>      trace_vfu_prop("devid", str);
>  }
>  
> +static void vfu_object_ctx_run(void *opaque)
> +{
> +    VfuObject *o = opaque;
> +    int ret = -1;
> +
> +    while (ret != 0) {
> +        ret = vfu_run_ctx(o->vfu_ctx);
> +        if (ret < 0) {
> +            if (errno == EINTR) {
> +                continue;
> +            } else if (errno == ENOTCONN) {
> +                qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
> +                o->vfu_poll_fd = -1;
> +                object_unparent(OBJECT(o));
> +                break;
> +            } else {
> +                error_setg(&error_abort, "vfu: Failed to run device %s - %s",
> +                           o->devid, strerror(errno));
> +                 break;
> +            }
> +        }
> +    }
> +}
> +
> +static void *vfu_object_attach_ctx(void *opaque)
> +{
> +    VfuObject *o = opaque;
> +    int ret;
> +
> +retry_attach:
> +    ret = vfu_attach_ctx(o->vfu_ctx);
> +    if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK)) {

Does this loop consume 100% CPU since this is non-blocking?

Is it possible to register the fd with a QEMU AioContext instead of
spawning a separate thread?

libvfio-user has non-blocking listen_fd but conn_fd is always blocking.
This means ATTACH_NB is not useful because vfu_attach_ctx() is actually
blocking. I think this means vfu_run_ctx() is also blocking in some
places and QEMU's event loop might hang :(.

Can you make libvfio-user non-blocking in order to solve these issues?

> +        goto retry_attach;
> +    } else if (ret < 0) {
> +        error_setg(&error_abort,
> +                   "vfu: Failed to attach device %s to context - %s",
> +                   o->devid, strerror(errno));
> +        return NULL;
> +    }
> +
> +    o->vfu_poll_fd = vfu_get_poll_fd(o->vfu_ctx);
> +    if (o->vfu_poll_fd < 0) {
> +        error_setg(&error_abort, "vfu: Failed to get poll fd %s", o->devid);
> +        return NULL;
> +    }
> +
> +    qemu_set_fd_handler(o->vfu_poll_fd, vfu_object_ctx_run,
> +                        NULL, o);
> +
> +    return NULL;
> +}
> +
>  static void vfu_object_machine_done(Notifier *notifier, void *data)
>  {
>      VfuObject *o = container_of(notifier, VfuObject, machine_done);
>      DeviceState *dev = NULL;
> +    QemuThread thread;
>      int ret;
>  
> -    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket, 0,
> +    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket,
> +                                LIBVFIO_USER_FLAG_ATTACH_NB,
>                                  o, VFU_DEV_TYPE_PCI);
>      if (o->vfu_ctx == NULL) {
>          error_setg(&error_abort, "vfu: Failed to create context - %s",
> @@ -124,6 +181,16 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
>                     o->devid, strerror(errno));
>          return;
>      }
> +
> +    ret = vfu_realize_ctx(o->vfu_ctx);
> +    if (ret < 0) {
> +        error_setg(&error_abort, "vfu: Failed to realize device %s- %s",
> +                   o->devid, strerror(errno));
> +        return;
> +    }
> +
> +    qemu_thread_create(&thread, o->socket, vfu_object_attach_ctx, o,
> +                       QEMU_THREAD_DETACHED);

Is this thread leaked when the object is destroyed?

>  }
>  
>  static void vfu_object_init(Object *obj)
> @@ -147,6 +214,8 @@ static void vfu_object_init(Object *obj)
>  
>      o->machine_done.notify = vfu_object_machine_done;
>      qemu_add_machine_init_done_notifier(&o->machine_done);
> +
> +    o->vfu_poll_fd = -1;
>  }
>  
>  static void vfu_object_finalize(Object *obj)
> -- 
> 1.8.3.1
> 
[PATCH RFC server v2 06/11] vfio-user: handle PCI config space accesses
Posted by Jagannathan Raman 2 years, 8 months ago
Define and register handlers for PCI config space accesses

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 hw/remote/vfio-user-obj.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 hw/remote/trace-events    |  2 ++
 2 files changed, 46 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 0726eb9..13011ce 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -36,6 +36,7 @@
 #include "sysemu/runstate.h"
 #include "qemu/notify.h"
 #include "qemu/thread.h"
+#include "qemu/main-loop.h"
 #include "qapi/error.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
@@ -144,6 +145,38 @@ retry_attach:
     return NULL;
 }
 
+static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
+                                     size_t count, loff_t offset,
+                                     const bool is_write)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    uint32_t pci_access_width = sizeof(uint32_t);
+    size_t bytes = count;
+    uint32_t val = 0;
+    char *ptr = buf;
+    int len;
+
+    while (bytes > 0) {
+        len = (bytes > pci_access_width) ? pci_access_width : bytes;
+        if (is_write) {
+            memcpy(&val, ptr, len);
+            pci_default_write_config(PCI_DEVICE(o->pci_dev),
+                                     offset, val, len);
+            trace_vfu_cfg_write(offset, val);
+        } else {
+            val = pci_default_read_config(PCI_DEVICE(o->pci_dev),
+                                          offset, len);
+            memcpy(ptr, &val, len);
+            trace_vfu_cfg_read(offset, val);
+        }
+        offset += len;
+        ptr += len;
+        bytes -= len;
+    }
+
+    return count;
+}
+
 static void vfu_object_machine_done(Notifier *notifier, void *data)
 {
     VfuObject *o = container_of(notifier, VfuObject, machine_done);
@@ -182,6 +215,17 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
         return;
     }
 
+    ret = vfu_setup_region(o->vfu_ctx, VFU_PCI_DEV_CFG_REGION_IDX,
+                           pci_config_size(o->pci_dev), &vfu_object_cfg_access,
+                           VFU_REGION_FLAG_RW | VFU_REGION_FLAG_ALWAYS_CB,
+                           NULL, 0, -1, 0);
+    if (ret < 0) {
+        error_setg(&error_abort,
+                   "vfu: Failed to setup config space handlers for %s- %s",
+                   o->devid, strerror(errno));
+        return;
+    }
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(&error_abort, "vfu: Failed to realize device %s- %s",
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 7da12f0..2ef7884 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -5,3 +5,5 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d,
 
 # vfio-user-obj.c
 vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
+vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
+vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
-- 
1.8.3.1


Re: [PATCH RFC server v2 06/11] vfio-user: handle PCI config space accesses
Posted by Stefan Hajnoczi 2 years, 7 months ago
On Fri, Aug 27, 2021 at 01:53:25PM -0400, Jagannathan Raman wrote:
> +static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
> +                                     size_t count, loff_t offset,
> +                                     const bool is_write)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    uint32_t pci_access_width = sizeof(uint32_t);
> +    size_t bytes = count;
> +    uint32_t val = 0;
> +    char *ptr = buf;
> +    int len;
> +
> +    while (bytes > 0) {
> +        len = (bytes > pci_access_width) ? pci_access_width : bytes;
> +        if (is_write) {
> +            memcpy(&val, ptr, len);
> +            pci_default_write_config(PCI_DEVICE(o->pci_dev),
> +                                     offset, val, len);
> +            trace_vfu_cfg_write(offset, val);
> +        } else {
> +            val = pci_default_read_config(PCI_DEVICE(o->pci_dev),
> +                                          offset, len);
> +            memcpy(ptr, &val, len);

pci_default_read_config() returns a host-endian 32-bit value. This code
looks wrong because it copies different bytes on big- and little-endian
hosts.

> +            trace_vfu_cfg_read(offset, val);
> +        }

Why call pci_default_read/write_config() directly instead of
pci_dev->config_read/write()?
Re: [PATCH RFC server v2 06/11] vfio-user: handle PCI config space accesses
Posted by Jag Raman 2 years, 7 months ago

> On Sep 9, 2021, at 3:27 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Fri, Aug 27, 2021 at 01:53:25PM -0400, Jagannathan Raman wrote:
>> +static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
>> +                                     size_t count, loff_t offset,
>> +                                     const bool is_write)
>> +{
>> +    VfuObject *o = vfu_get_private(vfu_ctx);
>> +    uint32_t pci_access_width = sizeof(uint32_t);
>> +    size_t bytes = count;
>> +    uint32_t val = 0;
>> +    char *ptr = buf;
>> +    int len;
>> +
>> +    while (bytes > 0) {
>> +        len = (bytes > pci_access_width) ? pci_access_width : bytes;
>> +        if (is_write) {
>> +            memcpy(&val, ptr, len);
>> +            pci_default_write_config(PCI_DEVICE(o->pci_dev),
>> +                                     offset, val, len);
>> +            trace_vfu_cfg_write(offset, val);
>> +        } else {
>> +            val = pci_default_read_config(PCI_DEVICE(o->pci_dev),
>> +                                          offset, len);
>> +            memcpy(ptr, &val, len);
> 
> pci_default_read_config() returns a host-endian 32-bit value. This code
> looks wrong because it copies different bytes on big- and little-endian
> hosts.

I’ll collect more details on this one, trying to wrap my head around it.

Concerning config space writes, it doesn’t look like we need to
perform any conversion as the store operation automatically happens
in the CPU’s native format when we do something like the following:
PCIDevice->config[addr] = val;

Concerning config read, I observed that pci_default_read_config()
performs le32_to_cpu() conversion. May be we should not rely on
it doing the conversion.

> 
>> +            trace_vfu_cfg_read(offset, val);
>> +        }
> 
> Why call pci_default_read/write_config() directly instead of
> pci_dev->config_read/write()?

That makes sense - we should be calling pci_dev->config_read/write().

After performing pci_dev->config_read(), I’ll convert it to the CPU’s
endianness format using le32_to_cpu(). On big-endian systems,
it would re-order the bytes, and on little-endian systems it would
be a no-op.

--
Jag
Re: [PATCH RFC server v2 06/11] vfio-user: handle PCI config space accesses
Posted by Stefan Hajnoczi 2 years, 7 months ago
On Fri, Sep 10, 2021 at 04:22:56PM +0000, Jag Raman wrote:
> 
> 
> > On Sep 9, 2021, at 3:27 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> > On Fri, Aug 27, 2021 at 01:53:25PM -0400, Jagannathan Raman wrote:
> >> +static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
> >> +                                     size_t count, loff_t offset,
> >> +                                     const bool is_write)
> >> +{
> >> +    VfuObject *o = vfu_get_private(vfu_ctx);
> >> +    uint32_t pci_access_width = sizeof(uint32_t);
> >> +    size_t bytes = count;
> >> +    uint32_t val = 0;
> >> +    char *ptr = buf;
> >> +    int len;
> >> +
> >> +    while (bytes > 0) {
> >> +        len = (bytes > pci_access_width) ? pci_access_width : bytes;
> >> +        if (is_write) {
> >> +            memcpy(&val, ptr, len);
> >> +            pci_default_write_config(PCI_DEVICE(o->pci_dev),
> >> +                                     offset, val, len);
> >> +            trace_vfu_cfg_write(offset, val);
> >> +        } else {
> >> +            val = pci_default_read_config(PCI_DEVICE(o->pci_dev),
> >> +                                          offset, len);
> >> +            memcpy(ptr, &val, len);
> > 
> > pci_default_read_config() returns a host-endian 32-bit value. This code
> > looks wrong because it copies different bytes on big- and little-endian
> > hosts.
> 
> I’ll collect more details on this one, trying to wrap my head around it.
> 
> Concerning config space writes, it doesn’t look like we need to
> perform any conversion as the store operation automatically happens
> in the CPU’s native format when we do something like the following:
> PCIDevice->config[addr] = val;

PCIDevice->config is uint8_t*. Endianness isn't an issue with single
byte accesses.

> 
> Concerning config read, I observed that pci_default_read_config()
> performs le32_to_cpu() conversion. May be we should not rely on
> it doing the conversion.

Yes, ->config_read() returns a host-endian 32-bit value and
->config_write() receives a host-endian 32-bit value (it has a
bit-shifting loop that implicitly handles endianness conversion).

Stefan
[PATCH RFC server v2 07/11] vfio-user: handle DMA mappings
Posted by Jagannathan Raman 2 years, 8 months ago
Define and register callbacks to manage the RAM regions used for
device DMA

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 hw/remote/vfio-user-obj.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++
 hw/remote/trace-events    |  2 ++
 2 files changed, 52 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 13011ce..76fb2d4 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -177,6 +177,49 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
     return count;
 }
 
+static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
+{
+    MemoryRegion *subregion = NULL;
+    g_autofree char *name = NULL;
+    static unsigned int suffix;
+    struct iovec *iov = &info->iova;
+
+    if (!info->vaddr) {
+        return;
+    }
+
+    name = g_strdup_printf("remote-mem-%u", suffix++);
+
+    subregion = g_new0(MemoryRegion, 1);
+
+    memory_region_init_ram_ptr(subregion, NULL, name,
+                               iov->iov_len, info->vaddr);
+
+    memory_region_add_subregion(get_system_memory(), (hwaddr)iov->iov_base,
+                                subregion);
+
+    trace_vfu_dma_register((uint64_t)iov->iov_base, iov->iov_len);
+}
+
+static int dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
+{
+    MemoryRegion *mr = NULL;
+    ram_addr_t offset;
+
+    mr = memory_region_from_host(info->vaddr, &offset);
+    if (!mr) {
+        return 0;
+    }
+
+    memory_region_del_subregion(get_system_memory(), mr);
+
+    object_unparent((OBJECT(mr)));
+
+    trace_vfu_dma_unregister((uint64_t)info->iova.iov_base);
+
+    return 0;
+}
+
 static void vfu_object_machine_done(Notifier *notifier, void *data)
 {
     VfuObject *o = container_of(notifier, VfuObject, machine_done);
@@ -226,6 +269,13 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
         return;
     }
 
+    ret = vfu_setup_device_dma(o->vfu_ctx, &dma_register, &dma_unregister);
+    if (ret < 0) {
+        error_setg(&error_abort, "vfu: Failed to setup DMA handlers for %s",
+                   o->devid);
+        return;
+    }
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(&error_abort, "vfu: Failed to realize device %s- %s",
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 2ef7884..f945c7e 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -7,3 +7,5 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d,
 vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
 vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
 vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
+vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", %zu bytes"
+vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
-- 
1.8.3.1


Re: [PATCH RFC server v2 07/11] vfio-user: handle DMA mappings
Posted by Stefan Hajnoczi 2 years, 7 months ago
On Fri, Aug 27, 2021 at 01:53:26PM -0400, Jagannathan Raman wrote:
> Define and register callbacks to manage the RAM regions used for
> device DMA
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  hw/remote/vfio-user-obj.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++
>  hw/remote/trace-events    |  2 ++
>  2 files changed, 52 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[PATCH RFC server v2 08/11] vfio-user: handle PCI BAR accesses
Posted by Jagannathan Raman 2 years, 8 months ago
Determine the BARs used by the PCI device and register handlers to
manage the access to the same.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 hw/remote/vfio-user-obj.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++
 hw/remote/trace-events    |  2 +
 2 files changed, 97 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 76fb2d4..299c938 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -220,6 +220,99 @@ static int dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
     return 0;
 }
 
+static ssize_t vfu_object_bar_rw(PCIDevice *pci_dev, hwaddr addr, size_t count,
+                                 char * const buf, const bool is_write,
+                                 uint8_t type)
+{
+    AddressSpace *as = NULL;
+    MemTxResult res;
+
+    if (type == PCI_BASE_ADDRESS_SPACE_MEMORY) {
+        as = pci_device_iommu_address_space(pci_dev);
+    } else {
+        as = &address_space_io;
+    }
+
+    trace_vfu_bar_rw_enter(is_write ? "Write" : "Read", (uint64_t)addr);
+
+    res = address_space_rw(as, addr, MEMTXATTRS_UNSPECIFIED, (void *)buf,
+                           (hwaddr)count, is_write);
+    if (res != MEMTX_OK) {
+        warn_report("vfu: failed to %s 0x%"PRIx64"",
+                    is_write ? "write to" : "read from",
+                    addr);
+        return -1;
+    }
+
+    trace_vfu_bar_rw_exit(is_write ? "Write" : "Read", (uint64_t)addr);
+
+    return count;
+}
+
+/**
+ * VFU_OBJECT_BAR_HANDLER - macro for defining handlers for PCI BARs.
+ *
+ * To create handler for BAR number 2, VFU_OBJECT_BAR_HANDLER(2) would
+ * define vfu_object_bar2_handler
+ */
+#define VFU_OBJECT_BAR_HANDLER(BAR_NO)                                         \
+    static ssize_t vfu_object_bar##BAR_NO##_handler(vfu_ctx_t *vfu_ctx,        \
+                                        char * const buf, size_t count,        \
+                                        loff_t offset, const bool is_write)    \
+    {                                                                          \
+        VfuObject *o = vfu_get_private(vfu_ctx);                               \
+        hwaddr addr = (hwaddr)(pci_get_long(o->pci_dev->config +               \
+                                            PCI_BASE_ADDRESS_0 +               \
+                                            (4 * BAR_NO)) + offset);           \
+                                                                               \
+        return vfu_object_bar_rw(o->pci_dev, addr, count, buf, is_write,       \
+                                 o->pci_dev->io_regions[BAR_NO].type);         \
+    }                                                                          \
+
+VFU_OBJECT_BAR_HANDLER(0)
+VFU_OBJECT_BAR_HANDLER(1)
+VFU_OBJECT_BAR_HANDLER(2)
+VFU_OBJECT_BAR_HANDLER(3)
+VFU_OBJECT_BAR_HANDLER(4)
+VFU_OBJECT_BAR_HANDLER(5)
+
+static vfu_region_access_cb_t *vfu_object_bar_handlers[PCI_NUM_REGIONS] = {
+    &vfu_object_bar0_handler,
+    &vfu_object_bar1_handler,
+    &vfu_object_bar2_handler,
+    &vfu_object_bar3_handler,
+    &vfu_object_bar4_handler,
+    &vfu_object_bar5_handler,
+};
+
+/**
+ * vfu_object_register_bars - Identify active BAR regions of pdev and setup
+ *                            callbacks to handle read/write accesses
+ */
+static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
+{
+    uint32_t orig_val, new_val;
+    int i, size;
+
+    for (i = 0; i < PCI_NUM_REGIONS; i++) {
+        orig_val = pci_default_read_config(pdev,
+                                           PCI_BASE_ADDRESS_0 + (4 * i), 4);
+        new_val = 0xffffffff;
+        pci_default_write_config(pdev,
+                                 PCI_BASE_ADDRESS_0 + (4 * i), new_val, 4);
+        new_val = pci_default_read_config(pdev,
+                                          PCI_BASE_ADDRESS_0 + (4 * i), 4);
+        size = (~(new_val & 0xFFFFFFF0)) + 1;
+        pci_default_write_config(pdev, PCI_BASE_ADDRESS_0 + (4 * i),
+                                 orig_val, 4);
+        if (size) {
+            vfu_setup_region(vfu_ctx, VFU_PCI_DEV_BAR0_REGION_IDX + i, size,
+                             vfu_object_bar_handlers[i], VFU_REGION_FLAG_RW,
+                             NULL, 0, -1, 0);
+        }
+    }
+}
+
 static void vfu_object_machine_done(Notifier *notifier, void *data)
 {
     VfuObject *o = container_of(notifier, VfuObject, machine_done);
@@ -276,6 +369,8 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
         return;
     }
 
+    vfu_object_register_bars(o->vfu_ctx, o->pci_dev);
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(&error_abort, "vfu: Failed to realize device %s- %s",
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index f945c7e..f3f65e2 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -9,3 +9,5 @@ vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
 vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
 vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", %zu bytes"
 vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
+vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR address 0x%"PRIx64""
+vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR address 0x%"PRIx64""
-- 
1.8.3.1


Re: [PATCH RFC server v2 08/11] vfio-user: handle PCI BAR accesses
Posted by Stefan Hajnoczi 2 years, 7 months ago
On Fri, Aug 27, 2021 at 01:53:27PM -0400, Jagannathan Raman wrote:
> +/**
> + * VFU_OBJECT_BAR_HANDLER - macro for defining handlers for PCI BARs.
> + *
> + * To create handler for BAR number 2, VFU_OBJECT_BAR_HANDLER(2) would
> + * define vfu_object_bar2_handler
> + */
> +#define VFU_OBJECT_BAR_HANDLER(BAR_NO)                                         \
> +    static ssize_t vfu_object_bar##BAR_NO##_handler(vfu_ctx_t *vfu_ctx,        \
> +                                        char * const buf, size_t count,        \
> +                                        loff_t offset, const bool is_write)    \
> +    {                                                                          \
> +        VfuObject *o = vfu_get_private(vfu_ctx);                               \
> +        hwaddr addr = (hwaddr)(pci_get_long(o->pci_dev->config +               \
> +                                            PCI_BASE_ADDRESS_0 +               \
> +                                            (4 * BAR_NO)) + offset);           \

Does this handle 64-bit BARs?

> +/**
> + * vfu_object_register_bars - Identify active BAR regions of pdev and setup
> + *                            callbacks to handle read/write accesses
> + */
> +static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
> +{
> +    uint32_t orig_val, new_val;
> +    int i, size;
> +
> +    for (i = 0; i < PCI_NUM_REGIONS; i++) {
> +        orig_val = pci_default_read_config(pdev,
> +                                           PCI_BASE_ADDRESS_0 + (4 * i), 4);

Same question as in an earlier patch: should we call pdev->read_config()?
Re: [PATCH RFC server v2 08/11] vfio-user: handle PCI BAR accesses
Posted by Jag Raman 2 years, 7 months ago

> On Sep 9, 2021, at 3:37 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Fri, Aug 27, 2021 at 01:53:27PM -0400, Jagannathan Raman wrote:
>> +/**
>> + * VFU_OBJECT_BAR_HANDLER - macro for defining handlers for PCI BARs.
>> + *
>> + * To create handler for BAR number 2, VFU_OBJECT_BAR_HANDLER(2) would
>> + * define vfu_object_bar2_handler
>> + */
>> +#define VFU_OBJECT_BAR_HANDLER(BAR_NO)                                         \
>> +    static ssize_t vfu_object_bar##BAR_NO##_handler(vfu_ctx_t *vfu_ctx,        \
>> +                                        char * const buf, size_t count,        \
>> +                                        loff_t offset, const bool is_write)    \
>> +    {                                                                          \
>> +        VfuObject *o = vfu_get_private(vfu_ctx);                               \
>> +        hwaddr addr = (hwaddr)(pci_get_long(o->pci_dev->config +               \
>> +                                            PCI_BASE_ADDRESS_0 +               \
>> +                                            (4 * BAR_NO)) + offset);           \
> 
> Does this handle 64-bit BARs?

It presently only handles 32-bit BARs. We’ll add support for 64-bit BARs in the next rev
of this series.

> 
>> +/**
>> + * vfu_object_register_bars - Identify active BAR regions of pdev and setup
>> + *                            callbacks to handle read/write accesses
>> + */
>> +static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
>> +{
>> +    uint32_t orig_val, new_val;
>> +    int i, size;
>> +
>> +    for (i = 0; i < PCI_NUM_REGIONS; i++) {
>> +        orig_val = pci_default_read_config(pdev,
>> +                                           PCI_BASE_ADDRESS_0 + (4 * i), 4);
> 
> Same question as in an earlier patch: should we call pdev->read_config()?

Sure, will do.

--
Jag

[PATCH RFC server v2 09/11] vfio-user: handle device interrupts
Posted by Jagannathan Raman 2 years, 8 months ago
Forward remote device's interrupts to the guest

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 include/hw/remote/iohub.h |  2 ++
 hw/remote/iohub.c         |  5 +++++
 hw/remote/vfio-user-obj.c | 30 ++++++++++++++++++++++++++++++
 hw/remote/trace-events    |  1 +
 4 files changed, 38 insertions(+)

diff --git a/include/hw/remote/iohub.h b/include/hw/remote/iohub.h
index 0bf98e0..d5bd0b0 100644
--- a/include/hw/remote/iohub.h
+++ b/include/hw/remote/iohub.h
@@ -15,6 +15,7 @@
 #include "qemu/event_notifier.h"
 #include "qemu/thread-posix.h"
 #include "hw/remote/mpqemu-link.h"
+#include "libvfio-user.h"
 
 #define REMOTE_IOHUB_NB_PIRQS    PCI_DEVFN_MAX
 
@@ -30,6 +31,7 @@ typedef struct RemoteIOHubState {
     unsigned int irq_level[REMOTE_IOHUB_NB_PIRQS];
     ResampleToken token[REMOTE_IOHUB_NB_PIRQS];
     QemuMutex irq_level_lock[REMOTE_IOHUB_NB_PIRQS];
+    vfu_ctx_t *vfu_ctx[REMOTE_IOHUB_NB_PIRQS];
 } RemoteIOHubState;
 
 int remote_iohub_map_irq(PCIDevice *pci_dev, int intx);
diff --git a/hw/remote/iohub.c b/hw/remote/iohub.c
index 547d597..9410233 100644
--- a/hw/remote/iohub.c
+++ b/hw/remote/iohub.c
@@ -18,6 +18,7 @@
 #include "hw/remote/machine.h"
 #include "hw/remote/iohub.h"
 #include "qemu/main-loop.h"
+#include "trace.h"
 
 void remote_iohub_init(RemoteIOHubState *iohub)
 {
@@ -62,6 +63,10 @@ void remote_iohub_set_irq(void *opaque, int pirq, int level)
     QEMU_LOCK_GUARD(&iohub->irq_level_lock[pirq]);
 
     if (level) {
+        if (iohub->vfu_ctx[pirq]) {
+            trace_vfu_interrupt(pirq);
+            vfu_irq_trigger(iohub->vfu_ctx[pirq], 0);
+        }
         if (++iohub->irq_level[pirq] == 1) {
             event_notifier_set(&iohub->irqfds[pirq]);
         }
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 299c938..92605ed 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -42,6 +42,9 @@
 #include "libvfio-user.h"
 #include "hw/qdev-core.h"
 #include "hw/pci/pci.h"
+#include "hw/boards.h"
+#include "hw/remote/iohub.h"
+#include "hw/remote/machine.h"
 
 #define TYPE_VFU_OBJECT "vfio-user"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -313,6 +316,26 @@ static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
     }
 }
 
+static int vfu_object_setup_irqs(vfu_ctx_t *vfu_ctx, PCIDevice *pci_dev)
+{
+    RemoteMachineState *machine = REMOTE_MACHINE(current_machine);
+    RemoteIOHubState *iohub = &machine->iohub;
+    int pirq, intx, ret;
+
+    ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_INTX_IRQ, 1);
+    if (ret < 0) {
+        return ret;
+    }
+
+    intx = pci_get_byte(pci_dev->config + PCI_INTERRUPT_PIN) - 1;
+
+    pirq = remote_iohub_map_irq(pci_dev, intx);
+
+    iohub->vfu_ctx[pirq] = vfu_ctx;
+
+    return 0;
+}
+
 static void vfu_object_machine_done(Notifier *notifier, void *data)
 {
     VfuObject *o = container_of(notifier, VfuObject, machine_done);
@@ -371,6 +394,13 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
 
     vfu_object_register_bars(o->vfu_ctx, o->pci_dev);
 
+    ret = vfu_object_setup_irqs(o->vfu_ctx, o->pci_dev);
+    if (ret < 0) {
+        error_setg(&error_abort, "vfu: Failed to setup interrupts for %s",
+                   o->devid);
+        return;
+    }
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(&error_abort, "vfu: Failed to realize device %s- %s",
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index f3f65e2..b419d6f 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -11,3 +11,4 @@ vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", %z
 vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
 vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR address 0x%"PRIx64""
 vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR address 0x%"PRIx64""
+vfu_interrupt(int pirq) "vfu: sending interrupt to device - PIRQ %d"
-- 
1.8.3.1


[PATCH RFC server v2 10/11] vfio-user: register handlers to facilitate migration
Posted by Jagannathan Raman 2 years, 8 months ago
Store and load the device's state during migration. use libvfio-user's
handlers for this purpose

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 migration/savevm.h        |   2 +
 hw/remote/vfio-user-obj.c | 313 ++++++++++++++++++++++++++++++++++++++++++++++
 migration/savevm.c        |  73 +++++++++++
 3 files changed, 388 insertions(+)

diff --git a/migration/savevm.h b/migration/savevm.h
index 6461342..8007064 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -67,5 +67,7 @@ int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
 int qemu_load_device_state(QEMUFile *f);
 int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
         bool in_postcopy, bool inactivate_disks);
+int qemu_remote_savevm(QEMUFile *f, DeviceState *dev);
+int qemu_remote_loadvm(QEMUFile *f);
 
 #endif
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 92605ed..16cf515 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -45,6 +45,10 @@
 #include "hw/boards.h"
 #include "hw/remote/iohub.h"
 #include "hw/remote/machine.h"
+#include "migration/qemu-file.h"
+#include "migration/savevm.h"
+#include "migration/global_state.h"
+#include "block/block.h"
 
 #define TYPE_VFU_OBJECT "vfio-user"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -72,6 +76,33 @@ struct VfuObject {
     PCIDevice *pci_dev;
 
     int vfu_poll_fd;
+
+    /*
+     * vfu_mig_buf holds the migration data. In the remote server, this
+     * buffer replaces the role of an IO channel which links the source
+     * and the destination.
+     *
+     * Whenever the client QEMU process initiates migration, the remote
+     * server gets notified via libvfio-user callbacks. The remote server
+     * sets up a QEMUFile object using this buffer as backend. The remote
+     * server passes this object to its migration subsystem, which slurps
+     * the VMSD of the device ('devid' above) referenced by this object
+     * and stores the VMSD in this buffer.
+     *
+     * The client subsequetly asks the remote server for any data that
+     * needs to be moved over to the destination via libvfio-user
+     * library's vfu_migration_callbacks_t callbacks. The remote hands
+     * over this buffer as data at this time.
+     *
+     * A reverse of this process happens at the destination.
+     */
+    uint8_t *vfu_mig_buf;
+
+    uint64_t vfu_mig_buf_size;
+
+    uint64_t vfu_mig_buf_pending;
+
+    QEMUFile *vfu_mig_file;
 };
 
 static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
@@ -96,6 +127,250 @@ static void vfu_object_set_devid(Object *obj, const char *str, Error **errp)
     trace_vfu_prop("devid", str);
 }
 
+/**
+ * Migration helper functions
+ *
+ * vfu_mig_buf_read & vfu_mig_buf_write are used by QEMU's migration
+ * subsystem - qemu_remote_loadvm & qemu_remote_savevm. loadvm/savevm
+ * call these functions via QEMUFileOps to load/save the VMSD of a
+ * device into vfu_mig_buf
+ *
+ */
+static ssize_t vfu_mig_buf_read(void *opaque, uint8_t *buf, int64_t pos,
+                                size_t size, Error **errp)
+{
+    VfuObject *o = opaque;
+
+    if (pos > o->vfu_mig_buf_size) {
+        size = 0;
+    } else if ((pos + size) > o->vfu_mig_buf_size) {
+        size = o->vfu_mig_buf_size;
+    }
+
+    memcpy(buf, (o->vfu_mig_buf + pos), size);
+
+    o->vfu_mig_buf_size -= size;
+
+    return size;
+}
+
+static ssize_t vfu_mig_buf_write(void *opaque, struct iovec *iov, int iovcnt,
+                                 int64_t pos, Error **errp)
+{
+    VfuObject *o = opaque;
+    uint64_t end = pos + iov_size(iov, iovcnt);
+    int i;
+
+    if (end > o->vfu_mig_buf_size) {
+        o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
+    }
+
+    for (i = 0; i < iovcnt; i++) {
+        memcpy((o->vfu_mig_buf + o->vfu_mig_buf_size), iov[i].iov_base,
+               iov[i].iov_len);
+        o->vfu_mig_buf_size += iov[i].iov_len;
+        o->vfu_mig_buf_pending += iov[i].iov_len;
+    }
+
+    return iov_size(iov, iovcnt);
+}
+
+static int vfu_mig_buf_shutdown(void *opaque, bool rd, bool wr, Error **errp)
+{
+    VfuObject *o = opaque;
+
+    o->vfu_mig_buf_size = 0;
+
+    g_free(o->vfu_mig_buf);
+
+    return 0;
+}
+
+static const QEMUFileOps vfu_mig_fops_save = {
+    .writev_buffer  = vfu_mig_buf_write,
+    .shut_down      = vfu_mig_buf_shutdown,
+};
+
+static const QEMUFileOps vfu_mig_fops_load = {
+    .get_buffer     = vfu_mig_buf_read,
+    .shut_down      = vfu_mig_buf_shutdown,
+};
+
+/**
+ * handlers for vfu_migration_callbacks_t
+ *
+ * The libvfio-user library accesses these handlers to drive the migration
+ * at the remote end, and also to transport the data stored in vfu_mig_buf
+ *
+ */
+static void vfu_mig_state_precopy(vfu_ctx_t *vfu_ctx)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    int ret;
+
+    if (!o->vfu_mig_file) {
+        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_save, false);
+    }
+
+    ret = qemu_remote_savevm(o->vfu_mig_file, DEVICE(o->pci_dev));
+    if (ret) {
+        qemu_file_shutdown(o->vfu_mig_file);
+        return;
+    }
+
+    qemu_fflush(o->vfu_mig_file);
+}
+
+static void vfu_mig_state_running(vfu_ctx_t *vfu_ctx)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(OBJECT(o));
+    static int migrated_devs;
+    Error *local_err = NULL;
+    int ret;
+
+    ret = qemu_remote_loadvm(o->vfu_mig_file);
+    if (ret) {
+        error_setg(&error_abort, "vfu: failed to restore device state");
+        return;
+    }
+
+    if (++migrated_devs == k->nr_devs) {
+        bdrv_invalidate_cache_all(&local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            return;
+        }
+
+        vm_start();
+    }
+}
+
+static void vfu_mig_state_stop(vfu_ctx_t *vfu_ctx)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(OBJECT(o));
+    static int migrated_devs;
+
+    /**
+     * note: calling bdrv_inactivate_all() is not the best approach.
+     *
+     *  Ideally, we would identify the block devices (if any) indirectly
+     *  linked (such as via a scs-hd device) to each of the migrated devices,
+     *  and inactivate them individually. This is essential while operating
+     *  the server in a storage daemon mode, with devices from different VMs.
+     *
+     *  However, we currently don't have this capability. As such, we need to
+     *  inactivate all devices at the same time when migration is completed.
+     */
+    if (++migrated_devs == k->nr_devs) {
+        bdrv_inactivate_all();
+    }
+}
+
+static int vfu_mig_transition(vfu_ctx_t *vfu_ctx, vfu_migr_state_t state)
+{
+    switch (state) {
+    case VFU_MIGR_STATE_RESUME:
+    case VFU_MIGR_STATE_STOP_AND_COPY:
+        break;
+    case VFU_MIGR_STATE_STOP:
+        vfu_mig_state_stop(vfu_ctx);
+        break;
+    case VFU_MIGR_STATE_PRE_COPY:
+        vfu_mig_state_precopy(vfu_ctx);
+        break;
+    case VFU_MIGR_STATE_RUNNING:
+        if (!runstate_is_running()) {
+            vfu_mig_state_running(vfu_ctx);
+        }
+        break;
+    default:
+        warn_report("vfu: Unknown migration state %d", state);
+    }
+
+    return 0;
+}
+
+static uint64_t vfu_mig_get_pending_bytes(vfu_ctx_t *vfu_ctx)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+
+    return o->vfu_mig_buf_pending;
+}
+
+static int vfu_mig_prepare_data(vfu_ctx_t *vfu_ctx, uint64_t *offset,
+                                uint64_t *size)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+
+    if (offset) {
+        *offset = 0;
+    }
+
+    if (size) {
+        *size = o->vfu_mig_buf_size;
+    }
+
+    return 0;
+}
+
+static ssize_t vfu_mig_read_data(vfu_ctx_t *vfu_ctx, void *buf,
+                                 uint64_t size, uint64_t offset)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+
+    if (offset > o->vfu_mig_buf_size) {
+        return -1;
+    }
+
+    if ((offset + size) > o->vfu_mig_buf_size) {
+        warn_report("vfu: buffer overflow - check pending_bytes");
+        size = o->vfu_mig_buf_size - offset;
+    }
+
+    memcpy(buf, (o->vfu_mig_buf + offset), size);
+
+    o->vfu_mig_buf_pending -= size;
+
+    return size;
+}
+
+static ssize_t vfu_mig_write_data(vfu_ctx_t *vfu_ctx, void *data,
+                                  uint64_t size, uint64_t offset)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    uint64_t end = offset + size;
+
+    if (end > o->vfu_mig_buf_size) {
+        o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
+        o->vfu_mig_buf_size = end;
+    }
+
+    memcpy((o->vfu_mig_buf + offset), data, size);
+
+    if (!o->vfu_mig_file) {
+        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_load, false);
+    }
+
+    return size;
+}
+
+static int vfu_mig_data_written(vfu_ctx_t *vfu_ctx, uint64_t count)
+{
+    return 0;
+}
+
+static const vfu_migration_callbacks_t vfu_mig_cbs = {
+    .version = VFU_MIGR_CALLBACKS_VERS,
+    .transition = &vfu_mig_transition,
+    .get_pending_bytes = &vfu_mig_get_pending_bytes,
+    .prepare_data = &vfu_mig_prepare_data,
+    .read_data = &vfu_mig_read_data,
+    .data_written = &vfu_mig_data_written,
+    .write_data = &vfu_mig_write_data,
+};
+
 static void vfu_object_ctx_run(void *opaque)
 {
     VfuObject *o = opaque;
@@ -340,6 +615,7 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
 {
     VfuObject *o = container_of(notifier, VfuObject, machine_done);
     DeviceState *dev = NULL;
+    size_t migr_area_size;
     QemuThread thread;
     int ret;
 
@@ -401,6 +677,35 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
         return;
     }
 
+    /*
+     * TODO: The 0x20000 number used below is a temporary. We are working on
+     *     a cleaner fix for this.
+     *
+     *     The libvfio-user library assumes that the remote knows the size of
+     *     the data to be migrated at boot time, but that is not the case with
+     *     VMSDs, as it can contain a variable-size buffer. 0x20000 is used
+     *     as a sufficiently large buffer to demonstrate migration, but that
+     *     cannot be used as a solution.
+     *
+     */
+    ret = vfu_setup_region(o->vfu_ctx, VFU_PCI_DEV_MIGR_REGION_IDX,
+                           0x20000, NULL,
+                           VFU_REGION_FLAG_RW, NULL, 0, -1, 0);
+    if (ret < 0) {
+        error_setg(&error_abort, "vfu: Failed to register migration BAR %s- %s",
+                   o->devid, strerror(errno));
+        return;
+    }
+
+    migr_area_size = vfu_get_migr_register_area_size();
+    ret = vfu_setup_device_migration_callbacks(o->vfu_ctx, &vfu_mig_cbs,
+                                               migr_area_size);
+    if (ret < 0) {
+        error_setg(&error_abort, "vfu: Failed to setup migration %s- %s",
+                   o->devid, strerror(errno));
+        return;
+    }
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(&error_abort, "vfu: Failed to realize device %s- %s",
@@ -435,6 +740,14 @@ static void vfu_object_init(Object *obj)
     qemu_add_machine_init_done_notifier(&o->machine_done);
 
     o->vfu_poll_fd = -1;
+
+    o->vfu_mig_file = NULL;
+
+    o->vfu_mig_buf = NULL;
+
+    o->vfu_mig_buf_size = 0;
+
+    o->vfu_mig_buf_pending = 0;
 }
 
 static void vfu_object_finalize(Object *obj)
diff --git a/migration/savevm.c b/migration/savevm.c
index 7b7b64b..341fde7 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1604,6 +1604,49 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
     return ret;
 }
 
+static SaveStateEntry *find_se_from_dev(DeviceState *dev)
+{
+    SaveStateEntry *se;
+
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (se->opaque == dev) {
+            return se;
+        }
+    }
+
+    return NULL;
+}
+
+int qemu_remote_savevm(QEMUFile *f, DeviceState *dev)
+{
+    SaveStateEntry *se;
+    int ret = 0;
+
+    se = find_se_from_dev(dev);
+    if (!se) {
+        return -ENODEV;
+    }
+
+    if (!se->vmsd || !vmstate_save_needed(se->vmsd, se->opaque)) {
+        return ret;
+    }
+
+    save_section_header(f, se, QEMU_VM_SECTION_FULL);
+
+    ret = vmstate_save(f, se, NULL);
+    if (ret) {
+        qemu_file_set_error(f, ret);
+        return ret;
+    }
+
+    save_section_footer(f, se);
+
+    qemu_put_byte(f, QEMU_VM_EOF);
+    qemu_fflush(f);
+
+    return 0;
+}
+
 void qemu_savevm_live_state(QEMUFile *f)
 {
     /* save QEMU_VM_SECTION_END section */
@@ -2444,6 +2487,36 @@ qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
     return 0;
 }
 
+int qemu_remote_loadvm(QEMUFile *f)
+{
+    uint8_t section_type;
+    int ret = 0;
+
+    while (true) {
+        section_type = qemu_get_byte(f);
+
+        ret = qemu_file_get_error(f);
+        if (ret) {
+            break;
+        }
+
+        switch (section_type) {
+        case QEMU_VM_SECTION_FULL:
+            ret = qemu_loadvm_section_start_full(f, NULL);
+            if (ret < 0) {
+                break;
+            }
+            break;
+        case QEMU_VM_EOF:
+            return ret;
+        default:
+            return -EINVAL;
+        }
+    }
+
+    return ret;
+}
+
 static int
 qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
 {
-- 
1.8.3.1


Re: [PATCH RFC server v2 10/11] vfio-user: register handlers to facilitate migration
Posted by Stefan Hajnoczi 2 years, 7 months ago
On Fri, Aug 27, 2021 at 01:53:29PM -0400, Jagannathan Raman wrote:
> +static ssize_t vfu_mig_buf_read(void *opaque, uint8_t *buf, int64_t pos,
> +                                size_t size, Error **errp)
> +{
> +    VfuObject *o = opaque;
> +
> +    if (pos > o->vfu_mig_buf_size) {
> +        size = 0;
> +    } else if ((pos + size) > o->vfu_mig_buf_size) {
> +        size = o->vfu_mig_buf_size;
> +    }
> +
> +    memcpy(buf, (o->vfu_mig_buf + pos), size);
> +
> +    o->vfu_mig_buf_size -= size;

This looks strange. pos increases each time we're called. We seem to be
truncating the buffer on each read. Should this line be dropped? Did you
test live migration (maybe this code needs more debugging)?

> +
> +    return size;
> +}
> +
> +static ssize_t vfu_mig_buf_write(void *opaque, struct iovec *iov, int iovcnt,
> +                                 int64_t pos, Error **errp)
> +{
> +    VfuObject *o = opaque;
> +    uint64_t end = pos + iov_size(iov, iovcnt);
> +    int i;
> +
> +    if (end > o->vfu_mig_buf_size) {
> +        o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
> +    }
> +
> +    for (i = 0; i < iovcnt; i++) {
> +        memcpy((o->vfu_mig_buf + o->vfu_mig_buf_size), iov[i].iov_base,
> +               iov[i].iov_len);
> +        o->vfu_mig_buf_size += iov[i].iov_len;
> +        o->vfu_mig_buf_pending += iov[i].iov_len;
> +    }
> +
> +    return iov_size(iov, iovcnt);
> +}
> +
> +static int vfu_mig_buf_shutdown(void *opaque, bool rd, bool wr, Error **errp)
> +{
> +    VfuObject *o = opaque;
> +
> +    o->vfu_mig_buf_size = 0;
> +
> +    g_free(o->vfu_mig_buf);
> +
> +    return 0;
> +}
> +
> +static const QEMUFileOps vfu_mig_fops_save = {
> +    .writev_buffer  = vfu_mig_buf_write,
> +    .shut_down      = vfu_mig_buf_shutdown,
> +};
> +
> +static const QEMUFileOps vfu_mig_fops_load = {
> +    .get_buffer     = vfu_mig_buf_read,
> +    .shut_down      = vfu_mig_buf_shutdown,
> +};
> +
> +/**
> + * handlers for vfu_migration_callbacks_t
> + *
> + * The libvfio-user library accesses these handlers to drive the migration
> + * at the remote end, and also to transport the data stored in vfu_mig_buf
> + *
> + */
> +static void vfu_mig_state_precopy(vfu_ctx_t *vfu_ctx)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    int ret;
> +
> +    if (!o->vfu_mig_file) {
> +        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_save, false);
> +    }
> +
> +    ret = qemu_remote_savevm(o->vfu_mig_file, DEVICE(o->pci_dev));
> +    if (ret) {
> +        qemu_file_shutdown(o->vfu_mig_file);
> +        return;
> +    }
> +
> +    qemu_fflush(o->vfu_mig_file);
> +}

Are you sure pre-copy is the state where you want to serialize the
savevm data? IIUC pre-copy is the iterative state while the device is
still running (e.g. when copying RAM but before devices are stopped). I
expected savevm to happen when we reach stop-and-copy.

The reason why this matters is that we're saving the state of the device
while the guest is still running and possibly interacting with the
device. The destination won't have the final state of the device, it
will have an earlier state of the device when we started migrating RAM!

Maybe I'm wrong, please double-check, but this looks like a bug.

> +
> +static void vfu_mig_state_running(vfu_ctx_t *vfu_ctx)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(OBJECT(o));
> +    static int migrated_devs;
> +    Error *local_err = NULL;
> +    int ret;
> +
> +    ret = qemu_remote_loadvm(o->vfu_mig_file);
> +    if (ret) {
> +        error_setg(&error_abort, "vfu: failed to restore device state");
> +        return;
> +    }
> +
> +    if (++migrated_devs == k->nr_devs) {
> +        bdrv_invalidate_cache_all(&local_err);
> +        if (local_err) {
> +            error_report_err(local_err);
> +            return;
> +        }
> +
> +        vm_start();
> +    }
> +}

This looks like it's intended for the destination side. Does this code
work on the source side if the device is transitioned back into the
running state?

> +
> +static void vfu_mig_state_stop(vfu_ctx_t *vfu_ctx)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(OBJECT(o));
> +    static int migrated_devs;
> +
> +    /**
> +     * note: calling bdrv_inactivate_all() is not the best approach.
> +     *
> +     *  Ideally, we would identify the block devices (if any) indirectly
> +     *  linked (such as via a scs-hd device) to each of the migrated devices,
> +     *  and inactivate them individually. This is essential while operating
> +     *  the server in a storage daemon mode, with devices from different VMs.
> +     *
> +     *  However, we currently don't have this capability. As such, we need to
> +     *  inactivate all devices at the same time when migration is completed.
> +     */
> +    if (++migrated_devs == k->nr_devs) {
> +        bdrv_inactivate_all();
> +    }
> +}
> +
> +static int vfu_mig_transition(vfu_ctx_t *vfu_ctx, vfu_migr_state_t state)
> +{
> +    switch (state) {
> +    case VFU_MIGR_STATE_RESUME:
> +    case VFU_MIGR_STATE_STOP_AND_COPY:
> +        break;
> +    case VFU_MIGR_STATE_STOP:
> +        vfu_mig_state_stop(vfu_ctx);
> +        break;
> +    case VFU_MIGR_STATE_PRE_COPY:
> +        vfu_mig_state_precopy(vfu_ctx);
> +        break;
> +    case VFU_MIGR_STATE_RUNNING:
> +        if (!runstate_is_running()) {
> +            vfu_mig_state_running(vfu_ctx);
> +        }
> +        break;
> +    default:
> +        warn_report("vfu: Unknown migration state %d", state);
> +    }
> +
> +    return 0;
> +}
> +
> +static uint64_t vfu_mig_get_pending_bytes(vfu_ctx_t *vfu_ctx)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +
> +    return o->vfu_mig_buf_pending;
> +}
> +
> +static int vfu_mig_prepare_data(vfu_ctx_t *vfu_ctx, uint64_t *offset,
> +                                uint64_t *size)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +
> +    if (offset) {
> +        *offset = 0;
> +    }
> +
> +    if (size) {
> +        *size = o->vfu_mig_buf_size;
> +    }
> +
> +    return 0;
> +}
> +
> +static ssize_t vfu_mig_read_data(vfu_ctx_t *vfu_ctx, void *buf,
> +                                 uint64_t size, uint64_t offset)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +
> +    if (offset > o->vfu_mig_buf_size) {
> +        return -1;
> +    }
> +
> +    if ((offset + size) > o->vfu_mig_buf_size) {
> +        warn_report("vfu: buffer overflow - check pending_bytes");
> +        size = o->vfu_mig_buf_size - offset;
> +    }
> +
> +    memcpy(buf, (o->vfu_mig_buf + offset), size);
> +
> +    o->vfu_mig_buf_pending -= size;
> +
> +    return size;
> +}
> +
> +static ssize_t vfu_mig_write_data(vfu_ctx_t *vfu_ctx, void *data,
> +                                  uint64_t size, uint64_t offset)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    uint64_t end = offset + size;
> +
> +    if (end > o->vfu_mig_buf_size) {
> +        o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
> +        o->vfu_mig_buf_size = end;
> +    }
> +
> +    memcpy((o->vfu_mig_buf + offset), data, size);
> +
> +    if (!o->vfu_mig_file) {
> +        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_load, false);
> +    }

Why open the file here where it's not accessed? I expected this to
happen at the point where data has been fully written and we call
qemu_remote_loadvm().

> +
> +    return size;
> +}
> +
> +static int vfu_mig_data_written(vfu_ctx_t *vfu_ctx, uint64_t count)
> +{
> +    return 0;
> +}
> +
> +static const vfu_migration_callbacks_t vfu_mig_cbs = {
> +    .version = VFU_MIGR_CALLBACKS_VERS,
> +    .transition = &vfu_mig_transition,
> +    .get_pending_bytes = &vfu_mig_get_pending_bytes,
> +    .prepare_data = &vfu_mig_prepare_data,
> +    .read_data = &vfu_mig_read_data,
> +    .data_written = &vfu_mig_data_written,
> +    .write_data = &vfu_mig_write_data,
> +};
> +
>  static void vfu_object_ctx_run(void *opaque)
>  {
>      VfuObject *o = opaque;
> @@ -340,6 +615,7 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
>  {
>      VfuObject *o = container_of(notifier, VfuObject, machine_done);
>      DeviceState *dev = NULL;
> +    size_t migr_area_size;
>      QemuThread thread;
>      int ret;
>  
> @@ -401,6 +677,35 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
>          return;
>      }
>  
> +    /*
> +     * TODO: The 0x20000 number used below is a temporary. We are working on
> +     *     a cleaner fix for this.
> +     *
> +     *     The libvfio-user library assumes that the remote knows the size of
> +     *     the data to be migrated at boot time, but that is not the case with
> +     *     VMSDs, as it can contain a variable-size buffer. 0x20000 is used
> +     *     as a sufficiently large buffer to demonstrate migration, but that
> +     *     cannot be used as a solution.
> +     *
> +     */

libvfio-user has the vfu_migration_callbacks_t interface that allows the
device to save/load more data regardless of the size of the migration
region. I don't see the issue here since the region doesn't need to be
sized to fit the savevm data?
[PATCH RFC server v2 11/11] vfio-user: acceptance test
Posted by Jagannathan Raman 2 years, 8 months ago
Acceptance test for libvfio-user in QEMU

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 MAINTAINERS                   |  1 +
 tests/acceptance/vfio-user.py | 94 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 95 insertions(+)
 create mode 100644 tests/acceptance/vfio-user.py

diff --git a/MAINTAINERS b/MAINTAINERS
index f9d8092..2c7332b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3392,6 +3392,7 @@ F: include/hw/remote/proxy-memory-listener.h
 F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
 F: hw/remote/vfio-user-obj.c
+F: tests/acceptance/vfio-user.py
 
 EBPF:
 M: Jason Wang <jasowang@redhat.com>
diff --git a/tests/acceptance/vfio-user.py b/tests/acceptance/vfio-user.py
new file mode 100644
index 0000000..ef318d9
--- /dev/null
+++ b/tests/acceptance/vfio-user.py
@@ -0,0 +1,94 @@
+# vfio-user protocol sanity test
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+
+import os
+import socket
+import uuid
+
+from avocado_qemu import Test
+from avocado_qemu import wait_for_console_pattern
+from avocado_qemu import exec_command
+from avocado_qemu import exec_command_and_wait_for_pattern
+
+class VfioUser(Test):
+    """
+    :avocado: tags=vfiouser
+    """
+    KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
+
+    def do_test(self, kernel_url, initrd_url, kernel_command_line,
+                machine_type):
+        """Main test method"""
+        self.require_accelerator('kvm')
+
+        kernel_path = self.fetch_asset(kernel_url)
+        initrd_path = self.fetch_asset(initrd_url)
+
+        socket = os.path.join('/tmp', str(uuid.uuid4()))
+        if os.path.exists(socket):
+            os.remove(socket)
+
+        # Create remote process
+        remote_vm = self.get_vm()
+        remote_vm.add_args('-machine', 'x-remote')
+        remote_vm.add_args('-nodefaults')
+        remote_vm.add_args('-device', 'lsi53c895a,id=lsi1')
+        remote_vm.add_args('-object', 'vfio-user,id=vfioobj1,'
+                           'devid=lsi1,socket='+socket)
+        remote_vm.launch()
+
+        # Create proxy process
+        self.vm.set_console()
+        self.vm.add_args('-machine', machine_type)
+        self.vm.add_args('-accel', 'kvm')
+        self.vm.add_args('-cpu', 'host')
+        self.vm.add_args('-object',
+                         'memory-backend-memfd,id=sysmem-file,size=2G')
+        self.vm.add_args('--numa', 'node,memdev=sysmem-file')
+        self.vm.add_args('-m', '2048')
+        self.vm.add_args('-kernel', kernel_path,
+                         '-initrd', initrd_path,
+                         '-append', kernel_command_line)
+        self.vm.add_args('-device',
+                         'vfio-user-pci,'
+                         'socket='+socket)
+        self.vm.launch()
+        wait_for_console_pattern(self, 'as init process',
+                                 'Kernel panic - not syncing')
+        exec_command(self, 'mount -t sysfs sysfs /sys')
+        exec_command_and_wait_for_pattern(self,
+                                          'cat /sys/bus/pci/devices/*/uevent',
+                                          'PCI_ID=1000:0012')
+
+    def test_multiprocess_x86_64(self):
+        """
+        :avocado: tags=arch:x86_64
+        """
+        kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/x86_64/os/images'
+                      '/pxeboot/vmlinuz')
+        initrd_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/x86_64/os/images'
+                      '/pxeboot/initrd.img')
+        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+                               'console=ttyS0 rdinit=/bin/bash')
+        machine_type = 'pc'
+        self.do_test(kernel_url, initrd_url, kernel_command_line, machine_type)
+
+    def test_multiprocess_aarch64(self):
+        """
+        :avocado: tags=arch:aarch64
+        """
+        kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/aarch64/os/images'
+                      '/pxeboot/vmlinuz')
+        initrd_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/aarch64/os/images'
+                      '/pxeboot/initrd.img')
+        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+                               'rdinit=/bin/bash console=ttyAMA0')
+        machine_type = 'virt,gic-version=3'
+        self.do_test(kernel_url, initrd_url, kernel_command_line, machine_type)
-- 
1.8.3.1