meson.build | 15 +- qapi/qom.json | 32 ++- include/exec/memory.h | 50 +++++ include/hw/remote/ioregionfd.h | 45 ++++ include/hw/remote/machine.h | 1 + include/hw/remote/mpqemu-link.h | 2 + include/hw/remote/proxy.h | 1 + include/hw/remote/remote.h | 31 +++ include/sysemu/kvm.h | 15 ++ linux-headers/ioregionfd.h | 30 +++ linux-headers/linux/kvm.h | 25 +++ accel/kvm/kvm-all.c | 132 ++++++++++++ accel/stubs/kvm-stub.c | 1 + hw/remote/ioregionfd.c | 361 ++++++++++++++++++++++++++++++++ hw/remote/message.c | 38 ++++ hw/remote/proxy.c | 66 +++++- hw/remote/remote-obj.c | 154 ++++++++++++-- softmmu/memory.c | 207 ++++++++++++++++++ Kconfig.host | 3 + MAINTAINERS | 3 + hw/remote/Kconfig | 4 + hw/remote/meson.build | 1 + meson_options.txt | 2 + scripts/meson-buildoptions.sh | 3 + 24 files changed, 1199 insertions(+), 23 deletions(-) create mode 100644 include/hw/remote/ioregionfd.h create mode 100644 include/hw/remote/remote.h create mode 100644 linux-headers/ioregionfd.h create mode 100644 hw/remote/ioregionfd.c
This patchset is an RFC version for the ioregionfd implementation
in QEMU. The kernel patches are to be posted with some fixes as a v4.
For this implementation version 3 of the posted kernel patches was user:
https://lore.kernel.org/kvm/cover.1613828726.git.eafanasova@gmail.com/
The future version will include support for vfio/libvfio-user.
Please refer to the design discussion here proposed by Stefan:
https://lore.kernel.org/all/YXpb1f3KicZxj1oj@stefanha-x1.localdomain/T/
The vfio-user version needed some bug-fixing and it was decided to send
this for multiprocess first.
The ioregionfd is configured currently trough the command line and each
ioregionfd represent an object. This allow for easy parsing and does
not require device/remote object command line option modifications.
The following command line can be used to specify ioregionfd:
<snip>
'-object', 'x-remote-object,id=robj1,devid=lsi0,fd='+str(remote.fileno()),\
'-object', 'ioregionfd-object,id=ioreg2,devid=lsi0,iofd='+str(iord.fileno())+',bar=1',\
'-object', 'ioregionfd-object,id=ioreg3,devid=lsi0,iofd='+str(iord.fileno())+',bar=2',\
</snip>
Proxy side of ioregionfd in this version uses only one file descriptor:
<snip>
'-device', 'x-pci-proxy-dev,id=lsi0,fd='+str(proxy.fileno())+',ioregfd='+str(iowr.fileno()), \
</snip>
This is done for RFC version and my though was that next version will
be for vfio-user, so I have not dedicated much effort to this command
line options.
The multiprocess messaging protocol was extended to support inquiries
by the proxy if device has any ioregionfds.
This RFC implements inquires by proxy about the type of BAR (ioregionfd
or not) and the type of it (memory/io).
Currently there are few limitations in this version of ioregionfd.
- one ioregionfd per bar, only full bar size is supported;
- one file descriptor per device for all of its ioregionfds;
- each remote device runs fd handler for all its BARs in one IOThread;
- proxy supports only one fd.
Some of these limitations will be dropped in the future version.
This RFC is to acquire the feedback/suggestions from the community
on the general approach.
The quick performance test was done for the remote lsi device with
ioregionfd and without for both mem BARs (1 and 2) with help
of the fio tool:
Random R/W:
read IOPS read BW write IOPS write BW
no ioregionfd 889 3559KiB/s 890 3561KiB/s
ioregionfd 938 3756KiB/s 939 3757KiB/s
Sequential Read and Sequential Write:
Sequential read Sequential write
read IOPS read BW write IOPS write BW
no ioregionfd 367k 1434MiB/s 76k 297MiB/s
ioregionfd 374k 1459MiB/s 77.3k 302MiB/s
Please review and send your feedback.
Thank you!
Elena
Elena Ufimtseva (8):
ioregionfd: introduce a syscall and memory API
multiprocess: place RemoteObject definition in a header file
ioregionfd: introduce memory API functions
ioregionfd: Introduce IORegionDFObject type
multiprocess: prepare ioregionfds for remote device
multiprocess: add MPQEMU_CMD_BAR_INFO
multiprocess: add ioregionfd memory region in proxy
multiprocess: handle ioregionfd commands
meson.build | 15 +-
qapi/qom.json | 32 ++-
include/exec/memory.h | 50 +++++
include/hw/remote/ioregionfd.h | 45 ++++
include/hw/remote/machine.h | 1 +
include/hw/remote/mpqemu-link.h | 2 +
include/hw/remote/proxy.h | 1 +
include/hw/remote/remote.h | 31 +++
include/sysemu/kvm.h | 15 ++
linux-headers/ioregionfd.h | 30 +++
linux-headers/linux/kvm.h | 25 +++
accel/kvm/kvm-all.c | 132 ++++++++++++
accel/stubs/kvm-stub.c | 1 +
hw/remote/ioregionfd.c | 361 ++++++++++++++++++++++++++++++++
hw/remote/message.c | 38 ++++
hw/remote/proxy.c | 66 +++++-
hw/remote/remote-obj.c | 154 ++++++++++++--
softmmu/memory.c | 207 ++++++++++++++++++
Kconfig.host | 3 +
MAINTAINERS | 3 +
hw/remote/Kconfig | 4 +
hw/remote/meson.build | 1 +
meson_options.txt | 2 +
scripts/meson-buildoptions.sh | 3 +
24 files changed, 1199 insertions(+), 23 deletions(-)
create mode 100644 include/hw/remote/ioregionfd.h
create mode 100644 include/hw/remote/remote.h
create mode 100644 linux-headers/ioregionfd.h
create mode 100644 hw/remote/ioregionfd.c
--
2.25.1
On Mon, Feb 07, 2022 at 11:22:14PM -0800, Elena Ufimtseva wrote: > This patchset is an RFC version for the ioregionfd implementation > in QEMU. The kernel patches are to be posted with some fixes as a v4. Hi Elena, I will review this on Monday. Thanks! Stefan
On Mon, Feb 07, 2022 at 11:22:14PM -0800, Elena Ufimtseva wrote: > This patchset is an RFC version for the ioregionfd implementation > in QEMU. The kernel patches are to be posted with some fixes as a v4. > > For this implementation version 3 of the posted kernel patches was user: > https://lore.kernel.org/kvm/cover.1613828726.git.eafanasova@gmail.com/ > > The future version will include support for vfio/libvfio-user. > Please refer to the design discussion here proposed by Stefan: > https://lore.kernel.org/all/YXpb1f3KicZxj1oj@stefanha-x1.localdomain/T/ > > The vfio-user version needed some bug-fixing and it was decided to send > this for multiprocess first. > > The ioregionfd is configured currently trough the command line and each > ioregionfd represent an object. This allow for easy parsing and does > not require device/remote object command line option modifications. > > The following command line can be used to specify ioregionfd: > <snip> > '-object', 'x-remote-object,id=robj1,devid=lsi0,fd='+str(remote.fileno()),\ > '-object', 'ioregionfd-object,id=ioreg2,devid=lsi0,iofd='+str(iord.fileno())+',bar=1',\ > '-object', 'ioregionfd-object,id=ioreg3,devid=lsi0,iofd='+str(iord.fileno())+',bar=2',\ Explicit configuration of ioregionfd-object is okay for early prototyping, but what is the plan for integrating this? I guess x-remote-object would query the remote device to find out which ioregionfds need to be registered and the user wouldn't need to specify ioregionfds on the command-line? > </snip> > > Proxy side of ioregionfd in this version uses only one file descriptor: > <snip> > '-device', 'x-pci-proxy-dev,id=lsi0,fd='+str(proxy.fileno())+',ioregfd='+str(iowr.fileno()), \ > </snip> This raises the question of the ioregionfd file descriptor lifecycle. In the end I think it shouldn't be specified on the command-line. Instead the remote device should create it and pass it to QEMU over the mpqemu/remote fd? > > This is done for RFC version and my though was that next version will > be for vfio-user, so I have not dedicated much effort to this command > line options. > > The multiprocess messaging protocol was extended to support inquiries > by the proxy if device has any ioregionfds. > This RFC implements inquires by proxy about the type of BAR (ioregionfd > or not) and the type of it (memory/io). > > Currently there are few limitations in this version of ioregionfd. > - one ioregionfd per bar, only full bar size is supported; > - one file descriptor per device for all of its ioregionfds; > - each remote device runs fd handler for all its BARs in one IOThread; > - proxy supports only one fd. > > Some of these limitations will be dropped in the future version. > This RFC is to acquire the feedback/suggestions from the community > on the general approach. > > The quick performance test was done for the remote lsi device with > ioregionfd and without for both mem BARs (1 and 2) with help > of the fio tool: > > Random R/W: > > read IOPS read BW write IOPS write BW > no ioregionfd 889 3559KiB/s 890 3561KiB/s > ioregionfd 938 3756KiB/s 939 3757KiB/s This is extremely slow, even for random I/O. How does this compare to QEMU running the LSI device without multi-process mode? > Sequential Read and Sequential Write: > > Sequential read Sequential write > read IOPS read BW write IOPS write BW > > no ioregionfd 367k 1434MiB/s 76k 297MiB/s > ioregionfd 374k 1459MiB/s 77.3k 302MiB/s It's normal for read and write IOPS to differ, but the read IOPS are very high. I wonder if caching and read-ahead are hiding the LSI device's actual performance here. What are the fio and QEMU command-lines? In order to benchmark ioregionfd it's best to run a benchmark where the bottleneck is MMIO/PIO dispatch. Otherwise we're looking at some other bottleneck (e.g. physical disk I/O performance) and the MMIO/PIO dispatch cost doesn't affect IOPS significantly. I suggest trying --blockdev null-co,size=64G,id=null0 as the disk instead of a file or host block device. The fio block size should be 4k to minimize the amount of time spent on I/O buffer contents and iodepth=1 because batching multiple requests with iodepth > 0 hides the MMIO/PIO dispatch bottleneck. Stefan
On Mon, Feb 14, 2022 at 02:52:29PM +0000, Stefan Hajnoczi wrote:
> On Mon, Feb 07, 2022 at 11:22:14PM -0800, Elena Ufimtseva wrote:
> > This patchset is an RFC version for the ioregionfd implementation
> > in QEMU. The kernel patches are to be posted with some fixes as a v4.
> >
> > For this implementation version 3 of the posted kernel patches was user:
> > https://lore.kernel.org/kvm/cover.1613828726.git.eafanasova@gmail.com/
> >
> > The future version will include support for vfio/libvfio-user.
> > Please refer to the design discussion here proposed by Stefan:
> > https://lore.kernel.org/all/YXpb1f3KicZxj1oj@stefanha-x1.localdomain/T/
> >
> > The vfio-user version needed some bug-fixing and it was decided to send
> > this for multiprocess first.
> >
> > The ioregionfd is configured currently trough the command line and each
> > ioregionfd represent an object. This allow for easy parsing and does
> > not require device/remote object command line option modifications.
> >
> > The following command line can be used to specify ioregionfd:
> > <snip>
> > '-object', 'x-remote-object,id=robj1,devid=lsi0,fd='+str(remote.fileno()),\
> > '-object', 'ioregionfd-object,id=ioreg2,devid=lsi0,iofd='+str(iord.fileno())+',bar=1',\
> > '-object', 'ioregionfd-object,id=ioreg3,devid=lsi0,iofd='+str(iord.fileno())+',bar=2',\
>
Hi Stefan
Thank you for taking a look!
> Explicit configuration of ioregionfd-object is okay for early
> prototyping, but what is the plan for integrating this? I guess
> x-remote-object would query the remote device to find out which
> ioregionfds need to be registered and the user wouldn't need to specify
> ioregionfds on the command-line?
Yes, this can be done. For some reason I thought that user will be able
to configure the number/size of the regions to be configured as
ioregionfds.
>
> > </snip>
> >
> > Proxy side of ioregionfd in this version uses only one file descriptor:
> > <snip>
> > '-device', 'x-pci-proxy-dev,id=lsi0,fd='+str(proxy.fileno())+',ioregfd='+str(iowr.fileno()), \
> > </snip>
>
> This raises the question of the ioregionfd file descriptor lifecycle. In
> the end I think it shouldn't be specified on the command-line. Instead
> the remote device should create it and pass it to QEMU over the
> mpqemu/remote fd?
Yes, this will be same as vfio-user does.
>
> >
> > This is done for RFC version and my though was that next version will
> > be for vfio-user, so I have not dedicated much effort to this command
> > line options.
> >
> > The multiprocess messaging protocol was extended to support inquiries
> > by the proxy if device has any ioregionfds.
> > This RFC implements inquires by proxy about the type of BAR (ioregionfd
> > or not) and the type of it (memory/io).
> >
> > Currently there are few limitations in this version of ioregionfd.
> > - one ioregionfd per bar, only full bar size is supported;
> > - one file descriptor per device for all of its ioregionfds;
> > - each remote device runs fd handler for all its BARs in one IOThread;
> > - proxy supports only one fd.
> >
> > Some of these limitations will be dropped in the future version.
> > This RFC is to acquire the feedback/suggestions from the community
> > on the general approach.
> >
> > The quick performance test was done for the remote lsi device with
> > ioregionfd and without for both mem BARs (1 and 2) with help
> > of the fio tool:
> >
> > Random R/W:
> >
> > read IOPS read BW write IOPS write BW
> > no ioregionfd 889 3559KiB/s 890 3561KiB/s
> > ioregionfd 938 3756KiB/s 939 3757KiB/s
>
> This is extremely slow, even for random I/O. How does this compare to
> QEMU running the LSI device without multi-process mode?
These tests had the iodepth=256. I have changed this to 1 and tested
without multiprocess, with multiprocess and multiprocess with both mmio
regions as ioregionfds:
read IOPS read BW(KiB/s) write IOPS write BW (KiB/s)
no multiprocess 89 358 90 360
multiprocess 138 556 139 557
multiprocess ioregionfd 174 698 173 693
The fio config for randomrw:
[global]
bs=4K
iodepth=1
direct=0
ioengine=libaio
group_reporting
time_based
runtime=240
numjobs=1
name=raw-randreadwrite
rw=randrw
size=8G
[job1]
filename=/fio/randomrw
And QEMU command line for non-mutliprocess:
/usr/local/bin/qemu-system-x86_64 -name "OL7.4" -machine q35,accel=kvm -smp sockets=1,cores=2,threads=2 -m 2048 -hda /home/homedir/ol7u9boot.img -boot d -vnc :0 -chardev stdio,id=seabios -device isa-debugcon,iobase=0x402,chardev=seabios -device lsi53c895a,id=lsi1 -drive id=drive_image1,if=none,file=/home/homedir/10gb.qcow2 -device scsi-hd,id=drive1,drive=drive_image1,bus=lsi1.0,scsi-id=0
QEMU command line for multiprocess:
remote_cmd = [ PROC_QEMU, \
'-machine', 'x-remote', \
'-device', 'lsi53c895a,id=lsi0', \
'-drive', 'id=drive_image1,file=/home/homedir/10gb.qcow2', \
'-device', 'scsi-hd,id=drive2,drive=drive_image1,bus=lsi0.0,' \
'scsi-id=0', \
'-nographic', \
'-monitor', 'unix:/home/homedir/rem-sock,server,nowait', \
'-object', 'x-remote-object,id=robj1,devid=lsi0,fd='+str(remote.fileno()),\
'-object', 'ioregionfd-object,id=ioreg2,devid=lsi0,iofd='+str(iord.fileno())+',bar=1,',\
'-object', 'ioregionfd-object,id=ioreg3,devid=lsi0,iofd='+str(iord.fileno())+',bar=2',\
]
proxy_cmd = [ PROC_QEMU, \
'-D', '/tmp/qemu-debug-log', \
'-name', 'OL7.4', \
'-machine', 'pc,accel=kvm', \
'-smp', 'sockets=1,cores=2,threads=2', \
'-m', '2048', \
'-object', 'memory-backend-memfd,id=sysmem-file,size=2G', \
'-numa', 'node,memdev=sysmem-file', \
'-hda','/home/homedir/ol7u9boot.img', \
'-boot', 'd', \
'-vnc', ':0', \
'-device', 'x-pci-proxy-dev,id=lsi0,fd='+str(proxy.fileno())+',ioregfd='+str(iowr.fileno()), \
'-monitor', 'unix:/home/homedir/qemu-sock,server,nowait', \
'-netdev','tap,id=mynet0,ifname=tap0,script=no,downscript=no', '-device','e1000,netdev=mynet0,mac=52:55:00:d1:55:01',\
]
Where for the test without ioregionfds, they are commented out.
I am doing more testing as I see some inconsistent results.
>
> > Sequential Read and Sequential Write:
> >
> > Sequential read Sequential write
> > read IOPS read BW write IOPS write BW
> >
> > no ioregionfd 367k 1434MiB/s 76k 297MiB/s
> > ioregionfd 374k 1459MiB/s 77.3k 302MiB/s
>
> It's normal for read and write IOPS to differ, but the read IOPS are
> very high. I wonder if caching and read-ahead are hiding the LSI
> device's actual performance here.
>
> What are the fio and QEMU command-lines?
>
> In order to benchmark ioregionfd it's best to run a benchmark where the
> bottleneck is MMIO/PIO dispatch. Otherwise we're looking at some other
> bottleneck (e.g. physical disk I/O performance) and the MMIO/PIO
> dispatch cost doesn't affect IOPS significantly.
>
> I suggest trying --blockdev null-co,size=64G,id=null0 as the disk
> instead of a file or host block device. The fio block size should be 4k
> to minimize the amount of time spent on I/O buffer contents and
> iodepth=1 because batching multiple requests with iodepth > 0 hides the
> MMIO/PIO dispatch bottleneck.
The queue depth in the tests above was 256, I will try that you have
suggested. The block size is 4k.
I am also looking at some other system issue that can interfere with
test, will be running test on the fresh install and with settings you
mentioned above.
Thank you!
>
> Stefan
© 2016 - 2026 Red Hat, Inc.