hw/pci/pci.c | 8 ++ hw/remote/trace-events | 2 + hw/remote/vfio-user-obj.c | 104 +++++++++++++++++++++---- include/exec/cpu-common.h | 2 - include/exec/memory.h | 41 +++++++++- include/hw/pci/pci_device.h | 3 + subprojects/libvfio-user.wrap | 2 +- system/dma-helpers.c | 4 +- system/memory.c | 8 ++ system/physmem.c | 141 ++++++++++++++++++---------------- 10 files changed, 226 insertions(+), 89 deletions(-)
This series adds basic support for message-based DMA in qemu's vfio-user server. This is useful for cases where the client does not provide file descriptors for accessing system memory via memory mappings. My motivating use case is to hook up device models as PCIe endpoints to a hardware design. This works by bridging the PCIe transaction layer to vfio-user, and the endpoint does not access memory directly, but sends memory requests TLPs to the hardware design in order to perform DMA. Note that more work is needed to make message-based DMA work well: qemu currently breaks down DMA accesses into chunks of size 8 bytes at maximum, each of which will be handled in a separate vfio-user DMA request message. This is quite terrible for large DMA accesses, such as when nvme reads and writes page-sized blocks for example. Thus, I would like to improve qemu to be able to perform larger accesses, at least for indirect memory regions. I have something working locally, but since this will likely result in more involved surgery and discussion, I am leaving this to be addressed in a separate patch. Changes from v1: * Address Stefan's review comments. In particular, enforce an allocation limit and don't drop the map client callbacks given that map requests can fail when hitting size limits. * libvfio-user version bump now included in the series. * Tested as well on big-endian s390x. This uncovered another byte order issue in vfio-user server code that I've included a fix for. Changes from v2: * Add a preparatory patch to make bounce buffering an AddressSpace-specific concept. * The total buffer size limit parameter is now per AdressSpace and can be configured for PCIDevice via a property. * Store a magic value in first bytes of bounce buffer struct as a best effort measure to detect invalid pointers in address_space_unmap. Changes from v3: * libvfio-user now supports twin-socket mode which uses separate sockets for client->server and server->client commands, respectively. This addresses the concurrent command bug triggered by server->client DMA access commands. See https://github.com/nutanix/libvfio-user/issues/279 for details. * Add missing teardown code in do_address_space_destroy. * Fix bounce buffer size bookkeeping race condition. * Generate unmap notification callbacks unconditionally. * Some cosmetic fixes. Changes from v4: * Fix accidentally dropped memory_region_unref, control flow restored to match previous code to simplify review. * Some cosmetic fixes. Changes from v5: * Unregister indirect memory region in libvfio-user dma_unregister callback. Changes from v6: * Rebase, resolve straightforward merge conflict in system/dma-helpers.c Changes from v7: * Rebase (applied cleanly) * Restore various Reviewed-by and Tested-by tags that I failed to carry forward (I double-checked that the patches haven't changed since the reviewed version) Mattias Nissler (5): softmmu: Per-AddressSpace bounce buffering softmmu: Support concurrent bounce buffers Update subprojects/libvfio-user vfio-user: Message-based DMA support vfio-user: Fix config space access byte order hw/pci/pci.c | 8 ++ hw/remote/trace-events | 2 + hw/remote/vfio-user-obj.c | 104 +++++++++++++++++++++---- include/exec/cpu-common.h | 2 - include/exec/memory.h | 41 +++++++++- include/hw/pci/pci_device.h | 3 + subprojects/libvfio-user.wrap | 2 +- system/dma-helpers.c | 4 +- system/memory.c | 8 ++ system/physmem.c | 141 ++++++++++++++++++---------------- 10 files changed, 226 insertions(+), 89 deletions(-) -- 2.34.1
On Mon, Mar 04, 2024 at 02:05:49AM -0800, Mattias Nissler wrote: > This series adds basic support for message-based DMA in qemu's vfio-user > server. This is useful for cases where the client does not provide file > descriptors for accessing system memory via memory mappings. My motivating use > case is to hook up device models as PCIe endpoints to a hardware design. This > works by bridging the PCIe transaction layer to vfio-user, and the endpoint > does not access memory directly, but sends memory requests TLPs to the hardware > design in order to perform DMA. > > Note that more work is needed to make message-based DMA work well: qemu > currently breaks down DMA accesses into chunks of size 8 bytes at maximum, each > of which will be handled in a separate vfio-user DMA request message. This is > quite terrible for large DMA accesses, such as when nvme reads and writes > page-sized blocks for example. Thus, I would like to improve qemu to be able to > perform larger accesses, at least for indirect memory regions. I have something > working locally, but since this will likely result in more involved surgery and > discussion, I am leaving this to be addressed in a separate patch. No objection from my side memory-wise. It'll be good to get some words from Paolo if possible. Copy Peter Maydell due to the other relevant discussion. https://lore.kernel.org/qemu-devel/20240228125939.56925-1-heinrich.schuchardt@canonical.com/ -- Peter Xu
Stefan, to the best of my knowledge this is fully reviewed and ready to go in - can you kindly pick it up or advise in case there's something I missed? Thanks! On Mon, Mar 4, 2024 at 11:25 AM Peter Xu <peterx@redhat.com> wrote: > > On Mon, Mar 04, 2024 at 02:05:49AM -0800, Mattias Nissler wrote: > > This series adds basic support for message-based DMA in qemu's vfio-user > > server. This is useful for cases where the client does not provide file > > descriptors for accessing system memory via memory mappings. My motivating use > > case is to hook up device models as PCIe endpoints to a hardware design. This > > works by bridging the PCIe transaction layer to vfio-user, and the endpoint > > does not access memory directly, but sends memory requests TLPs to the hardware > > design in order to perform DMA. > > > > Note that more work is needed to make message-based DMA work well: qemu > > currently breaks down DMA accesses into chunks of size 8 bytes at maximum, each > > of which will be handled in a separate vfio-user DMA request message. This is > > quite terrible for large DMA accesses, such as when nvme reads and writes > > page-sized blocks for example. Thus, I would like to improve qemu to be able to > > perform larger accesses, at least for indirect memory regions. I have something > > working locally, but since this will likely result in more involved surgery and > > discussion, I am leaving this to be addressed in a separate patch. > > No objection from my side memory-wise. It'll be good to get some words > from Paolo if possible. > > Copy Peter Maydell due to the other relevant discussion. > > https://lore.kernel.org/qemu-devel/20240228125939.56925-1-heinrich.schuchardt@canonical.com/ > > -- > Peter Xu >
On Thu, 28 Mar 2024 at 03:54, Mattias Nissler <mnissler@rivosinc.com> wrote: > > Stefan, to the best of my knowledge this is fully reviewed and ready > to go in - can you kindly pick it up or advise in case there's > something I missed? Thanks! This code is outside the areas that I maintain. I think it would make sense for Jag to merge it and send a pull request as vfio-user maintainer. Stefan
On Mon, May 6, 2024 at 5:01 PM Stefan Hajnoczi <stefanha@gmail.com> wrote: > On Thu, 28 Mar 2024 at 03:54, Mattias Nissler <mnissler@rivosinc.com> > wrote: > > > > Stefan, to the best of my knowledge this is fully reviewed and ready > > to go in - can you kindly pick it up or advise in case there's > > something I missed? Thanks! > > This code is outside the areas that I maintain. I think it would make > sense for Jag to merge it and send a pull request as vfio-user > maintainer. OK, thanks for following up, I'll check with Jag.
On Thu, Mar 28, 2024 at 08:53:36AM +0100, Mattias Nissler wrote: > Stefan, to the best of my knowledge this is fully reviewed and ready > to go in - can you kindly pick it up or advise in case there's > something I missed? Thanks! Fails cross-compile on mipsel: https://gitlab.com/peterx/qemu/-/jobs/6787790601 IIUC it should be the same for all 32bits systems that do not support 64bit atomics. Perhaps the easiest fix is switching all *bounce_buffer_size to 32bit. Thanks, -- Peter Xu
On Mon, May 6, 2024 at 4:44 PM Peter Xu <peterx@redhat.com> wrote: > On Thu, Mar 28, 2024 at 08:53:36AM +0100, Mattias Nissler wrote: > > Stefan, to the best of my knowledge this is fully reviewed and ready > > to go in - can you kindly pick it up or advise in case there's > > something I missed? Thanks! > > Fails cross-compile on mipsel: > > https://gitlab.com/peterx/qemu/-/jobs/6787790601 Ah, bummer, thanks for reporting. 4GB of bounce buffer should be plenty, so switching to 32 bit atomics seems a good idea at first glance. I'll take a closer look tomorrow and send a respin with a fix.
On Mon, May 6, 2024 at 11:07 PM Mattias Nissler <mnissler@rivosinc.com> wrote: > > > On Mon, May 6, 2024 at 4:44 PM Peter Xu <peterx@redhat.com> wrote: > >> On Thu, Mar 28, 2024 at 08:53:36AM +0100, Mattias Nissler wrote: >> > Stefan, to the best of my knowledge this is fully reviewed and ready >> > to go in - can you kindly pick it up or advise in case there's >> > something I missed? Thanks! >> >> Fails cross-compile on mipsel: >> >> https://gitlab.com/peterx/qemu/-/jobs/6787790601 > > > Ah, bummer, thanks for reporting. 4GB of bounce buffer should be plenty, > so switching to 32 bit atomics seems a good idea at first glance. I'll take > a closer look tomorrow and send a respin with a fix. > To close the loop on this: I have posted v9 with patch #2 adjusted to use uint32_t for size accounting to fix this.
On 7/5/24 11:43, Mattias Nissler wrote: > > > On Mon, May 6, 2024 at 11:07 PM Mattias Nissler <mnissler@rivosinc.com > <mailto:mnissler@rivosinc.com>> wrote: > > > > On Mon, May 6, 2024 at 4:44 PM Peter Xu <peterx@redhat.com > <mailto:peterx@redhat.com>> wrote: > > On Thu, Mar 28, 2024 at 08:53:36AM +0100, Mattias Nissler wrote: > > Stefan, to the best of my knowledge this is fully reviewed > and ready > > to go in - can you kindly pick it up or advise in case there's > > something I missed? Thanks! > > Fails cross-compile on mipsel: > > https://gitlab.com/peterx/qemu/-/jobs/6787790601 > <https://gitlab.com/peterx/qemu/-/jobs/6787790601> > > > Ah, bummer, thanks for reporting. 4GB of bounce buffer should be > plenty, so switching to 32 bit atomics seems a good idea at first > glance. I'll take a closer look tomorrow and send a respin with a fix. > > > To close the loop on this: I have posted v9 with patch #2 adjusted to > use uint32_t for size accounting to fix this. "size accounting" calls for portable size_t type. But if uint32_t satisfies our needs, OK.
On Tue, May 7, 2024 at 2:52 PM Philippe Mathieu-Daudé <philmd@linaro.org> wrote: > > On 7/5/24 11:43, Mattias Nissler wrote: > > > > > > On Mon, May 6, 2024 at 11:07 PM Mattias Nissler <mnissler@rivosinc.com > > <mailto:mnissler@rivosinc.com>> wrote: > > > > > > > > On Mon, May 6, 2024 at 4:44 PM Peter Xu <peterx@redhat.com > > <mailto:peterx@redhat.com>> wrote: > > > > On Thu, Mar 28, 2024 at 08:53:36AM +0100, Mattias Nissler wrote: > > > Stefan, to the best of my knowledge this is fully reviewed > > and ready > > > to go in - can you kindly pick it up or advise in case there's > > > something I missed? Thanks! > > > > Fails cross-compile on mipsel: > > > > https://gitlab.com/peterx/qemu/-/jobs/6787790601 > > <https://gitlab.com/peterx/qemu/-/jobs/6787790601> > > > > > > Ah, bummer, thanks for reporting. 4GB of bounce buffer should be > > plenty, so switching to 32 bit atomics seems a good idea at first > > glance. I'll take a closer look tomorrow and send a respin with a fix. > > > > > > To close the loop on this: I have posted v9 with patch #2 adjusted to > > use uint32_t for size accounting to fix this. > > "size accounting" calls for portable size_t type. But if uint32_t > satisfies our needs, OK. To clarify, I'm referring to "bounce buffer size accounting", i.e. keeping track of how much memory we've allocated for bounce buffers. I don't think that there are practical use cases where anyone would want to spend more than 4GB on bounce buffers, hence uint32_t seemed fine. If you prefer size_t (at the expense of using different widths, which will ultimately be visible in the command line parameter), I'm happy to switch to that though.
© 2016 - 2026 Red Hat, Inc.