[PATCH v7 00/11] vfio/pci: Allow MMIO regions to be exported through dma-buf

Leon Romanovsky posted 11 patches 1 month, 1 week ago
There is a newer version of this series
Documentation/driver-api/pci/p2pdma.rst |  95 +++++++---
block/blk-mq-dma.c                      |   2 +-
drivers/dma-buf/dma-buf.c               | 235 ++++++++++++++++++++++++
drivers/iommu/dma-iommu.c               |   4 +-
drivers/pci/p2pdma.c                    | 182 +++++++++++++-----
drivers/vfio/pci/Kconfig                |   3 +
drivers/vfio/pci/Makefile               |   1 +
drivers/vfio/pci/nvgrace-gpu/main.c     |  56 ++++++
drivers/vfio/pci/vfio_pci.c             |   5 +
drivers/vfio/pci/vfio_pci_config.c      |  22 ++-
drivers/vfio/pci/vfio_pci_core.c        |  53 ++++--
drivers/vfio/pci/vfio_pci_dmabuf.c      | 315 ++++++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_priv.h        |  23 +++
drivers/vfio/vfio_main.c                |   2 +
include/linux/dma-buf.h                 |  18 ++
include/linux/pci-p2pdma.h              | 120 +++++++-----
include/linux/vfio.h                    |   2 +
include/linux/vfio_pci_core.h           |  42 +++++
include/uapi/linux/vfio.h               |  28 +++
kernel/dma/direct.c                     |   4 +-
mm/hmm.c                                |   2 +-
21 files changed, 1074 insertions(+), 140 deletions(-)
[PATCH v7 00/11] vfio/pci: Allow MMIO regions to be exported through dma-buf
Posted by Leon Romanovsky 1 month, 1 week ago
Changelog:
v7:
 * Dropped restore_revoke flag and added vfio_pci_dma_buf_move
   to reverse loop.
 * Fixed spelling errors in documentation patch.
 * Rebased on top of v6.18-rc3.
 * Added include to stddef.h to vfio.h, to keep uapi header file independent.
v6: https://patch.msgid.link/20251102-dmabuf-vfio-v6-0-d773cff0db9f@nvidia.com
 * Fixed wrong error check from pcim_p2pdma_init().
 * Documented pcim_p2pdma_provider() function.
 * Improved commit messages.
 * Added VFIO DMA-BUF selftest, not sent yet.
 * Added __counted_by(nr_ranges) annotation to struct vfio_device_feature_dma_buf.
 * Fixed error unwind when dma_buf_fd() fails.
 * Document latest changes to p2pmem.
 * Removed EXPORT_SYMBOL_GPL from pci_p2pdma_map_type.
 * Moved DMA mapping logic to DMA-BUF.
 * Removed types patch to avoid dependencies between subsystems.
 * Moved vfio_pci_dma_buf_move() in err_undo block.
 * Added nvgrace patch.
v5: https://lore.kernel.org/all/cover.1760368250.git.leon@kernel.org
 * Rebased on top of v6.18-rc1.
 * Added more validation logic to make sure that DMA-BUF length doesn't
   overflow in various scenarios.
 * Hide kernel config from the users.
 * Fixed type conversion issue. DMA ranges are exposed with u64 length,
   but DMA-BUF uses "unsigned int" as a length for SG entries.
 * Added check to prevent from VFIO drivers which reports BAR size
   different from PCI, do not use DMA-BUF functionality.
v4: https://lore.kernel.org/all/cover.1759070796.git.leon@kernel.org
 * Split pcim_p2pdma_provider() to two functions, one that initializes
   array of providers and another to return right provider pointer.
v3: https://lore.kernel.org/all/cover.1758804980.git.leon@kernel.org
 * Changed pcim_p2pdma_enable() to be pcim_p2pdma_provider().
 * Cache provider in vfio_pci_dma_buf struct instead of BAR index.
 * Removed misleading comment from pcim_p2pdma_provider().
 * Moved MMIO check to be in pcim_p2pdma_provider().
v2: https://lore.kernel.org/all/cover.1757589589.git.leon@kernel.org/
 * Added extra patch which adds new CONFIG, so next patches can reuse
 * it.
 * Squashed "PCI/P2PDMA: Remove redundant bus_offset from map state"
   into the other patch.
 * Fixed revoke calls to be aligned with true->false semantics.
 * Extended p2pdma_providers to be per-BAR and not global to whole
 * device.
 * Fixed possible race between dmabuf states and revoke.
 * Moved revoke to PCI BAR zap block.
v1: https://lore.kernel.org/all/cover.1754311439.git.leon@kernel.org
 * Changed commit messages.
 * Reused DMA_ATTR_MMIO attribute.
 * Returned support for multiple DMA ranges per-dMABUF.
v0: https://lore.kernel.org/all/cover.1753274085.git.leonro@nvidia.com

---------------------------------------------------------------------------
Based on "[PATCH v6 00/16] dma-mapping: migrate to physical address-based API"
https://lore.kernel.org/all/cover.1757423202.git.leonro@nvidia.com/ series.
---------------------------------------------------------------------------

This series extends the VFIO PCI subsystem to support exporting MMIO
regions from PCI device BARs as dma-buf objects, enabling safe sharing of
non-struct page memory with controlled lifetime management. This allows RDMA
and other subsystems to import dma-buf FDs and build them into memory regions
for PCI P2P operations.

The series supports a use case for SPDK where a NVMe device will be
owned by SPDK through VFIO but interacting with a RDMA device. The RDMA
device may directly access the NVMe CMB or directly manipulate the NVMe
device's doorbell using PCI P2P.

However, as a general mechanism, it can support many other scenarios with
VFIO. This dmabuf approach can be usable by iommufd as well for generic
and safe P2P mappings.

In addition to the SPDK use-case mentioned above, the capability added
in this patch series can also be useful when a buffer (located in device
memory such as VRAM) needs to be shared between any two dGPU devices or
instances (assuming one of them is bound to VFIO PCI) as long as they
are P2P DMA compatible.

The implementation provides a revocable attachment mechanism using dma-buf
move operations. MMIO regions are normally pinned as BARs don't change
physical addresses, but access is revoked when the VFIO device is closed
or a PCI reset is issued. This ensures kernel self-defense against
potentially hostile userspace.

The series includes significant refactoring of the PCI P2PDMA subsystem
to separate core P2P functionality from memory allocation features,
making it more modular and suitable for VFIO use cases that don't need
struct page support.

-----------------------------------------------------------------------
The series is based originally on
https://lore.kernel.org/all/20250307052248.405803-1-vivek.kasireddy@intel.com/
but heavily rewritten to be based on DMA physical API.
-----------------------------------------------------------------------
The WIP branch can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=dmabuf-vfio-v7

Thanks

---
Jason Gunthorpe (2):
      PCI/P2PDMA: Document DMABUF model
      vfio/nvgrace: Support get_dmabuf_phys

Leon Romanovsky (7):
      PCI/P2PDMA: Separate the mmap() support from the core logic
      PCI/P2PDMA: Simplify bus address mapping API
      PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation
      PCI/P2PDMA: Provide an access to pci_p2pdma_map_type() function
      dma-buf: provide phys_vec to scatter-gather mapping routine
      vfio/pci: Enable peer-to-peer DMA transactions by default
      vfio/pci: Add dma-buf export support for MMIO regions

Vivek Kasireddy (2):
      vfio: Export vfio device get and put registration helpers
      vfio/pci: Share the core device pointer while invoking feature functions

 Documentation/driver-api/pci/p2pdma.rst |  95 +++++++---
 block/blk-mq-dma.c                      |   2 +-
 drivers/dma-buf/dma-buf.c               | 235 ++++++++++++++++++++++++
 drivers/iommu/dma-iommu.c               |   4 +-
 drivers/pci/p2pdma.c                    | 182 +++++++++++++-----
 drivers/vfio/pci/Kconfig                |   3 +
 drivers/vfio/pci/Makefile               |   1 +
 drivers/vfio/pci/nvgrace-gpu/main.c     |  56 ++++++
 drivers/vfio/pci/vfio_pci.c             |   5 +
 drivers/vfio/pci/vfio_pci_config.c      |  22 ++-
 drivers/vfio/pci/vfio_pci_core.c        |  53 ++++--
 drivers/vfio/pci/vfio_pci_dmabuf.c      | 315 ++++++++++++++++++++++++++++++++
 drivers/vfio/pci/vfio_pci_priv.h        |  23 +++
 drivers/vfio/vfio_main.c                |   2 +
 include/linux/dma-buf.h                 |  18 ++
 include/linux/pci-p2pdma.h              | 120 +++++++-----
 include/linux/vfio.h                    |   2 +
 include/linux/vfio_pci_core.h           |  42 +++++
 include/uapi/linux/vfio.h               |  28 +++
 kernel/dma/direct.c                     |   4 +-
 mm/hmm.c                                |   2 +-
 21 files changed, 1074 insertions(+), 140 deletions(-)
---
base-commit: dcb6fa37fd7bc9c3d2b066329b0d27dedf8becaa
change-id: 20251016-dmabuf-vfio-6cef732adf5a

Best regards,
--  
Leon Romanovsky <leonro@nvidia.com>

Re: [PATCH v7 00/11] vfio/pci: Allow MMIO regions to be exported through dma-buf
Posted by Alex Williamson 1 month, 1 week ago
On Thu,  6 Nov 2025 16:16:45 +0200
Leon Romanovsky <leon@kernel.org> wrote:

> Changelog:
> v7:
>  * Dropped restore_revoke flag and added vfio_pci_dma_buf_move
>    to reverse loop.
>  * Fixed spelling errors in documentation patch.
>  * Rebased on top of v6.18-rc3.
>  * Added include to stddef.h to vfio.h, to keep uapi header file independent.

I think we're winding down on review comments.  It'd be great to get
p2pdma and dma-buf acks on this series.  Otherwise it's been posted
enough that we'll assume no objections.  Thanks,

Alex
Re: [PATCH v7 00/11] vfio/pci: Allow MMIO regions to be exported through dma-buf
Posted by Christian König 1 month, 1 week ago
On 11/10/25 21:42, Alex Williamson wrote:
> On Thu,  6 Nov 2025 16:16:45 +0200
> Leon Romanovsky <leon@kernel.org> wrote:
> 
>> Changelog:
>> v7:
>>  * Dropped restore_revoke flag and added vfio_pci_dma_buf_move
>>    to reverse loop.
>>  * Fixed spelling errors in documentation patch.
>>  * Rebased on top of v6.18-rc3.
>>  * Added include to stddef.h to vfio.h, to keep uapi header file independent.
> 
> I think we're winding down on review comments.  It'd be great to get
> p2pdma and dma-buf acks on this series.  Otherwise it's been posted
> enough that we'll assume no objections.  Thanks,

Already have it on my TODO list to take a closer look, but no idea when that will be.

This patch set is on place 4 or 5 on a rather long list of stuff to review/finish.

Christian.

> 
> Alex
Re: [PATCH v7 00/11] vfio/pci: Allow MMIO regions to be exported through dma-buf
Posted by Alex Williamson 1 month ago
On Tue, 11 Nov 2025 09:54:22 +0100
Christian König <christian.koenig@amd.com> wrote:

> On 11/10/25 21:42, Alex Williamson wrote:
> > On Thu,  6 Nov 2025 16:16:45 +0200
> > Leon Romanovsky <leon@kernel.org> wrote:
> >   
> >> Changelog:
> >> v7:
> >>  * Dropped restore_revoke flag and added vfio_pci_dma_buf_move
> >>    to reverse loop.
> >>  * Fixed spelling errors in documentation patch.
> >>  * Rebased on top of v6.18-rc3.
> >>  * Added include to stddef.h to vfio.h, to keep uapi header file independent.  
> > 
> > I think we're winding down on review comments.  It'd be great to get
> > p2pdma and dma-buf acks on this series.  Otherwise it's been posted
> > enough that we'll assume no objections.  Thanks,  
> 
> Already have it on my TODO list to take a closer look, but no idea when that will be.
> 
> This patch set is on place 4 or 5 on a rather long list of stuff to review/finish.

Hi Christian,

Gentle nudge.  Leon posted v8[1] last week, which is not drawing any
new comments.  Do you foresee having time for review that I should
still hold off merging for v6.19 a bit longer?  Thanks,

Alex


[1]https://lore.kernel.org/all/20251111-dmabuf-vfio-v8-0-fd9aa5df478f@nvidia.com/
Re: [PATCH v7 00/11] vfio/pci: Allow MMIO regions to be exported through dma-buf
Posted by Jason Gunthorpe 1 month ago
On Mon, Nov 17, 2025 at 08:36:20AM -0700, Alex Williamson wrote:
> On Tue, 11 Nov 2025 09:54:22 +0100
> Christian König <christian.koenig@amd.com> wrote:
> 
> > On 11/10/25 21:42, Alex Williamson wrote:
> > > On Thu,  6 Nov 2025 16:16:45 +0200
> > > Leon Romanovsky <leon@kernel.org> wrote:
> > >   
> > >> Changelog:
> > >> v7:
> > >>  * Dropped restore_revoke flag and added vfio_pci_dma_buf_move
> > >>    to reverse loop.
> > >>  * Fixed spelling errors in documentation patch.
> > >>  * Rebased on top of v6.18-rc3.
> > >>  * Added include to stddef.h to vfio.h, to keep uapi header file independent.  
> > > 
> > > I think we're winding down on review comments.  It'd be great to get
> > > p2pdma and dma-buf acks on this series.  Otherwise it's been posted
> > > enough that we'll assume no objections.  Thanks,  
> > 
> > Already have it on my TODO list to take a closer look, but no idea when that will be.
> > 
> > This patch set is on place 4 or 5 on a rather long list of stuff to review/finish.
> 
> Hi Christian,
> 
> Gentle nudge.  Leon posted v8[1] last week, which is not drawing any
> new comments.  Do you foresee having time for review that I should
> still hold off merging for v6.19 a bit longer?  Thanks,

I really want this merged this cycle, along with the iommufd part,
which means it needs to go into your tree by very early next week on a
shared branch so I can do the iommufd part on top.

It is the last blocking kernel piece to conclude the viommu support
roll out into qemu for iommufd which quite a lot of people have been
working on for years now.

IMHO there is nothing profound in the dmabuf patch, it was written by
the expert in the new DMA API operation, and doesn't form any
troublesome API contracts. It is also the same basic code as from the
v1 in July just moved into dmabuf .c files instead of vfio .c files at
Christoph's request.

My hope is DRM folks will pick up the baton and continue to improve
this to move other drivers away from dma_map_resource(). Simona told
me people have wanted DMA API improvements for ages, now we have them,
now is the time!

Any remarks after the fact can be addressed incrementally.

If there are no concrete technical remarks please take it. 6 months is
long enough to wait for feedback.

Thanks,
Jason
Re: [PATCH v7 00/11] vfio/pci: Allow MMIO regions to be exported through dma-buf
Posted by Christian König 1 month ago
On 11/17/25 18:16, Jason Gunthorpe wrote:
> On Mon, Nov 17, 2025 at 08:36:20AM -0700, Alex Williamson wrote:
>> On Tue, 11 Nov 2025 09:54:22 +0100
>> Christian König <christian.koenig@amd.com> wrote:
>>
>>> On 11/10/25 21:42, Alex Williamson wrote:
>>>> On Thu,  6 Nov 2025 16:16:45 +0200
>>>> Leon Romanovsky <leon@kernel.org> wrote:
>>>>   
>>>>> Changelog:
>>>>> v7:
>>>>>  * Dropped restore_revoke flag and added vfio_pci_dma_buf_move
>>>>>    to reverse loop.
>>>>>  * Fixed spelling errors in documentation patch.
>>>>>  * Rebased on top of v6.18-rc3.
>>>>>  * Added include to stddef.h to vfio.h, to keep uapi header file independent.  
>>>>
>>>> I think we're winding down on review comments.  It'd be great to get
>>>> p2pdma and dma-buf acks on this series.  Otherwise it's been posted
>>>> enough that we'll assume no objections.  Thanks,  
>>>
>>> Already have it on my TODO list to take a closer look, but no idea when that will be.
>>>
>>> This patch set is on place 4 or 5 on a rather long list of stuff to review/finish.
>>
>> Hi Christian,
>>
>> Gentle nudge.  Leon posted v8[1] last week, which is not drawing any
>> new comments.  Do you foresee having time for review that I should
>> still hold off merging for v6.19 a bit longer?  Thanks,
> 
> I really want this merged this cycle, along with the iommufd part,
> which means it needs to go into your tree by very early next week on a
> shared branch so I can do the iommufd part on top.
> 
> It is the last blocking kernel piece to conclude the viommu support
> roll out into qemu for iommufd which quite a lot of people have been
> working on for years now.
> 
> IMHO there is nothing profound in the dmabuf patch, it was written by
> the expert in the new DMA API operation, and doesn't form any
> troublesome API contracts. It is also the same basic code as from the
> v1 in July just moved into dmabuf .c files instead of vfio .c files at
> Christoph's request.

As long as it is only an internal API between iommu and vfio which also respects the standard DMA-buf semantics to either pin buffers or provide a move_notify interface then feel free to go ahead with it.

Skimming over it my only concern is patch #6 which adds the helper to the common DMA-buf code and that in turn would need an in-deep review which I currently don't have time for.

So if we could keep those inside the VFIO driver for now I think that should be good to go.

Regards,
Christian.


> My hope is DRM folks will pick up the baton and continue to improve
> this to move other drivers away from dma_map_resource(). Simona told
> me people have wanted DMA API improvements for ages, now we have them,
> now is the time!
> 
> Any remarks after the fact can be addressed incrementally.
> 
> If there are no concrete technical remarks please take it. 6 months is
> long enough to wait for feedback.
> 
> Thanks,
> Jason

Re: [PATCH v7 00/11] vfio/pci: Allow MMIO regions to be exported through dma-buf
Posted by Jason Gunthorpe 1 month ago
On Tue, Nov 18, 2025 at 03:37:41PM +0100, Christian König wrote:

> Skimming over it my only concern is patch #6 which adds the helper
> to the common DMA-buf code and that in turn would need an in-deep
> review which I currently don't have time for.

I think you should trust Leon on the implementation. He knows what he
is doing here when it comes to the DMA API, since he made all the
patches so far to use it.

Please consider just reviewing the exported function signature:

+struct sg_table *dma_buf_map(struct dma_buf_attachment *attach,
+			     struct p2pdma_provider *provider,
+			     struct dma_buf_phys_vec *phys_vec,
+			     size_t nr_ranges, size_t size,
+			     enum dma_data_direction dir)

If issues are discovered inside the implementation later on then Leon
will be available to fix them.

The code is intended to implement that basic function signature which
can be thought of as dma_map_resource() done correctly for PCI
devices.

> So if we could keep those inside the VFIO driver for now I think
> that should be good to go.

That was several versions ago. Christoph is very strongly against
this, he wants to see the new DMA API used by wrapper functions in
subsytems related to how the subsystem's data structures work rather
than proliferate into drivers. I agree with this, so we need to go in
this direction.

Other options, like put the code in the DMA API area, are also not
going to be agreed because we really don't want this weird DMABUF use
of no-struct page scatterlist to leak out beyond DMABUF.

So, this is the start of a DMA mapping helper API for DMABUF related
data structures, it introduces a simplified mapping entry point for
drivers that only use MMIO.

As I said I expect this API surface to progress as other DRM drivers
are updated (hopefully DRM community will take on this), but there is
nothing wrong with starting by having a basic entry point for a narrow
use case.

Thanks,
Jason