[PATCH 00/10] vfio/pci: Allow MMIO regions to be exported through dma-buf

Leon Romanovsky posted 10 patches 2 months, 2 weeks ago
There is a newer version of this series
block/blk-mq-dma.c                 |   7 +-
drivers/iommu/dma-iommu.c          |   4 +-
drivers/pci/p2pdma.c               | 144 +++++++++----
drivers/vfio/pci/Kconfig           |  20 ++
drivers/vfio/pci/Makefile          |   2 +
drivers/vfio/pci/vfio_pci_config.c |  22 +-
drivers/vfio/pci/vfio_pci_core.c   |  59 ++++--
drivers/vfio/pci/vfio_pci_dmabuf.c | 321 +++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_priv.h   |  23 +++
drivers/vfio/vfio_main.c           |   2 +
include/linux/dma-buf.h            |   1 +
include/linux/pci-p2pdma.h         | 114 +++++-----
include/linux/types.h              |   5 +
include/linux/vfio.h               |   2 +
include/linux/vfio_pci_core.h      |   4 +
include/uapi/linux/vfio.h          |  19 ++
kernel/dma/direct.c                |   4 +-
mm/hmm.c                           |   2 +-
18 files changed, 631 insertions(+), 124 deletions(-)
create mode 100644 drivers/vfio/pci/vfio_pci_dmabuf.c
[PATCH 00/10] vfio/pci: Allow MMIO regions to be exported through dma-buf
Posted by Leon Romanovsky 2 months, 2 weeks ago
From: Leon Romanovsky <leonro@nvidia.com>

---------------------------------------------------------------------------
Based on blk and DMA patches which will be sent during coming merge window.
---------------------------------------------------------------------------

This series extends the VFIO PCI subsystem to support exporting MMIO regions
from PCI device BARs as dma-buf objects, enabling safe sharing of non-struct
page memory with controlled lifetime management. This allows RDMA and other
subsystems to import dma-buf FDs and build them into memory regions for PCI
P2P operations.

The series supports a use case for SPDK where a NVMe device will be owned
by SPDK through VFIO but interacting with a RDMA device. The RDMA device
may directly access the NVMe CMB or directly manipulate the NVMe device's
doorbell using PCI P2P.

However, as a general mechanism, it can support many other scenarios with
VFIO. This dmabuf approach can be usable by iommufd as well for generic
and safe P2P mappings.

In addition to the SPDK use-case mentioned above, the capability added
in this patch series can also be useful when a buffer (located in device
memory such as VRAM) needs to be shared between any two dGPU devices or
instances (assuming one of them is bound to VFIO PCI) as long as they
are P2P DMA compatible.

The implementation provides a revocable attachment mechanism using dma-buf
move operations. MMIO regions are normally pinned as BARs don't change
physical addresses, but access is revoked when the VFIO device is closed
or a PCI reset is issued. This ensures kernel self-defense against
potentially hostile userspace.

The series includes significant refactoring of the PCI P2PDMA subsystem
to separate core P2P functionality from memory allocation features,
making it more modular and suitable for VFIO use cases that don't need
struct page support.

-----------------------------------------------------------------------
This is based on
https://lore.kernel.org/all/20250307052248.405803-1-vivek.kasireddy@intel.com/
but heavily rewritten to be based on DMA physical API.
-----------------------------------------------------------------------
The WIP branch can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=dmabuf-vfio

Thanks

Leon Romanovsky (8):
  PCI/P2PDMA: Remove redundant bus_offset from map state
  PCI/P2PDMA: Introduce p2pdma_provider structure for cleaner
    abstraction
  PCI/P2PDMA: Simplify bus address mapping API
  PCI/P2PDMA: Refactor to separate core P2P functionality from memory
    allocation
  PCI/P2PDMA: Export pci_p2pdma_map_type() function
  types: move phys_vec definition to common header
  vfio/pci: Enable peer-to-peer DMA transactions by default
  vfio/pci: Add dma-buf export support for MMIO regions

Vivek Kasireddy (2):
  vfio: Export vfio device get and put registration helpers
  vfio/pci: Share the core device pointer while invoking feature
    functions

 block/blk-mq-dma.c                 |   7 +-
 drivers/iommu/dma-iommu.c          |   4 +-
 drivers/pci/p2pdma.c               | 144 +++++++++----
 drivers/vfio/pci/Kconfig           |  20 ++
 drivers/vfio/pci/Makefile          |   2 +
 drivers/vfio/pci/vfio_pci_config.c |  22 +-
 drivers/vfio/pci/vfio_pci_core.c   |  59 ++++--
 drivers/vfio/pci/vfio_pci_dmabuf.c | 321 +++++++++++++++++++++++++++++
 drivers/vfio/pci/vfio_pci_priv.h   |  23 +++
 drivers/vfio/vfio_main.c           |   2 +
 include/linux/dma-buf.h            |   1 +
 include/linux/pci-p2pdma.h         | 114 +++++-----
 include/linux/types.h              |   5 +
 include/linux/vfio.h               |   2 +
 include/linux/vfio_pci_core.h      |   4 +
 include/uapi/linux/vfio.h          |  19 ++
 kernel/dma/direct.c                |   4 +-
 mm/hmm.c                           |   2 +-
 18 files changed, 631 insertions(+), 124 deletions(-)
 create mode 100644 drivers/vfio/pci/vfio_pci_dmabuf.c

-- 
2.50.1
Re: [PATCH 00/10] vfio/pci: Allow MMIO regions to be exported through dma-buf
Posted by Alex Williamson 2 months, 1 week ago
On Wed, 23 Jul 2025 16:00:01 +0300
Leon Romanovsky <leon@kernel.org> wrote:

> From: Leon Romanovsky <leonro@nvidia.com>
> 
> ---------------------------------------------------------------------------
> Based on blk and DMA patches which will be sent during coming merge window.
> ---------------------------------------------------------------------------
> 
> This series extends the VFIO PCI subsystem to support exporting MMIO regions
> from PCI device BARs as dma-buf objects, enabling safe sharing of non-struct
> page memory with controlled lifetime management. This allows RDMA and other
> subsystems to import dma-buf FDs and build them into memory regions for PCI
> P2P operations.
> 
> The series supports a use case for SPDK where a NVMe device will be owned
> by SPDK through VFIO but interacting with a RDMA device. The RDMA device
> may directly access the NVMe CMB or directly manipulate the NVMe device's
> doorbell using PCI P2P.
> 
> However, as a general mechanism, it can support many other scenarios with
> VFIO. This dmabuf approach can be usable by iommufd as well for generic
> and safe P2P mappings.

I think this will eventually enable DMA mapping of device MMIO through
an IOMMUFD IOAS for the VM P2P use cases, right?  How do we get from
what appears to be a point-to-point mapping between two devices to a
shared IOVA between multiple devices?  I'm guessing we need IOMMUFD to
support something like IOMMU_IOAS_MAP_FILE for dma-buf, but I can't
connect all the dots.  Thanks,

Alex
Re: [PATCH 00/10] vfio/pci: Allow MMIO regions to be exported through dma-buf
Posted by Jason Gunthorpe 2 months, 1 week ago
On Wed, Jul 30, 2025 at 01:58:46PM -0600, Alex Williamson wrote:
> On Wed, 23 Jul 2025 16:00:01 +0300
> Leon Romanovsky <leon@kernel.org> wrote:
> 
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > ---------------------------------------------------------------------------
> > Based on blk and DMA patches which will be sent during coming merge window.
> > ---------------------------------------------------------------------------
> > 
> > This series extends the VFIO PCI subsystem to support exporting MMIO regions
> > from PCI device BARs as dma-buf objects, enabling safe sharing of non-struct
> > page memory with controlled lifetime management. This allows RDMA and other
> > subsystems to import dma-buf FDs and build them into memory regions for PCI
> > P2P operations.
> > 
> > The series supports a use case for SPDK where a NVMe device will be owned
> > by SPDK through VFIO but interacting with a RDMA device. The RDMA device
> > may directly access the NVMe CMB or directly manipulate the NVMe device's
> > doorbell using PCI P2P.
> > 
> > However, as a general mechanism, it can support many other scenarios with
> > VFIO. This dmabuf approach can be usable by iommufd as well for generic
> > and safe P2P mappings.
> 
> I think this will eventually enable DMA mapping of device MMIO through
> an IOMMUFD IOAS for the VM P2P use cases, right?  

This is the plan

> How do we get from
> what appears to be a point-to-point mapping between two devices to a
> shared IOVA between multiple devices?

You have it right below, it is a point to point mapping between the
vfio device and the iommufd.

> I'm guessing we need IOMMUFD to support something like
> IOMMU_IOAS_MAP_FILE for dma-buf, 

1) The dma phys series which needs more work
2) This series to get basic 'movable' DMABUF support in VFIO
3) Add 'revokable' as a DMABUF concept and implement it with mlx5 and
   vfio
4) Add some way to get the phys_addr list from the DMABUF
5) IOMMU_IOAS_MAP_FILE using a revokable attachment and the phys_addr
   list. When VFIO does FLR the iommufd can remove the IOPTEs and then
   put them back when FLR is done.

It is not so much more code, but I think every step will take a lot of
work to get agreements.

Then we reuse all of the above with some tweaks for the CC problems
too.

Jason