Documentation/driver-api/pci/p2pdma.rst | 95 +++++++--- block/blk-mq-dma.c | 2 +- drivers/dma-buf/dma-buf.c | 235 ++++++++++++++++++++++++ drivers/iommu/dma-iommu.c | 4 +- drivers/pci/p2pdma.c | 182 +++++++++++++----- drivers/vfio/pci/Kconfig | 3 + drivers/vfio/pci/Makefile | 1 + drivers/vfio/pci/nvgrace-gpu/main.c | 56 ++++++ drivers/vfio/pci/vfio_pci.c | 5 + drivers/vfio/pci/vfio_pci_config.c | 22 ++- drivers/vfio/pci/vfio_pci_core.c | 56 ++++-- drivers/vfio/pci/vfio_pci_dmabuf.c | 315 ++++++++++++++++++++++++++++++++ drivers/vfio/pci/vfio_pci_priv.h | 23 +++ drivers/vfio/vfio_main.c | 2 + include/linux/dma-buf.h | 18 ++ include/linux/pci-p2pdma.h | 120 +++++++----- include/linux/vfio.h | 2 + include/linux/vfio_pci_core.h | 42 +++++ include/uapi/linux/vfio.h | 27 +++ kernel/dma/direct.c | 4 +- mm/hmm.c | 2 +- 21 files changed, 1077 insertions(+), 139 deletions(-)
Changelog:
v6:
* Fixed wrong error check from pcim_p2pdma_init().
* Documented pcim_p2pdma_provider() function.
* Improved commit messages.
* Added VFIO DMA-BUF selftest.
* Added __counted_by(nr_ranges) annotation to struct vfio_device_feature_dma_buf.
* Fixed error unwind when dma_buf_fd() fails.
* Document latest changes to p2pmem.
* Removed EXPORT_SYMBOL_GPL from pci_p2pdma_map_type.
* Moved DMA mapping logic to DMA-BUF.
* Removed types patch to avoid dependencies between subsystems.
* Moved vfio_pci_dma_buf_move() in err_undo block.
* Added nvgrace patch.
v5: https://lore.kernel.org/all/cover.1760368250.git.leon@kernel.org
* Rebased on top of v6.18-rc1.
* Added more validation logic to make sure that DMA-BUF length doesn't
overflow in various scenarios.
* Hide kernel config from the users.
* Fixed type conversion issue. DMA ranges are exposed with u64 length,
but DMA-BUF uses "unsigned int" as a length for SG entries.
* Added check to prevent from VFIO drivers which reports BAR size
different from PCI, do not use DMA-BUF functionality.
v4: https://lore.kernel.org/all/cover.1759070796.git.leon@kernel.org
* Split pcim_p2pdma_provider() to two functions, one that initializes
array of providers and another to return right provider pointer.
v3: https://lore.kernel.org/all/cover.1758804980.git.leon@kernel.org
* Changed pcim_p2pdma_enable() to be pcim_p2pdma_provider().
* Cache provider in vfio_pci_dma_buf struct instead of BAR index.
* Removed misleading comment from pcim_p2pdma_provider().
* Moved MMIO check to be in pcim_p2pdma_provider().
v2: https://lore.kernel.org/all/cover.1757589589.git.leon@kernel.org/
* Added extra patch which adds new CONFIG, so next patches can reuse
* it.
* Squashed "PCI/P2PDMA: Remove redundant bus_offset from map state"
into the other patch.
* Fixed revoke calls to be aligned with true->false semantics.
* Extended p2pdma_providers to be per-BAR and not global to whole
* device.
* Fixed possible race between dmabuf states and revoke.
* Moved revoke to PCI BAR zap block.
v1: https://lore.kernel.org/all/cover.1754311439.git.leon@kernel.org
* Changed commit messages.
* Reused DMA_ATTR_MMIO attribute.
* Returned support for multiple DMA ranges per-dMABUF.
v0: https://lore.kernel.org/all/cover.1753274085.git.leonro@nvidia.com
---------------------------------------------------------------------------
Based on "[PATCH v6 00/16] dma-mapping: migrate to physical address-based API"
https://lore.kernel.org/all/cover.1757423202.git.leonro@nvidia.com/ series.
---------------------------------------------------------------------------
This series extends the VFIO PCI subsystem to support exporting MMIO
regions from PCI device BARs as dma-buf objects, enabling safe sharing of
non-struct page memory with controlled lifetime management. This allows RDMA
and other subsystems to import dma-buf FDs and build them into memory regions
for PCI P2P operations.
The series supports a use case for SPDK where a NVMe device will be
owned by SPDK through VFIO but interacting with a RDMA device. The RDMA
device may directly access the NVMe CMB or directly manipulate the NVMe
device's doorbell using PCI P2P.
However, as a general mechanism, it can support many other scenarios with
VFIO. This dmabuf approach can be usable by iommufd as well for generic
and safe P2P mappings.
In addition to the SPDK use-case mentioned above, the capability added
in this patch series can also be useful when a buffer (located in device
memory such as VRAM) needs to be shared between any two dGPU devices or
instances (assuming one of them is bound to VFIO PCI) as long as they
are P2P DMA compatible.
The implementation provides a revocable attachment mechanism using dma-buf
move operations. MMIO regions are normally pinned as BARs don't change
physical addresses, but access is revoked when the VFIO device is closed
or a PCI reset is issued. This ensures kernel self-defense against
potentially hostile userspace.
The series includes significant refactoring of the PCI P2PDMA subsystem
to separate core P2P functionality from memory allocation features,
making it more modular and suitable for VFIO use cases that don't need
struct page support.
-----------------------------------------------------------------------
The series is based originally on
https://lore.kernel.org/all/20250307052248.405803-1-vivek.kasireddy@intel.com/
but heavily rewritten to be based on DMA physical API.
-----------------------------------------------------------------------
The WIP branch can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=dmabuf-vfio-v6
Thanks
---
Jason Gunthorpe (2):
PCI/P2PDMA: Document DMABUF model
vfio/nvgrace: Support get_dmabuf_phys
Leon Romanovsky (7):
PCI/P2PDMA: Separate the mmap() support from the core logic
PCI/P2PDMA: Simplify bus address mapping API
PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation
PCI/P2PDMA: Provide an access to pci_p2pdma_map_type() function
dma-buf: provide phys_vec to scatter-gather mapping routine
vfio/pci: Enable peer-to-peer DMA transactions by default
vfio/pci: Add dma-buf export support for MMIO regions
Vivek Kasireddy (2):
vfio: Export vfio device get and put registration helpers
vfio/pci: Share the core device pointer while invoking feature functions
Documentation/driver-api/pci/p2pdma.rst | 95 +++++++---
block/blk-mq-dma.c | 2 +-
drivers/dma-buf/dma-buf.c | 235 ++++++++++++++++++++++++
drivers/iommu/dma-iommu.c | 4 +-
drivers/pci/p2pdma.c | 182 +++++++++++++-----
drivers/vfio/pci/Kconfig | 3 +
drivers/vfio/pci/Makefile | 1 +
drivers/vfio/pci/nvgrace-gpu/main.c | 56 ++++++
drivers/vfio/pci/vfio_pci.c | 5 +
drivers/vfio/pci/vfio_pci_config.c | 22 ++-
drivers/vfio/pci/vfio_pci_core.c | 56 ++++--
drivers/vfio/pci/vfio_pci_dmabuf.c | 315 ++++++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_priv.h | 23 +++
drivers/vfio/vfio_main.c | 2 +
include/linux/dma-buf.h | 18 ++
include/linux/pci-p2pdma.h | 120 +++++++-----
include/linux/vfio.h | 2 +
include/linux/vfio_pci_core.h | 42 +++++
include/uapi/linux/vfio.h | 27 +++
kernel/dma/direct.c | 4 +-
mm/hmm.c | 2 +-
21 files changed, 1077 insertions(+), 139 deletions(-)
---
base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787
change-id: 20251016-dmabuf-vfio-6cef732adf5a
Best regards,
--
Leon Romanovsky <leonro@nvidia.com>
On Sun, Nov 02, 2025 at 10:00:48AM +0200, Leon Romanovsky wrote: <...> > --- > Jason Gunthorpe (2): > PCI/P2PDMA: Document DMABUF model > vfio/nvgrace: Support get_dmabuf_phys > > Leon Romanovsky (7): > PCI/P2PDMA: Separate the mmap() support from the core logic > PCI/P2PDMA: Simplify bus address mapping API > PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation > PCI/P2PDMA: Provide an access to pci_p2pdma_map_type() function > dma-buf: provide phys_vec to scatter-gather mapping routine > vfio/pci: Enable peer-to-peer DMA transactions by default > vfio/pci: Add dma-buf export support for MMIO regions > > Vivek Kasireddy (2): > vfio: Export vfio device get and put registration helpers > vfio/pci: Share the core device pointer while invoking feature functions Hi, Can we get Acked-by for p2pdma and DMABUF parts? Thanks > > Documentation/driver-api/pci/p2pdma.rst | 95 +++++++--- > block/blk-mq-dma.c | 2 +- > drivers/dma-buf/dma-buf.c | 235 ++++++++++++++++++++++++ > drivers/iommu/dma-iommu.c | 4 +- > drivers/pci/p2pdma.c | 182 +++++++++++++----- > drivers/vfio/pci/Kconfig | 3 + > drivers/vfio/pci/Makefile | 1 + > drivers/vfio/pci/nvgrace-gpu/main.c | 56 ++++++ > drivers/vfio/pci/vfio_pci.c | 5 + > drivers/vfio/pci/vfio_pci_config.c | 22 ++- > drivers/vfio/pci/vfio_pci_core.c | 56 ++++-- > drivers/vfio/pci/vfio_pci_dmabuf.c | 315 ++++++++++++++++++++++++++++++++ > drivers/vfio/pci/vfio_pci_priv.h | 23 +++ > drivers/vfio/vfio_main.c | 2 + > include/linux/dma-buf.h | 18 ++ > include/linux/pci-p2pdma.h | 120 +++++++----- > include/linux/vfio.h | 2 + > include/linux/vfio_pci_core.h | 42 +++++ > include/uapi/linux/vfio.h | 27 +++ > kernel/dma/direct.c | 4 +- > mm/hmm.c | 2 +- > 21 files changed, 1077 insertions(+), 139 deletions(-) > --- > base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787 > change-id: 20251016-dmabuf-vfio-6cef732adf5a > > Best regards, > -- > Leon Romanovsky <leonro@nvidia.com> > >
On Sun, Nov 02, 2025 at 10:00:48AM +0200, Leon Romanovsky wrote:
> Changelog:
> v6:
> * Fixed wrong error check from pcim_p2pdma_init().
> * Documented pcim_p2pdma_provider() function.
> * Improved commit messages.
> * Added VFIO DMA-BUF selftest.
> * Added __counted_by(nr_ranges) annotation to struct vfio_device_feature_dma_buf.
> * Fixed error unwind when dma_buf_fd() fails.
> * Document latest changes to p2pmem.
> * Removed EXPORT_SYMBOL_GPL from pci_p2pdma_map_type.
> * Moved DMA mapping logic to DMA-BUF.
> * Removed types patch to avoid dependencies between subsystems.
> * Moved vfio_pci_dma_buf_move() in err_undo block.
> * Added nvgrace patch.
Thanks Leon. Attaching a toy program which sanity tests the dma-buf export UAPI
by feeding the allocated dma-buf into an dma-buf importer (libibverbs + CX-7).
Tested-by: Alex Mastro <amastro@fb.com>
$ cc -Og -Wall -Wextra $(pkg-config --cflags --libs libibverbs) test_dmabuf.c -o test_dmabuf
$ ./test_dmabuf 0000:05:00.0 3 4 0 0x1000
opening 0000:05:00.0 via /dev/vfio/56
allocating dma_buf bar_idx=4, bar_offset=0x0, size=0x1000
allocated dma_buf fd=6
discovered 4 ibv devices: mlx5_0 mlx5_1 mlx5_2 mlx5_3
opened ibv device 3: mlx5_3
registered dma_buf
unregistered dma_buf
closed dma_buf fd
---
#include <fcntl.h>
#include <infiniband/verbs.h>
#include <libgen.h>
#include <linux/limits.h>
#include <linux/types.h>
#include <linux/vfio.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/ioctl.h>
#include <unistd.h>
#define ensure(cond) \
do { \
if (!(cond)) { \
fprintf(stderr, \
"%s:%d Condition failed: '%s' (errno=%d: %s)\n", \
__FILE__, __LINE__, #cond, errno, \
strerror(errno)); \
exit(EXIT_FAILURE); \
} \
} while (0)
#ifndef VFIO_DEVICE_FEATURE_DMA_BUF
#define VFIO_DEVICE_FEATURE_DMA_BUF 11
struct vfio_region_dma_range {
__u64 offset;
__u64 length;
};
struct vfio_device_feature_dma_buf {
__u32 region_index;
__u32 open_flags;
__u32 flags;
__u32 nr_ranges;
struct vfio_region_dma_range dma_ranges[];
};
#endif
static uint32_t group_for_bdf(const char *bdf)
{
char path[PATH_MAX];
char link[PATH_MAX];
int ret;
snprintf(path, sizeof(path), "/sys/bus/pci/devices/%s/iommu_group",
bdf);
ret = readlink(path, link, sizeof(link));
ensure(ret > 0);
const char *filename = basename(link);
ensure(filename);
return strtoul(filename, NULL, 0);
}
int main(int argc, char **argv)
{
int ret;
if (argc != 6) {
printf("usage: %s <vfio_bdf> <ibv_device_idx> <bar_idx> <bar_offset> <size>\n",
argv[0]);
printf("example: %s 0000:05:00.0 3 2 0x20000 0x1000\n",
argv[0]);
return 1;
}
const char *bdf = argv[1];
uint32_t ibv_idx = strtoul(argv[2], NULL, 0);
uint32_t bar_idx = strtoul(argv[3], NULL, 0);
uint64_t bar_offs = strtoull(argv[4], NULL, 0);
uint64_t dmabuf_len = strtoull(argv[5], NULL, 0);
uint32_t group_num = group_for_bdf(bdf);
char group_path[PATH_MAX];
snprintf(group_path, sizeof(group_path), "/dev/vfio/%u", group_num);
int container_fd = open("/dev/vfio/vfio", O_RDWR);
ensure(container_fd >= 0);
printf("opening %s via %s\n", bdf, group_path);
int group_fd = open(group_path, O_RDWR);
ensure(group_fd >= 0);
ret = ioctl(group_fd, VFIO_GROUP_SET_CONTAINER, &container_fd);
ensure(!ret);
ret = ioctl(container_fd, VFIO_SET_IOMMU, VFIO_TYPE1v2_IOMMU);
ensure(!ret);
int device_fd = ioctl(group_fd, VFIO_GROUP_GET_DEVICE_FD, bdf);
ensure(device_fd >= 0);
uint8_t buf[sizeof(struct vfio_device_feature) +
sizeof(struct vfio_device_feature_dma_buf) +
sizeof(struct vfio_region_dma_range)]
__attribute__((aligned(32)));
struct vfio_device_feature *ft = (struct vfio_device_feature *)buf;
*ft = (struct vfio_device_feature){
.argsz = sizeof(buf),
.flags = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_DMA_BUF,
};
struct vfio_device_feature_dma_buf *ft_dma_buf =
(struct vfio_device_feature_dma_buf *)ft->data;
*ft_dma_buf = (struct vfio_device_feature_dma_buf){
.region_index = bar_idx,
.open_flags = O_RDWR,
.nr_ranges = 1,
};
ft_dma_buf->dma_ranges[0] = (struct vfio_region_dma_range){
.length = dmabuf_len,
.offset = bar_offs,
};
printf("allocating dma_buf bar_idx=%u, bar_offset=0x%lx, size=0x%lx\n",
bar_idx, bar_offs, dmabuf_len);
int dmabuf_fd = ioctl(device_fd, VFIO_DEVICE_FEATURE, buf);
ensure(dmabuf_fd >= 0);
printf("allocated dma_buf fd=%d\n", dmabuf_fd);
int num;
struct ibv_device **devs = ibv_get_device_list(&num);
ensure(devs && num > 0);
printf("discovered %d ibv devices:", num);
for (int i = 0; i < num; i++) {
printf(" %s", ibv_get_device_name(devs[i]));
}
printf("\n");
ensure(ibv_idx < (uint32_t)num);
struct ibv_context *ctx = ibv_open_device(devs[ibv_idx]);
ensure(ctx);
printf("opened ibv device %d: %s\n", ibv_idx,
ibv_get_device_name(devs[ibv_idx]));
struct ibv_pd *pd = ibv_alloc_pd(ctx);
ensure(pd);
uint64_t offset = 0;
uint64_t iova = 0;
int access = IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_READ |
IBV_ACCESS_REMOTE_WRITE;
struct ibv_mr *mr = ibv_reg_dmabuf_mr(pd, offset, dmabuf_len, iova,
dmabuf_fd, access);
ensure(mr);
printf("registered dma_buf\n");
ret = ibv_dereg_mr(mr);
ensure(!ret);
printf("unregistered dma_buf\n");
ret = close(dmabuf_fd);
ensure(!ret);
printf("closed dma_buf fd\n");
return 0;
}
---
On Mon, Nov 03, 2025 at 12:07:12PM -0800, Alex Mastro wrote: > On Sun, Nov 02, 2025 at 10:00:48AM +0200, Leon Romanovsky wrote: > > Changelog: > > v6: > > * Fixed wrong error check from pcim_p2pdma_init(). > > * Documented pcim_p2pdma_provider() function. > > * Improved commit messages. > > * Added VFIO DMA-BUF selftest. > > * Added __counted_by(nr_ranges) annotation to struct vfio_device_feature_dma_buf. > > * Fixed error unwind when dma_buf_fd() fails. > > * Document latest changes to p2pmem. > > * Removed EXPORT_SYMBOL_GPL from pci_p2pdma_map_type. > > * Moved DMA mapping logic to DMA-BUF. > > * Removed types patch to avoid dependencies between subsystems. > > * Moved vfio_pci_dma_buf_move() in err_undo block. > > * Added nvgrace patch. > > Thanks Leon. Attaching a toy program which sanity tests the dma-buf export UAPI > by feeding the allocated dma-buf into an dma-buf importer (libibverbs + CX-7). > > Tested-by: Alex Mastro <amastro@fb.com> Thanks a lot.
On Mon, Nov 03, 2025 at 12:07:12PM -0800, Alex Mastro wrote:
> On Sun, Nov 02, 2025 at 10:00:48AM +0200, Leon Romanovsky wrote:
> > Changelog:
> > v6:
> > * Fixed wrong error check from pcim_p2pdma_init().
> > * Documented pcim_p2pdma_provider() function.
> > * Improved commit messages.
> > * Added VFIO DMA-BUF selftest.
> > * Added __counted_by(nr_ranges) annotation to struct vfio_device_feature_dma_buf.
> > * Fixed error unwind when dma_buf_fd() fails.
> > * Document latest changes to p2pmem.
> > * Removed EXPORT_SYMBOL_GPL from pci_p2pdma_map_type.
> > * Moved DMA mapping logic to DMA-BUF.
> > * Removed types patch to avoid dependencies between subsystems.
> > * Moved vfio_pci_dma_buf_move() in err_undo block.
> > * Added nvgrace patch.
>
> Thanks Leon. Attaching a toy program which sanity tests the dma-buf export UAPI
> by feeding the allocated dma-buf into an dma-buf importer (libibverbs + CX-7).
Oh! Here is my toy program to do the same with iommufd as the importer:
#define _GNU_SOURCE
#define __user
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include "include/uapi/linux/vfio.h"
#include "include/uapi/linux/iommufd.h"
#include <string.h>
#include <sys/mman.h>
#include <errno.h>
int main(int argc, const char *argv[])
{
int vfio_dev_fd, iommufd_fd, ret;
// Open the per-device VFIO file (e.g., /dev/vfio/devices/vfio3)
vfio_dev_fd = open("/dev/vfio/devices/vfio0", O_RDWR);
if (vfio_dev_fd < 0) {
perror("Failed to open VFIO per-device file");
return 1;
}
// Open /dev/iommu for iommufd
iommufd_fd = open("/dev/iommu", O_RDWR);
if (iommufd_fd < 0) {
perror("Failed to open /dev/iommu");
close(vfio_dev_fd);
return 1;
}
// Bind device FD to iommufd
struct vfio_device_bind_iommufd bind = {
.argsz = sizeof(bind),
.flags = 0,
.iommufd = iommufd_fd,
};
ret = ioctl(vfio_dev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);
if (ret < 0) {
perror("VFIO_DEVICE_BIND_IOMMUFD failed");
close(vfio_dev_fd);
close(iommufd_fd);
return 1;
}
// Allocate an IOAS (I/O address space)
struct iommu_ioas_alloc alloc_data = {
.size = sizeof(alloc_data),
.flags = 0,
};
ret = ioctl(iommufd_fd, IOMMU_IOAS_ALLOC, &alloc_data);
if (ret < 0) {
perror("IOMMU_IOAS_ALLOC failed");
close(vfio_dev_fd);
close(iommufd_fd);
return 1;
}
// Attach the device to the IOAS
struct vfio_device_attach_iommufd_pt attach_data = {
.argsz = sizeof(attach_data),
.flags = 0,
.pt_id = alloc_data.out_ioas_id,
};
ret = ioctl(vfio_dev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data);
if (ret < 0) {
perror("VFIO_DEVICE_ATTACH_IOMMUFD_PT failed");
close(vfio_dev_fd);
close(iommufd_fd);
return 1;
}
#if 0
int mapfd = memfd_create("test", MFD_CLOEXEC);
if (mapfd == -1) {
perror("memfd_create failed");
return 1;
}
ftruncate(mapfd, 4096);
#else
struct dmabuf_arg {
struct vfio_device_feature hdr;
struct vfio_device_feature_dma_buf dma_buf;
struct vfio_region_dma_range range;
} dma_buf_feature = {
.hdr = { .argsz = sizeof(dma_buf_feature),
.flags = VFIO_DEVICE_FEATURE_GET |
VFIO_DEVICE_FEATURE_DMA_BUF },
.dma_buf = { .region_index = VFIO_PCI_BAR0_REGION_INDEX,
.open_flags = O_CLOEXEC,
.nr_ranges = 1 },
.range = { .length = 4096 },
};
ret = ioctl(vfio_dev_fd, VFIO_DEVICE_FEATURE, &dma_buf_feature);
if (ret < 0) {
perror("VFIO_DEVICE_FEATURE_GET failed");
return 1;
}
int mapfd = ret;
#endif
struct iommu_ioas_map_file map_file = {
.size = sizeof(map_file),
.flags = IOMMU_IOAS_MAP_WRITEABLE | IOMMU_IOAS_MAP_READABLE,
.ioas_id = alloc_data.out_ioas_id,
.fd = mapfd,
.start = 0,
.length = 4096,
};
ret = ioctl(iommufd_fd, IOMMU_IOAS_MAP_FILE, &map_file);
if (ret < 0) {
perror("IOMMU_IOAS_MAP_FILE failed");
return 1;
}
printf("Successfully attached device to IOAS ID: %u\n",
alloc_data.out_ioas_id);
close(vfio_dev_fd);
close(iommufd_fd);
return 0;
}
On Sun, Nov 02, 2025 at 10:00:48AM +0200, Leon Romanovsky wrote: > Changelog: > v6: > * Fixed wrong error check from pcim_p2pdma_init(). > * Documented pcim_p2pdma_provider() function. > * Improved commit messages. > * Added VFIO DMA-BUF selftest. > * Added __counted_by(nr_ranges) annotation to struct vfio_device_feature_dma_buf. > * Fixed error unwind when dma_buf_fd() fails. > * Document latest changes to p2pmem. > * Removed EXPORT_SYMBOL_GPL from pci_p2pdma_map_type. > * Moved DMA mapping logic to DMA-BUF. > * Removed types patch to avoid dependencies between subsystems. > * Moved vfio_pci_dma_buf_move() in err_undo block. > * Added nvgrace patch. I have verified this v6 using Jason's iommufd dmabuf branch: https://github.com/jgunthorpe/linux/commits/iommufd_dmabuf/ by drafting a QEMU patch on top of Shameer's vSMMU v5 series: https://github.com/nicolinc/qemu/commits/wip/iommufd_dmabuf/ with that, I see GPU BAR memory be correctly fetched in the QEMU: vfio_region_dmabuf Device 0009:01:00.0, region "0009:01:00.0 BAR 0", offset: 0x0, size: 0x1000000 vfio_region_dmabuf Device 0009:01:00.0, region "0009:01:00.0 BAR 2", offset: 0x0, size: 0x44f00000 vfio_region_dmabuf Device 0009:01:00.0, region "0009:01:00.0 BAR 4", offset: 0x0, size: 0x17a0000000 Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Hello Nicolin, On 11/4/25 20:19, Nicolin Chen wrote: > On Sun, Nov 02, 2025 at 10:00:48AM +0200, Leon Romanovsky wrote: >> Changelog: >> v6: >> * Fixed wrong error check from pcim_p2pdma_init(). >> * Documented pcim_p2pdma_provider() function. >> * Improved commit messages. >> * Added VFIO DMA-BUF selftest. >> * Added __counted_by(nr_ranges) annotation to struct vfio_device_feature_dma_buf. >> * Fixed error unwind when dma_buf_fd() fails. >> * Document latest changes to p2pmem. >> * Removed EXPORT_SYMBOL_GPL from pci_p2pdma_map_type. >> * Moved DMA mapping logic to DMA-BUF. >> * Removed types patch to avoid dependencies between subsystems. >> * Moved vfio_pci_dma_buf_move() in err_undo block. >> * Added nvgrace patch. > > I have verified this v6 using Jason's iommufd dmabuf branch: > https://github.com/jgunthorpe/linux/commits/iommufd_dmabuf/ > > by drafting a QEMU patch on top of Shameer's vSMMU v5 series: > https://github.com/nicolinc/qemu/commits/wip/iommufd_dmabuf/ > > with that, I see GPU BAR memory be correctly fetched in the QEMU: > vfio_region_dmabuf Device 0009:01:00.0, region "0009:01:00.0 BAR 0", offset: 0x0, size: 0x1000000 > vfio_region_dmabuf Device 0009:01:00.0, region "0009:01:00.0 BAR 2", offset: 0x0, size: 0x44f00000 > vfio_region_dmabuf Device 0009:01:00.0, region "0009:01:00.0 BAR 4", offset: 0x0, size: 0x17a0000000 > > Tested-by: Nicolin Chen <nicolinc@nvidia.com> Do you plan to provide P2P support with IOMMUFD for QEMU ? Thanks, C.
On Tue, Nov 04, 2025 at 11:19:43AM -0800, Nicolin Chen wrote: > On Sun, Nov 02, 2025 at 10:00:48AM +0200, Leon Romanovsky wrote: > > Changelog: > > v6: > > * Fixed wrong error check from pcim_p2pdma_init(). > > * Documented pcim_p2pdma_provider() function. > > * Improved commit messages. > > * Added VFIO DMA-BUF selftest. > > * Added __counted_by(nr_ranges) annotation to struct vfio_device_feature_dma_buf. > > * Fixed error unwind when dma_buf_fd() fails. > > * Document latest changes to p2pmem. > > * Removed EXPORT_SYMBOL_GPL from pci_p2pdma_map_type. > > * Moved DMA mapping logic to DMA-BUF. > > * Removed types patch to avoid dependencies between subsystems. > > * Moved vfio_pci_dma_buf_move() in err_undo block. > > * Added nvgrace patch. > > I have verified this v6 using Jason's iommufd dmabuf branch: > https://github.com/jgunthorpe/linux/commits/iommufd_dmabuf/ > > by drafting a QEMU patch on top of Shameer's vSMMU v5 series: > https://github.com/nicolinc/qemu/commits/wip/iommufd_dmabuf/ > > with that, I see GPU BAR memory be correctly fetched in the QEMU: > vfio_region_dmabuf Device 0009:01:00.0, region "0009:01:00.0 BAR 0", offset: 0x0, size: 0x1000000 > vfio_region_dmabuf Device 0009:01:00.0, region "0009:01:00.0 BAR 2", offset: 0x0, size: 0x44f00000 > vfio_region_dmabuf Device 0009:01:00.0, region "0009:01:00.0 BAR 4", offset: 0x0, size: 0x17a0000000 > > Tested-by: Nicolin Chen <nicolinc@nvidia.com> Thanks a lot.
On Tue, Nov 04, 2025 at 11:19:43AM -0800, Nicolin Chen wrote: > On Sun, Nov 02, 2025 at 10:00:48AM +0200, Leon Romanovsky wrote: > > Changelog: > > v6: > > * Fixed wrong error check from pcim_p2pdma_init(). > > * Documented pcim_p2pdma_provider() function. > > * Improved commit messages. > > * Added VFIO DMA-BUF selftest. > > * Added __counted_by(nr_ranges) annotation to struct vfio_device_feature_dma_buf. > > * Fixed error unwind when dma_buf_fd() fails. > > * Document latest changes to p2pmem. > > * Removed EXPORT_SYMBOL_GPL from pci_p2pdma_map_type. > > * Moved DMA mapping logic to DMA-BUF. > > * Removed types patch to avoid dependencies between subsystems. > > * Moved vfio_pci_dma_buf_move() in err_undo block. > > * Added nvgrace patch. > > I have verified this v6 using Jason's iommufd dmabuf branch: > https://github.com/jgunthorpe/linux/commits/iommufd_dmabuf/ > > by drafting a QEMU patch on top of Shameer's vSMMU v5 series: > https://github.com/nicolinc/qemu/commits/wip/iommufd_dmabuf/ > > with that, I see GPU BAR memory be correctly fetched in the QEMU: > vfio_region_dmabuf Device 0009:01:00.0, region "0009:01:00.0 BAR 0", offset: 0x0, size: 0x1000000 > vfio_region_dmabuf Device 0009:01:00.0, region "0009:01:00.0 BAR 2", offset: 0x0, size: 0x44f00000 > vfio_region_dmabuf Device 0009:01:00.0, region "0009:01:00.0 BAR 4", offset: 0x0, size: 0x17a0000000 Great thanks! This means we finally have a solution to that follow_pfn lifetime problem in type 1! What a long journey :) For those following along this same flow will be used with KVM to allow it to map VFIO as well. Confidential Compute will require this because some arches can't put confidential MMIO (or RAM) into a VMA. Jason
© 2016 - 2026 Red Hat, Inc.