[PATCH v8 09/11] vfio/pci: Enable peer-to-peer DMA transactions by default

Leon Romanovsky posted 11 patches 3 months ago
There is a newer version of this series
[PATCH v8 09/11] vfio/pci: Enable peer-to-peer DMA transactions by default
Posted by Leon Romanovsky 3 months ago
From: Leon Romanovsky <leonro@nvidia.com>

Make sure that all VFIO PCI devices have peer-to-peer capabilities
enables, so we would be able to export their MMIO memory through DMABUF,

VFIO has always supported P2P mappings with itself. VFIO type 1
insecurely reads PFNs directly out of a VMA's PTEs and programs them
into the IOMMU allowing any two VFIO devices to perform P2P to each
other.

All existing VMMs use this capability to export P2P into a VM where
the VM could setup any kind of DMA it likes. Projects like DPDK/SPDK
are also known to make use of this, though less frequently.

As a first step to more properly integrating VFIO with the P2P
subsystem unconditionally enable P2P support for VFIO PCI devices. The
struct p2pdma_provider will act has a handle to the P2P subsystem to
do things like DMA mapping.

While real PCI devices have to support P2P (they can't even tell if an
IOVA is P2P or not) there may be fake PCI devices that may trigger
some kind of catastrophic system failure. To date VFIO has never
tripped up on such a case, but if one is discovered the plan is to add
a PCI quirk and have pcim_p2pdma_init() fail. This will fully block
the broken device throughout any users of the P2P subsystem in the
kernel.

Thus P2P through DMABUF will follow the historical VFIO model and be
unconditionally enabled by vfio-pci.

Tested-by: Alex Mastro <amastro@fb.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index ca9a95716a85..142b84b3f225 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -28,6 +28,7 @@
 #include <linux/nospec.h>
 #include <linux/sched/mm.h>
 #include <linux/iommufd.h>
+#include <linux/pci-p2pdma.h>
 #if IS_ENABLED(CONFIG_EEH)
 #include <asm/eeh.h>
 #endif
@@ -2081,6 +2082,7 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
 {
 	struct vfio_pci_core_device *vdev =
 		container_of(core_vdev, struct vfio_pci_core_device, vdev);
+	int ret;
 
 	vdev->pdev = to_pci_dev(core_vdev->dev);
 	vdev->irq_type = VFIO_PCI_NUM_IRQS;
@@ -2090,6 +2092,9 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
 	INIT_LIST_HEAD(&vdev->dummy_resources_list);
 	INIT_LIST_HEAD(&vdev->ioeventfds_list);
 	INIT_LIST_HEAD(&vdev->sriov_pfs_item);
+	ret = pcim_p2pdma_init(vdev->pdev);
+	if (ret && ret != -EOPNOTSUPP)
+		return ret;
 	init_rwsem(&vdev->memory_lock);
 	xa_init(&vdev->ctx);
 

-- 
2.51.1
RE: [PATCH v8 09/11] vfio/pci: Enable peer-to-peer DMA transactions by default
Posted by Tian, Kevin 2 months, 3 weeks ago
> From: Leon Romanovsky <leon@kernel.org>
> Sent: Tuesday, November 11, 2025 5:58 PM
> 
> From: Leon Romanovsky <leonro@nvidia.com>

not required with only your own s-o-b

> @@ -2090,6 +2092,9 @@ int vfio_pci_core_init_dev(struct vfio_device
> *core_vdev)
>  	INIT_LIST_HEAD(&vdev->dummy_resources_list);
>  	INIT_LIST_HEAD(&vdev->ioeventfds_list);
>  	INIT_LIST_HEAD(&vdev->sriov_pfs_item);
> +	ret = pcim_p2pdma_init(vdev->pdev);
> +	if (ret && ret != -EOPNOTSUPP)
> +		return ret;

Reading the commit msg seems -EOPNOTSUPP is only returned for fake
PCI devices, otherwise it implies regression. better add a comment for it?

otherwise,

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Re: [PATCH v8 09/11] vfio/pci: Enable peer-to-peer DMA transactions by default
Posted by Keith Busch 2 months, 3 weeks ago
On Tue, Nov 18, 2025 at 07:18:36AM +0000, Tian, Kevin wrote:
> > From: Leon Romanovsky <leon@kernel.org>
> > Sent: Tuesday, November 11, 2025 5:58 PM
> > 
> > From: Leon Romanovsky <leonro@nvidia.com>
> 
> not required with only your own s-o-b

That's automatically appended when the sender and signer don't match.
It's not uncommon for developers to send from a kernel.org email but
sign off with a corporate account, or the other way around.
RE: [PATCH v8 09/11] vfio/pci: Enable peer-to-peer DMA transactions by default
Posted by Tian, Kevin 2 months, 3 weeks ago
> From: Keith Busch <kbusch@kernel.org>
> Sent: Wednesday, November 19, 2025 4:19 AM
> 
> On Tue, Nov 18, 2025 at 07:18:36AM +0000, Tian, Kevin wrote:
> > > From: Leon Romanovsky <leon@kernel.org>
> > > Sent: Tuesday, November 11, 2025 5:58 PM
> > >
> > > From: Leon Romanovsky <leonro@nvidia.com>
> >
> > not required with only your own s-o-b
> 
> That's automatically appended when the sender and signer don't match.
> It's not uncommon for developers to send from a kernel.org email but
> sign off with a corporate account, or the other way around.

Good to know.
Re: [PATCH v8 09/11] vfio/pci: Enable peer-to-peer DMA transactions by default
Posted by Leon Romanovsky 2 months, 3 weeks ago
On Wed, Nov 19, 2025 at 12:02:02AM +0000, Tian, Kevin wrote:
> > From: Keith Busch <kbusch@kernel.org>
> > Sent: Wednesday, November 19, 2025 4:19 AM
> > 
> > On Tue, Nov 18, 2025 at 07:18:36AM +0000, Tian, Kevin wrote:
> > > > From: Leon Romanovsky <leon@kernel.org>
> > > > Sent: Tuesday, November 11, 2025 5:58 PM
> > > >
> > > > From: Leon Romanovsky <leonro@nvidia.com>
> > >
> > > not required with only your own s-o-b
> > 
> > That's automatically appended when the sender and signer don't match.
> > It's not uncommon for developers to send from a kernel.org email but
> > sign off with a corporate account, or the other way around.
> 
> Good to know.

Yes, in addition, I used to separate between code authorship and my
open-source activity. Code belongs to my employer and this is why corporate
address is used as an author, but all emails and communications are coming from
my kernel.org account.

Thanks
Re: [PATCH v8 09/11] vfio/pci: Enable peer-to-peer DMA transactions by default
Posted by Alex Williamson 2 months, 3 weeks ago
On Tue, 18 Nov 2025 07:18:36 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Leon Romanovsky <leon@kernel.org>
> > Sent: Tuesday, November 11, 2025 5:58 PM
> > 
> > From: Leon Romanovsky <leonro@nvidia.com>  
> 
> not required with only your own s-o-b
> 
> > @@ -2090,6 +2092,9 @@ int vfio_pci_core_init_dev(struct vfio_device
> > *core_vdev)
> >  	INIT_LIST_HEAD(&vdev->dummy_resources_list);
> >  	INIT_LIST_HEAD(&vdev->ioeventfds_list);
> >  	INIT_LIST_HEAD(&vdev->sriov_pfs_item);
> > +	ret = pcim_p2pdma_init(vdev->pdev);
> > +	if (ret && ret != -EOPNOTSUPP)
> > +		return ret;  
> 
> Reading the commit msg seems -EOPNOTSUPP is only returned for fake
> PCI devices, otherwise it implies regression. better add a comment for it?

I think the commit log is saying that if a device comes along that
can't support this, we'd quirk the init path to return -EOPNOTSUPP for
that particular device here.  This path is currently used when
!CONFIG_PCI_P2PDMA to make this error non-fatal to the device init.

I don't see a regression if such a device comes along and while we
could survive other types of failures by disabling p2pdma here, I think
all such cases are sufficient rare out of memory cases to consider them
catastrophic.  Thanks,

Alex
RE: [PATCH v8 09/11] vfio/pci: Enable peer-to-peer DMA transactions by default
Posted by Tian, Kevin 2 months, 3 weeks ago
> From: Alex Williamson <alex@shazbot.org>
> Sent: Wednesday, November 19, 2025 4:11 AM
> 
> On Tue, 18 Nov 2025 07:18:36 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Leon Romanovsky <leon@kernel.org>
> > > Sent: Tuesday, November 11, 2025 5:58 PM
> > >
> > > From: Leon Romanovsky <leonro@nvidia.com>
> >
> > not required with only your own s-o-b
> >
> > > @@ -2090,6 +2092,9 @@ int vfio_pci_core_init_dev(struct vfio_device
> > > *core_vdev)
> > >  	INIT_LIST_HEAD(&vdev->dummy_resources_list);
> > >  	INIT_LIST_HEAD(&vdev->ioeventfds_list);
> > >  	INIT_LIST_HEAD(&vdev->sriov_pfs_item);
> > > +	ret = pcim_p2pdma_init(vdev->pdev);
> > > +	if (ret && ret != -EOPNOTSUPP)
> > > +		return ret;
> >
> > Reading the commit msg seems -EOPNOTSUPP is only returned for fake
> > PCI devices, otherwise it implies regression. better add a comment for it?
> 
> I think the commit log is saying that if a device comes along that
> can't support this, we'd quirk the init path to return -EOPNOTSUPP for
> that particular device here.  This path is currently used when
> !CONFIG_PCI_P2PDMA to make this error non-fatal to the device init.
> 
> I don't see a regression if such a device comes along and while we
> could survive other types of failures by disabling p2pdma here, I think
> all such cases are sufficient rare out of memory cases to consider them
> catastrophic.  Thanks,
> 

ah yes. I read it inaccurately.