From: Yonatan Maman <Ymaman@Nvidia.com>
Add support for P2P for MLX5 NIC devices with automatic fallback to
standard DMA when P2P mapping fails.
The change introduces P2P DMA requests by default using the
HMM_PFN_ALLOW_P2P flag. If P2P mapping fails with -EFAULT error, the
operation is retried without the P2P flag, ensuring a fallback to
standard DMA flow (using host memory).
Signed-off-by: Yonatan Maman <Ymaman@Nvidia.com>
Signed-off-by: Gal Shalom <GalShalom@Nvidia.com>
---
drivers/infiniband/hw/mlx5/odp.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index f6abd64f07f7..6a0171117f48 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -715,6 +715,10 @@ static int pagefault_real_mr(struct mlx5_ib_mr *mr, struct ib_umem_odp *odp,
if (odp->umem.writable && !downgrade)
access_mask |= HMM_PFN_WRITE;
+ /*
+ * try fault with HMM_PFN_ALLOW_P2P flag
+ */
+ access_mask |= HMM_PFN_ALLOW_P2P;
np = ib_umem_odp_map_dma_and_lock(odp, user_va, bcnt, access_mask, fault);
if (np < 0)
return np;
@@ -724,6 +728,18 @@ static int pagefault_real_mr(struct mlx5_ib_mr *mr, struct ib_umem_odp *odp,
* ib_umem_odp_map_dma_and_lock already checks this.
*/
ret = mlx5r_umr_update_xlt(mr, start_idx, np, page_shift, xlt_flags);
+ if (ret == -EFAULT) {
+ /*
+ * Indicate P2P Mapping Error, retry with no HMM_PFN_ALLOW_P2P
+ */
+ mutex_unlock(&odp->umem_mutex);
+ access_mask &= ~(HMM_PFN_ALLOW_P2P);
+ np = ib_umem_odp_map_dma_and_lock(odp, user_va, bcnt, access_mask, fault);
+ if (np < 0)
+ return np;
+ ret = mlx5r_umr_update_xlt(mr, start_idx, np, page_shift, xlt_flags);
+ }
+
mutex_unlock(&odp->umem_mutex);
if (ret < 0) {
--
2.34.1
On Fri, Jul 18, 2025 at 02:51:11PM +0300, Yonatan Maman wrote: > From: Yonatan Maman <Ymaman@Nvidia.com> > > Add support for P2P for MLX5 NIC devices with automatic fallback to > standard DMA when P2P mapping fails. That's now how the P2P API works. You need to check the P2P availability higher up.
On Mon, Jul 21, 2025 at 12:03:41AM -0700, Christoph Hellwig wrote: > On Fri, Jul 18, 2025 at 02:51:11PM +0300, Yonatan Maman wrote: > > From: Yonatan Maman <Ymaman@Nvidia.com> > > > > Add support for P2P for MLX5 NIC devices with automatic fallback to > > standard DMA when P2P mapping fails. > > That's now how the P2P API works. You need to check the P2P availability > higher up. How do you mean? This looks OKish to me, for ODP and HMM it has to check the P2P availability on a page by page basis because every single page can be a different origin device. There isn't really a higher up here... Jason
On Wed, Jul 23, 2025 at 12:55:22AM -0300, Jason Gunthorpe wrote: > On Mon, Jul 21, 2025 at 12:03:41AM -0700, Christoph Hellwig wrote: > > On Fri, Jul 18, 2025 at 02:51:11PM +0300, Yonatan Maman wrote: > > > From: Yonatan Maman <Ymaman@Nvidia.com> > > > > > > Add support for P2P for MLX5 NIC devices with automatic fallback to > > > standard DMA when P2P mapping fails. > > > > That's now how the P2P API works. You need to check the P2P availability > > higher up. > > How do you mean? > > This looks OKish to me, for ODP and HMM it has to check the P2P > availability on a page by page basis because every single page can be > a different origin device. > > There isn't really a higher up here... The DMA API expects the caller to already check for connectability, why can't HMM do that like everyone else?
On Thu, Jul 24, 2025 at 12:30:34AM -0700, Christoph Hellwig wrote: > On Wed, Jul 23, 2025 at 12:55:22AM -0300, Jason Gunthorpe wrote: > > On Mon, Jul 21, 2025 at 12:03:41AM -0700, Christoph Hellwig wrote: > > > On Fri, Jul 18, 2025 at 02:51:11PM +0300, Yonatan Maman wrote: > > > > From: Yonatan Maman <Ymaman@Nvidia.com> > > > > > > > > Add support for P2P for MLX5 NIC devices with automatic fallback to > > > > standard DMA when P2P mapping fails. > > > > > > That's now how the P2P API works. You need to check the P2P availability > > > higher up. > > > > How do you mean? > > > > This looks OKish to me, for ODP and HMM it has to check the P2P > > availability on a page by page basis because every single page can be > > a different origin device. > > > > There isn't really a higher up here... > > The DMA API expects the caller to already check for connectability, > why can't HMM do that like everyone else? It does, this doesn't change anything about how the DMA API works. All this series does, and you stated it perfectly, is to allow HMM to return the single PCI P2P alias of the device private page. HMM already blindly returns normal P2P pages in a VMA, it should also blindly return the P2P alias pages too. Once the P2P is returned the xisting code in hmm_dma_map_pfn() calls pci_p2pdma_state() to find out if it is compatible or not. Lifting the pci_p2pdma_state() from hmm_dma_map_pfn() and into hmm_range_fault() is perhaps possible and may be reasonable, but not really related to this series. Jason
© 2016 - 2025 Red Hat, Inc.