From: Dragos Tatulea <dtatulea@nvidia.com>
Declare netmem TX support in netdev.
As required, use the netmem aware dma unmapping APIs
for unmapping netmems in tx completion path.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h | 3 ++-
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 ++
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
index e837c21d3d21..6501252359b0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
@@ -362,7 +362,8 @@ mlx5e_tx_dma_unmap(struct device *pdev, struct mlx5e_sq_dma *dma)
dma_unmap_single(pdev, dma->addr, dma->size, DMA_TO_DEVICE);
break;
case MLX5E_DMA_MAP_PAGE:
- dma_unmap_page(pdev, dma->addr, dma->size, DMA_TO_DEVICE);
+ netmem_dma_unmap_page_attrs(pdev, dma->addr, dma->size,
+ DMA_TO_DEVICE, 0);
break;
default:
WARN_ONCE(true, "mlx5e_tx_dma_unmap unknown DMA type!\n");
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index b4df62b58292..24559cbcbfc2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -5741,6 +5741,8 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
netdev->priv_flags |= IFF_UNICAST_FLT;
+ netdev->netmem_tx = true;
+
netif_set_tso_max_size(netdev, GSO_MAX_SIZE);
mlx5e_set_xdp_feature(netdev);
mlx5e_set_netdev_dev_addr(netdev);
--
2.34.1
On 06/16, Mark Bloch wrote: > From: Dragos Tatulea <dtatulea@nvidia.com> > > Declare netmem TX support in netdev. > > As required, use the netmem aware dma unmapping APIs > for unmapping netmems in tx completion path. > > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> > Reviewed-by: Tariq Toukan <tariqt@nvidia.com> > Reviewed-by: Mina Almasry <almasrymina@google.com> > Signed-off-by: Mark Bloch <mbloch@nvidia.com> > --- > drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h | 3 ++- > drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 ++ > 2 files changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h > index e837c21d3d21..6501252359b0 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h > @@ -362,7 +362,8 @@ mlx5e_tx_dma_unmap(struct device *pdev, struct mlx5e_sq_dma *dma) > dma_unmap_single(pdev, dma->addr, dma->size, DMA_TO_DEVICE); > break; > case MLX5E_DMA_MAP_PAGE: > - dma_unmap_page(pdev, dma->addr, dma->size, DMA_TO_DEVICE); > + netmem_dma_unmap_page_attrs(pdev, dma->addr, dma->size, > + DMA_TO_DEVICE, 0); For this to work, the dma->addr needs to be 0, so the callers of the dma_map() need to be adjusted as well, or am I missing something? There is netmem_dma_unmap_addr_set to handle that, but I don't see anybody calling it. Do we need to add the following (untested)? diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c index 55a8629f0792..fb6465210aed 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c @@ -210,7 +210,9 @@ mlx5e_txwqe_build_dsegs(struct mlx5e_txqsq *sq, struct sk_buff *skb, if (unlikely(dma_mapping_error(sq->pdev, dma_addr))) goto dma_unmap_wqe_err; - dseg->addr = cpu_to_be64(dma_addr); + dseg->addr = 0; + if (!netmem_is_net_iov(skb_frag_netmem(frag))) + dseg->addr = cpu_to_be64(dma_addr); dseg->lkey = sq->mkey_be; dseg->byte_count = cpu_to_be32(fsz);
On Wed, Jun 18, 2025 at 03:16:15PM -0700, Stanislav Fomichev wrote: > On 06/16, Mark Bloch wrote: > > From: Dragos Tatulea <dtatulea@nvidia.com> > > > > Declare netmem TX support in netdev. > > > > As required, use the netmem aware dma unmapping APIs > > for unmapping netmems in tx completion path. > > > > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> > > Reviewed-by: Tariq Toukan <tariqt@nvidia.com> > > Reviewed-by: Mina Almasry <almasrymina@google.com> > > Signed-off-by: Mark Bloch <mbloch@nvidia.com> > > --- > > drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h | 3 ++- > > drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 ++ > > 2 files changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h > > index e837c21d3d21..6501252359b0 100644 > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h > > @@ -362,7 +362,8 @@ mlx5e_tx_dma_unmap(struct device *pdev, struct mlx5e_sq_dma *dma) > > dma_unmap_single(pdev, dma->addr, dma->size, DMA_TO_DEVICE); > > break; > > case MLX5E_DMA_MAP_PAGE: > > - dma_unmap_page(pdev, dma->addr, dma->size, DMA_TO_DEVICE); > > + netmem_dma_unmap_page_attrs(pdev, dma->addr, dma->size, > > + DMA_TO_DEVICE, 0); > > For this to work, the dma->addr needs to be 0, so the callers of the > dma_map() need to be adjusted as well, or am I missing something? > There is netmem_dma_unmap_addr_set to handle that, but I don't see > anybody calling it. Do we need to add the following (untested)? > Hmmmm... yes. I figured that skb_frag_dma_map() would do the work but I was wrong, it is not enough. > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c > index 55a8629f0792..fb6465210aed 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c > @@ -210,7 +210,9 @@ mlx5e_txwqe_build_dsegs(struct mlx5e_txqsq *sq, struct sk_buff *skb, > if (unlikely(dma_mapping_error(sq->pdev, dma_addr))) > goto dma_unmap_wqe_err; > > - dseg->addr = cpu_to_be64(dma_addr); > + dseg->addr = 0; > + if (!netmem_is_net_iov(skb_frag_netmem(frag))) > + dseg->addr = cpu_to_be64(dma_addr); AFAIU we still want to pass the computed dma_address to the data segment to the HW. We only need to make sure in mlx5e_dma_push() to set dma_addr to 0, to avoid calling netmem_dma_unmap_page_attrs() with dma->addr 0. Like in the snippet below. Do you agree? We will send a fix patch once the above question is answered. Also, is there a way to test this with more confidence? The ncdevmem tx test passed just fine. Thanks, Dragos diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c index 55a8629f0792..ecee2e4f678b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c @@ -214,6 +214,9 @@ mlx5e_txwqe_build_dsegs(struct mlx5e_txqsq *sq, struct sk_buff *skb, dseg->lkey = sq->mkey_be; dseg->byte_count = cpu_to_be32(fsz); + if (!netmem_is_net_iov(skb_frag_netmem(frag))) + dma_addr = 0; + mlx5e_dma_push(sq, dma_addr, fsz, MLX5E_DMA_MAP_PAGE); num_dma++; dseg++;
On Thu, Jun 19, 2025 at 12:20 AM Dragos Tatulea <dtatulea@nvidia.com> wrote: > > On Wed, Jun 18, 2025 at 03:16:15PM -0700, Stanislav Fomichev wrote: > > On 06/16, Mark Bloch wrote: > > > From: Dragos Tatulea <dtatulea@nvidia.com> > > > > > > Declare netmem TX support in netdev. > > > > > > As required, use the netmem aware dma unmapping APIs > > > for unmapping netmems in tx completion path. > > > > > > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> > > > Reviewed-by: Tariq Toukan <tariqt@nvidia.com> > > > Reviewed-by: Mina Almasry <almasrymina@google.com> > > > Signed-off-by: Mark Bloch <mbloch@nvidia.com> > > > --- > > > drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h | 3 ++- > > > drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 ++ > > > 2 files changed, 4 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h > > > index e837c21d3d21..6501252359b0 100644 > > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h > > > @@ -362,7 +362,8 @@ mlx5e_tx_dma_unmap(struct device *pdev, struct mlx5e_sq_dma *dma) > > > dma_unmap_single(pdev, dma->addr, dma->size, DMA_TO_DEVICE); > > > break; > > > case MLX5E_DMA_MAP_PAGE: > > > - dma_unmap_page(pdev, dma->addr, dma->size, DMA_TO_DEVICE); > > > + netmem_dma_unmap_page_attrs(pdev, dma->addr, dma->size, > > > + DMA_TO_DEVICE, 0); > > > > For this to work, the dma->addr needs to be 0, so the callers of the > > dma_map() need to be adjusted as well, or am I missing something? > > There is netmem_dma_unmap_addr_set to handle that, but I don't see > > anybody calling it. Do we need to add the following (untested)? > > > Hmmmm... yes. I figured that skb_frag_dma_map() would do the work > but I was wrong, it is not enough. > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c > > index 55a8629f0792..fb6465210aed 100644 > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c > > @@ -210,7 +210,9 @@ mlx5e_txwqe_build_dsegs(struct mlx5e_txqsq *sq, struct sk_buff *skb, > > if (unlikely(dma_mapping_error(sq->pdev, dma_addr))) > > goto dma_unmap_wqe_err; > > > > - dseg->addr = cpu_to_be64(dma_addr); > > + dseg->addr = 0; > > + if (!netmem_is_net_iov(skb_frag_netmem(frag))) > > + dseg->addr = cpu_to_be64(dma_addr); > AFAIU we still want to pass the computed dma_address to the data segment > to the HW. We only need to make sure in mlx5e_dma_push() to set dma_addr > to 0, yes > to avoid calling netmem_dma_unmap_page_attrs() with dma->addr 0. > Like in the snippet below. Do you agree? > the opposite. You want netmem_dma_unmap_page_attrs() to be called with dma->addr == 0, so that is will skip the dma unmapping. > We will send a fix patch once the above question is answered. Also, is > there a way to test this with more confidence? The ncdevmem tx test > passed just fine. > You have to test ncdevmem tx on a platform with iommu enabled. Only in this case the netmem_dma_unmap_page_attrs() may cause a problem, and even then it's not a sure thing. It depends on the type of iommu and type of dmabuf i think. > Thanks, > Dragos > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c > index 55a8629f0792..ecee2e4f678b 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c > @@ -214,6 +214,9 @@ mlx5e_txwqe_build_dsegs(struct mlx5e_txqsq *sq, struct sk_buff *skb, > dseg->lkey = sq->mkey_be; > dseg->byte_count = cpu_to_be32(fsz); > > + if (!netmem_is_net_iov(skb_frag_netmem(frag))) > + dma_addr = 0; > + > mlx5e_dma_push(sq, dma_addr, fsz, MLX5E_DMA_MAP_PAGE); > num_dma++; If you can find a way to do this via netmem_dma_unmap_addr_set, I think that would be better, so you're not relying on a manual netmem_is_net_iov check. The way you'd do that is you'd pass skb_frag_netmem(frag) to mlx5e_dma_push, and then replace the `dma->addr = addr` with netmem_dma_unmap_addr_set. But up to you. If you decide to do a net_iov check and dma_addr = 0, add a comment please. -- Thanks, Mina
On Thu, Jun 19, 2025 at 08:32:48AM -0700, Mina Almasry wrote: > On Thu, Jun 19, 2025 at 12:20 AM Dragos Tatulea <dtatulea@nvidia.com> wrote: > > > > On Wed, Jun 18, 2025 at 03:16:15PM -0700, Stanislav Fomichev wrote: > > > On 06/16, Mark Bloch wrote: > > > > From: Dragos Tatulea <dtatulea@nvidia.com> > > > > > > > > Declare netmem TX support in netdev. > > > > > > > > As required, use the netmem aware dma unmapping APIs > > > > for unmapping netmems in tx completion path. > > > > > > > > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> > > > > Reviewed-by: Tariq Toukan <tariqt@nvidia.com> > > > > Reviewed-by: Mina Almasry <almasrymina@google.com> > > > > Signed-off-by: Mark Bloch <mbloch@nvidia.com> > > > > --- > > > > drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h | 3 ++- > > > > drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 ++ > > > > 2 files changed, 4 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h > > > > index e837c21d3d21..6501252359b0 100644 > > > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h > > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h > > > > @@ -362,7 +362,8 @@ mlx5e_tx_dma_unmap(struct device *pdev, struct mlx5e_sq_dma *dma) > > > > dma_unmap_single(pdev, dma->addr, dma->size, DMA_TO_DEVICE); > > > > break; > > > > case MLX5E_DMA_MAP_PAGE: > > > > - dma_unmap_page(pdev, dma->addr, dma->size, DMA_TO_DEVICE); > > > > + netmem_dma_unmap_page_attrs(pdev, dma->addr, dma->size, > > > > + DMA_TO_DEVICE, 0); > > > > > > For this to work, the dma->addr needs to be 0, so the callers of the > > > dma_map() need to be adjusted as well, or am I missing something? > > > There is netmem_dma_unmap_addr_set to handle that, but I don't see > > > anybody calling it. Do we need to add the following (untested)? > > > > > Hmmmm... yes. I figured that skb_frag_dma_map() would do the work > > but I was wrong, it is not enough. > > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c > > > index 55a8629f0792..fb6465210aed 100644 > > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c > > > @@ -210,7 +210,9 @@ mlx5e_txwqe_build_dsegs(struct mlx5e_txqsq *sq, struct sk_buff *skb, > > > if (unlikely(dma_mapping_error(sq->pdev, dma_addr))) > > > goto dma_unmap_wqe_err; > > > > > > - dseg->addr = cpu_to_be64(dma_addr); > > > + dseg->addr = 0; > > > + if (!netmem_is_net_iov(skb_frag_netmem(frag))) > > > + dseg->addr = cpu_to_be64(dma_addr); > > AFAIU we still want to pass the computed dma_address to the data segment > > to the HW. We only need to make sure in mlx5e_dma_push() to set dma_addr > > to 0, > > yes > > > to avoid calling netmem_dma_unmap_page_attrs() with dma->addr 0. > > Like in the snippet below. Do you agree? > > > > the opposite. You want netmem_dma_unmap_page_attrs() to be called with > dma->addr == 0, so that is will skip the dma unmapping. > Yes sorry, that's what I meant to say. > > We will send a fix patch once the above question is answered. Also, is > > there a way to test this with more confidence? The ncdevmem tx test > > passed just fine. > > > > You have to test ncdevmem tx on a platform with iommu enabled. Only in > this case the netmem_dma_unmap_page_attrs() may cause a problem, and > even then it's not a sure thing. It depends on the type of iommu and > type of dmabuf i think. > Is it worth adding a WARN_ON_ONCE(netmem_is_net_iov()) in netmem_dma_unmap_page_attrs() after addr check to catch these kinds of misuse? > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c > > index 55a8629f0792..ecee2e4f678b 100644 > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c > > @@ -214,6 +214,9 @@ mlx5e_txwqe_build_dsegs(struct mlx5e_txqsq *sq, struct sk_buff *skb, > > dseg->lkey = sq->mkey_be; > > dseg->byte_count = cpu_to_be32(fsz); > > > > + if (!netmem_is_net_iov(skb_frag_netmem(frag))) > > + dma_addr = 0; > > + > > mlx5e_dma_push(sq, dma_addr, fsz, MLX5E_DMA_MAP_PAGE); > > num_dma++; > > If you can find a way to do this via netmem_dma_unmap_addr_set, I > think that would be better, so you're not relying on a manual > netmem_is_net_iov check. > > The way you'd do that is you'd pass skb_frag_netmem(frag) to > mlx5e_dma_push, and then replace the `dma->addr = addr` with > netmem_dma_unmap_addr_set. But up to you. > Thanks for the suggestion. This would require some additional refactoring. I need to play with this to see if it requires a lot of rewiring or not. > If you decide to do a net_iov check and dma_addr = 0, add a comment please. > Ack. Thanks, Dragos
> > If you can find a way to do this via netmem_dma_unmap_addr_set, I > > think that would be better, so you're not relying on a manual > > netmem_is_net_iov check. > > > > The way you'd do that is you'd pass skb_frag_netmem(frag) to > > mlx5e_dma_push, and then replace the `dma->addr = addr` with > > netmem_dma_unmap_addr_set. But up to you. > > > Thanks for the suggestion. This would require some additional > refactoring. I need to play with this to see if it requires a > lot of rewiring or not. > Got around to this. Found a way to use netmem_dma_unmap_addr_set() with a small refactoring that makes sense. We'll send a patch soon. Thanks, Dragos
On Thu, Jun 19, 2025 at 9:07 AM Dragos Tatulea <dtatulea@nvidia.com> wrote: > > You have to test ncdevmem tx on a platform with iommu enabled. Only in > > this case the netmem_dma_unmap_page_attrs() may cause a problem, and > > even then it's not a sure thing. It depends on the type of iommu and > > type of dmabuf i think. > > > Is it worth adding a WARN_ON_ONCE(netmem_is_net_iov()) > in netmem_dma_unmap_page_attrs() after addr check to catch these kinds > of misuse? > I would say it's worth it, but it's the same challenge you point to in your reply: netmem_dma_unmap_page_attrs currently doesn't take in a netmem, and it may be a big refactor not worth it if it's callers also don't have a reference to the netmem readily available to pass it. -- Thanks, Mina
© 2016 - 2025 Red Hat, Inc.