[PATCH net-next v6 12/12] net/mlx5e: Add TX support for netmems

Mark Bloch posted 12 patches 3 months, 1 week ago
[PATCH net-next v6 12/12] net/mlx5e: Add TX support for netmems
Posted by Mark Bloch 3 months, 1 week ago
From: Dragos Tatulea <dtatulea@nvidia.com>

Declare netmem TX support in netdev.

As required, use the netmem aware dma unmapping APIs
for unmapping netmems in tx completion path.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h | 3 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
index e837c21d3d21..6501252359b0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
@@ -362,7 +362,8 @@ mlx5e_tx_dma_unmap(struct device *pdev, struct mlx5e_sq_dma *dma)
 		dma_unmap_single(pdev, dma->addr, dma->size, DMA_TO_DEVICE);
 		break;
 	case MLX5E_DMA_MAP_PAGE:
-		dma_unmap_page(pdev, dma->addr, dma->size, DMA_TO_DEVICE);
+		netmem_dma_unmap_page_attrs(pdev, dma->addr, dma->size,
+					    DMA_TO_DEVICE, 0);
 		break;
 	default:
 		WARN_ONCE(true, "mlx5e_tx_dma_unmap unknown DMA type!\n");
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index b4df62b58292..24559cbcbfc2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -5741,6 +5741,8 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
 
 	netdev->priv_flags       |= IFF_UNICAST_FLT;
 
+	netdev->netmem_tx = true;
+
 	netif_set_tso_max_size(netdev, GSO_MAX_SIZE);
 	mlx5e_set_xdp_feature(netdev);
 	mlx5e_set_netdev_dev_addr(netdev);
-- 
2.34.1
Re: [PATCH net-next v6 12/12] net/mlx5e: Add TX support for netmems
Posted by Stanislav Fomichev 3 months, 1 week ago
On 06/16, Mark Bloch wrote:
> From: Dragos Tatulea <dtatulea@nvidia.com>
> 
> Declare netmem TX support in netdev.
> 
> As required, use the netmem aware dma unmapping APIs
> for unmapping netmems in tx completion path.
> 
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> Reviewed-by: Mina Almasry <almasrymina@google.com>
> Signed-off-by: Mark Bloch <mbloch@nvidia.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h | 3 ++-
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 ++
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
> index e837c21d3d21..6501252359b0 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
> @@ -362,7 +362,8 @@ mlx5e_tx_dma_unmap(struct device *pdev, struct mlx5e_sq_dma *dma)
>  		dma_unmap_single(pdev, dma->addr, dma->size, DMA_TO_DEVICE);
>  		break;
>  	case MLX5E_DMA_MAP_PAGE:
> -		dma_unmap_page(pdev, dma->addr, dma->size, DMA_TO_DEVICE);
> +		netmem_dma_unmap_page_attrs(pdev, dma->addr, dma->size,
> +					    DMA_TO_DEVICE, 0);

For this to work, the dma->addr needs to be 0, so the callers of the
dma_map() need to be adjusted as well, or am I missing something?
There is netmem_dma_unmap_addr_set to handle that, but I don't see
anybody calling it. Do we need to add the following (untested)?

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 55a8629f0792..fb6465210aed 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -210,7 +210,9 @@ mlx5e_txwqe_build_dsegs(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 		if (unlikely(dma_mapping_error(sq->pdev, dma_addr)))
 			goto dma_unmap_wqe_err;
 
-		dseg->addr       = cpu_to_be64(dma_addr);
+		dseg->addr = 0;
+		if (!netmem_is_net_iov(skb_frag_netmem(frag)))
+			dseg->addr = cpu_to_be64(dma_addr);
 		dseg->lkey       = sq->mkey_be;
 		dseg->byte_count = cpu_to_be32(fsz);
Re: [PATCH net-next v6 12/12] net/mlx5e: Add TX support for netmems
Posted by Dragos Tatulea 3 months, 1 week ago
On Wed, Jun 18, 2025 at 03:16:15PM -0700, Stanislav Fomichev wrote:
> On 06/16, Mark Bloch wrote:
> > From: Dragos Tatulea <dtatulea@nvidia.com>
> > 
> > Declare netmem TX support in netdev.
> > 
> > As required, use the netmem aware dma unmapping APIs
> > for unmapping netmems in tx completion path.
> > 
> > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> > Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> > Reviewed-by: Mina Almasry <almasrymina@google.com>
> > Signed-off-by: Mark Bloch <mbloch@nvidia.com>
> > ---
> >  drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h | 3 ++-
> >  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 ++
> >  2 files changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
> > index e837c21d3d21..6501252359b0 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
> > @@ -362,7 +362,8 @@ mlx5e_tx_dma_unmap(struct device *pdev, struct mlx5e_sq_dma *dma)
> >  		dma_unmap_single(pdev, dma->addr, dma->size, DMA_TO_DEVICE);
> >  		break;
> >  	case MLX5E_DMA_MAP_PAGE:
> > -		dma_unmap_page(pdev, dma->addr, dma->size, DMA_TO_DEVICE);
> > +		netmem_dma_unmap_page_attrs(pdev, dma->addr, dma->size,
> > +					    DMA_TO_DEVICE, 0);
> 
> For this to work, the dma->addr needs to be 0, so the callers of the
> dma_map() need to be adjusted as well, or am I missing something?
> There is netmem_dma_unmap_addr_set to handle that, but I don't see
> anybody calling it. Do we need to add the following (untested)?
>
Hmmmm... yes. I figured that skb_frag_dma_map() would do the work
but I was wrong, it is not enough.

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> index 55a8629f0792..fb6465210aed 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> @@ -210,7 +210,9 @@ mlx5e_txwqe_build_dsegs(struct mlx5e_txqsq *sq, struct sk_buff *skb,
>  		if (unlikely(dma_mapping_error(sq->pdev, dma_addr)))
>  			goto dma_unmap_wqe_err;
>  
> -		dseg->addr       = cpu_to_be64(dma_addr);
> +		dseg->addr = 0;
> +		if (!netmem_is_net_iov(skb_frag_netmem(frag)))
> +			dseg->addr = cpu_to_be64(dma_addr);
AFAIU we still want to pass the computed dma_address to the data segment
to the HW. We only need to make sure in mlx5e_dma_push() to set dma_addr
to 0, to avoid calling netmem_dma_unmap_page_attrs() with dma->addr 0.
Like in the snippet below. Do you agree?

We will send a fix patch once the above question is answered. Also, is
there a way to test this with more confidence? The ncdevmem tx test
passed just fine.

Thanks,
Dragos

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 55a8629f0792..ecee2e4f678b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -214,6 +214,9 @@ mlx5e_txwqe_build_dsegs(struct mlx5e_txqsq *sq, struct sk_buff *skb,
                dseg->lkey       = sq->mkey_be;
                dseg->byte_count = cpu_to_be32(fsz);
 
+               if (!netmem_is_net_iov(skb_frag_netmem(frag)))
+                       dma_addr = 0;
+
                mlx5e_dma_push(sq, dma_addr, fsz, MLX5E_DMA_MAP_PAGE);
                num_dma++;
                dseg++;
Re: [PATCH net-next v6 12/12] net/mlx5e: Add TX support for netmems
Posted by Mina Almasry 3 months, 1 week ago
On Thu, Jun 19, 2025 at 12:20 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> On Wed, Jun 18, 2025 at 03:16:15PM -0700, Stanislav Fomichev wrote:
> > On 06/16, Mark Bloch wrote:
> > > From: Dragos Tatulea <dtatulea@nvidia.com>
> > >
> > > Declare netmem TX support in netdev.
> > >
> > > As required, use the netmem aware dma unmapping APIs
> > > for unmapping netmems in tx completion path.
> > >
> > > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> > > Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> > > Reviewed-by: Mina Almasry <almasrymina@google.com>
> > > Signed-off-by: Mark Bloch <mbloch@nvidia.com>
> > > ---
> > >  drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h | 3 ++-
> > >  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 ++
> > >  2 files changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
> > > index e837c21d3d21..6501252359b0 100644
> > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
> > > @@ -362,7 +362,8 @@ mlx5e_tx_dma_unmap(struct device *pdev, struct mlx5e_sq_dma *dma)
> > >             dma_unmap_single(pdev, dma->addr, dma->size, DMA_TO_DEVICE);
> > >             break;
> > >     case MLX5E_DMA_MAP_PAGE:
> > > -           dma_unmap_page(pdev, dma->addr, dma->size, DMA_TO_DEVICE);
> > > +           netmem_dma_unmap_page_attrs(pdev, dma->addr, dma->size,
> > > +                                       DMA_TO_DEVICE, 0);
> >
> > For this to work, the dma->addr needs to be 0, so the callers of the
> > dma_map() need to be adjusted as well, or am I missing something?
> > There is netmem_dma_unmap_addr_set to handle that, but I don't see
> > anybody calling it. Do we need to add the following (untested)?
> >
> Hmmmm... yes. I figured that skb_frag_dma_map() would do the work
> but I was wrong, it is not enough.
>
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> > index 55a8629f0792..fb6465210aed 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> > @@ -210,7 +210,9 @@ mlx5e_txwqe_build_dsegs(struct mlx5e_txqsq *sq, struct sk_buff *skb,
> >               if (unlikely(dma_mapping_error(sq->pdev, dma_addr)))
> >                       goto dma_unmap_wqe_err;
> >
> > -             dseg->addr       = cpu_to_be64(dma_addr);
> > +             dseg->addr = 0;
> > +             if (!netmem_is_net_iov(skb_frag_netmem(frag)))
> > +                     dseg->addr = cpu_to_be64(dma_addr);
> AFAIU we still want to pass the computed dma_address to the data segment
> to the HW. We only need to make sure in mlx5e_dma_push() to set dma_addr
> to 0,

yes

> to avoid calling netmem_dma_unmap_page_attrs() with dma->addr 0.
> Like in the snippet below. Do you agree?
>

the opposite. You want netmem_dma_unmap_page_attrs() to be called with
dma->addr == 0, so that is will skip the dma unmapping.

> We will send a fix patch once the above question is answered. Also, is
> there a way to test this with more confidence? The ncdevmem tx test
> passed just fine.
>

You have to test ncdevmem tx on a platform with iommu enabled. Only in
this case the netmem_dma_unmap_page_attrs() may cause a problem, and
even then it's not a sure thing. It depends on the type of iommu and
type of dmabuf i think.

> Thanks,
> Dragos
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> index 55a8629f0792..ecee2e4f678b 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> @@ -214,6 +214,9 @@ mlx5e_txwqe_build_dsegs(struct mlx5e_txqsq *sq, struct sk_buff *skb,
>                 dseg->lkey       = sq->mkey_be;
>                 dseg->byte_count = cpu_to_be32(fsz);
>
> +               if (!netmem_is_net_iov(skb_frag_netmem(frag)))
> +                       dma_addr = 0;
> +
>                 mlx5e_dma_push(sq, dma_addr, fsz, MLX5E_DMA_MAP_PAGE);
>                 num_dma++;

If you can find a way to do this via netmem_dma_unmap_addr_set, I
think that would be better, so you're not relying on a manual
netmem_is_net_iov check.

The way you'd do that is you'd pass skb_frag_netmem(frag) to
mlx5e_dma_push, and then replace the `dma->addr = addr` with
netmem_dma_unmap_addr_set. But up to you.

If you decide to do a net_iov check and dma_addr = 0, add a comment please.

-- 
Thanks,
Mina
Re: [PATCH net-next v6 12/12] net/mlx5e: Add TX support for netmems
Posted by Dragos Tatulea 3 months, 1 week ago
On Thu, Jun 19, 2025 at 08:32:48AM -0700, Mina Almasry wrote:
> On Thu, Jun 19, 2025 at 12:20 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> >
> > On Wed, Jun 18, 2025 at 03:16:15PM -0700, Stanislav Fomichev wrote:
> > > On 06/16, Mark Bloch wrote:
> > > > From: Dragos Tatulea <dtatulea@nvidia.com>
> > > >
> > > > Declare netmem TX support in netdev.
> > > >
> > > > As required, use the netmem aware dma unmapping APIs
> > > > for unmapping netmems in tx completion path.
> > > >
> > > > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> > > > Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> > > > Reviewed-by: Mina Almasry <almasrymina@google.com>
> > > > Signed-off-by: Mark Bloch <mbloch@nvidia.com>
> > > > ---
> > > >  drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h | 3 ++-
> > > >  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 ++
> > > >  2 files changed, 4 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
> > > > index e837c21d3d21..6501252359b0 100644
> > > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
> > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
> > > > @@ -362,7 +362,8 @@ mlx5e_tx_dma_unmap(struct device *pdev, struct mlx5e_sq_dma *dma)
> > > >             dma_unmap_single(pdev, dma->addr, dma->size, DMA_TO_DEVICE);
> > > >             break;
> > > >     case MLX5E_DMA_MAP_PAGE:
> > > > -           dma_unmap_page(pdev, dma->addr, dma->size, DMA_TO_DEVICE);
> > > > +           netmem_dma_unmap_page_attrs(pdev, dma->addr, dma->size,
> > > > +                                       DMA_TO_DEVICE, 0);
> > >
> > > For this to work, the dma->addr needs to be 0, so the callers of the
> > > dma_map() need to be adjusted as well, or am I missing something?
> > > There is netmem_dma_unmap_addr_set to handle that, but I don't see
> > > anybody calling it. Do we need to add the following (untested)?
> > >
> > Hmmmm... yes. I figured that skb_frag_dma_map() would do the work
> > but I was wrong, it is not enough.
> >
> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> > > index 55a8629f0792..fb6465210aed 100644
> > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> > > @@ -210,7 +210,9 @@ mlx5e_txwqe_build_dsegs(struct mlx5e_txqsq *sq, struct sk_buff *skb,
> > >               if (unlikely(dma_mapping_error(sq->pdev, dma_addr)))
> > >                       goto dma_unmap_wqe_err;
> > >
> > > -             dseg->addr       = cpu_to_be64(dma_addr);
> > > +             dseg->addr = 0;
> > > +             if (!netmem_is_net_iov(skb_frag_netmem(frag)))
> > > +                     dseg->addr = cpu_to_be64(dma_addr);
> > AFAIU we still want to pass the computed dma_address to the data segment
> > to the HW. We only need to make sure in mlx5e_dma_push() to set dma_addr
> > to 0,
> 
> yes
> 
> > to avoid calling netmem_dma_unmap_page_attrs() with dma->addr 0.
> > Like in the snippet below. Do you agree?
> >
> 
> the opposite. You want netmem_dma_unmap_page_attrs() to be called with
> dma->addr == 0, so that is will skip the dma unmapping.
>
Yes sorry, that's what I meant to say.

> > We will send a fix patch once the above question is answered. Also, is
> > there a way to test this with more confidence? The ncdevmem tx test
> > passed just fine.
> >
> 
> You have to test ncdevmem tx on a platform with iommu enabled. Only in
> this case the netmem_dma_unmap_page_attrs() may cause a problem, and
> even then it's not a sure thing. It depends on the type of iommu and
> type of dmabuf i think.
> 
Is it worth adding a WARN_ON_ONCE(netmem_is_net_iov())
in netmem_dma_unmap_page_attrs() after addr check to catch these kinds
of misuse?

> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> > index 55a8629f0792..ecee2e4f678b 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> > @@ -214,6 +214,9 @@ mlx5e_txwqe_build_dsegs(struct mlx5e_txqsq *sq, struct sk_buff *skb,
> >                 dseg->lkey       = sq->mkey_be;
> >                 dseg->byte_count = cpu_to_be32(fsz);
> >
> > +               if (!netmem_is_net_iov(skb_frag_netmem(frag)))
> > +                       dma_addr = 0;
> > +
> >                 mlx5e_dma_push(sq, dma_addr, fsz, MLX5E_DMA_MAP_PAGE);
> >                 num_dma++;
> 
> If you can find a way to do this via netmem_dma_unmap_addr_set, I
> think that would be better, so you're not relying on a manual
> netmem_is_net_iov check.
> 
> The way you'd do that is you'd pass skb_frag_netmem(frag) to
> mlx5e_dma_push, and then replace the `dma->addr = addr` with
> netmem_dma_unmap_addr_set. But up to you.
>
Thanks for the suggestion. This would require some additional
refactoring. I need to play with this to see if it requires a
lot of rewiring or not.

> If you decide to do a net_iov check and dma_addr = 0, add a comment please.
> 
Ack.

Thanks,
Dragos
Re: [PATCH net-next v6 12/12] net/mlx5e: Add TX support for netmems
Posted by Dragos Tatulea 3 months ago
> > If you can find a way to do this via netmem_dma_unmap_addr_set, I
> > think that would be better, so you're not relying on a manual
> > netmem_is_net_iov check.
> > 
> > The way you'd do that is you'd pass skb_frag_netmem(frag) to
> > mlx5e_dma_push, and then replace the `dma->addr = addr` with
> > netmem_dma_unmap_addr_set. But up to you.
> >
> Thanks for the suggestion. This would require some additional
> refactoring. I need to play with this to see if it requires a
> lot of rewiring or not.
>
Got around to this. Found a way to use netmem_dma_unmap_addr_set()
with a small refactoring that makes sense. We'll send a patch soon.

Thanks,
Dragos
Re: [PATCH net-next v6 12/12] net/mlx5e: Add TX support for netmems
Posted by Mina Almasry 3 months, 1 week ago
On Thu, Jun 19, 2025 at 9:07 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > You have to test ncdevmem tx on a platform with iommu enabled. Only in
> > this case the netmem_dma_unmap_page_attrs() may cause a problem, and
> > even then it's not a sure thing. It depends on the type of iommu and
> > type of dmabuf i think.
> >
> Is it worth adding a WARN_ON_ONCE(netmem_is_net_iov())
> in netmem_dma_unmap_page_attrs() after addr check to catch these kinds
> of misuse?
>

I would say it's worth it, but it's the same challenge you point to in
your reply: netmem_dma_unmap_page_attrs currently doesn't take in a
netmem, and it may be a big refactor not worth it if it's callers also
don't have a reference to the netmem readily available to pass it.

-- 
Thanks,
Mina