[RFC net-next v3 4/7] net/mlx5e: add op for getting netdev DMA device

Dragos Tatulea posted 7 patches 1 month, 2 weeks ago
There is a newer version of this series
[RFC net-next v3 4/7] net/mlx5e: add op for getting netdev DMA device
Posted by Dragos Tatulea 1 month, 2 weeks ago
For zero-copy (devmem, io_uring), the netdev DMA device used
is the parent device of the net device. However that is not
always accurate for mlx5 devices:
- SFs: The parent device is an auxdev.
- Multi-PF netdevs: The DMA device should be determined by
  the queue.

This change implements the DMA device queue API that returns the DMA
device appropriately for all cases.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 24 +++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 21bb88c5d3dc..0e48065a46eb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -5625,12 +5625,36 @@ static int mlx5e_queue_start(struct net_device *dev, void *newq,
 	return 0;
 }
 
+static struct device *mlx5e_queue_get_dma_dev(struct net_device *dev,
+					      int queue_index)
+{
+	struct mlx5e_priv *priv = netdev_priv(dev);
+	struct mlx5e_channels *channels;
+	struct device *pdev = NULL;
+	struct mlx5e_channel *ch;
+
+	channels = &priv->channels;
+
+	mutex_lock(&priv->state_lock);
+
+	if (queue_index >= channels->num)
+		goto out;
+
+	ch = channels->c[queue_index];
+	pdev = ch->pdev;
+out:
+	mutex_unlock(&priv->state_lock);
+
+	return pdev;
+}
+
 static const struct netdev_queue_mgmt_ops mlx5e_queue_mgmt_ops = {
 	.ndo_queue_mem_size	=	sizeof(struct mlx5_qmgmt_data),
 	.ndo_queue_mem_alloc	=	mlx5e_queue_mem_alloc,
 	.ndo_queue_mem_free	=	mlx5e_queue_mem_free,
 	.ndo_queue_start	=	mlx5e_queue_start,
 	.ndo_queue_stop		=	mlx5e_queue_stop,
+	.ndo_queue_get_dma_dev	=	mlx5e_queue_get_dma_dev,
 };
 
 static void mlx5e_build_nic_netdev(struct net_device *netdev)
-- 
2.50.1
Re: [RFC net-next v3 4/7] net/mlx5e: add op for getting netdev DMA device
Posted by Mina Almasry 1 month, 2 weeks ago
On Fri, Aug 15, 2025 at 4:07 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> For zero-copy (devmem, io_uring), the netdev DMA device used
> is the parent device of the net device. However that is not
> always accurate for mlx5 devices:
> - SFs: The parent device is an auxdev.
> - Multi-PF netdevs: The DMA device should be determined by
>   the queue.
>
> This change implements the DMA device queue API that returns the DMA
> device appropriately for all cases.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> ---
>  .../net/ethernet/mellanox/mlx5/core/en_main.c | 24 +++++++++++++++++++
>  1 file changed, 24 insertions(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 21bb88c5d3dc..0e48065a46eb 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -5625,12 +5625,36 @@ static int mlx5e_queue_start(struct net_device *dev, void *newq,
>         return 0;
>  }
>
> +static struct device *mlx5e_queue_get_dma_dev(struct net_device *dev,
> +                                             int queue_index)
> +{
> +       struct mlx5e_priv *priv = netdev_priv(dev);
> +       struct mlx5e_channels *channels;
> +       struct device *pdev = NULL;
> +       struct mlx5e_channel *ch;
> +
> +       channels = &priv->channels;
> +
> +       mutex_lock(&priv->state_lock);
> +
> +       if (queue_index >= channels->num)
> +               goto out;
> +
> +       ch = channels->c[queue_index];
> +       pdev = ch->pdev;

This code assumes priv is initialized, and probably that the device is
up/running/registered. At first I thought that was fine, but now that
I look at the code more closely, netdev_nl_bind_rx_doit checks if the
device is present but doesn't seem to check that the device is
registered.

I wonder if we should have a generic check in netdev_nl_bind_rx_doit
for NETDEV_REGISTERED, and if not, does this code handle unregistered
netdev correctly (like netdev_priv and priv->channels are valid even
for unregistered mlx5 devices)?


-- 
Thanks,
Mina
Re: [RFC net-next v3 4/7] net/mlx5e: add op for getting netdev DMA device
Posted by Dragos Tatulea 1 month, 2 weeks ago
On Fri, Aug 15, 2025 at 10:37:15AM -0700, Mina Almasry wrote:
> On Fri, Aug 15, 2025 at 4:07 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> >
> > For zero-copy (devmem, io_uring), the netdev DMA device used
> > is the parent device of the net device. However that is not
> > always accurate for mlx5 devices:
> > - SFs: The parent device is an auxdev.
> > - Multi-PF netdevs: The DMA device should be determined by
> >   the queue.
> >
> > This change implements the DMA device queue API that returns the DMA
> > device appropriately for all cases.
> >
> > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> > ---
> >  .../net/ethernet/mellanox/mlx5/core/en_main.c | 24 +++++++++++++++++++
> >  1 file changed, 24 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > index 21bb88c5d3dc..0e48065a46eb 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > @@ -5625,12 +5625,36 @@ static int mlx5e_queue_start(struct net_device *dev, void *newq,
> >         return 0;
> >  }
> >
> > +static struct device *mlx5e_queue_get_dma_dev(struct net_device *dev,
> > +                                             int queue_index)
> > +{
> > +       struct mlx5e_priv *priv = netdev_priv(dev);
> > +       struct mlx5e_channels *channels;
> > +       struct device *pdev = NULL;
> > +       struct mlx5e_channel *ch;
> > +
> > +       channels = &priv->channels;
> > +
> > +       mutex_lock(&priv->state_lock);
> > +
> > +       if (queue_index >= channels->num)
> > +               goto out;
> > +
> > +       ch = channels->c[queue_index];
> > +       pdev = ch->pdev;
> 
> This code assumes priv is initialized, and probably that the device is
> up/running/registered. At first I thought that was fine, but now that
> I look at the code more closely, netdev_nl_bind_rx_doit checks if the
> device is present but doesn't seem to check that the device is
> registered.
> 
> I wonder if we should have a generic check in netdev_nl_bind_rx_doit
> for NETDEV_REGISTERED, and if not, does this code handle unregistered
> netdev correctly (like netdev_priv and priv->channels are valid even
> for unregistered mlx5 devices)?
>
netdev_get_by_index_lock() returns non-NULL only when the device is in
state NETDEV_REGISTERED or NETREG_UNINITIALIZED. So I think  that this
check should suffice.

Thanks,
Dragos
Re: [RFC net-next v3 4/7] net/mlx5e: add op for getting netdev DMA device
Posted by Mina Almasry 1 month, 2 weeks ago
On Mon, Aug 18, 2025 at 10:40 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> On Fri, Aug 15, 2025 at 10:37:15AM -0700, Mina Almasry wrote:
> > On Fri, Aug 15, 2025 at 4:07 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > >
> > > For zero-copy (devmem, io_uring), the netdev DMA device used
> > > is the parent device of the net device. However that is not
> > > always accurate for mlx5 devices:
> > > - SFs: The parent device is an auxdev.
> > > - Multi-PF netdevs: The DMA device should be determined by
> > >   the queue.
> > >
> > > This change implements the DMA device queue API that returns the DMA
> > > device appropriately for all cases.
> > >
> > > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> > > ---
> > >  .../net/ethernet/mellanox/mlx5/core/en_main.c | 24 +++++++++++++++++++
> > >  1 file changed, 24 insertions(+)
> > >
> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > > index 21bb88c5d3dc..0e48065a46eb 100644
> > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > > @@ -5625,12 +5625,36 @@ static int mlx5e_queue_start(struct net_device *dev, void *newq,
> > >         return 0;
> > >  }
> > >
> > > +static struct device *mlx5e_queue_get_dma_dev(struct net_device *dev,
> > > +                                             int queue_index)
> > > +{
> > > +       struct mlx5e_priv *priv = netdev_priv(dev);
> > > +       struct mlx5e_channels *channels;
> > > +       struct device *pdev = NULL;
> > > +       struct mlx5e_channel *ch;
> > > +
> > > +       channels = &priv->channels;
> > > +
> > > +       mutex_lock(&priv->state_lock);
> > > +
> > > +       if (queue_index >= channels->num)
> > > +               goto out;
> > > +
> > > +       ch = channels->c[queue_index];
> > > +       pdev = ch->pdev;
> >
> > This code assumes priv is initialized, and probably that the device is
> > up/running/registered. At first I thought that was fine, but now that
> > I look at the code more closely, netdev_nl_bind_rx_doit checks if the
> > device is present but doesn't seem to check that the device is
> > registered.
> >
> > I wonder if we should have a generic check in netdev_nl_bind_rx_doit
> > for NETDEV_REGISTERED, and if not, does this code handle unregistered
> > netdev correctly (like netdev_priv and priv->channels are valid even
> > for unregistered mlx5 devices)?
> >
> netdev_get_by_index_lock() returns non-NULL only when the device is in
> state NETDEV_REGISTERED or NETREG_UNINITIALIZED. So I think  that this
> check should suffice.
>

Ack, thanks for checking. For the patch:

Reviewed-by: Mina Almasry <almasrymina@google.com>



-- 
Thanks,
Mina