[RFC net-next v3 1/7] queue_api: add support for fetching per queue DMA dev

Dragos Tatulea posted 7 patches 1 month, 2 weeks ago
There is a newer version of this series
[RFC net-next v3 1/7] queue_api: add support for fetching per queue DMA dev
Posted by Dragos Tatulea 1 month, 2 weeks ago
For zerocopy (io_uring, devmem), there is an assumption that the
parent device can do DMA. However that is not always the case:
- Scalable Function netdevs [1] have the DMA device in the grandparent.
- For Multi-PF netdevs [2] queues can be associated to different DMA
devices.

This patch introduces the a queue based interface for allowing drivers
to expose a different DMA device for zerocopy.

[1] Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst
[2] Documentation/networking/multi-pf-netdev.rst

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
---
 include/net/netdev_queues.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
index 6e835972abd1..d4d8c42b809f 100644
--- a/include/net/netdev_queues.h
+++ b/include/net/netdev_queues.h
@@ -127,6 +127,10 @@ void netdev_stat_queue_sum(struct net_device *netdev,
  * @ndo_queue_stop:	Stop the RX queue at the specified index. The stopped
  *			queue's memory is written at the specified address.
  *
+ * @ndo_queue_get_dma_dev: Get dma device for zero-copy operations to be used
+ *			   for this queue. When such device is not available,
+ *			   the function will return NULL.
+ *
  * Note that @ndo_queue_mem_alloc and @ndo_queue_mem_free may be called while
  * the interface is closed. @ndo_queue_start and @ndo_queue_stop will only
  * be called for an interface which is open.
@@ -144,6 +148,8 @@ struct netdev_queue_mgmt_ops {
 	int			(*ndo_queue_stop)(struct net_device *dev,
 						  void *per_queue_mem,
 						  int idx);
+	struct device *		(*ndo_queue_get_dma_dev)(struct net_device *dev,
+							 int idx);
 };
 
 /**
@@ -321,4 +327,18 @@ static inline void netif_subqueue_sent(const struct net_device *dev,
 					 get_desc, start_thrs);		\
 	})
 
+static inline struct device *
+netdev_queue_get_dma_dev(struct net_device *dev, int idx)
+{
+	const struct netdev_queue_mgmt_ops *queue_ops = dev->queue_mgmt_ops;
+	struct device *dma_dev;
+
+	if (queue_ops && queue_ops->ndo_queue_get_dma_dev)
+		dma_dev = queue_ops->ndo_queue_get_dma_dev(dev, idx);
+	else
+		dma_dev = dev->dev.parent;
+
+	return dma_dev && dma_dev->dma_mask ? dma_dev : NULL;
+}
+
 #endif
-- 
2.50.1
Re: [RFC net-next v3 1/7] queue_api: add support for fetching per queue DMA dev
Posted by Pavel Begunkov 1 month, 2 weeks ago
On 8/15/25 12:03, Dragos Tatulea wrote:
> For zerocopy (io_uring, devmem), there is an assumption that the
> parent device can do DMA. However that is not always the case:
> - Scalable Function netdevs [1] have the DMA device in the grandparent.
> - For Multi-PF netdevs [2] queues can be associated to different DMA
> devices.
> 
> This patch introduces the a queue based interface for allowing drivers
> to expose a different DMA device for zerocopy.
> 
> [1] Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst
> [2] Documentation/networking/multi-pf-netdev.rst
> 
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> ---
>   include/net/netdev_queues.h | 20 ++++++++++++++++++++
>   1 file changed, 20 insertions(+)

Assuming it'll be moved and uninlined

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>

-- 
Pavel Begunkov
Re: [RFC net-next v3 1/7] queue_api: add support for fetching per queue DMA dev
Posted by Mina Almasry 1 month, 2 weeks ago
On Fri, Aug 15, 2025 at 4:07 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> For zerocopy (io_uring, devmem), there is an assumption that the
> parent device can do DMA. However that is not always the case:
> - Scalable Function netdevs [1] have the DMA device in the grandparent.
> - For Multi-PF netdevs [2] queues can be associated to different DMA
> devices.
>
> This patch introduces the a queue based interface for allowing drivers
> to expose a different DMA device for zerocopy.
>
> [1] Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst
> [2] Documentation/networking/multi-pf-netdev.rst
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> ---
>  include/net/netdev_queues.h | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
>
> diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
> index 6e835972abd1..d4d8c42b809f 100644
> --- a/include/net/netdev_queues.h
> +++ b/include/net/netdev_queues.h
> @@ -127,6 +127,10 @@ void netdev_stat_queue_sum(struct net_device *netdev,
>   * @ndo_queue_stop:    Stop the RX queue at the specified index. The stopped
>   *                     queue's memory is written at the specified address.
>   *
> + * @ndo_queue_get_dma_dev: Get dma device for zero-copy operations to be used

I'm wondering a bit why this dma-dev issue exists for memory providers
but not for the dma-dev used by the page_pool itself.

I'm guessing because the pp uses the dev in page_pool_params->dev
while I implemented the memory provider stuff to completely ignore
pp_params->dev and use its own device (sorry...).

We may want to extend your work so that the pp also ignores
pp_params.dev and uses the device returned by the queue API if it's
provided by the driver. But there is no upside to doing things this
way except for some consistency, so I think I'm complicating things
for no reason.

I think this looks good to me. With the helper moved to a .c file as
Jakub requested I can Reviewed-by.

-- 
Thanks,
Mina
Re: [RFC net-next v3 1/7] queue_api: add support for fetching per queue DMA dev
Posted by Dragos Tatulea 1 month, 2 weeks ago
On Fri, Aug 15, 2025 at 11:11:01AM -0700, Mina Almasry wrote:
> On Fri, Aug 15, 2025 at 4:07 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> >
> > For zerocopy (io_uring, devmem), there is an assumption that the
> > parent device can do DMA. However that is not always the case:
> > - Scalable Function netdevs [1] have the DMA device in the grandparent.
> > - For Multi-PF netdevs [2] queues can be associated to different DMA
> > devices.
> >
> > This patch introduces the a queue based interface for allowing drivers
> > to expose a different DMA device for zerocopy.
> >
> > [1] Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst
> > [2] Documentation/networking/multi-pf-netdev.rst
> >
> > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> > ---
> >  include/net/netdev_queues.h | 20 ++++++++++++++++++++
> >  1 file changed, 20 insertions(+)
> >
> > diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
> > index 6e835972abd1..d4d8c42b809f 100644
> > --- a/include/net/netdev_queues.h
> > +++ b/include/net/netdev_queues.h
> > @@ -127,6 +127,10 @@ void netdev_stat_queue_sum(struct net_device *netdev,
> >   * @ndo_queue_stop:    Stop the RX queue at the specified index. The stopped
> >   *                     queue's memory is written at the specified address.
> >   *
> > + * @ndo_queue_get_dma_dev: Get dma device for zero-copy operations to be used
> 
> I'm wondering a bit why this dma-dev issue exists for memory providers
> but not for the dma-dev used by the page_pool itself.
> 
> I'm guessing because the pp uses the dev in page_pool_params->dev
> while I implemented the memory provider stuff to completely ignore
> pp_params->dev and use its own device (sorry...).
>
AFAIU rightly so. The page_pool is created at a later stage.

> We may want to extend your work so that the pp also ignores
> pp_params.dev and uses the device returned by the queue API if it's
> provided by the driver. But there is no upside to doing things this
> way except for some consistency, so I think I'm complicating things
> for no reason.
>
The drivers are setting the pp_params.dev though, so no need for a queue
API call. Or maybe I misunderstood your point?

> I think this looks good to me. With the helper moved to a .c file as
> Jakub requested I can Reviewed-by.
>
Will do.

Thanks,
Dragos

Re: [RFC net-next v3 1/7] queue_api: add support for fetching per queue DMA dev
Posted by Jakub Kicinski 1 month, 2 weeks ago
On Fri, 15 Aug 2025 14:03:42 +0300 Dragos Tatulea wrote:
> +static inline struct device *
> +netdev_queue_get_dma_dev(struct net_device *dev, int idx)
> +{
> +	const struct netdev_queue_mgmt_ops *queue_ops = dev->queue_mgmt_ops;
> +	struct device *dma_dev;
> +
> +	if (queue_ops && queue_ops->ndo_queue_get_dma_dev)
> +		dma_dev = queue_ops->ndo_queue_get_dma_dev(dev, idx);
> +	else
> +		dma_dev = dev->dev.parent;
> +
> +	return dma_dev && dma_dev->dma_mask ? dma_dev : NULL;
> +}

This really does not have to live in the header file.
Re: [RFC net-next v3 1/7] queue_api: add support for fetching per queue DMA dev
Posted by Dragos Tatulea 1 month, 2 weeks ago
On Fri, Aug 15, 2025 at 10:16:27AM -0700, Jakub Kicinski wrote:
> On Fri, 15 Aug 2025 14:03:42 +0300 Dragos Tatulea wrote:
> > +static inline struct device *
> > +netdev_queue_get_dma_dev(struct net_device *dev, int idx)
> > +{
> > +	const struct netdev_queue_mgmt_ops *queue_ops = dev->queue_mgmt_ops;
> > +	struct device *dma_dev;
> > +
> > +	if (queue_ops && queue_ops->ndo_queue_get_dma_dev)
> > +		dma_dev = queue_ops->ndo_queue_get_dma_dev(dev, idx);
> > +	else
> > +		dma_dev = dev->dev.parent;
> > +
> > +	return dma_dev && dma_dev->dma_mask ? dma_dev : NULL;
> > +}
> 
> This really does not have to live in the header file.
Alright, but where? It somewhat fits in the existing net/core/dev.c or
net/core/netdev_rx_queue.c. But neither is great.

Maybe it is time to create a net/core/netdev_queues.c?

Thanks,
Dragos
Re: [RFC net-next v3 1/7] queue_api: add support for fetching per queue DMA dev
Posted by Jakub Kicinski 1 month, 2 weeks ago
On Tue, 19 Aug 2025 14:41:08 +0000 Dragos Tatulea wrote:
> On Fri, Aug 15, 2025 at 10:16:27AM -0700, Jakub Kicinski wrote:
> > On Fri, 15 Aug 2025 14:03:42 +0300 Dragos Tatulea wrote:  
> > > +static inline struct device *
> > > +netdev_queue_get_dma_dev(struct net_device *dev, int idx)
> > > +{
> > > +	const struct netdev_queue_mgmt_ops *queue_ops = dev->queue_mgmt_ops;
> > > +	struct device *dma_dev;
> > > +
> > > +	if (queue_ops && queue_ops->ndo_queue_get_dma_dev)
> > > +		dma_dev = queue_ops->ndo_queue_get_dma_dev(dev, idx);
> > > +	else
> > > +		dma_dev = dev->dev.parent;
> > > +
> > > +	return dma_dev && dma_dev->dma_mask ? dma_dev : NULL;
> > > +}  
> > 
> > This really does not have to live in the header file.  
> Alright, but where? It somewhat fits in the existing net/core/dev.c or
> net/core/netdev_rx_queue.c. But neither is great.
> 
> Maybe it is time to create a net/core/netdev_queues.c?

Sure, net/core/netdev_queues.c SGTM