For zerocopy (io_uring, devmem), there is an assumption that the
parent device can do DMA. However that is not always the case:
- Scalable Function netdevs [1] have the DMA device in the grandparent.
- For Multi-PF netdevs [2] queues can be associated to different DMA
devices.
This patch introduces the a queue based interface for allowing drivers
to expose a different DMA device for zerocopy.
[1] Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst
[2] Documentation/networking/multi-pf-netdev.rst
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
---
include/net/netdev_queues.h | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
index 6e835972abd1..d4d8c42b809f 100644
--- a/include/net/netdev_queues.h
+++ b/include/net/netdev_queues.h
@@ -127,6 +127,10 @@ void netdev_stat_queue_sum(struct net_device *netdev,
* @ndo_queue_stop: Stop the RX queue at the specified index. The stopped
* queue's memory is written at the specified address.
*
+ * @ndo_queue_get_dma_dev: Get dma device for zero-copy operations to be used
+ * for this queue. When such device is not available,
+ * the function will return NULL.
+ *
* Note that @ndo_queue_mem_alloc and @ndo_queue_mem_free may be called while
* the interface is closed. @ndo_queue_start and @ndo_queue_stop will only
* be called for an interface which is open.
@@ -144,6 +148,8 @@ struct netdev_queue_mgmt_ops {
int (*ndo_queue_stop)(struct net_device *dev,
void *per_queue_mem,
int idx);
+ struct device * (*ndo_queue_get_dma_dev)(struct net_device *dev,
+ int idx);
};
/**
@@ -321,4 +327,18 @@ static inline void netif_subqueue_sent(const struct net_device *dev,
get_desc, start_thrs); \
})
+static inline struct device *
+netdev_queue_get_dma_dev(struct net_device *dev, int idx)
+{
+ const struct netdev_queue_mgmt_ops *queue_ops = dev->queue_mgmt_ops;
+ struct device *dma_dev;
+
+ if (queue_ops && queue_ops->ndo_queue_get_dma_dev)
+ dma_dev = queue_ops->ndo_queue_get_dma_dev(dev, idx);
+ else
+ dma_dev = dev->dev.parent;
+
+ return dma_dev && dma_dev->dma_mask ? dma_dev : NULL;
+}
+
#endif
--
2.50.1
On 8/15/25 12:03, Dragos Tatulea wrote: > For zerocopy (io_uring, devmem), there is an assumption that the > parent device can do DMA. However that is not always the case: > - Scalable Function netdevs [1] have the DMA device in the grandparent. > - For Multi-PF netdevs [2] queues can be associated to different DMA > devices. > > This patch introduces the a queue based interface for allowing drivers > to expose a different DMA device for zerocopy. > > [1] Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst > [2] Documentation/networking/multi-pf-netdev.rst > > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> > --- > include/net/netdev_queues.h | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) Assuming it'll be moved and uninlined Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> -- Pavel Begunkov
On Fri, Aug 15, 2025 at 4:07 AM Dragos Tatulea <dtatulea@nvidia.com> wrote: > > For zerocopy (io_uring, devmem), there is an assumption that the > parent device can do DMA. However that is not always the case: > - Scalable Function netdevs [1] have the DMA device in the grandparent. > - For Multi-PF netdevs [2] queues can be associated to different DMA > devices. > > This patch introduces the a queue based interface for allowing drivers > to expose a different DMA device for zerocopy. > > [1] Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst > [2] Documentation/networking/multi-pf-netdev.rst > > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> > --- > include/net/netdev_queues.h | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h > index 6e835972abd1..d4d8c42b809f 100644 > --- a/include/net/netdev_queues.h > +++ b/include/net/netdev_queues.h > @@ -127,6 +127,10 @@ void netdev_stat_queue_sum(struct net_device *netdev, > * @ndo_queue_stop: Stop the RX queue at the specified index. The stopped > * queue's memory is written at the specified address. > * > + * @ndo_queue_get_dma_dev: Get dma device for zero-copy operations to be used I'm wondering a bit why this dma-dev issue exists for memory providers but not for the dma-dev used by the page_pool itself. I'm guessing because the pp uses the dev in page_pool_params->dev while I implemented the memory provider stuff to completely ignore pp_params->dev and use its own device (sorry...). We may want to extend your work so that the pp also ignores pp_params.dev and uses the device returned by the queue API if it's provided by the driver. But there is no upside to doing things this way except for some consistency, so I think I'm complicating things for no reason. I think this looks good to me. With the helper moved to a .c file as Jakub requested I can Reviewed-by. -- Thanks, Mina
On Fri, Aug 15, 2025 at 11:11:01AM -0700, Mina Almasry wrote: > On Fri, Aug 15, 2025 at 4:07 AM Dragos Tatulea <dtatulea@nvidia.com> wrote: > > > > For zerocopy (io_uring, devmem), there is an assumption that the > > parent device can do DMA. However that is not always the case: > > - Scalable Function netdevs [1] have the DMA device in the grandparent. > > - For Multi-PF netdevs [2] queues can be associated to different DMA > > devices. > > > > This patch introduces the a queue based interface for allowing drivers > > to expose a different DMA device for zerocopy. > > > > [1] Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst > > [2] Documentation/networking/multi-pf-netdev.rst > > > > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> > > --- > > include/net/netdev_queues.h | 20 ++++++++++++++++++++ > > 1 file changed, 20 insertions(+) > > > > diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h > > index 6e835972abd1..d4d8c42b809f 100644 > > --- a/include/net/netdev_queues.h > > +++ b/include/net/netdev_queues.h > > @@ -127,6 +127,10 @@ void netdev_stat_queue_sum(struct net_device *netdev, > > * @ndo_queue_stop: Stop the RX queue at the specified index. The stopped > > * queue's memory is written at the specified address. > > * > > + * @ndo_queue_get_dma_dev: Get dma device for zero-copy operations to be used > > I'm wondering a bit why this dma-dev issue exists for memory providers > but not for the dma-dev used by the page_pool itself. > > I'm guessing because the pp uses the dev in page_pool_params->dev > while I implemented the memory provider stuff to completely ignore > pp_params->dev and use its own device (sorry...). > AFAIU rightly so. The page_pool is created at a later stage. > We may want to extend your work so that the pp also ignores > pp_params.dev and uses the device returned by the queue API if it's > provided by the driver. But there is no upside to doing things this > way except for some consistency, so I think I'm complicating things > for no reason. > The drivers are setting the pp_params.dev though, so no need for a queue API call. Or maybe I misunderstood your point? > I think this looks good to me. With the helper moved to a .c file as > Jakub requested I can Reviewed-by. > Will do. Thanks, Dragos
On Fri, 15 Aug 2025 14:03:42 +0300 Dragos Tatulea wrote: > +static inline struct device * > +netdev_queue_get_dma_dev(struct net_device *dev, int idx) > +{ > + const struct netdev_queue_mgmt_ops *queue_ops = dev->queue_mgmt_ops; > + struct device *dma_dev; > + > + if (queue_ops && queue_ops->ndo_queue_get_dma_dev) > + dma_dev = queue_ops->ndo_queue_get_dma_dev(dev, idx); > + else > + dma_dev = dev->dev.parent; > + > + return dma_dev && dma_dev->dma_mask ? dma_dev : NULL; > +} This really does not have to live in the header file.
On Fri, Aug 15, 2025 at 10:16:27AM -0700, Jakub Kicinski wrote: > On Fri, 15 Aug 2025 14:03:42 +0300 Dragos Tatulea wrote: > > +static inline struct device * > > +netdev_queue_get_dma_dev(struct net_device *dev, int idx) > > +{ > > + const struct netdev_queue_mgmt_ops *queue_ops = dev->queue_mgmt_ops; > > + struct device *dma_dev; > > + > > + if (queue_ops && queue_ops->ndo_queue_get_dma_dev) > > + dma_dev = queue_ops->ndo_queue_get_dma_dev(dev, idx); > > + else > > + dma_dev = dev->dev.parent; > > + > > + return dma_dev && dma_dev->dma_mask ? dma_dev : NULL; > > +} > > This really does not have to live in the header file. Alright, but where? It somewhat fits in the existing net/core/dev.c or net/core/netdev_rx_queue.c. But neither is great. Maybe it is time to create a net/core/netdev_queues.c? Thanks, Dragos
On Tue, 19 Aug 2025 14:41:08 +0000 Dragos Tatulea wrote: > On Fri, Aug 15, 2025 at 10:16:27AM -0700, Jakub Kicinski wrote: > > On Fri, 15 Aug 2025 14:03:42 +0300 Dragos Tatulea wrote: > > > +static inline struct device * > > > +netdev_queue_get_dma_dev(struct net_device *dev, int idx) > > > +{ > > > + const struct netdev_queue_mgmt_ops *queue_ops = dev->queue_mgmt_ops; > > > + struct device *dma_dev; > > > + > > > + if (queue_ops && queue_ops->ndo_queue_get_dma_dev) > > > + dma_dev = queue_ops->ndo_queue_get_dma_dev(dev, idx); > > > + else > > > + dma_dev = dev->dev.parent; > > > + > > > + return dma_dev && dma_dev->dma_mask ? dma_dev : NULL; > > > +} > > > > This really does not have to live in the header file. > Alright, but where? It somewhat fits in the existing net/core/dev.c or > net/core/netdev_rx_queue.c. But neither is great. > > Maybe it is time to create a net/core/netdev_queues.c? Sure, net/core/netdev_queues.c SGTM
© 2016 - 2025 Red Hat, Inc.