[RFC net-next v3 1/4] net: protect queue -> napi linking with netdev_lock()

Joe Damato posted 4 patches 10 months, 3 weeks ago
[RFC net-next v3 1/4] net: protect queue -> napi linking with netdev_lock()
Posted by Joe Damato 10 months, 3 weeks ago
From: Jakub Kicinski <kuba@kernel.org>

netdev netlink is the only reader of netdev_{,rx_}queue->napi,
and it already holds netdev->lock. Switch protection of the
writes to netdev->lock as well.

Add netif_queue_set_napi_locked() for API completeness,
but the expectation is that most current drivers won't have
to worry about locking any more. Today they jump thru hoops
to take rtnl_lock.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Joe Damato <jdamato@fastly.com>
---
 v2:
   - Added in v2 from Jakub.

 include/linux/netdevice.h     |  9 +++++++--
 include/net/netdev_rx_queue.h |  2 +-
 net/core/dev.c                | 16 +++++++++++++---
 3 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8da4c61f97b9..4709d16bada5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -691,7 +691,7 @@ struct netdev_queue {
  * slow- / control-path part
  */
 	/* NAPI instance for the queue
-	 * Readers and writers must hold RTNL
+	 * Readers and writers must hold netdev->lock
 	 */
 	struct napi_struct	*napi;
 
@@ -2467,7 +2467,8 @@ struct net_device {
 	 * Partially protects (writers must hold both @lock and rtnl_lock):
 	 *	@up
 	 *
-	 * Also protects some fields in struct napi_struct.
+	 * Also protects some fields in:
+	 *	struct napi_struct, struct netdev_queue, struct netdev_rx_queue
 	 *
 	 * Ordering: take after rtnl_lock.
 	 */
@@ -2694,6 +2695,10 @@ static inline void *netdev_priv(const struct net_device *dev)
 void netif_queue_set_napi(struct net_device *dev, unsigned int queue_index,
 			  enum netdev_queue_type type,
 			  struct napi_struct *napi);
+void netif_queue_set_napi_locked(struct net_device *dev,
+				 unsigned int queue_index,
+				 enum netdev_queue_type type,
+				 struct napi_struct *napi);
 
 static inline void netdev_lock(struct net_device *dev)
 {
diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h
index 596836abf7bf..9fcac0b43b71 100644
--- a/include/net/netdev_rx_queue.h
+++ b/include/net/netdev_rx_queue.h
@@ -23,7 +23,7 @@ struct netdev_rx_queue {
 	struct xsk_buff_pool            *pool;
 #endif
 	/* NAPI instance for the queue
-	 * Readers and writers must hold RTNL
+	 * Readers and writers must hold netdev->lock
 	 */
 	struct napi_struct		*napi;
 	struct pp_memory_provider_params mp_params;
diff --git a/net/core/dev.c b/net/core/dev.c
index afa2282f2604..ab361fd9efd9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6842,14 +6842,24 @@ EXPORT_SYMBOL(dev_set_threaded);
  */
 void netif_queue_set_napi(struct net_device *dev, unsigned int queue_index,
 			  enum netdev_queue_type type, struct napi_struct *napi)
+{
+	netdev_lock(dev);
+	netif_queue_set_napi_locked(dev, queue_index, type, napi);
+	netdev_unlock(dev);
+}
+EXPORT_SYMBOL(netif_queue_set_napi);
+
+void netif_queue_set_napi_locked(struct net_device *dev,
+				 unsigned int queue_index,
+				 enum netdev_queue_type type,
+				 struct napi_struct *napi)
 {
 	struct netdev_rx_queue *rxq;
 	struct netdev_queue *txq;
 
 	if (WARN_ON_ONCE(napi && !napi->dev))
 		return;
-	if (dev->reg_state >= NETREG_REGISTERED)
-		ASSERT_RTNL();
+	netdev_assert_locked_or_invisible(dev);
 
 	switch (type) {
 	case NETDEV_QUEUE_TYPE_RX:
@@ -6864,7 +6874,7 @@ void netif_queue_set_napi(struct net_device *dev, unsigned int queue_index,
 		return;
 	}
 }
-EXPORT_SYMBOL(netif_queue_set_napi);
+EXPORT_SYMBOL(netif_queue_set_napi_locked);
 
 static void napi_restore_config(struct napi_struct *n)
 {
-- 
2.25.1
Re: [RFC net-next v3 1/4] net: protect queue -> napi linking with netdev_lock()
Posted by Jakub Kicinski 10 months, 2 weeks ago
On Tue, 21 Jan 2025 19:10:41 +0000 Joe Damato wrote:
> From: Jakub Kicinski <kuba@kernel.org>
> 
> netdev netlink is the only reader of netdev_{,rx_}queue->napi,
> and it already holds netdev->lock. Switch protection of the
> writes to netdev->lock as well.
> 
> Add netif_queue_set_napi_locked() for API completeness,
> but the expectation is that most current drivers won't have
> to worry about locking any more. Today they jump thru hoops
> to take rtnl_lock.

I started having second thoughts about this patch, sorry to say.
NAPI objects were easy to protect with the lock because there's
a clear registration and unregistration API. Queues OTOH are made
visible by the netif_set_real_num_queues() call, which is tricky 
to protect with the instance lock. Queues are made visible, then
we configure them.

My thinking changed a bit, I think we should aim to protect all
ndos and ethtool ops with the instance lock. Stanislav and Saeed
seem to be working on that:
https://lore.kernel.org/all/Z5LhKdNMO5CvAvZf@mini-arch/
so hopefully that doesn't cause too much of a delay.
But you may need to rework this series further :(
Re: [RFC net-next v3 1/4] net: protect queue -> napi linking with netdev_lock()
Posted by Joe Damato 10 months, 2 weeks ago
On Mon, Jan 27, 2025 at 01:37:56PM -0800, Jakub Kicinski wrote:
> On Tue, 21 Jan 2025 19:10:41 +0000 Joe Damato wrote:
> > From: Jakub Kicinski <kuba@kernel.org>
> > 
> > netdev netlink is the only reader of netdev_{,rx_}queue->napi,
> > and it already holds netdev->lock. Switch protection of the
> > writes to netdev->lock as well.
> > 
> > Add netif_queue_set_napi_locked() for API completeness,
> > but the expectation is that most current drivers won't have
> > to worry about locking any more. Today they jump thru hoops
> > to take rtnl_lock.
> 
> I started having second thoughts about this patch, sorry to say.
> NAPI objects were easy to protect with the lock because there's
> a clear registration and unregistration API. Queues OTOH are made
> visible by the netif_set_real_num_queues() call, which is tricky 
> to protect with the instance lock. Queues are made visible, then
> we configure them.
> 
> My thinking changed a bit, I think we should aim to protect all
> ndos and ethtool ops with the instance lock. Stanislav and Saeed
> seem to be working on that:
> https://lore.kernel.org/all/Z5LhKdNMO5CvAvZf@mini-arch/
> so hopefully that doesn't cause too much of a delay.
> But you may need to rework this series further :(

OK, I'll wait for something to emerge from that work before
re-spinning this.