From nobody Mon Apr 6 10:44:18 2026 Received: from mail-dy1-f181.google.com (mail-dy1-f181.google.com [74.125.82.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D826C2D47E6 for ; Fri, 20 Mar 2026 01:25:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773969910; cv=none; b=sxIzT+z/7NJTjwkvMNP3CB8iKK6ndEPub9l8oXSE6qoUY6/qTai+jQIiapCOylExepxAbQpgY71urDVbNfQdlAzfVnI01DxuLWWsPbdKxl99vchdUZhSermPYI6AJN9wM3n0HxBqeqtVjbg/NNES6ogwwkxxGk3xkNhHw2dOzlo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773969910; c=relaxed/simple; bh=XclYFxLz8YWyIYAfWaCWDvbF9YZKt7pn6pNrTl1z7TY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fF/SqR5hQR9B2EkFDMVy7bxzh1i6m2Z+pEqsB3vSZHw6kaNKZGgayJONZZG266HFGCaaZbsmeVncInglGDHfdyVzXMx5dSMewaumfTrUeFBfIp+7Grrl2q9V6BgPtE4YdVq94MTl23qqwc9637wBLfp6c4cHZjRqDv7vBNs9PbU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=fomichev.me; spf=pass smtp.mailfrom=gmail.com; arc=none smtp.client-ip=74.125.82.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=fomichev.me Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-dy1-f181.google.com with SMTP id 5a478bee46e88-2b4520f6b32so2342839eec.0 for ; Thu, 19 Mar 2026 18:25:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773969907; x=1774574707; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=0WCep2sMd+XcNQUXoONoloTYC004nnvk8vp16Ef+qqc=; b=dyAZklrM5H95SN/RCYwKAlCT24Z8KRaZGx6iyOLjWZELZrP60LabG4dsLz8yiBto+n mtg55Cq9qa/liqY7afFli5962U0Zkvd0zOHIvUkUwuUcyYCUs6DyJOiABM7gCxqop4+L 37hEwuscJiQCIC94MS4ZXSFPwtbxjmkLbo5F9QAewe9nTPlACVKrMctggCQgKgplyTAR zdSSLwuumiUWaqX4oq2FDciDemuZvuN8QarNCeSMn+t1qXj7XVnWCe1ePqOUUhHCNW75 ZN7HUFiQxVEirY1xOCAfi0kFXBVTmc15z1e6uo4JaM3I+fI/Vx74LKBYM2p9vGVsL6I5 lnxw== X-Forwarded-Encrypted: i=1; AJvYcCWz+JnzG+WdK4zMeSxpHDFuy2/uDLgUn2mpJ5C4SQsSN9rnIkseHaD+5/bdnhgERVDFIFPYF2B+/UvuQ2E=@vger.kernel.org X-Gm-Message-State: AOJu0YyiTIQhvPpm8hTVr+bK9vgmi0YIHUGsXjrq+7Gh9JxK7ZfKVmFb pT9dCUcQA9fS441G2XAE9+RnzdSkugCNQhZaNU2bWknkU2JP/mlOClQ= X-Gm-Gg: ATEYQzyrhM8FVh3ZMDYKastUYQ7V5UHeisA+Hsdvv4aTmLPYapbf88AG80HeTHF+gX2 DZbcguW8P0XWH8Pm9YbHYbrU8yyDe0vmNDbsksna02j1JwglkgkVt9kpKRyDz3yA0MAc5uWIJ3K 04/PlC2QlshztZjDILgVHhOgiVRtb3jfZXKKghS0Td3VuO0S0QiGV0ATje1VTP4VsUkUGPsepuK 5Y9ddIHSnQnIcGFHsGoDjOUyui9L287RGJH1xp63eBc8mnGcMTpiC5dLj1MFNwMU49Qbw3WVS9y VELJ3C0J3nKBFq9N6PMcJCUPN1U31t6jDtNL26f0GQd9QqeTUHCtYlN9Ts2r2FyE2QYrAn5izHZ 6GwxieoJaGNqeSYKdc1J6z7f3++GClKb80g3hZkCQnZy54BFd+lzgSNiebFg7Yjmfc/9amd16qV HaOx9AAz5fFjJ4qcdoL1AMSgY157dsa03IyDlDgEuF7Hs0InHRaMhfPcl1KttsNiBc+78pW3Vnu AAKgdYTybmYddA4/g== X-Received: by 2002:a05:7300:e208:b0:2b0:52cc:fe69 with SMTP id 5a478bee46e88-2c109567323mr694475eec.5.1773969906707; Thu, 19 Mar 2026 18:25:06 -0700 (PDT) Received: from localhost (c-76-102-12-149.hsd1.ca.comcast.net. [76.102.12.149]) by smtp.gmail.com with ESMTPSA id 5a478bee46e88-2c10b31ebd5sm1142605eec.27.2026.03.19.18.25.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Mar 2026 18:25:06 -0700 (PDT) From: Stanislav Fomichev To: netdev@vger.kernel.org Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, horms@kernel.org, corbet@lwn.net, skhan@linuxfoundation.org, andrew+netdev@lunn.ch, michael.chan@broadcom.com, pavan.chebbi@broadcom.com, anthony.l.nguyen@intel.com, przemyslaw.kitszel@intel.com, saeedm@nvidia.com, tariqt@nvidia.com, mbloch@nvidia.com, alexanderduyck@fb.com, kernel-team@meta.com, johannes@sipsolutions.net, sd@queasysnail.net, jianbol@nvidia.com, dtatulea@nvidia.com, sdf@fomichev.me, mohsin.bashr@gmail.com, jacob.e.keller@intel.com, willemb@google.com, skhawaja@google.com, bestswngs@gmail.com, aleksandr.loktionov@intel.com, kees@kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, intel-wired-lan@lists.osuosl.org, linux-rdma@vger.kernel.org, linux-wireless@vger.kernel.org, linux-kselftest@vger.kernel.org, leon@kernel.org Subject: [PATCH net-next v3 03/13] net: introduce ndo_set_rx_mode_async and dev_rx_mode_work Date: Thu, 19 Mar 2026 18:24:51 -0700 Message-ID: <20260320012501.2033548-4-sdf@fomichev.me> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260320012501.2033548-1-sdf@fomichev.me> References: <20260320012501.2033548-1-sdf@fomichev.me> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add ndo_set_rx_mode_async callback that drivers can implement instead of the legacy ndo_set_rx_mode. The legacy callback runs under the netif_addr_lock spinlock with BHs disabled, preventing drivers from sleeping. The async variant runs from a work queue with rtnl_lock and netdev_lock_ops held, in fully sleepable context. When __dev_set_rx_mode() sees ndo_set_rx_mode_async, it schedules dev_rx_mode_work instead of calling the driver inline. The work function takes two snapshots of each address list (uc/mc) under the addr_lock, then drops the lock and calls the driver with the work copies. After the driver returns, it reconciles the snapshots back to the real lists under the lock. Reviewed-by: Aleksandr Loktionov Signed-off-by: Stanislav Fomichev --- Documentation/networking/netdevices.rst | 8 +++ include/linux/netdevice.h | 20 ++++++ net/core/dev.c | 95 +++++++++++++++++++++++-- 3 files changed, 116 insertions(+), 7 deletions(-) diff --git a/Documentation/networking/netdevices.rst b/Documentation/networ= king/netdevices.rst index 35704d115312..dc83d78d3b27 100644 --- a/Documentation/networking/netdevices.rst +++ b/Documentation/networking/netdevices.rst @@ -289,6 +289,14 @@ struct net_device synchronization rules ndo_set_rx_mode: Synchronization: netif_addr_lock spinlock. Context: BHs disabled + Notes: Deprecated in favor of sleepable ndo_set_rx_mode_async. + +ndo_set_rx_mode_async: + Synchronization: rtnl_lock() semaphore. In addition, netdev instance + lock if the driver implements queue management or shaper API. + Context: process (from a work queue) + Notes: Sleepable version of ndo_set_rx_mode. Receives snapshots + of the unicast and multicast address lists. =20 ndo_setup_tc: ``TC_SETUP_BLOCK`` and ``TC_SETUP_FT`` are running under NFT locks diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 469b7cdb3237..b05bdd67b807 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1117,6 +1117,16 @@ struct netdev_net_notifier { * This function is called device changes address list filtering. * If driver handles unicast address filtering, it should set * IFF_UNICAST_FLT in its priv_flags. + * Cannot sleep, called with netif_addr_lock_bh held. + * Deprecated in favor of sleepable ndo_set_rx_mode_async. + * + * void (*ndo_set_rx_mode_async)(struct net_device *dev, + * struct netdev_hw_addr_list *uc, + * struct netdev_hw_addr_list *mc); + * Sleepable version of ndo_set_rx_mode. Called from a work queue + * with rtnl_lock and netdev_lock_ops(dev) held. The uc/mc parameters + * are snapshots of the address lists - iterate with + * netdev_hw_addr_list_for_each(ha, uc). * * int (*ndo_set_mac_address)(struct net_device *dev, void *addr); * This function is called when the Media Access Control address @@ -1437,6 +1447,9 @@ struct net_device_ops { void (*ndo_change_rx_flags)(struct net_device *dev, int flags); void (*ndo_set_rx_mode)(struct net_device *dev); + void (*ndo_set_rx_mode_async)(struct net_device *dev, + struct netdev_hw_addr_list *uc, + struct netdev_hw_addr_list *mc); int (*ndo_set_mac_address)(struct net_device *dev, void *addr); int (*ndo_validate_addr)(struct net_device *dev); @@ -1903,6 +1916,7 @@ enum netdev_reg_state { * has been enabled due to the need to listen to * additional unicast addresses in a device that * does not implement ndo_set_rx_mode() + * @rx_mode_work: Work queue entry for ndo_set_rx_mode_async() * @uc: unicast mac addresses * @mc: multicast mac addresses * @dev_addrs: list of device hw addresses @@ -2293,6 +2307,7 @@ struct net_device { unsigned int promiscuity; unsigned int allmulti; bool uc_promisc; + struct work_struct rx_mode_work; #ifdef CONFIG_LOCKDEP unsigned char nested_level; #endif @@ -4661,6 +4676,11 @@ static inline bool netif_device_present(const struct= net_device *dev) return test_bit(__LINK_STATE_PRESENT, &dev->state); } =20 +static inline bool netif_up_and_present(const struct net_device *dev) +{ + return (dev->flags & IFF_UP) && netif_device_present(dev); +} + void netif_device_detach(struct net_device *dev); =20 void netif_device_attach(struct net_device *dev); diff --git a/net/core/dev.c b/net/core/dev.c index 200d44883fc1..fedc423306fc 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2381,6 +2381,8 @@ static void netstamp_clear(struct work_struct *work) static DECLARE_WORK(netstamp_work, netstamp_clear); #endif =20 +static struct workqueue_struct *rx_mode_wq; + void net_enable_timestamp(void) { #ifdef CONFIG_JUMP_LABEL @@ -9669,22 +9671,84 @@ int netif_set_allmulti(struct net_device *dev, int = inc, bool notify) return 0; } =20 -/* - * Upload unicast and multicast address lists to device and - * configure RX filtering. When the device doesn't support unicast - * filtering it is put in promiscuous mode while unicast addresses - * are present. +static void dev_rx_mode_work(struct work_struct *work) +{ + struct net_device *dev =3D container_of(work, struct net_device, + rx_mode_work); + struct netdev_hw_addr_list uc_snap, mc_snap, uc_ref, mc_ref; + const struct net_device_ops *ops =3D dev->netdev_ops; + int err; + + __hw_addr_init(&uc_snap); + __hw_addr_init(&mc_snap); + __hw_addr_init(&uc_ref); + __hw_addr_init(&mc_ref); + + rtnl_lock(); + netdev_lock_ops(dev); + + if (!netif_up_and_present(dev)) + goto out; + + if (ops->ndo_set_rx_mode_async) { + netif_addr_lock_bh(dev); + + err =3D __hw_addr_list_snapshot(&uc_snap, &dev->uc, + dev->addr_len); + if (!err) + err =3D __hw_addr_list_snapshot(&uc_ref, &dev->uc, + dev->addr_len); + if (!err) + err =3D __hw_addr_list_snapshot(&mc_snap, &dev->mc, + dev->addr_len); + if (!err) + err =3D __hw_addr_list_snapshot(&mc_ref, &dev->mc, + dev->addr_len); + netif_addr_unlock_bh(dev); + + if (err) { + netdev_WARN(dev, "failed to sync uc/mc addresses\n"); + __hw_addr_flush(&uc_snap); + __hw_addr_flush(&uc_ref); + __hw_addr_flush(&mc_snap); + goto out; + } + + ops->ndo_set_rx_mode_async(dev, &uc_snap, &mc_snap); + + netif_addr_lock_bh(dev); + __hw_addr_list_reconcile(&dev->uc, &uc_snap, + &uc_ref, dev->addr_len); + __hw_addr_list_reconcile(&dev->mc, &mc_snap, + &mc_ref, dev->addr_len); + netif_addr_unlock_bh(dev); + } + +out: + netdev_unlock_ops(dev); + rtnl_unlock(); +} + +/** + * __dev_set_rx_mode() - upload unicast and multicast address lists to dev= ice + * and configure RX filtering. + * @dev: device + * + * When the device doesn't support unicast filtering it is put in promiscu= ous + * mode while unicast addresses are present. */ void __dev_set_rx_mode(struct net_device *dev) { const struct net_device_ops *ops =3D dev->netdev_ops; =20 /* dev_open will call this function so the list will stay sane. */ - if (!(dev->flags&IFF_UP)) + if (!netif_up_and_present(dev)) return; =20 - if (!netif_device_present(dev)) + if (ops->ndo_set_rx_mode_async) { + queue_work(rx_mode_wq, &dev->rx_mode_work); return; + } =20 if (!(dev->priv_flags & IFF_UNICAST_FLT)) { /* Unicast addresses changes may only happen under the rtnl, @@ -11708,6 +11772,16 @@ void netdev_run_todo(void) =20 __rtnl_unlock(); =20 + /* Make sure all pending rx_mode work completes before returning. + * + * rx_mode_wq may be NULL during early boot: + * core_initcall(netlink_proto_init) vs subsys_initcall(net_dev_init). + * + * Check current_work() to avoid flushing from the wq. + */ + if (rx_mode_wq && !current_work()) + flush_workqueue(rx_mode_wq); + /* Wait for rcu callbacks to finish before next phase */ if (!list_empty(&list)) rcu_barrier(); @@ -12099,6 +12173,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv= , const char *name, #endif =20 mutex_init(&dev->lock); + INIT_WORK(&dev->rx_mode_work, dev_rx_mode_work); =20 dev->priv_flags =3D IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM; setup(dev); @@ -12203,6 +12278,8 @@ void free_netdev(struct net_device *dev) =20 kfree(rcu_dereference_protected(dev->ingress_queue, 1)); =20 + cancel_work_sync(&dev->rx_mode_work); + /* Flush device addresses */ dev_addr_flush(dev); =20 @@ -13296,6 +13373,10 @@ static int __init net_dev_init(void) if (register_pernet_device(&default_device_ops)) goto out; =20 + rx_mode_wq =3D alloc_ordered_workqueue("rx_mode_wq", 0); + if (!rx_mode_wq) + goto out; + open_softirq(NET_TX_SOFTIRQ, net_tx_action); open_softirq(NET_RX_SOFTIRQ, net_rx_action); =20 --=20 2.53.0