net/ipv4/nexthop.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
When removing a nexthop from a group, remove_nh_grp_entry() publishes
the new group via rcu_assign_pointer() then immediately frees the
removed entry's percpu stats with free_percpu(). However, the
synchronize_net() grace period in the caller remove_nexthop_from_groups()
runs after the free. RCU readers that entered before the publish still
see the old group and can dereference the freed stats via
nh_grp_entry_stats_inc() -> get_cpu_ptr(nhge->stats), causing a
use-after-free on percpu memory.
Fix by deferring the free_percpu() until after synchronize_net() in the
caller. Removed entries are chained via nh_list onto a local deferred
free list. After the grace period completes and all RCU readers have
finished, the percpu stats are safely freed.
Fixes: f4676ea74b85 ("net: nexthop: Add nexthop group entry stats")
Cc: stable@vger.kernel.org
Signed-off-by: Mehul Rao <mehulrao@gmail.com>
---
net/ipv4/nexthop.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index 1aa2b05ee8de..c942f1282236 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -2002,7 +2002,8 @@ static void nh_hthr_group_rebalance(struct nh_group *nhg)
}
static void remove_nh_grp_entry(struct net *net, struct nh_grp_entry *nhge,
- struct nl_info *nlinfo)
+ struct nl_info *nlinfo,
+ struct list_head *deferred_free)
{
struct nh_grp_entry *nhges, *new_nhges;
struct nexthop *nhp = nhge->nh_parent;
@@ -2062,8 +2063,8 @@ static void remove_nh_grp_entry(struct net *net, struct nh_grp_entry *nhge,
rcu_assign_pointer(nhp->nh_grp, newg);
list_del(&nhge->nh_list);
- free_percpu(nhge->stats);
nexthop_put(nhge->nh);
+ list_add(&nhge->nh_list, deferred_free);
/* Removal of a NH from a resilient group is notified through
* bucket notifications.
@@ -2083,6 +2084,7 @@ static void remove_nexthop_from_groups(struct net *net, struct nexthop *nh,
struct nl_info *nlinfo)
{
struct nh_grp_entry *nhge, *tmp;
+ LIST_HEAD(deferred_free);
/* If there is nothing to do, let's avoid the costly call to
* synchronize_net()
@@ -2091,10 +2093,16 @@ static void remove_nexthop_from_groups(struct net *net, struct nexthop *nh,
return;
list_for_each_entry_safe(nhge, tmp, &nh->grp_list, nh_list)
- remove_nh_grp_entry(net, nhge, nlinfo);
+ remove_nh_grp_entry(net, nhge, nlinfo, &deferred_free);
/* make sure all see the newly published array before releasing rtnl */
synchronize_net();
+
+ /* Now safe to free percpu stats — all RCU readers have finished */
+ list_for_each_entry_safe(nhge, tmp, &deferred_free, nh_list) {
+ list_del(&nhge->nh_list);
+ free_percpu(nhge->stats);
+ }
}
static void remove_nexthop_group(struct nexthop *nh, struct nl_info *nlinfo)
--
2.53.0
On Fri, Mar 06, 2026 at 06:38:20PM -0500, Mehul Rao wrote:
> When removing a nexthop from a group, remove_nh_grp_entry() publishes
> the new group via rcu_assign_pointer() then immediately frees the
> removed entry's percpu stats with free_percpu(). However, the
> synchronize_net() grace period in the caller remove_nexthop_from_groups()
> runs after the free. RCU readers that entered before the publish still
> see the old group and can dereference the freed stats via
> nh_grp_entry_stats_inc() -> get_cpu_ptr(nhge->stats), causing a
> use-after-free on percpu memory.
>
> Fix by deferring the free_percpu() until after synchronize_net() in the
> caller. Removed entries are chained via nh_list onto a local deferred
> free list. After the grace period completes and all RCU readers have
> finished, the percpu stats are safely freed.
>
> Fixes: f4676ea74b85 ("net: nexthop: Add nexthop group entry stats")
> Cc: stable@vger.kernel.org
> Signed-off-by: Mehul Rao <mehulrao@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Do you have a reproducer? I would like to understand why we don't see it
in the torture tests (e.g., ipv4_torture) in fib_nexthops.sh.
On Sat, Mar 7, 2026 at 12:39 AM Mehul Rao <mehulrao@gmail.com> wrote:
>
> When removing a nexthop from a group, remove_nh_grp_entry() publishes
> the new group via rcu_assign_pointer() then immediately frees the
> removed entry's percpu stats with free_percpu(). However, the
> synchronize_net() grace period in the caller remove_nexthop_from_groups()
> runs after the free. RCU readers that entered before the publish still
> see the old group and can dereference the freed stats via
> nh_grp_entry_stats_inc() -> get_cpu_ptr(nhge->stats), causing a
> use-after-free on percpu memory.
>
> Fix by deferring the free_percpu() until after synchronize_net() in the
> caller. Removed entries are chained via nh_list onto a local deferred
> free list. After the grace period completes and all RCU readers have
> finished, the percpu stats are safely freed.
>
> Fixes: f4676ea74b85 ("net: nexthop: Add nexthop group entry stats")
> Cc: stable@vger.kernel.org
> Signed-off-by: Mehul Rao <mehulrao@gmail.com>
SGTM, thanks !
Reviewed-by: Eric Dumazet <edumazet@google.com>
© 2016 - 2026 Red Hat, Inc.