net/ipv6/ip6_fib.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
fib6_metric_set() may be called concurrently from softirq context without
holding the FIB table lock. A typical path is:
ndisc_router_discovery()
spin_unlock_bh(&table->tb6_lock) <- lock released
fib6_metric_set(rt, RTAX_HOPLIMIT, ...) <- lockless call
When two CPUs process Router Advertisement packets for the same router
simultaneously, they can both arrive at fib6_metric_set() with the same
fib6_info pointer whose fib6_metrics still points to dst_default_metrics.
if (f6i->fib6_metrics == &dst_default_metrics) { /* both CPUs: true */
struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
refcount_set(&p->refcnt, 1);
f6i->fib6_metrics = p; /* CPU1 overwrites CPU0's p -> p0 leaked */
}
The dst_metrics allocated by the losing CPU has refcnt=1 but no pointer
to it anywhere in memory, producing a kmemleak report:
unreferenced object 0xff1100025aca1400 (size 96):
comm "softirq", pid 0, jiffies 4299271239
backtrace:
kmalloc_trace+0x28a/0x380
fib6_metric_set+0xcd/0x180
ndisc_router_discovery+0x12dc/0x24b0
icmpv6_rcv+0xc16/0x1360
Fix this by:
- Set val for p->metrics before published via cmpxchg() so the metrics
value is ready before the pointer becomes visible to other CPUs.
- Replace the plain pointer store with cmpxchg() and free the allocation
safely when competition failed.
- Add READ_ONCE()/WRITE_ONCE() for metrics[] setting in the non-default
metrics path to prevent compiler-based data races.
Fixes: d4ead6b34b67 ("net/ipv6: move metrics from dst to rt6_info")
Reported-by: Fei Liu <feliu@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
Changes in v2:
- Set val for p->metrics before published via cmpxchg() (Eric Dumazet)
- Add READ_ONCE()/WRITE_ONCE() for metrics[] setting (Jiayuan Chen)
- Link to v1: https://lore.kernel.org/r/20260326-b4-fib6_metric_set-kmemleak-v1-1-c89fc1b312c0@gmail.com
---
net/ipv6/ip6_fib.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index dd26657b6a4a..2a7cc33fbcef 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -730,17 +730,24 @@ void fib6_metric_set(struct fib6_info *f6i, int metric, u32 val)
if (!f6i)
return;
- if (f6i->fib6_metrics == &dst_default_metrics) {
+ if (READ_ONCE(f6i->fib6_metrics) == &dst_default_metrics) {
+ struct dst_metrics *dflt = (struct dst_metrics *)&dst_default_metrics;
struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
if (!p)
return;
+ p->metrics[metric - 1] = val;
refcount_set(&p->refcnt, 1);
- f6i->fib6_metrics = p;
+ if (cmpxchg(&f6i->fib6_metrics, dflt, p) != dflt)
+ kfree(p);
+ else
+ return;
}
- f6i->fib6_metrics->metrics[metric - 1] = val;
+ struct dst_metrics *m = READ_ONCE(f6i->fib6_metrics);
+
+ WRITE_ONCE(m->metrics[metric - 1], val);
}
/*
---
base-commit: c4ea7d8907cf72b259bf70bd8c2e791e1c4ff70f
change-id: 20260326-b4-fib6_metric_set-kmemleak-7aa51978284a
Best regards,
--
Hangbin Liu <liuhangbin@gmail.com>
On Fri, 27 Mar 2026 10:24:47 +0800 Hangbin Liu wrote:
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -730,17 +730,24 @@ void fib6_metric_set(struct fib6_info *f6i, int metric, u32 val)
> if (!f6i)
> return;
>
> - if (f6i->fib6_metrics == &dst_default_metrics) {
> + if (READ_ONCE(f6i->fib6_metrics) == &dst_default_metrics) {
> + struct dst_metrics *dflt = (struct dst_metrics *)&dst_default_metrics;
Why does this exist? To cast away the const?
> struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
>
> if (!p)
> return;
>
> + p->metrics[metric - 1] = val;
> refcount_set(&p->refcnt, 1);
> - f6i->fib6_metrics = p;
> + if (cmpxchg(&f6i->fib6_metrics, dflt, p) != dflt)
> + kfree(p);
> + else
> + return;
> }
>
> - f6i->fib6_metrics->metrics[metric - 1] = val;
> + struct dst_metrics *m = READ_ONCE(f6i->fib6_metrics);
No variable declarations in the middle of a function please.
> + WRITE_ONCE(m->metrics[metric - 1], val);
> }
On Mon, Mar 30, 2026 at 05:46:28PM -0700, Jakub Kicinski wrote:
> On Fri, 27 Mar 2026 10:24:47 +0800 Hangbin Liu wrote:
> > --- a/net/ipv6/ip6_fib.c
> > +++ b/net/ipv6/ip6_fib.c
> > @@ -730,17 +730,24 @@ void fib6_metric_set(struct fib6_info *f6i, int metric, u32 val)
> > if (!f6i)
> > return;
> >
> > - if (f6i->fib6_metrics == &dst_default_metrics) {
> > + if (READ_ONCE(f6i->fib6_metrics) == &dst_default_metrics) {
> > + struct dst_metrics *dflt = (struct dst_metrics *)&dst_default_metrics;
>
> Why does this exist? To cast away the const?
Yes, cause cmpxchg doesn't accept const type.
>
> > struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
> >
> > if (!p)
> > return;
> >
> > + p->metrics[metric - 1] = val;
> > refcount_set(&p->refcnt, 1);
> > - f6i->fib6_metrics = p;
> > + if (cmpxchg(&f6i->fib6_metrics, dflt, p) != dflt)
> > + kfree(p);
> > + else
> > + return;
> > }
> >
> > - f6i->fib6_metrics->metrics[metric - 1] = val;
> > + struct dst_metrics *m = READ_ONCE(f6i->fib6_metrics);
>
> No variable declarations in the middle of a function please.
Oh, I thought it's OK now since kernel supports C99...
I will fix it.
Thanks
Hangbin
On 3/27/26 10:24 AM, Hangbin Liu wrote:
> fib6_metric_set() may be called concurrently from softirq context without
> holding the FIB table lock. A typical path is:
>
> ndisc_router_discovery()
> spin_unlock_bh(&table->tb6_lock) <- lock released
> fib6_metric_set(rt, RTAX_HOPLIMIT, ...) <- lockless call
>
> When two CPUs process Router Advertisement packets for the same router
> simultaneously, they can both arrive at fib6_metric_set() with the same
> fib6_info pointer whose fib6_metrics still points to dst_default_metrics.
>
> if (f6i->fib6_metrics == &dst_default_metrics) { /* both CPUs: true */
> struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
> refcount_set(&p->refcnt, 1);
> f6i->fib6_metrics = p; /* CPU1 overwrites CPU0's p -> p0 leaked */
> }
>
> The dst_metrics allocated by the losing CPU has refcnt=1 but no pointer
> to it anywhere in memory, producing a kmemleak report:
>
> unreferenced object 0xff1100025aca1400 (size 96):
> comm "softirq", pid 0, jiffies 4299271239
> backtrace:
> kmalloc_trace+0x28a/0x380
> fib6_metric_set+0xcd/0x180
> ndisc_router_discovery+0x12dc/0x24b0
> icmpv6_rcv+0xc16/0x1360
>
> Fix this by:
> - Set val for p->metrics before published via cmpxchg() so the metrics
> value is ready before the pointer becomes visible to other CPUs.
> - Replace the plain pointer store with cmpxchg() and free the allocation
> safely when competition failed.
> - Add READ_ONCE()/WRITE_ONCE() for metrics[] setting in the non-default
> metrics path to prevent compiler-based data races.
>
> Fixes: d4ead6b34b67 ("net/ipv6: move metrics from dst to rt6_info")
> Reported-by: Fei Liu <feliu@redhat.com>
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
https://sashiko.dev/#/patchset/20260327-b4-fib6_metric_set-kmemleak-v2-1-366b2c78b5c2%40gmail.com
The concern about reader paths (e.g., ip_dst_init_metrics, fib6_pmtu)
lacking READ_ONCE()
annotations is valid — if the compiler reloads from->fib6_metrics after
inlining, it could produce
an inconsistent pointer/flags combination in dst->_metrics, potentially
leading to a refcount_dec
on the read-only dst_default_metrics.
However, this is a pre-existing issue that exists before this patch.
The plain store f6i->fib6_metrics = p in the original code has the same
read-side race.
This patch focuses on fixing the writer-side data race that causes
kmemleak, and it
does so correctly.
BTW, please consider moving the declaration of m to the top of the
function if you have a next version
On Sat, Mar 28, 2026 at 07:22:48PM +0800, Jiayuan Chen wrote: > https://sashiko.dev/#/patchset/20260327-b4-fib6_metric_set-kmemleak-v2-1-366b2c78b5c2%40gmail.com > > > The concern about reader paths (e.g., ip_dst_init_metrics, fib6_pmtu) > lacking READ_ONCE() > annotations is valid — if the compiler reloads from->fib6_metrics after > inlining, it could produce > an inconsistent pointer/flags combination in dst->_metrics, potentially > leading to a refcount_dec > on the read-only dst_default_metrics. Thanks, I will fix the reader path separately in case I missed anything and slow down this one's process. Thanks Hangbin
© 2016 - 2026 Red Hat, Inc.