[PATCH net] ipv6: fix data race in fib6_metric_set() using cmpxchg

Hangbin Liu posted 1 patch 1 week ago
There is a newer version of this series
net/ipv6/ip6_fib.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
[PATCH net] ipv6: fix data race in fib6_metric_set() using cmpxchg
Posted by Hangbin Liu 1 week ago
fib6_metric_set() may be called concurrently from softirq context without
holding the FIB table lock. A typical path is:

  ndisc_router_discovery()
    spin_unlock_bh(&table->tb6_lock)        <- lock released
    fib6_metric_set(rt, RTAX_HOPLIMIT, ...) <- lockless call

When two CPUs process Router Advertisement packets for the same router
simultaneously, they can both arrive at fib6_metric_set() with the same
fib6_info pointer whose fib6_metrics still points to dst_default_metrics.

  if (f6i->fib6_metrics == &dst_default_metrics) {   /* both CPUs: true */
      struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
      refcount_set(&p->refcnt, 1);
      f6i->fib6_metrics = p;   /* CPU1 overwrites CPU0's p -> p0 leaked */
  }

The dst_metrics allocated by the losing CPU has refcnt=1 but no pointer
to it anywhere in memory, producing a kmemleak report:

  unreferenced object 0xff1100025aca1400 (size 96):
    comm "softirq", pid 0, jiffies 4299271239
    backtrace:
      kmalloc_trace+0x28a/0x380
      fib6_metric_set+0xcd/0x180
      ndisc_router_discovery+0x12dc/0x24b0
      icmpv6_rcv+0xc16/0x1360

Fix this by replacing the plain pointer store with cmpxchg() and free
the allocation safely when competition failed.

Fixes: d4ead6b34b67 ("net/ipv6: move metrics from dst to rt6_info")
Reported-by: Fei Liu <feliu@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 net/ipv6/ip6_fib.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index dd26657b6a4a..64de761f40d5 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -730,14 +730,16 @@ void fib6_metric_set(struct fib6_info *f6i, int metric, u32 val)
 	if (!f6i)
 		return;
 
-	if (f6i->fib6_metrics == &dst_default_metrics) {
+	if (READ_ONCE(f6i->fib6_metrics) == &dst_default_metrics) {
+		struct dst_metrics *dflt = (struct dst_metrics *)&dst_default_metrics;
 		struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
 
 		if (!p)
 			return;
 
 		refcount_set(&p->refcnt, 1);
-		f6i->fib6_metrics = p;
+		if (cmpxchg(&f6i->fib6_metrics, dflt, p) != dflt)
+			kfree(p);
 	}
 
 	f6i->fib6_metrics->metrics[metric - 1] = val;

---
base-commit: c4ea7d8907cf72b259bf70bd8c2e791e1c4ff70f
change-id: 20260326-b4-fib6_metric_set-kmemleak-7aa51978284a

Best regards,
-- 
Hangbin Liu <liuhangbin@gmail.com>
Re: [PATCH net] ipv6: fix data race in fib6_metric_set() using cmpxchg
Posted by Eric Dumazet 1 week ago
On Wed, Mar 25, 2026 at 9:22 PM Hangbin Liu <liuhangbin@gmail.com> wrote:
>
> fib6_metric_set() may be called concurrently from softirq context without
> holding the FIB table lock. A typical path is:
>
>   ndisc_router_discovery()
>     spin_unlock_bh(&table->tb6_lock)        <- lock released
>     fib6_metric_set(rt, RTAX_HOPLIMIT, ...) <- lockless call
>
> When two CPUs process Router Advertisement packets for the same router
> simultaneously, they can both arrive at fib6_metric_set() with the same
> fib6_info pointer whose fib6_metrics still points to dst_default_metrics.
>
>   if (f6i->fib6_metrics == &dst_default_metrics) {   /* both CPUs: true */
>       struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
>       refcount_set(&p->refcnt, 1);
>       f6i->fib6_metrics = p;   /* CPU1 overwrites CPU0's p -> p0 leaked */
>   }
>
> The dst_metrics allocated by the losing CPU has refcnt=1 but no pointer
> to it anywhere in memory, producing a kmemleak report:
>
>   unreferenced object 0xff1100025aca1400 (size 96):
>     comm "softirq", pid 0, jiffies 4299271239
>     backtrace:
>       kmalloc_trace+0x28a/0x380
>       fib6_metric_set+0xcd/0x180
>       ndisc_router_discovery+0x12dc/0x24b0
>       icmpv6_rcv+0xc16/0x1360
>
> Fix this by replacing the plain pointer store with cmpxchg() and free
> the allocation safely when competition failed.
>
> Fixes: d4ead6b34b67 ("net/ipv6: move metrics from dst to rt6_info")
> Reported-by: Fei Liu <feliu@redhat.com>
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
>  net/ipv6/ip6_fib.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> index dd26657b6a4a..64de761f40d5 100644
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -730,14 +730,16 @@ void fib6_metric_set(struct fib6_info *f6i, int metric, u32 val)
>         if (!f6i)
>                 return;
>
> -       if (f6i->fib6_metrics == &dst_default_metrics) {
> +       if (READ_ONCE(f6i->fib6_metrics) == &dst_default_metrics) {
> +               struct dst_metrics *dflt = (struct dst_metrics *)&dst_default_metrics;
>                 struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
>
>                 if (!p)
>                         return;
>
>                 refcount_set(&p->refcnt, 1);
> -               f6i->fib6_metrics = p;
> +               if (cmpxchg(&f6i->fib6_metrics, dflt, p) != dflt)
> +                       kfree(p);
>         }
>

The following line should happen before the cmpxchg(),
 ->metrics[X] accesses also need READ_ONCE()/WRITE_ONCE() annotations.


>         f6i->fib6_metrics->metrics[metric - 1] = val;
>
> ---
> base-commit: c4ea7d8907cf72b259bf70bd8c2e791e1c4ff70f
> change-id: 20260326-b4-fib6_metric_set-kmemleak-7aa51978284a
>
> Best regards,
> --
> Hangbin Liu <liuhangbin@gmail.com>
>
Re: [PATCH net] ipv6: fix data race in fib6_metric_set() using cmpxchg
Posted by Hangbin Liu 1 week ago
On Thu, Mar 26, 2026 at 05:05:57AM -0700, Eric Dumazet wrote:
> > diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> > index dd26657b6a4a..64de761f40d5 100644
> > --- a/net/ipv6/ip6_fib.c
> > +++ b/net/ipv6/ip6_fib.c
> > @@ -730,14 +730,16 @@ void fib6_metric_set(struct fib6_info *f6i, int metric, u32 val)
> >         if (!f6i)
> >                 return;
> >
> > -       if (f6i->fib6_metrics == &dst_default_metrics) {
> > +       if (READ_ONCE(f6i->fib6_metrics) == &dst_default_metrics) {
> > +               struct dst_metrics *dflt = (struct dst_metrics *)&dst_default_metrics;
> >                 struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
> >
> >                 if (!p)
> >                         return;
> >
> >                 refcount_set(&p->refcnt, 1);
> > -               f6i->fib6_metrics = p;
> > +               if (cmpxchg(&f6i->fib6_metrics, dflt, p) != dflt)
> > +                       kfree(p);
> >         }
> >
> 
> The following line should happen before the cmpxchg(),
>  ->metrics[X] accesses also need READ_ONCE()/WRITE_ONCE() annotations.

Hi Eric,

Jiayuan also suggested to using READ_ONCE()/WRITE_ONCE() for metrics[X]
accesses. But I don't get why this line should happen before the cmpxchg(),
Would you please help explain?

> 
> 
> >         f6i->fib6_metrics->metrics[metric - 1] = val;
> >

Thanks
Hangbin
Re: [PATCH net] ipv6: fix data race in fib6_metric_set() using cmpxchg
Posted by Jiayuan Chen 1 week ago
On 3/26/26 9:13 PM, Hangbin Liu wrote:
> On Thu, Mar 26, 2026 at 05:05:57AM -0700, Eric Dumazet wrote:
>>> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
>>> index dd26657b6a4a..64de761f40d5 100644
>>> --- a/net/ipv6/ip6_fib.c
>>> +++ b/net/ipv6/ip6_fib.c
>>> @@ -730,14 +730,16 @@ void fib6_metric_set(struct fib6_info *f6i, int metric, u32 val)
>>>          if (!f6i)
>>>                  return;
>>>
>>> -       if (f6i->fib6_metrics == &dst_default_metrics) {
>>> +       if (READ_ONCE(f6i->fib6_metrics) == &dst_default_metrics) {
>>> +               struct dst_metrics *dflt = (struct dst_metrics *)&dst_default_metrics;
>>>                  struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
>>>
>>>                  if (!p)
>>>                          return;
>>>
>>>                  refcount_set(&p->refcnt, 1);
>>> -               f6i->fib6_metrics = p;
>>> +               if (cmpxchg(&f6i->fib6_metrics, dflt, p) != dflt)
>>> +                       kfree(p);
>>>          }
>>>
>> The following line should happen before the cmpxchg(),
>>   ->metrics[X] accesses also need READ_ONCE()/WRITE_ONCE() annotations.
> Hi Eric,
>
> Jiayuan also suggested to using READ_ONCE()/WRITE_ONCE() for metrics[X]
> accesses. But I don't get why this line should happen before the cmpxchg(),
> Would you please help explain?


I think what Eric means is something like this:


...
         struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);

         if (!p)
             return;

         p->metrics[metric - 1] = val;
         refcount_set(&p->refcnt, 1);
         if (cmpxchg(&f6i->fib6_metrics, dflt, p) != dflt)
             kfree(p);
         else
             return;
     }
}

m = READ_ONCE(f6i->fib6_metrics);
WRITE_ONCE(m->metrics[metric - 1], val);


Since p is private data before being published via cmpxchg(), we can
safely initialize its metrics beforehand. This way we don't need to
worry about concurrent access to f6i->fib6_metrics->metrics[] during
initialization. Right?


Re: [PATCH net] ipv6: fix data race in fib6_metric_set() using cmpxchg
Posted by Eric Dumazet 1 week ago
On Thu, Mar 26, 2026 at 6:44 AM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>
>
> On 3/26/26 9:13 PM, Hangbin Liu wrote:
> > On Thu, Mar 26, 2026 at 05:05:57AM -0700, Eric Dumazet wrote:
> >>> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> >>> index dd26657b6a4a..64de761f40d5 100644
> >>> --- a/net/ipv6/ip6_fib.c
> >>> +++ b/net/ipv6/ip6_fib.c
> >>> @@ -730,14 +730,16 @@ void fib6_metric_set(struct fib6_info *f6i, int metric, u32 val)
> >>>          if (!f6i)
> >>>                  return;
> >>>
> >>> -       if (f6i->fib6_metrics == &dst_default_metrics) {
> >>> +       if (READ_ONCE(f6i->fib6_metrics) == &dst_default_metrics) {
> >>> +               struct dst_metrics *dflt = (struct dst_metrics *)&dst_default_metrics;
> >>>                  struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
> >>>
> >>>                  if (!p)
> >>>                          return;
> >>>
> >>>                  refcount_set(&p->refcnt, 1);
> >>> -               f6i->fib6_metrics = p;
> >>> +               if (cmpxchg(&f6i->fib6_metrics, dflt, p) != dflt)
> >>> +                       kfree(p);
> >>>          }
> >>>
> >> The following line should happen before the cmpxchg(),
> >>   ->metrics[X] accesses also need READ_ONCE()/WRITE_ONCE() annotations.
> > Hi Eric,
> >
> > Jiayuan also suggested to using READ_ONCE()/WRITE_ONCE() for metrics[X]
> > accesses. But I don't get why this line should happen before the cmpxchg(),
> > Would you please help explain?
>
>
> I think what Eric means is something like this:
>
>
> ...
>          struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
>
>          if (!p)
>              return;
>
>          p->metrics[metric - 1] = val;
>          refcount_set(&p->refcnt, 1);
>          if (cmpxchg(&f6i->fib6_metrics, dflt, p) != dflt)
>              kfree(p);
>          else
>              return;
>      }
> }
>
> m = READ_ONCE(f6i->fib6_metrics);
> WRITE_ONCE(m->metrics[metric - 1], val);
>
>
> Since p is private data before being published via cmpxchg(), we can
> safely initialize its metrics beforehand. This way we don't need to
> worry about concurrent access to f6i->fib6_metrics->metrics[] during
> initialization. Right?

Yes.  Think about RCU (Read Copy Update) rules.

We allocate an object, and populate it, then make sure changes are
committed, before publishing the new pointer.

Othewise an other cpu could read a 0 metric, while we wanted something else.
Re: [PATCH net] ipv6: fix data race in fib6_metric_set() using cmpxchg
Posted by Hangbin Liu 6 days, 20 hours ago
On Thu, Mar 26, 2026 at 07:01:36AM -0700, Eric Dumazet wrote:
> On Thu, Mar 26, 2026 at 6:44 AM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
> >
> >
> > On 3/26/26 9:13 PM, Hangbin Liu wrote:
> > > On Thu, Mar 26, 2026 at 05:05:57AM -0700, Eric Dumazet wrote:
> > >>> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> > >>> index dd26657b6a4a..64de761f40d5 100644
> > >>> --- a/net/ipv6/ip6_fib.c
> > >>> +++ b/net/ipv6/ip6_fib.c
> > >>> @@ -730,14 +730,16 @@ void fib6_metric_set(struct fib6_info *f6i, int metric, u32 val)
> > >>>          if (!f6i)
> > >>>                  return;
> > >>>
> > >>> -       if (f6i->fib6_metrics == &dst_default_metrics) {
> > >>> +       if (READ_ONCE(f6i->fib6_metrics) == &dst_default_metrics) {
> > >>> +               struct dst_metrics *dflt = (struct dst_metrics *)&dst_default_metrics;
> > >>>                  struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
> > >>>
> > >>>                  if (!p)
> > >>>                          return;
> > >>>
> > >>>                  refcount_set(&p->refcnt, 1);
> > >>> -               f6i->fib6_metrics = p;
> > >>> +               if (cmpxchg(&f6i->fib6_metrics, dflt, p) != dflt)
> > >>> +                       kfree(p);
> > >>>          }
> > >>>
> > >> The following line should happen before the cmpxchg(),
> > >>   ->metrics[X] accesses also need READ_ONCE()/WRITE_ONCE() annotations.
> > > Hi Eric,
> > >
> > > Jiayuan also suggested to using READ_ONCE()/WRITE_ONCE() for metrics[X]
> > > accesses. But I don't get why this line should happen before the cmpxchg(),
> > > Would you please help explain?
> >
> >
> > I think what Eric means is something like this:
> >
> >
> > ...
> >          struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
> >
> >          if (!p)
> >              return;
> >
> >          p->metrics[metric - 1] = val;
> >          refcount_set(&p->refcnt, 1);
> >          if (cmpxchg(&f6i->fib6_metrics, dflt, p) != dflt)
> >              kfree(p);
> >          else
> >              return;
> >      }
> > }
> >
> > m = READ_ONCE(f6i->fib6_metrics);
> > WRITE_ONCE(m->metrics[metric - 1], val);
> >
> >
> > Since p is private data before being published via cmpxchg(), we can
> > safely initialize its metrics beforehand. This way we don't need to
> > worry about concurrent access to f6i->fib6_metrics->metrics[] during
> > initialization. Right?
> 
> Yes.  Think about RCU (Read Copy Update) rules.
> 
> We allocate an object, and populate it, then make sure changes are
> committed, before publishing the new pointer.
> 
> Othewise an other cpu could read a 0 metric, while we wanted something else.

Ah, got it. Thanks for the explanation.

Regards
Hangbin
Re: [PATCH net] ipv6: fix data race in fib6_metric_set() using cmpxchg
Posted by Jiayuan Chen 1 week ago
On 3/26/26 12:22 PM, Hangbin Liu wrote:
> fib6_metric_set() may be called concurrently from softirq context without
> holding the FIB table lock. A typical path is:
>
>    ndisc_router_discovery()
>      spin_unlock_bh(&table->tb6_lock)        <- lock released
>      fib6_metric_set(rt, RTAX_HOPLIMIT, ...) <- lockless call
>
> When two CPUs process Router Advertisement packets for the same router
> simultaneously, they can both arrive at fib6_metric_set() with the same
> fib6_info pointer whose fib6_metrics still points to dst_default_metrics.
>
>    if (f6i->fib6_metrics == &dst_default_metrics) {   /* both CPUs: true */
>        struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
>        refcount_set(&p->refcnt, 1);
>        f6i->fib6_metrics = p;   /* CPU1 overwrites CPU0's p -> p0 leaked */
>    }
>
> The dst_metrics allocated by the losing CPU has refcnt=1 but no pointer
> to it anywhere in memory, producing a kmemleak report:
>
>    unreferenced object 0xff1100025aca1400 (size 96):
>      comm "softirq", pid 0, jiffies 4299271239
>      backtrace:
>        kmalloc_trace+0x28a/0x380
>        fib6_metric_set+0xcd/0x180
>        ndisc_router_discovery+0x12dc/0x24b0
>        icmpv6_rcv+0xc16/0x1360
>
> Fix this by replacing the plain pointer store with cmpxchg() and free
> the allocation safely when competition failed.
>
> Fixes: d4ead6b34b67 ("net/ipv6: move metrics from dst to rt6_info")
> Reported-by: Fei Liu <feliu@redhat.com>
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
>   net/ipv6/ip6_fib.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> index dd26657b6a4a..64de761f40d5 100644
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -730,14 +730,16 @@ void fib6_metric_set(struct fib6_info *f6i, int metric, u32 val)
>   	if (!f6i)
>   		return;
>   
> -	if (f6i->fib6_metrics == &dst_default_metrics) {
> +	if (READ_ONCE(f6i->fib6_metrics) == &dst_default_metrics) {
> +		struct dst_metrics *dflt = (struct dst_metrics *)&dst_default_metrics;
>   		struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
>   
>   		if (!p)
>   			return;
>   
>   		refcount_set(&p->refcnt, 1);
> -		f6i->fib6_metrics = p;
> +		if (cmpxchg(&f6i->fib6_metrics, dflt, p) != dflt)
> +			kfree(p);
>   	}
>   


[...]

>   	f6i->fib6_metrics->metrics[metric - 1] = val;

Suggest using marked accessors to suppress KCSAN warnings:

struct dst_metrics *m = READ_ONCE(f6i->fib6_metrics);
WRITE_ONCE(m->metrics[metric - 1], val);
Re: [PATCH net] ipv6: fix data race in fib6_metric_set() using cmpxchg
Posted by Hangbin Liu 1 week ago
On Thu, Mar 26, 2026 at 02:23:15PM +0800, Jiayuan Chen wrote:
> 
> On 3/26/26 12:22 PM, Hangbin Liu wrote:
> > fib6_metric_set() may be called concurrently from softirq context without
> > holding the FIB table lock. A typical path is:
> > 
> >    ndisc_router_discovery()
> >      spin_unlock_bh(&table->tb6_lock)        <- lock released
> >      fib6_metric_set(rt, RTAX_HOPLIMIT, ...) <- lockless call
> > 
> > When two CPUs process Router Advertisement packets for the same router
> > simultaneously, they can both arrive at fib6_metric_set() with the same
> > fib6_info pointer whose fib6_metrics still points to dst_default_metrics.
> > 
> >    if (f6i->fib6_metrics == &dst_default_metrics) {   /* both CPUs: true */
> >        struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
> >        refcount_set(&p->refcnt, 1);
> >        f6i->fib6_metrics = p;   /* CPU1 overwrites CPU0's p -> p0 leaked */
> >    }
> > 
> > The dst_metrics allocated by the losing CPU has refcnt=1 but no pointer
> > to it anywhere in memory, producing a kmemleak report:
> > 
> >    unreferenced object 0xff1100025aca1400 (size 96):
> >      comm "softirq", pid 0, jiffies 4299271239
> >      backtrace:
> >        kmalloc_trace+0x28a/0x380
> >        fib6_metric_set+0xcd/0x180
> >        ndisc_router_discovery+0x12dc/0x24b0
> >        icmpv6_rcv+0xc16/0x1360
> > 
> > Fix this by replacing the plain pointer store with cmpxchg() and free
> > the allocation safely when competition failed.
> > 
> > Fixes: d4ead6b34b67 ("net/ipv6: move metrics from dst to rt6_info")
> > Reported-by: Fei Liu <feliu@redhat.com>
> > Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> > ---
> >   net/ipv6/ip6_fib.c | 6 ++++--
> >   1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> > index dd26657b6a4a..64de761f40d5 100644
> > --- a/net/ipv6/ip6_fib.c
> > +++ b/net/ipv6/ip6_fib.c
> > @@ -730,14 +730,16 @@ void fib6_metric_set(struct fib6_info *f6i, int metric, u32 val)
> >   	if (!f6i)
> >   		return;
> > -	if (f6i->fib6_metrics == &dst_default_metrics) {
> > +	if (READ_ONCE(f6i->fib6_metrics) == &dst_default_metrics) {
> > +		struct dst_metrics *dflt = (struct dst_metrics *)&dst_default_metrics;
> >   		struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
> >   		if (!p)
> >   			return;
> >   		refcount_set(&p->refcnt, 1);
> > -		f6i->fib6_metrics = p;
> > +		if (cmpxchg(&f6i->fib6_metrics, dflt, p) != dflt)
> > +			kfree(p);
> >   	}
> 
> 
> [...]
> 
> >   	f6i->fib6_metrics->metrics[metric - 1] = val;
> 
> Suggest using marked accessors to suppress KCSAN warnings:
> 
> struct dst_metrics *m = READ_ONCE(f6i->fib6_metrics);
> WRITE_ONCE(m->metrics[metric - 1], val);

Thanks, I will update this in next version.

Hangbin
Re: [PATCH net] ipv6: fix data race in fib6_metric_set() using cmpxchg
Posted by Hangbin Liu 1 week ago
On Thu, Mar 26, 2026 at 06:44:43AM +0000, Hangbin Liu wrote:
> On Thu, Mar 26, 2026 at 02:23:15PM +0800, Jiayuan Chen wrote:
> > 
> > On 3/26/26 12:22 PM, Hangbin Liu wrote:
> > > fib6_metric_set() may be called concurrently from softirq context without
> > > holding the FIB table lock. A typical path is:
> > > 
> > >    ndisc_router_discovery()
> > >      spin_unlock_bh(&table->tb6_lock)        <- lock released
> > >      fib6_metric_set(rt, RTAX_HOPLIMIT, ...) <- lockless call
> > > 
> > > When two CPUs process Router Advertisement packets for the same router
> > > simultaneously, they can both arrive at fib6_metric_set() with the same
> > > fib6_info pointer whose fib6_metrics still points to dst_default_metrics.
> > > 
> > >    if (f6i->fib6_metrics == &dst_default_metrics) {   /* both CPUs: true */
> > >        struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
> > >        refcount_set(&p->refcnt, 1);
> > >        f6i->fib6_metrics = p;   /* CPU1 overwrites CPU0's p -> p0 leaked */
> > >    }
> > > 
> > > The dst_metrics allocated by the losing CPU has refcnt=1 but no pointer
> > > to it anywhere in memory, producing a kmemleak report:
> > > 
> > >    unreferenced object 0xff1100025aca1400 (size 96):
> > >      comm "softirq", pid 0, jiffies 4299271239
> > >      backtrace:
> > >        kmalloc_trace+0x28a/0x380
> > >        fib6_metric_set+0xcd/0x180
> > >        ndisc_router_discovery+0x12dc/0x24b0
> > >        icmpv6_rcv+0xc16/0x1360
> > > 
> > > Fix this by replacing the plain pointer store with cmpxchg() and free
> > > the allocation safely when competition failed.
> > > 
> > > Fixes: d4ead6b34b67 ("net/ipv6: move metrics from dst to rt6_info")
> > > Reported-by: Fei Liu <feliu@redhat.com>
> > > Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> > > ---
> > >   net/ipv6/ip6_fib.c | 6 ++++--
> > >   1 file changed, 4 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> > > index dd26657b6a4a..64de761f40d5 100644
> > > --- a/net/ipv6/ip6_fib.c
> > > +++ b/net/ipv6/ip6_fib.c
> > > @@ -730,14 +730,16 @@ void fib6_metric_set(struct fib6_info *f6i, int metric, u32 val)
> > >   	if (!f6i)
> > >   		return;
> > > -	if (f6i->fib6_metrics == &dst_default_metrics) {
> > > +	if (READ_ONCE(f6i->fib6_metrics) == &dst_default_metrics) {
> > > +		struct dst_metrics *dflt = (struct dst_metrics *)&dst_default_metrics;
> > >   		struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
> > >   		if (!p)
> > >   			return;
> > >   		refcount_set(&p->refcnt, 1);
> > > -		f6i->fib6_metrics = p;
> > > +		if (cmpxchg(&f6i->fib6_metrics, dflt, p) != dflt)
> > > +			kfree(p);
> > >   	}
> > 
> > 
> > [...]
> > 
> > >   	f6i->fib6_metrics->metrics[metric - 1] = val;
> > 
> > Suggest using marked accessors to suppress KCSAN warnings:
> > 
> > struct dst_metrics *m = READ_ONCE(f6i->fib6_metrics);
> > WRITE_ONCE(m->metrics[metric - 1], val);
> 
> Thanks, I will update this in next version.

BTW, do we really need to WRITE_ONCE here? What if the `val` are different
on 2 CPUs? This would hide the problem, right?

Thanks
Hangbin
Re: [PATCH net] ipv6: fix data race in fib6_metric_set() using cmpxchg
Posted by Jiayuan Chen 1 week ago
On 3/26/26 3:13 PM, Hangbin Liu wrote:
> On Thu, Mar 26, 2026 at 06:44:43AM +0000, Hangbin Liu wrote:
>> On Thu, Mar 26, 2026 at 02:23:15PM +0800, Jiayuan Chen wrote:
>>> On 3/26/26 12:22 PM, Hangbin Liu wrote:
>>>> fib6_metric_set() may be called concurrently from softirq context without
>>>> holding the FIB table lock. A typical path is:
>>>>
>>>>     ndisc_router_discovery()
>>>>       spin_unlock_bh(&table->tb6_lock)        <- lock released
>>>>       fib6_metric_set(rt, RTAX_HOPLIMIT, ...) <- lockless call
>>>>
>>>> When two CPUs process Router Advertisement packets for the same router
>>>> simultaneously, they can both arrive at fib6_metric_set() with the same
>>>> fib6_info pointer whose fib6_metrics still points to dst_default_metrics.
>>>>
>>>>     if (f6i->fib6_metrics == &dst_default_metrics) {   /* both CPUs: true */
>>>>         struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
>>>>         refcount_set(&p->refcnt, 1);
>>>>         f6i->fib6_metrics = p;   /* CPU1 overwrites CPU0's p -> p0 leaked */
>>>>     }
>>>>
>>>> The dst_metrics allocated by the losing CPU has refcnt=1 but no pointer
>>>> to it anywhere in memory, producing a kmemleak report:
>>>>
>>>>     unreferenced object 0xff1100025aca1400 (size 96):
>>>>       comm "softirq", pid 0, jiffies 4299271239
>>>>       backtrace:
>>>>         kmalloc_trace+0x28a/0x380
>>>>         fib6_metric_set+0xcd/0x180
>>>>         ndisc_router_discovery+0x12dc/0x24b0
>>>>         icmpv6_rcv+0xc16/0x1360
>>>>
>>>> Fix this by replacing the plain pointer store with cmpxchg() and free
>>>> the allocation safely when competition failed.
>>>>
>>>> Fixes: d4ead6b34b67 ("net/ipv6: move metrics from dst to rt6_info")
>>>> Reported-by: Fei Liu <feliu@redhat.com>
>>>> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
>>>> ---
>>>>    net/ipv6/ip6_fib.c | 6 ++++--
>>>>    1 file changed, 4 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
>>>> index dd26657b6a4a..64de761f40d5 100644
>>>> --- a/net/ipv6/ip6_fib.c
>>>> +++ b/net/ipv6/ip6_fib.c
>>>> @@ -730,14 +730,16 @@ void fib6_metric_set(struct fib6_info *f6i, int metric, u32 val)
>>>>    	if (!f6i)
>>>>    		return;
>>>> -	if (f6i->fib6_metrics == &dst_default_metrics) {
>>>> +	if (READ_ONCE(f6i->fib6_metrics) == &dst_default_metrics) {
>>>> +		struct dst_metrics *dflt = (struct dst_metrics *)&dst_default_metrics;
>>>>    		struct dst_metrics *p = kzalloc_obj(*p, GFP_ATOMIC);
>>>>    		if (!p)
>>>>    			return;
>>>>    		refcount_set(&p->refcnt, 1);
>>>> -		f6i->fib6_metrics = p;
>>>> +		if (cmpxchg(&f6i->fib6_metrics, dflt, p) != dflt)
>>>> +			kfree(p);
>>>>    	}
>>>
>>> [...]
>>>
>>>>    	f6i->fib6_metrics->metrics[metric - 1] = val;
>>> Suggest using marked accessors to suppress KCSAN warnings:
>>>
>>> struct dst_metrics *m = READ_ONCE(f6i->fib6_metrics);
>>> WRITE_ONCE(m->metrics[metric - 1], val);
>> Thanks, I will update this in next version.
> BTW, do we really need to WRITE_ONCE here? What if the `val` are different
> on 2 CPUs? This would hide the problem, right?
>
> Thanks
> Hangbin
If concurrent writes to the same memory by two CPUs are considered a bug,
the solution is prevention, not detection by KCSAN. Furthermore, WRITE_ONCE
also helps suppress warnings from read-write races. For example, commit
6e84fc395e90 demonstrates this.