[PATCH V2] netrom: Prevent race conditions between multiple add route

Lizhi Xu posted 1 patch 3 months, 2 weeks ago
net/netrom/nr_route.c | 2 ++
1 file changed, 2 insertions(+)
[PATCH V2] netrom: Prevent race conditions between multiple add route
Posted by Lizhi Xu 3 months, 2 weeks ago
The root cause of the problem is that multiple different tasks initiate
NETROM_NODE commands to add new routes, there is no lock between them to
protect the same nr_neigh.
Task0 may add the nr_neigh.refcount value of 1 on Task1 to routes[2].

When Task2 executes nr_neigh_put(nr_node->routes[2].neighbour), it will
release the neighbour because its refcount value is 1.

In this case, the following situation causes a UAF:

Task0					Task1						Task2
=====					=====						=====
nr_add_node()
nr_neigh_get_dev()			nr_add_node()
					nr_node_lock()
					nr_node->routes[2].neighbour->count--
					nr_neigh_put(nr_node->routes[2].neighbour);
					nr_remove_neigh(nr_node->routes[2].neighbour)
					nr_node_unlock()
nr_node_lock()
nr_node->routes[2].neighbour = nr_neigh
nr_neigh_hold(nr_neigh);								nr_add_node()
											nr_neigh_put()

The solution to the problem is to use a lock to synchronize each add a route
to node.

syzbot reported:
BUG: KASAN: slab-use-after-free in nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
Read of size 4 at addr ffff888051e6e9b0 by task syz.1.2539/8741

Call Trace:
 <TASK>
 nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248

Reported-by: syzbot+2860e75836a08b172755@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2860e75836a08b172755
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
---
V1 -> V2: update comments for cause uaf

 net/netrom/nr_route.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
index b94cb2ffbaf8..ae1e5ee1f52f 100644
--- a/net/netrom/nr_route.c
+++ b/net/netrom/nr_route.c
@@ -102,7 +102,9 @@ static int __must_check nr_add_node(ax25_address *nr, const char *mnemonic,
 	struct nr_neigh *nr_neigh;
 	int i, found;
 	struct net_device *odev;
+	static DEFINE_MUTEX(add_node_lock);
 
+	guard(mutex)(&add_node_lock);
 	if ((odev=nr_dev_get(nr)) != NULL) {	/* Can't add routes to ourself */
 		dev_put(odev);
 		return -EINVAL;
-- 
2.43.0
Re: [PATCH V2] netrom: Prevent race conditions between multiple add route
Posted by Dan Carpenter 3 months, 2 weeks ago
On Mon, Oct 20, 2025 at 07:02:44PM +0800, Lizhi Xu wrote:
> The root cause of the problem is that multiple different tasks initiate
> NETROM_NODE commands to add new routes, there is no lock between them to
> protect the same nr_neigh.
> Task0 may add the nr_neigh.refcount value of 1 on Task1 to routes[2].
> 
> When Task2 executes nr_neigh_put(nr_node->routes[2].neighbour), it will
> release the neighbour because its refcount value is 1.
> 
> In this case, the following situation causes a UAF:
> 
> Task0					Task1						Task2
> =====					=====						=====
> nr_add_node()
> nr_neigh_get_dev()			nr_add_node()
> 					nr_node_lock()
> 					nr_node->routes[2].neighbour->count--
> 					nr_neigh_put(nr_node->routes[2].neighbour);
> 					nr_remove_neigh(nr_node->routes[2].neighbour)
> 					nr_node_unlock()
> nr_node_lock()
> nr_node->routes[2].neighbour = nr_neigh
> nr_neigh_hold(nr_neigh);								nr_add_node()
> 											nr_neigh_put()
> 
> The solution to the problem is to use a lock to synchronize each add a route
> to node.

This chart is still not right.  Let me add line numbers to your chart:

netrom/nr_route.c
   214          nr_node_lock(nr_node);
   215  
   216          if (quality != 0)
   217                  strscpy(nr_node->mnemonic, mnemonic);
   218  
   219          for (found = 0, i = 0; i < nr_node->count; i++) {
   220                  if (nr_node->routes[i].neighbour == nr_neigh) {
   221                          nr_node->routes[i].quality   = quality;
   222                          nr_node->routes[i].obs_count = obs_count;
   223                          found = 1;
   224                          break;
   225                  }
   226          }
   227  
   228          if (!found) {
   229                  /* We have space at the bottom, slot it in */
   230                  if (nr_node->count < 3) {
   231                          nr_node->routes[2] = nr_node->routes[1];
   232                          nr_node->routes[1] = nr_node->routes[0];
   233  
   234                          nr_node->routes[0].quality   = quality;
   235                          nr_node->routes[0].obs_count = obs_count;
   236                          nr_node->routes[0].neighbour = nr_neigh;
   237  
   238                          nr_node->which++;
   239                          nr_node->count++;
   240                          nr_neigh_hold(nr_neigh);
   241                          nr_neigh->count++;
   242                  } else {
   243                          /* It must be better than the worst */
   244                          if (quality > nr_node->routes[2].quality) {
   245                                  nr_node->routes[2].neighbour->count--;
   246                                  nr_neigh_put(nr_node->routes[2].neighbour);
   247  
   248                                  if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)
   249                                          nr_remove_neigh(nr_node->routes[2].neighbour);
   250  
   251                                  nr_node->routes[2].quality   = quality;
   252                                  nr_node->routes[2].obs_count = obs_count;
   253                                  nr_node->routes[2].neighbour = nr_neigh;
   254  
   255                                  nr_neigh_hold(nr_neigh);
   256                                  nr_neigh->count++;
   257                          }
   258                  }
   259          }


Task0					Task1						Task2
=====					=====						=====
[97] nr_add_node()
[113] nr_neigh_get_dev()		[97] nr_add_node()
					[214] nr_node_lock()
					[245] nr_node->routes[2].neighbour->count--
					[246] nr_neigh_put(nr_node->routes[2].neighbour);
					[248] nr_remove_neigh(nr_node->routes[2].neighbour)
					[283] nr_node_unlock()
[214] nr_node_lock()
[253] nr_node->routes[2].neighbour = nr_neigh
[254] nr_neigh_hold(nr_neigh);								[97] nr_add_node()
											[XXX] nr_neigh_put()
                                                                                        ^^^^^^^^^^^^^^^^^^^^

These charts are supposed to be chronological so [XXX] is wrong because the
use after free happens on line [248].  Do we really need three threads to
make this race work?

> 
> syzbot reported:
> BUG: KASAN: slab-use-after-free in nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
                                                                                     ^^^

> Read of size 4 at addr ffff888051e6e9b0 by task syz.1.2539/8741

I'm sure you tested your patch and that it fixes the bug, but I just
wonder if it's the best possible fix?

regards,
dan carpenter
[PATCH V2] netrom: Prevent race conditions between multiple add route
Posted by Lizhi Xu 3 months, 2 weeks ago
On Mon, 20 Oct 2025 15:25:40 +0300, Dan Carpenter wrote:
> Task0					Task1						Task2
> =====					=====						=====
> [97] nr_add_node()
> [113] nr_neigh_get_dev()		[97] nr_add_node()
> 					[214] nr_node_lock()
> 					[245] nr_node->routes[2].neighbour->count--
> 					[246] nr_neigh_put(nr_node->routes[2].neighbour);
> 					[248] nr_remove_neigh(nr_node->routes[2].neighbour)
> 					[283] nr_node_unlock()
> [214] nr_node_lock()
> [253] nr_node->routes[2].neighbour = nr_neigh
> [254] nr_neigh_hold(nr_neigh);							[97] nr_add_node()
> 											[XXX] nr_neigh_put()
>                                                                                         ^^^^^^^^^^^^^^^^^^^^
> 
> These charts are supposed to be chronological so [XXX] is wrong because the
> use after free happens on line [248].  Do we really need three threads to
> make this race work?
The UAF problem occurs in Task2. Task1 sets the refcount of nr_neigh to 1,
then Task0 adds it to routes[2]. Task2 releases routes[2].neighbour after
executing [XXX]nr_neigh_put().

BR,
Lizhi
[PATCH V2] netrom: Prevent race conditions between multiple add route
Posted by Lizhi Xu 3 months, 2 weeks ago
On Mon, 20 Oct 2025 21:34:56 +0800, Lizhi Xu wrote:
> > Task0					Task1						Task2
> > =====					=====						=====
> > [97] nr_add_node()
> > [113] nr_neigh_get_dev()		[97] nr_add_node()
> > 					[214] nr_node_lock()
> > 					[245] nr_node->routes[2].neighbour->count--
> > 					[246] nr_neigh_put(nr_node->routes[2].neighbour);
> > 					[248] nr_remove_neigh(nr_node->routes[2].neighbour)
> > 					[283] nr_node_unlock()
> > [214] nr_node_lock()
> > [253] nr_node->routes[2].neighbour = nr_neigh
> > [254] nr_neigh_hold(nr_neigh);							[97] nr_add_node()
> > 											[XXX] nr_neigh_put()
> >                                                                                         ^^^^^^^^^^^^^^^^^^^^
> > 
> > These charts are supposed to be chronological so [XXX] is wrong because the
> > use after free happens on line [248].  Do we really need three threads to
> > make this race work?
> The UAF problem occurs in Task2. Task1 sets the refcount of nr_neigh to 1,
> then Task0 adds it to routes[2]. Task2 releases routes[2].neighbour after
> executing [XXX]nr_neigh_put().
Execution Order:
1 -> Task0
[113] nr_neigh_get_dev() // After execution, the refcount value is 3

2 -> Task1
[246] nr_neigh_put(nr_node->routes[2].neighbour);   // After execution, the refcount value is 2
[248] nr_remove_neigh(nr_node->routes[2].neighbour) // After execution, the refcount value is 1

3 -> Task0
[253] nr_node->routes[2].neighbour = nr_neigh       // nr_neigh's refcount value is 1 and add it to routes[2]

4 -> Task2
[XXX] nr_neigh_put(nr_node->routes[2].neighbour)    // After execution, neighhour is freed
if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)  // Uaf occurs this line when accessing neighbour->count

BR,
Lizhi
Re: [PATCH V2] netrom: Prevent race conditions between multiple add route
Posted by Dan Carpenter 3 months, 2 weeks ago
On Mon, Oct 20, 2025 at 09:49:12PM +0800, Lizhi Xu wrote:
> On Mon, 20 Oct 2025 21:34:56 +0800, Lizhi Xu wrote:
> > > Task0					Task1						Task2
> > > =====					=====						=====
> > > [97] nr_add_node()
> > > [113] nr_neigh_get_dev()		[97] nr_add_node()
> > > 					[214] nr_node_lock()
> > > 					[245] nr_node->routes[2].neighbour->count--
> > > 					[246] nr_neigh_put(nr_node->routes[2].neighbour);
> > > 					[248] nr_remove_neigh(nr_node->routes[2].neighbour)
> > > 					[283] nr_node_unlock()
> > > [214] nr_node_lock()
> > > [253] nr_node->routes[2].neighbour = nr_neigh
> > > [254] nr_neigh_hold(nr_neigh);							[97] nr_add_node()
> > > 											[XXX] nr_neigh_put()
> > >                                                                                         ^^^^^^^^^^^^^^^^^^^^
> > > 
> > > These charts are supposed to be chronological so [XXX] is wrong because the
> > > use after free happens on line [248].  Do we really need three threads to
> > > make this race work?
> > The UAF problem occurs in Task2. Task1 sets the refcount of nr_neigh to 1,
> > then Task0 adds it to routes[2]. Task2 releases routes[2].neighbour after
> > executing [XXX]nr_neigh_put().
> Execution Order:
> 1 -> Task0
> [113] nr_neigh_get_dev() // After execution, the refcount value is 3
> 
> 2 -> Task1
> [246] nr_neigh_put(nr_node->routes[2].neighbour);   // After execution, the refcount value is 2
> [248] nr_remove_neigh(nr_node->routes[2].neighbour) // After execution, the refcount value is 1
> 
> 3 -> Task0
> [253] nr_node->routes[2].neighbour = nr_neigh       // nr_neigh's refcount value is 1 and add it to routes[2]
> 
> 4 -> Task2
> [XXX] nr_neigh_put(nr_node->routes[2].neighbour)    // After execution, neighhour is freed
> if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)  // Uaf occurs this line when accessing neighbour->count

Let's step back a bit and look at the bigger picture design.  (Which is
completely undocumented so we're just guessing).

When we put nr_neigh into nr_node->routes[] we bump the nr_neigh_hold()
reference count and nr_neigh->count++, then when we remove it from
->routes[] we drop the reference and do nr_neigh->count--.

If it's the last reference (and we are not holding ->locked) then we
remove it from the &nr_neigh_list and drop the reference count again and
free it.  So we drop the reference count twice.  This is a complicated
design with three variables: nr_neigh_hold(), nr_neigh->count and
->locked.  Why can it not just be one counter nr_neigh_hold().  So
instead of setting locked = true we would just take an extra reference?
The nr_neigh->count++ would be replaced with nr_neigh_hold() as well.

Because that's fundamentally the problem, right?  We call
nr_neigh_get_dev() so we think we're holding a reference and we're
safe, but we don't realize that calling neighbour->count-- can
result in dropping two references.

regards,
dan carpenter
Re: [PATCH V2] netrom: Prevent race conditions between multiple add route
Posted by Lizhi Xu 3 months, 2 weeks ago
On Mon, 20 Oct 2025 20:59:24 +0300, Dan Carpenter wrote:
> On Mon, Oct 20, 2025 at 09:49:12PM +0800, Lizhi Xu wrote:
> > On Mon, 20 Oct 2025 21:34:56 +0800, Lizhi Xu wrote:
> > > > Task0					Task1						Task2
> > > > =====					=====						=====
> > > > [97] nr_add_node()
> > > > [113] nr_neigh_get_dev()		[97] nr_add_node()
> > > > 					[214] nr_node_lock()
> > > > 					[245] nr_node->routes[2].neighbour->count--
> > > > 					[246] nr_neigh_put(nr_node->routes[2].neighbour);
> > > > 					[248] nr_remove_neigh(nr_node->routes[2].neighbour)
> > > > 					[283] nr_node_unlock()
> > > > [214] nr_node_lock()
> > > > [253] nr_node->routes[2].neighbour = nr_neigh
> > > > [254] nr_neigh_hold(nr_neigh);							[97] nr_add_node()
> > > > 											[XXX] nr_neigh_put()
> > > >                                                                                         ^^^^^^^^^^^^^^^^^^^^
> > > >
> > > > These charts are supposed to be chronological so [XXX] is wrong because the
> > > > use after free happens on line [248].  Do we really need three threads to
> > > > make this race work?
> > > The UAF problem occurs in Task2. Task1 sets the refcount of nr_neigh to 1,
> > > then Task0 adds it to routes[2]. Task2 releases routes[2].neighbour after
> > > executing [XXX]nr_neigh_put().
> > Execution Order:
> > 1 -> Task0
> > [113] nr_neigh_get_dev() // After execution, the refcount value is 3
> >
> > 2 -> Task1
> > [246] nr_neigh_put(nr_node->routes[2].neighbour);   // After execution, the refcount value is 2
> > [248] nr_remove_neigh(nr_node->routes[2].neighbour) // After execution, the refcount value is 1
> >
> > 3 -> Task0
> > [253] nr_node->routes[2].neighbour = nr_neigh       // nr_neigh's refcount value is 1 and add it to routes[2]
> >
> > 4 -> Task2
> > [XXX] nr_neigh_put(nr_node->routes[2].neighbour)    // After execution, neighhour is freed
> > if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)  // Uaf occurs this line when accessing neighbour->count
> 
> Let's step back a bit and look at the bigger picture design.  (Which is
> completely undocumented so we're just guessing).
> 
> When we put nr_neigh into nr_node->routes[] we bump the nr_neigh_hold()
> reference count and nr_neigh->count++, then when we remove it from
> ->routes[] we drop the reference and do nr_neigh->count--.
> 
> If it's the last reference (and we are not holding ->locked) then we
> remove it from the &nr_neigh_list and drop the reference count again and
> free it.  So we drop the reference count twice.  This is a complicated
> design with three variables: nr_neigh_hold(), nr_neigh->count and
> ->locked.  Why can it not just be one counter nr_neigh_hold().  So
> instead of setting locked = true we would just take an extra reference?
> The nr_neigh->count++ would be replaced with nr_neigh_hold() as well.
locked controls whether the neighbor quality can be automatically updated;
count controls the number of different routes a neighbor is linked to;
refcount is simply used to manage the neighbor lifecycle.
> 
> Because that's fundamentally the problem, right?  We call
> nr_neigh_get_dev() so we think we're holding a reference and we're
> safe, but we don't realize that calling neighbour->count-- can
> result in dropping two references.
After nr_neigh_get_dev() retrieves a neighbor, there shouldn't be an
unfinished nr_add_node() call operating on the neighbor in the route.
Therefore, we need to use a lock before the nr_neigh_get_dev() operation
begins to ensure that the neighbor is added atomically to the routing table.

BR,
Lizhi
Re: [PATCH V2] netrom: Prevent race conditions between multiple add route
Posted by Dan Carpenter 3 months, 2 weeks ago
On Tue, Oct 21, 2025 at 10:05:33AM +0800, Lizhi Xu wrote:
> On Mon, 20 Oct 2025 20:59:24 +0300, Dan Carpenter wrote:
> > On Mon, Oct 20, 2025 at 09:49:12PM +0800, Lizhi Xu wrote:
> > > On Mon, 20 Oct 2025 21:34:56 +0800, Lizhi Xu wrote:
> > > > > Task0					Task1						Task2
> > > > > =====					=====						=====
> > > > > [97] nr_add_node()
> > > > > [113] nr_neigh_get_dev()		[97] nr_add_node()
> > > > > 					[214] nr_node_lock()
> > > > > 					[245] nr_node->routes[2].neighbour->count--
> > > > > 					[246] nr_neigh_put(nr_node->routes[2].neighbour);
> > > > > 					[248] nr_remove_neigh(nr_node->routes[2].neighbour)
> > > > > 					[283] nr_node_unlock()
> > > > > [214] nr_node_lock()
> > > > > [253] nr_node->routes[2].neighbour = nr_neigh
> > > > > [254] nr_neigh_hold(nr_neigh);							[97] nr_add_node()
> > > > > 											[XXX] nr_neigh_put()
> > > > >                                                                                         ^^^^^^^^^^^^^^^^^^^^
> > > > >
> > > > > These charts are supposed to be chronological so [XXX] is wrong because the
> > > > > use after free happens on line [248].  Do we really need three threads to
> > > > > make this race work?
> > > > The UAF problem occurs in Task2. Task1 sets the refcount of nr_neigh to 1,
> > > > then Task0 adds it to routes[2]. Task2 releases routes[2].neighbour after
> > > > executing [XXX]nr_neigh_put().
> > > Execution Order:
> > > 1 -> Task0
> > > [113] nr_neigh_get_dev() // After execution, the refcount value is 3
> > >
> > > 2 -> Task1
> > > [246] nr_neigh_put(nr_node->routes[2].neighbour);   // After execution, the refcount value is 2
> > > [248] nr_remove_neigh(nr_node->routes[2].neighbour) // After execution, the refcount value is 1
> > >
> > > 3 -> Task0
> > > [253] nr_node->routes[2].neighbour = nr_neigh       // nr_neigh's refcount value is 1 and add it to routes[2]
> > >
> > > 4 -> Task2
> > > [XXX] nr_neigh_put(nr_node->routes[2].neighbour)    // After execution, neighhour is freed
> > > if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)  // Uaf occurs this line when accessing neighbour->count
> > 
> > Let's step back a bit and look at the bigger picture design.  (Which is
> > completely undocumented so we're just guessing).
> > 
> > When we put nr_neigh into nr_node->routes[] we bump the nr_neigh_hold()
> > reference count and nr_neigh->count++, then when we remove it from
> > ->routes[] we drop the reference and do nr_neigh->count--.
> > 
> > If it's the last reference (and we are not holding ->locked) then we
> > remove it from the &nr_neigh_list and drop the reference count again and
> > free it.  So we drop the reference count twice.  This is a complicated
> > design with three variables: nr_neigh_hold(), nr_neigh->count and
> > ->locked.  Why can it not just be one counter nr_neigh_hold().  So
> > instead of setting locked = true we would just take an extra reference?
> > The nr_neigh->count++ would be replaced with nr_neigh_hold() as well.
> locked controls whether the neighbor quality can be automatically updated;

I'm not sure your patch fixes the bug because we could still race against
nr_del_node().

I'm not saying get rid of locked completely, I'm saying get rid of code like
this:
		if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)
			nr_remove_neigh(nr_node->routes[2].neighbour);

Right now, locked serves as a special kind of reference count, because we
don't drop the reference if it's true.

> count controls the number of different routes a neighbor is linked to;

Sure, that is interesting information for the user, so keep it around to
print in the proc file, but don't use it as a reference count.

> refcount is simply used to manage the neighbor lifecycle.

The bug is caused because our reference counting is bad.

So right now what happens is we allocate nr_neigh and we put it on the
&nr_neigh_list.  Then we lock it or we add it to ->routes[] and each of
those has a different reference count.  Then when we drop those references
we do:

		if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)
			nr_remove_neigh(nr_node->routes[2].neighbour);

This removes it from the list, and hopefully this is the last reference
and it frees it.

It would be much simpler to say, we only use nr_neigh_hold()/put() for
reference counting.  When we set locked we do:

	nr_neigh_hold(nr_neigh);
	nr_neigh->locked  = true;

Incrementing the refcount means it can't be freed.

Then when we remove nr_neigh from ->routes[] we wouldn't "remove it from
the list", instead we would just drop a reference.  When we dropped the
last reference, nr_neigh_put() would remove it from the list.

My proposal would be a behavior change because right now what happens is:

1: allocate nr_neigh
2: add it to ->routes[]
3: remove it from ->routes[]
   (freed automatically because we drop two references)

Now it would be:
1: allocate nr_neigh
2: add it to ->routes[]
3: remove it from ->routes[]
4: needs to be freed manually with nr_del_neigh().

regards,
dan carpenter
Re: [PATCH V2] netrom: Prevent race conditions between multiple add route
Posted by Lizhi Xu 3 months, 2 weeks ago
On Tue, 21 Oct 2025 09:36:47 +0300, Dan Carpenter wrote:
> On Tue, Oct 21, 2025 at 10:05:33AM +0800, Lizhi Xu wrote:
> > On Mon, 20 Oct 2025 20:59:24 +0300, Dan Carpenter wrote:
> > > On Mon, Oct 20, 2025 at 09:49:12PM +0800, Lizhi Xu wrote:
> > > > On Mon, 20 Oct 2025 21:34:56 +0800, Lizhi Xu wrote:
> > > > > > Task0					Task1						Task2
> > > > > > =====					=====						=====
> > > > > > [97] nr_add_node()
> > > > > > [113] nr_neigh_get_dev()		[97] nr_add_node()
> > > > > > 					[214] nr_node_lock()
> > > > > > 					[245] nr_node->routes[2].neighbour->count--
> > > > > > 					[246] nr_neigh_put(nr_node->routes[2].neighbour);
> > > > > > 					[248] nr_remove_neigh(nr_node->routes[2].neighbour)
> > > > > > 					[283] nr_node_unlock()
> > > > > > [214] nr_node_lock()
> > > > > > [253] nr_node->routes[2].neighbour = nr_neigh
> > > > > > [254] nr_neigh_hold(nr_neigh);							[97] nr_add_node()
> > > > > > 											[XXX] nr_neigh_put()
> > > > > >                                                                                         ^^^^^^^^^^^^^^^^^^^^
> > > > > >
> > > > > > These charts are supposed to be chronological so [XXX] is wrong because the
> > > > > > use after free happens on line [248].  Do we really need three threads to
> > > > > > make this race work?
> > > > > The UAF problem occurs in Task2. Task1 sets the refcount of nr_neigh to 1,
> > > > > then Task0 adds it to routes[2]. Task2 releases routes[2].neighbour after
> > > > > executing [XXX]nr_neigh_put().
> > > > Execution Order:
> > > > 1 -> Task0
> > > > [113] nr_neigh_get_dev() // After execution, the refcount value is 3
> > > >
> > > > 2 -> Task1
> > > > [246] nr_neigh_put(nr_node->routes[2].neighbour);   // After execution, the refcount value is 2
> > > > [248] nr_remove_neigh(nr_node->routes[2].neighbour) // After execution, the refcount value is 1
> > > >
> > > > 3 -> Task0
> > > > [253] nr_node->routes[2].neighbour = nr_neigh       // nr_neigh's refcount value is 1 and add it to routes[2]
> > > >
> > > > 4 -> Task2
> > > > [XXX] nr_neigh_put(nr_node->routes[2].neighbour)    // After execution, neighhour is freed
> > > > if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)  // Uaf occurs this line when accessing neighbour->count
> > >
> > > Let's step back a bit and look at the bigger picture design.  (Which is
> > > completely undocumented so we're just guessing).
> > >
> > > When we put nr_neigh into nr_node->routes[] we bump the nr_neigh_hold()
> > > reference count and nr_neigh->count++, then when we remove it from
> > > ->routes[] we drop the reference and do nr_neigh->count--.
> > >
> > > If it's the last reference (and we are not holding ->locked) then we
> > > remove it from the &nr_neigh_list and drop the reference count again and
> > > free it.  So we drop the reference count twice.  This is a complicated
> > > design with three variables: nr_neigh_hold(), nr_neigh->count and
> > > ->locked.  Why can it not just be one counter nr_neigh_hold().  So
> > > instead of setting locked = true we would just take an extra reference?
> > > The nr_neigh->count++ would be replaced with nr_neigh_hold() as well.
> > locked controls whether the neighbor quality can be automatically updated;
>
> I'm not sure your patch fixes the bug because we could still race against
> nr_del_node().
This is fine in this issue, but for rigor, I'll add locks to all related
ioctl and route frame operations to maintain synchronization.
I will send V3 patch to improve it.
> 
> I'm not saying get rid of locked completely, I'm saying get rid of code like
> this:
> 		if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)
> 			nr_remove_neigh(nr_node->routes[2].neighbour);
> 
> Right now, locked serves as a special kind of reference count, because we
> don't drop the reference if it's true.
I don't think this is correct.
> 
> > count controls the number of different routes a neighbor is linked to;
> 
> Sure, that is interesting information for the user, so keep it around to
> print in the proc file, but don't use it as a reference count.
> 
> > refcount is simply used to manage the neighbor lifecycle.
> 
> The bug is caused because our reference counting is bad.
> 
> So right now what happens is we allocate nr_neigh and we put it on the
> &nr_neigh_list.  Then we lock it or we add it to ->routes[] and each of
> those has a different reference count.  Then when we drop those references
> we do:
> 
> 		if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)
> 			nr_remove_neigh(nr_node->routes[2].neighbour);
> 
> This removes it from the list, and hopefully this is the last reference
> and it frees it.
> 
> It would be much simpler to say, we only use nr_neigh_hold()/put() for
> reference counting.  When we set locked we do:
> 
> 	nr_neigh_hold(nr_neigh);
> 	nr_neigh->locked  = true;
> 
> Incrementing the refcount means it can't be freed.
No, setting locked = 1 is only done in nr_add_neigh(), and nr_neigh_hold()
is not executed, and the refcount value is 1.
> 
> Then when we remove nr_neigh from ->routes[] we wouldn't "remove it from
> the list", instead we would just drop a reference.  When we dropped the
> last reference, nr_neigh_put() would remove it from the list.
> 
> My proposal would be a behavior change because right now what happens is:
> 
> 1: allocate nr_neigh
> 2: add it to ->routes[]
> 3: remove it from ->routes[]
>    (freed automatically because we drop two references)
No, No, I know where your analysis went wrong, it is here.

The problem is not when allocating neigh and adding it to routes[2],
but when nr_add_node is executed twice later, one is Task0 as I mentioned
above, and the other is Task1.
After Task1 moves the neighbor out of routes, Task0 uses nr_neigh_get_dev()
to get the neighbor that Task1 moved out, and then adds it to its routes.
This is wrong. Task0 should not use nr_neigh_get_dev() to obtain the neighbor
before other tasks move it out. This will interfere with the reference count
of the neighbor, which is the root cause of the problem.

BR,
Lizhi
[PATCH V3] netrom: Prevent race conditions between neighbor operations
Posted by Lizhi Xu 3 months, 2 weeks ago
The root cause of the problem is that multiple different tasks initiate
SIOCADDRT & NETROM_NODE commands to add new routes, there is no lock
between them to protect the same nr_neigh.

Task0 can add the nr_neigh.refcount value of 1 on Task1 to routes[2].
When Task2 executes nr_neigh_put(nr_node->routes[2].neighbour), it will
release the neighbour because its refcount value is 1.

In this case, the following situation causes a UAF on Task2:

Task0					Task1						Task2
=====					=====						=====
nr_add_node()
nr_neigh_get_dev()			nr_add_node()
					nr_node_lock()
					nr_node->routes[2].neighbour->count--
					nr_neigh_put(nr_node->routes[2].neighbour);
					nr_remove_neigh(nr_node->routes[2].neighbour)
					nr_node_unlock()
nr_node_lock()
nr_node->routes[2].neighbour = nr_neigh
nr_neigh_hold(nr_neigh);								nr_add_node()
											nr_neigh_put()
											if (nr_node->routes[2].neighbour->count
Description of the UAF triggering process:
First, Task 0 executes nr_neigh_get_dev() to set neighbor refcount to 3.
Then, Task 1 puts the same neighbor from its routes[2] and executes
nr_remove_neigh() because the count is 0. After these two operations,
the neighbor's refcount becomes 1. Then, Task 0 acquires the nr node
lock and writes it to its routes[2].neighbour.
Finally, Task 2 executes nr_neigh_put(nr_node->routes[2].neighbour) to
release the neighbor. The subsequent execution of the neighbor->count
check triggers a UAF.

The solution to the problem is to use a lock to synchronize each add a
route to node, but for rigor, I'll add locks to related ioctl and route
frame operations to maintain synchronization.

syzbot reported:
BUG: KASAN: slab-use-after-free in nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
Read of size 4 at addr ffff888051e6e9b0 by task syz.1.2539/8741

Call Trace:
 <TASK>
 nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248

Reported-by: syzbot+2860e75836a08b172755@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2860e75836a08b172755
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
---
V1 -> V2: update comments for cause uaf
V2 -> V3: sync neighbor operations in ioctl and route frame, update comments

 net/netrom/nr_route.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
index b94cb2ffbaf8..debe3e925338 100644
--- a/net/netrom/nr_route.c
+++ b/net/netrom/nr_route.c
@@ -40,6 +40,7 @@ static HLIST_HEAD(nr_node_list);
 static DEFINE_SPINLOCK(nr_node_list_lock);
 static HLIST_HEAD(nr_neigh_list);
 static DEFINE_SPINLOCK(nr_neigh_list_lock);
+static DEFINE_MUTEX(neighbor_lock);
 
 static struct nr_node *nr_node_get(ax25_address *callsign)
 {
@@ -633,6 +634,8 @@ int nr_rt_ioctl(unsigned int cmd, void __user *arg)
 	ax25_digi digi;
 	int ret;
 
+	guard(mutex)(&neighbor_lock);
+
 	switch (cmd) {
 	case SIOCADDRT:
 		if (copy_from_user(&nr_route, arg, sizeof(struct nr_route_struct)))
@@ -765,6 +768,7 @@ int nr_route_frame(struct sk_buff *skb, ax25_cb *ax25)
 	nr_dest = (ax25_address *)(skb->data + 7);
 
 	if (ax25 != NULL) {
+		guard(mutex)(&neighbor_lock);
 		ret = nr_add_node(nr_src, "", &ax25->dest_addr, ax25->digipeat,
 				  ax25->ax25_dev->dev, 0,
 				  READ_ONCE(sysctl_netrom_obsolescence_count_initialiser));
-- 
2.43.0
Re: [PATCH V3] netrom: Prevent race conditions between neighbor operations
Posted by Paolo Abeni 3 months, 2 weeks ago
On 10/21/25 10:35 AM, Lizhi Xu wrote:
> The root cause of the problem is that multiple different tasks initiate
> SIOCADDRT & NETROM_NODE commands to add new routes, there is no lock
> between them to protect the same nr_neigh.
> 
> Task0 can add the nr_neigh.refcount value of 1 on Task1 to routes[2].
> When Task2 executes nr_neigh_put(nr_node->routes[2].neighbour), it will
> release the neighbour because its refcount value is 1.
> 
> In this case, the following situation causes a UAF on Task2:
> 
> Task0					Task1						Task2
> =====					=====						=====
> nr_add_node()
> nr_neigh_get_dev()			nr_add_node()
> 					nr_node_lock()
> 					nr_node->routes[2].neighbour->count--
> 					nr_neigh_put(nr_node->routes[2].neighbour);
> 					nr_remove_neigh(nr_node->routes[2].neighbour)
> 					nr_node_unlock()
> nr_node_lock()
> nr_node->routes[2].neighbour = nr_neigh
> nr_neigh_hold(nr_neigh);								nr_add_node()
> 											nr_neigh_put()
> 											if (nr_node->routes[2].neighbour->count
> Description of the UAF triggering process:
> First, Task 0 executes nr_neigh_get_dev() to set neighbor refcount to 3.
> Then, Task 1 puts the same neighbor from its routes[2] and executes
> nr_remove_neigh() because the count is 0. After these two operations,
> the neighbor's refcount becomes 1. Then, Task 0 acquires the nr node
> lock and writes it to its routes[2].neighbour.
> Finally, Task 2 executes nr_neigh_put(nr_node->routes[2].neighbour) to
> release the neighbor. The subsequent execution of the neighbor->count
> check triggers a UAF.
> 
> The solution to the problem is to use a lock to synchronize each add a
> route to node, but for rigor, I'll add locks to related ioctl and route
> frame operations to maintain synchronization.

I think that adding another locking mechanism on top of an already
complex and not well understood locking and reference infra is not the
right direction.

Why reordering the statements as:

	if (nr_node->routes[2].neighbour->count == 0 &&
!nr_node->routes[2].neighbour->locked)
		nr_remove_neigh(nr_node->routes[2].neighbour);
	nr_neigh_put(nr_node->routes[2].neighbour);

is not enough?

> syzbot reported:
> BUG: KASAN: slab-use-after-free in nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
> Read of size 4 at addr ffff888051e6e9b0 by task syz.1.2539/8741
> 
> Call Trace:
>  <TASK>
>  nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
> 
> Reported-by: syzbot+2860e75836a08b172755@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=2860e75836a08b172755
> Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>



> ---
> V1 -> V2: update comments for cause uaf
> V2 -> V3: sync neighbor operations in ioctl and route frame, update comments
> 
>  net/netrom/nr_route.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
> index b94cb2ffbaf8..debe3e925338 100644
> --- a/net/netrom/nr_route.c
> +++ b/net/netrom/nr_route.c
> @@ -40,6 +40,7 @@ static HLIST_HEAD(nr_node_list);
>  static DEFINE_SPINLOCK(nr_node_list_lock);
>  static HLIST_HEAD(nr_neigh_list);
>  static DEFINE_SPINLOCK(nr_neigh_list_lock);
> +static DEFINE_MUTEX(neighbor_lock);
>  
>  static struct nr_node *nr_node_get(ax25_address *callsign)
>  {
> @@ -633,6 +634,8 @@ int nr_rt_ioctl(unsigned int cmd, void __user *arg)
>  	ax25_digi digi;
>  	int ret;
>  
> +	guard(mutex)(&neighbor_lock);

See:

https://elixir.bootlin.com/linux/v6.18-rc1/source/Documentation/process/maintainer-netdev.rst#L395

/P
Re: [PATCH V3] netrom: Prevent race conditions between neighbor operations
Posted by Dan Carpenter 3 months, 2 weeks ago
On Thu, Oct 23, 2025 at 01:44:18PM +0200, Paolo Abeni wrote:
> Why reordering the statements as:
> 
> 	if (nr_node->routes[2].neighbour->count == 0 &&
> !nr_node->routes[2].neighbour->locked)
> 		nr_remove_neigh(nr_node->routes[2].neighbour);
> 	nr_neigh_put(nr_node->routes[2].neighbour);
> 
> is not enough?

There are so many unfortunate things like this:

net/netrom/nr_route.c
   243                          /* It must be better than the worst */
   244                          if (quality > nr_node->routes[2].quality) {
   245                                  nr_node->routes[2].neighbour->count--;

++/-- are not atomic.

   246                                  nr_neigh_put(nr_node->routes[2].neighbour);
   247  
   248                                  if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)
   249                                          nr_remove_neigh(nr_node->routes[2].neighbour);
   250  
   251                                  nr_node->routes[2].quality   = quality;
   252                                  nr_node->routes[2].obs_count = obs_count;
   253                                  nr_node->routes[2].neighbour = nr_neigh;

This line should come after the next two lines.

   254  
   255                                  nr_neigh_hold(nr_neigh);
   256                                  nr_neigh->count++;
   257                          }

regards,
dan carpenter
Re: [PATCH V3] netrom: Prevent race conditions between neighbor operations
Posted by Lizhi Xu 3 months, 2 weeks ago
On Thu, 23 Oct 2025 13:44:18 +0200, Paolo Abeni wrote:
> > The root cause of the problem is that multiple different tasks initiate
> > SIOCADDRT & NETROM_NODE commands to add new routes, there is no lock
> > between them to protect the same nr_neigh.
> >
> > Task0 can add the nr_neigh.refcount value of 1 on Task1 to routes[2].
> > When Task2 executes nr_neigh_put(nr_node->routes[2].neighbour), it will
> > release the neighbour because its refcount value is 1.
> >
> > In this case, the following situation causes a UAF on Task2:
> >
> > Task0					Task1						Task2
> > =====					=====						=====
> > nr_add_node()
> > nr_neigh_get_dev()			nr_add_node()
> > 					nr_node_lock()
> > 					nr_node->routes[2].neighbour->count--
> > 					nr_neigh_put(nr_node->routes[2].neighbour);
> > 					nr_remove_neigh(nr_node->routes[2].neighbour)
> > 					nr_node_unlock()
> > nr_node_lock()
> > nr_node->routes[2].neighbour = nr_neigh
> > nr_neigh_hold(nr_neigh);								nr_add_node()
> > 											nr_neigh_put()
> > 											if (nr_node->routes[2].neighbour->count
> > Description of the UAF triggering process:
> > First, Task 0 executes nr_neigh_get_dev() to set neighbor refcount to 3.
> > Then, Task 1 puts the same neighbor from its routes[2] and executes
> > nr_remove_neigh() because the count is 0. After these two operations,
> > the neighbor's refcount becomes 1. Then, Task 0 acquires the nr node
> > lock and writes it to its routes[2].neighbour.
> > Finally, Task 2 executes nr_neigh_put(nr_node->routes[2].neighbour) to
> > release the neighbor. The subsequent execution of the neighbor->count
> > check triggers a UAF.
> >
> > The solution to the problem is to use a lock to synchronize each add a
> > route to node, but for rigor, I'll add locks to related ioctl and route
> > frame operations to maintain synchronization.
> 
> I think that adding another locking mechanism on top of an already
> complex and not well understood locking and reference infra is not the
> right direction.
> 
> Why reordering the statements as:
> 
> 	if (nr_node->routes[2].neighbour->count == 0 &&
> !nr_node->routes[2].neighbour->locked)
> 		nr_remove_neigh(nr_node->routes[2].neighbour);
> 	nr_neigh_put(nr_node->routes[2].neighbour);
> 
> is not enough?
This is not enough, the same uaf will appear, nr_remove_neigh() will also
execute nr_neigh_put(), and then executing nr_neigh_put() again will
trigger the uaf.
> 
> > syzbot reported:
> > BUG: KASAN: slab-use-after-free in nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
> > Read of size 4 at addr ffff888051e6e9b0 by task syz.1.2539/8741
> >
> > Call Trace:
> >  <TASK>
> >  nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
> >
> > Reported-by: syzbot+2860e75836a08b172755@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=2860e75836a08b172755
> > Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
> 
> 
> 
> > ---
> > V1 -> V2: update comments for cause uaf
> > V2 -> V3: sync neighbor operations in ioctl and route frame, update comments
> >
> >  net/netrom/nr_route.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
> > index b94cb2ffbaf8..debe3e925338 100644
> > --- a/net/netrom/nr_route.c
> > +++ b/net/netrom/nr_route.c
> > @@ -40,6 +40,7 @@ static HLIST_HEAD(nr_node_list);
> >  static DEFINE_SPINLOCK(nr_node_list_lock);
> >  static HLIST_HEAD(nr_neigh_list);
> >  static DEFINE_SPINLOCK(nr_neigh_list_lock);
> > +static DEFINE_MUTEX(neighbor_lock);
> >
> >  static struct nr_node *nr_node_get(ax25_address *callsign)
> >  {
> > @@ -633,6 +634,8 @@ int nr_rt_ioctl(unsigned int cmd, void __user *arg)
> >  	ax25_digi digi;
> >  	int ret;
> >
> > +	guard(mutex)(&neighbor_lock);
> 
> See:
> 
> https://elixir.bootlin.com/linux/v6.18-rc1/source/Documentation/process/maintainer-netdev.rst#L395
Using guard is not recommended. I'll reconsider.

BR,
Lizhi
[PATCH V4] netrom: Preventing the use of abnormal neighbor
Posted by Lizhi Xu 3 months, 2 weeks ago
The root cause of the problem is that multiple different tasks initiate
SIOCADDRT & NETROM_NODE commands to add new routes, there is no lock
between them to protect the same nr_neigh.

Task0 can add the nr_neigh.refcount value of 1 on Task1 to routes[2].
When Task2 executes nr_neigh_put(nr_node->routes[2].neighbour), it will
release the neighbour because its refcount value is 1.

In this case, the following situation causes a UAF on Task2:

Task0					Task1						Task2
=====					=====						=====
nr_add_node()
nr_neigh_get_dev()			nr_add_node()
					nr_node_lock()
					nr_node->routes[2].neighbour->count--
					nr_neigh_put(nr_node->routes[2].neighbour);
					nr_remove_neigh(nr_node->routes[2].neighbour)
					nr_node_unlock()
nr_node_lock()
nr_node->routes[2].neighbour = nr_neigh
nr_neigh_hold(nr_neigh);								nr_add_node()
											nr_neigh_put()
											if (nr_node->routes[2].neighbour->count
Description of the UAF triggering process:
First, Task 0 executes nr_neigh_get_dev() to set neighbor refcount to 3.
Then, Task 1 puts the same neighbor from its routes[2] and executes
nr_remove_neigh() because the count is 0. After these two operations,
the neighbor's refcount becomes 1. Then, Task 0 acquires the nr node
lock and writes it to its routes[2].neighbour.
Finally, Task 2 executes nr_neigh_put(nr_node->routes[2].neighbour) to
release the neighbor. The subsequent execution of the neighbor->count
check triggers a UAF.

Filter out neighbors with a refcount of 1 to avoid unsafe conditions.

syzbot reported:
BUG: KASAN: slab-use-after-free in nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
Read of size 4 at addr ffff888051e6e9b0 by task syz.1.2539/8741

Call Trace:
 <TASK>
 nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248

Reported-by: syzbot+2860e75836a08b172755@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2860e75836a08b172755
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
---
V1 -> V2: update comments for cause uaf
V2 -> V3: sync neighbor operations in ioctl and route frame, update comments
V3 -> V4: Preventing the use of neighbors with a reference count of 1

 net/netrom/nr_route.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
index b94cb2ffbaf8..1ef2743a5ec0 100644
--- a/net/netrom/nr_route.c
+++ b/net/netrom/nr_route.c
@@ -100,7 +100,7 @@ static int __must_check nr_add_node(ax25_address *nr, const char *mnemonic,
 {
 	struct nr_node  *nr_node;
 	struct nr_neigh *nr_neigh;
-	int i, found;
+	int i, found, ret = 0;
 	struct net_device *odev;
 
 	if ((odev=nr_dev_get(nr)) != NULL) {	/* Can't add routes to ourself */
@@ -212,6 +212,10 @@ static int __must_check nr_add_node(ax25_address *nr, const char *mnemonic,
 		return 0;
 	}
 	nr_node_lock(nr_node);
+	if (refcount_read(&nr_neigh->refcount) == 1) {
+		ret = -EINVAL;
+		goto out;
+	}
 
 	if (quality != 0)
 		strscpy(nr_node->mnemonic, mnemonic);
@@ -279,10 +283,11 @@ static int __must_check nr_add_node(ax25_address *nr, const char *mnemonic,
 		}
 	}
 
+out:
 	nr_neigh_put(nr_neigh);
 	nr_node_unlock(nr_node);
 	nr_node_put(nr_node);
-	return 0;
+	return ret;
 }
 
 static void nr_remove_node_locked(struct nr_node *nr_node)
-- 
2.43.0
Re: [PATCH V4] netrom: Preventing the use of abnormal neighbor
Posted by Paolo Abeni 3 months, 1 week ago
On 10/23/25 3:50 PM, Lizhi Xu wrote:
> The root cause of the problem is that multiple different tasks initiate
> SIOCADDRT & NETROM_NODE commands to add new routes, there is no lock
> between them to protect the same nr_neigh.
> 
> Task0 can add the nr_neigh.refcount value of 1 on Task1 to routes[2].
> When Task2 executes nr_neigh_put(nr_node->routes[2].neighbour), it will
> release the neighbour because its refcount value is 1.
> 
> In this case, the following situation causes a UAF on Task2:
> 
> Task0					Task1						Task2
> =====					=====						=====
> nr_add_node()
> nr_neigh_get_dev()			nr_add_node()
> 					nr_node_lock()
> 					nr_node->routes[2].neighbour->count--
> 					nr_neigh_put(nr_node->routes[2].neighbour);
> 					nr_remove_neigh(nr_node->routes[2].neighbour)
> 					nr_node_unlock()
> nr_node_lock()
> nr_node->routes[2].neighbour = nr_neigh
> nr_neigh_hold(nr_neigh);								nr_add_node()
> 											nr_neigh_put()
> 											if (nr_node->routes[2].neighbour->count
> Description of the UAF triggering process:
> First, Task 0 executes nr_neigh_get_dev() to set neighbor refcount to 3.
> Then, Task 1 puts the same neighbor from its routes[2] and executes
> nr_remove_neigh() because the count is 0. After these two operations,
> the neighbor's refcount becomes 1. Then, Task 0 acquires the nr node
> lock and writes it to its routes[2].neighbour.
> Finally, Task 2 executes nr_neigh_put(nr_node->routes[2].neighbour) to
> release the neighbor. The subsequent execution of the neighbor->count
> check triggers a UAF.

I looked at the code quite a bit and I think this could possibly avoid
the above mentioned race, but this whole area looks quite confusing to me.

I think it would be helpful if you could better describe the relevant
scenario starting from the initial setup (no nodes, no neighs).

Thanks,

Paolo
Re: [PATCH V4] netrom: Preventing the use of abnormal neighbor
Posted by Lizhi Xu 3 months, 1 week ago
On Tue, 28 Oct 2025 15:13:37 +0100, Paolo Abeni wrote:
> > The root cause of the problem is that multiple different tasks initiate
> > SIOCADDRT & NETROM_NODE commands to add new routes, there is no lock
> > between them to protect the same nr_neigh.
> >
> > Task0 can add the nr_neigh.refcount value of 1 on Task1 to routes[2].
> > When Task2 executes nr_neigh_put(nr_node->routes[2].neighbour), it will
> > release the neighbour because its refcount value is 1.
> >
> > In this case, the following situation causes a UAF on Task2:
> >
> > Task0					Task1						Task2
> > =====					=====						=====
> > nr_add_node()
> > nr_neigh_get_dev()			nr_add_node()
> > 					nr_node_lock()
> > 					nr_node->routes[2].neighbour->count--
> > 					nr_neigh_put(nr_node->routes[2].neighbour);
> > 					nr_remove_neigh(nr_node->routes[2].neighbour)
> > 					nr_node_unlock()
> > nr_node_lock()
> > nr_node->routes[2].neighbour = nr_neigh
> > nr_neigh_hold(nr_neigh);								nr_add_node()
> > 											nr_neigh_put()
> > 											if (nr_node->routes[2].neighbour->count
> > Description of the UAF triggering process:
> > First, Task 0 executes nr_neigh_get_dev() to set neighbor refcount to 3.
> > Then, Task 1 puts the same neighbor from its routes[2] and executes
> > nr_remove_neigh() because the count is 0. After these two operations,
> > the neighbor's refcount becomes 1. Then, Task 0 acquires the nr node
> > lock and writes it to its routes[2].neighbour.
> > Finally, Task 2 executes nr_neigh_put(nr_node->routes[2].neighbour) to
> > release the neighbor. The subsequent execution of the neighbor->count
> > check triggers a UAF.
> 
> I looked at the code quite a bit and I think this could possibly avoid
> the above mentioned race, but this whole area looks quite confusing to me.
> 
> I think it would be helpful if you could better describe the relevant
> scenario starting from the initial setup (no nodes, no neighs).
OK. Let me fill in the origin of neigh.

Task3
=====
nr_add_node()
[146]if ((nr_neigh = kmalloc(sizeof(*nr_neigh), GFP_ATOMIC)) == NULL)
[253]nr_node->routes[2].neighbour = nr_neigh;
[255]nr_neigh_hold(nr_neigh);
[256]nr_neigh->count++;

neigh is created on line 146 in nr_add_node(), and added to node on
lines 253-256. It occurs before all Task0, Task1, and Task2.

Note:
1. [x], x is line number.
2. During my debugging process, I didn't pay attention to where the node
was created, and I apologize that I cannot provide the relevant creation
process.

BR,
Lizhi
Re: [PATCH V4] netrom: Preventing the use of abnormal neighbor
Posted by Lizhi Xu 2 months, 3 weeks ago
On Wed, 29 Oct 2025 10:59:04 +0800, Lizhi Xu wrote:
> > > The root cause of the problem is that multiple different tasks initiate
> > > SIOCADDRT & NETROM_NODE commands to add new routes, there is no lock
> > > between them to protect the same nr_neigh.
> > >
> > > Task0 can add the nr_neigh.refcount value of 1 on Task1 to routes[2].
> > > When Task2 executes nr_neigh_put(nr_node->routes[2].neighbour), it will
> > > release the neighbour because its refcount value is 1.
> > >
> > > In this case, the following situation causes a UAF on Task2:
> > >
> > > Task0					Task1						Task2
> > > =====					=====						=====
> > > nr_add_node()
> > > nr_neigh_get_dev()			nr_add_node()
> > > 					nr_node_lock()
> > > 					nr_node->routes[2].neighbour->count--
> > > 					nr_neigh_put(nr_node->routes[2].neighbour);
> > > 					nr_remove_neigh(nr_node->routes[2].neighbour)
> > > 					nr_node_unlock()
> > > nr_node_lock()
> > > nr_node->routes[2].neighbour = nr_neigh
> > > nr_neigh_hold(nr_neigh);								nr_add_node()
> > > 											nr_neigh_put()
> > > 											if (nr_node->routes[2].neighbour->count
> > > Description of the UAF triggering process:
> > > First, Task 0 executes nr_neigh_get_dev() to set neighbor refcount to 3.
> > > Then, Task 1 puts the same neighbor from its routes[2] and executes
> > > nr_remove_neigh() because the count is 0. After these two operations,
> > > the neighbor's refcount becomes 1. Then, Task 0 acquires the nr node
> > > lock and writes it to its routes[2].neighbour.
> > > Finally, Task 2 executes nr_neigh_put(nr_node->routes[2].neighbour) to
> > > release the neighbor. The subsequent execution of the neighbor->count
> > > check triggers a UAF.
> > 
> > I looked at the code quite a bit and I think this could possibly avoid
> > the above mentioned race, but this whole area looks quite confusing to me.
> > 
> > I think it would be helpful if you could better describe the relevant
> > scenario starting from the initial setup (no nodes, no neighs).
> OK. Let me fill in the origin of neigh.
> 
> Task3
> =====
> nr_add_node()
> [146]if ((nr_neigh = kmalloc(sizeof(*nr_neigh), GFP_ATOMIC)) == NULL)
> [253]nr_node->routes[2].neighbour = nr_neigh;
> [255]nr_neigh_hold(nr_neigh);
> [256]nr_neigh->count++;
> 
> neigh is created on line 146 in nr_add_node(), and added to node on
> lines 253-256. It occurs before all Task0, Task1, and Task2.
> 
> Note:
> 1. [x], x is line number.
> 2. During my debugging process, I didn't pay attention to where the node
> was created, and I apologize that I cannot provide the relevant creation
> process.
Hi everyone, 
Today is my last day at WindRiver. Starting tomorrow, my email address
lizhi.xu@windriver.com will no longer be used;
I will use eadavis@qq.com thereafter.

BR,
Lizhi
Re: [PATCH V3] netrom: Prevent race conditions between neighbor operations
Posted by Eric Dumazet 3 months, 2 weeks ago
On Thu, Oct 23, 2025 at 4:44 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 10/21/25 10:35 AM, Lizhi Xu wrote:
> > The root cause of the problem is that multiple different tasks initiate
> > SIOCADDRT & NETROM_NODE commands to add new routes, there is no lock
> > between them to protect the same nr_neigh.
> >
> > Task0 can add the nr_neigh.refcount value of 1 on Task1 to routes[2].
> > When Task2 executes nr_neigh_put(nr_node->routes[2].neighbour), it will
> > release the neighbour because its refcount value is 1.
> >
> > In this case, the following situation causes a UAF on Task2:
> >
> > Task0                                 Task1                                           Task2
> > =====                                 =====                                           =====
> > nr_add_node()
> > nr_neigh_get_dev()                    nr_add_node()
> >                                       nr_node_lock()
> >                                       nr_node->routes[2].neighbour->count--
> >                                       nr_neigh_put(nr_node->routes[2].neighbour);
> >                                       nr_remove_neigh(nr_node->routes[2].neighbour)
> >                                       nr_node_unlock()
> > nr_node_lock()
> > nr_node->routes[2].neighbour = nr_neigh
> > nr_neigh_hold(nr_neigh);                                                              nr_add_node()
> >                                                                                       nr_neigh_put()
> >                                                                                       if (nr_node->routes[2].neighbour->count
> > Description of the UAF triggering process:
> > First, Task 0 executes nr_neigh_get_dev() to set neighbor refcount to 3.
> > Then, Task 1 puts the same neighbor from its routes[2] and executes
> > nr_remove_neigh() because the count is 0. After these two operations,
> > the neighbor's refcount becomes 1. Then, Task 0 acquires the nr node
> > lock and writes it to its routes[2].neighbour.
> > Finally, Task 2 executes nr_neigh_put(nr_node->routes[2].neighbour) to
> > release the neighbor. The subsequent execution of the neighbor->count
> > check triggers a UAF.
> >
> > The solution to the problem is to use a lock to synchronize each add a
> > route to node, but for rigor, I'll add locks to related ioctl and route
> > frame operations to maintain synchronization.
>
> I think that adding another locking mechanism on top of an already
> complex and not well understood locking and reference infra is not the
> right direction.
>
> Why reordering the statements as:
>
>         if (nr_node->routes[2].neighbour->count == 0 &&
> !nr_node->routes[2].neighbour->locked)
>                 nr_remove_neigh(nr_node->routes[2].neighbour);
>         nr_neigh_put(nr_node->routes[2].neighbour);
>
> is not enough?
>
> > syzbot reported:
> > BUG: KASAN: slab-use-after-free in nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
> > Read of size 4 at addr ffff888051e6e9b0 by task syz.1.2539/8741
> >
> > Call Trace:
> >  <TASK>
> >  nr_add_node+0x25db/0x2c00 net/netrom/nr_route.c:248
> >
> > Reported-by: syzbot+2860e75836a08b172755@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=2860e75836a08b172755
> > Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
>
>
>
> > ---
> > V1 -> V2: update comments for cause uaf
> > V2 -> V3: sync neighbor operations in ioctl and route frame, update comments
> >
> >  net/netrom/nr_route.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
> > index b94cb2ffbaf8..debe3e925338 100644
> > --- a/net/netrom/nr_route.c
> > +++ b/net/netrom/nr_route.c
> > @@ -40,6 +40,7 @@ static HLIST_HEAD(nr_node_list);
> >  static DEFINE_SPINLOCK(nr_node_list_lock);
> >  static HLIST_HEAD(nr_neigh_list);
> >  static DEFINE_SPINLOCK(nr_neigh_list_lock);
> > +static DEFINE_MUTEX(neighbor_lock);
> >
> >  static struct nr_node *nr_node_get(ax25_address *callsign)
> >  {
> > @@ -633,6 +634,8 @@ int nr_rt_ioctl(unsigned int cmd, void __user *arg)
> >       ax25_digi digi;
> >       int ret;
> >
> > +     guard(mutex)(&neighbor_lock);
>
> See:
>
> https://elixir.bootlin.com/linux/v6.18-rc1/source/Documentation/process/maintainer-netdev.rst#L395
>

I would also try to use a single spinlock : ie fuse together
nr_node_list_lock and nr_neigh_list_lock

Having two locks for something that is primarily used by fuzzers
nowadays is wasting our time.
Re: [PATCH V2] netrom: Prevent race conditions between multiple add route
Posted by Dan Carpenter 3 months, 2 weeks ago
On Mon, Oct 20, 2025 at 03:25:40PM +0300, Dan Carpenter wrote:
> netrom/nr_route.c
>    214          nr_node_lock(nr_node);

I guess nr_node is different for each thread?

>    215  
>    216          if (quality != 0)
>    217                  strscpy(nr_node->mnemonic, mnemonic);
>    218  
>    219          for (found = 0, i = 0; i < nr_node->count; i++) {
>    220                  if (nr_node->routes[i].neighbour == nr_neigh) {
>    221                          nr_node->routes[i].quality   = quality;
>    222                          nr_node->routes[i].obs_count = obs_count;
>    223                          found = 1;
>    224                          break;
>    225                  }
>    226          }
>    227  
>    228          if (!found) {
>    229                  /* We have space at the bottom, slot it in */
>    230                  if (nr_node->count < 3) {
>    231                          nr_node->routes[2] = nr_node->routes[1];
>    232                          nr_node->routes[1] = nr_node->routes[0];
>    233  
>    234                          nr_node->routes[0].quality   = quality;
>    235                          nr_node->routes[0].obs_count = obs_count;
>    236                          nr_node->routes[0].neighbour = nr_neigh;
>    237  
>    238                          nr_node->which++;
>    239                          nr_node->count++;
>    240                          nr_neigh_hold(nr_neigh);
>    241                          nr_neigh->count++;
>    242                  } else {
>    243                          /* It must be better than the worst */
>    244                          if (quality > nr_node->routes[2].quality) {
>    245                                  nr_node->routes[2].neighbour->count--;
>    246                                  nr_neigh_put(nr_node->routes[2].neighbour);
>    247  
>    248                                  if (nr_node->routes[2].neighbour->count == 0 && !nr_node->routes[2].neighbour->locked)
>    249                                          nr_remove_neigh(nr_node->routes[2].neighbour);

Does neighbour->count means how many nr_node pointers have it in ->routes[]?
I wish this code had comments.
KTDOO: add comments explaining all the counters in netrom/nr_route.c

>    250  
>    251                                  nr_node->routes[2].quality   = quality;
>    252                                  nr_node->routes[2].obs_count = obs_count;
>    253                                  nr_node->routes[2].neighbour = nr_neigh;
>    254  
>    255                                  nr_neigh_hold(nr_neigh);
>    256                                  nr_neigh->count++;

Could we just add some locking to this if statement only?  I had thought
that nr_node_lock() serialized it but now I think maybe not?  Or maybe we
could increment the counters before assigning it?

			nr_neigh->count++;
			nr_neigh_hold(nr_neigh);
			nr_node->routes[2].neighbour = nr_neigh;

I'm not an expert in concerency.  Does calling nr_neigh_hold() mean th
that the nr_neigh->count++ will happen before the assignment?


>    257                          }
>    258                  }
>    259          }

regards,
dan carpenter
Re: [PATCH V2] netrom: Prevent race conditions between multiple add route
Posted by Dan Carpenter 3 months, 2 weeks ago
On Mon, Oct 20, 2025 at 03:25:40PM +0300, Dan Carpenter wrote:
> On Mon, Oct 20, 2025 at 07:02:44PM +0800, Lizhi Xu wrote:
> > The root cause of the problem is that multiple different tasks initiate
> > NETROM_NODE commands to add new routes, there is no lock between them to
> > protect the same nr_neigh.
> > Task0 may add the nr_neigh.refcount value of 1 on Task1 to routes[2].
> > 
> > When Task2 executes nr_neigh_put(nr_node->routes[2].neighbour), it will
> > release the neighbour because its refcount value is 1.
> > 
> > In this case, the following situation causes a UAF:
> > 
> > Task0					Task1						Task2
> > =====					=====						=====
> > nr_add_node()
> > nr_neigh_get_dev()			nr_add_node()
> > 					nr_node_lock()
> > 					nr_node->routes[2].neighbour->count--
> > 					nr_neigh_put(nr_node->routes[2].neighbour);
> > 					nr_remove_neigh(nr_node->routes[2].neighbour)
> > 					nr_node_unlock()
> > nr_node_lock()
> > nr_node->routes[2].neighbour = nr_neigh
> > nr_neigh_hold(nr_neigh);								nr_add_node()
> > 											nr_neigh_put()
> > 
> > The solution to the problem is to use a lock to synchronize each add a route
> > to node.
> 
> This chart is still not right.  Let me add line numbers to your chart:
> 
> netrom/nr_route.c
>    214          nr_node_lock(nr_node);
>    215  
>    216          if (quality != 0)
>    217                  strscpy(nr_node->mnemonic, mnemonic);
>    218  
>    219          for (found = 0, i = 0; i < nr_node->count; i++) {
>    220                  if (nr_node->routes[i].neighbour == nr_neigh) {
>    221                          nr_node->routes[i].quality   = quality;
>    222                          nr_node->routes[i].obs_count = obs_count;

Should we call nr_neigh->count++ if we found it?  I guess I don't
really understand what nr_neigh->count is counting...  It would be
really nice if someone added some comments explaining how the ref
counting worked.

>    223                          found = 1;
>    224                          break;
>    225                  }
>    226          }

regards,
dan carpenter