[PATCH net v2] ipv6: addrconf: skip autoconf on unregistering devices

Xu Rao posted 1 patch 1 month ago
net/ipv6/addrconf.c | 6 ++++++
1 file changed, 6 insertions(+)
[PATCH net v2] ipv6: addrconf: skip autoconf on unregistering devices
Posted by Xu Rao 1 month ago
syzbot reports that unregister_netdevice() can wait forever for a
netdevsim device whose reference count never drops to zero.

The leaked reference is held by an IPv6 local route created from
addrconf.  A late NETDEV_CHANGE notification can still reach
addrconf_notify() after the device has entered NETREG_UNREGISTERING.
The handler can then run automatic address configuration, add a
link-local address and install its host route after unregister teardown
has already started.  The route nexthop takes a netdev reference in
fib6_nh_init(), and there might not be a later ifdown pass to remove
the newly created address and route.

Do not run MTU, UP or CHANGE based IPv6 autoconfiguration once the
device is unregistering.  Keep NETDEV_DOWN and NETDEV_UNREGISTER
handling unchanged so the teardown path can still remove existing IPv6
state.

Reported-by: syzbot+e2af46126e0644cbebdd@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e2af46126e0644cbebdd
Signed-off-by: Xu Rao <raoxu@uniontech.com>
---
v2:
- Drop READ_ONCE() around dev->reg_state.  addrconf_notify() is called
  from the netdevice notifier path, so a plain load is sufficient.
- Do not add a Fixes tag.  The issue does not appear to be caused by a
  single commit, but by a long-standing unregister-time lifecycle gap.

 net/ipv6/addrconf.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 5476b6536eb7..a517e57cf86a 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3666,6 +3666,9 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
 		break;

 	case NETDEV_CHANGEMTU:
+		if (dev->reg_state == NETREG_UNREGISTERING)
+			break;
+
 		/* if MTU under IPV6_MIN_MTU stop IPv6 on this interface. */
 		if (dev->mtu < IPV6_MIN_MTU) {
 			addrconf_ifdown(dev, dev != net->loopback_dev);
@@ -3691,6 +3694,9 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
 		fallthrough;
 	case NETDEV_UP:
 	case NETDEV_CHANGE:
+		if (dev->reg_state == NETREG_UNREGISTERING)
+			break;
+
 		if (idev && idev->cnf.disable_ipv6)
 			break;

--
2.50.1
Re: [PATCH net v2] ipv6: addrconf: skip autoconf on unregistering devices
Posted by Ido Schimmel 1 month ago
On Tue, May 12, 2026 at 08:44:10PM +0800, Xu Rao wrote:
> syzbot reports that unregister_netdevice() can wait forever for a
> netdevsim device whose reference count never drops to zero.
> 
> The leaked reference is held by an IPv6 local route created from
> addrconf.  A late NETDEV_CHANGE notification can still reach
> addrconf_notify() after the device has entered NETREG_UNREGISTERING.
> The handler can then run automatic address configuration, add a
> link-local address and install its host route after unregister teardown
> has already started.  The route nexthop takes a netdev reference in
> fib6_nh_init(), and there might not be a later ifdown pass to remove
> the newly created address and route.

Do you have a reproducer?

The kernel repeatedly sends NETDEV_UNREGISTER notifications when it's
waiting for the reference count to drop.

> 
> Do not run MTU, UP or CHANGE based IPv6 autoconfiguration once the
> device is unregistering.  Keep NETDEV_DOWN and NETDEV_UNREGISTER
> handling unchanged so the teardown path can still remove existing IPv6
> state.
> 
> Reported-by: syzbot+e2af46126e0644cbebdd@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=e2af46126e0644cbebdd
> Signed-off-by: Xu Rao <raoxu@uniontech.com>
> ---
> v2:
> - Drop READ_ONCE() around dev->reg_state.  addrconf_notify() is called
>   from the netdevice notifier path, so a plain load is sufficient.
> - Do not add a Fixes tag.  The issue does not appear to be caused by a
>   single commit, but by a long-standing unregister-time lifecycle gap.
> 
>  net/ipv6/addrconf.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 5476b6536eb7..a517e57cf86a 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -3666,6 +3666,9 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
>  		break;
> 
>  	case NETDEV_CHANGEMTU:
> +		if (dev->reg_state == NETREG_UNREGISTERING)
> +			break;
> +
>  		/* if MTU under IPV6_MIN_MTU stop IPv6 on this interface. */
>  		if (dev->mtu < IPV6_MIN_MTU) {
>  			addrconf_ifdown(dev, dev != net->loopback_dev);
> @@ -3691,6 +3694,9 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
>  		fallthrough;
>  	case NETDEV_UP:
>  	case NETDEV_CHANGE:
> +		if (dev->reg_state == NETREG_UNREGISTERING)
> +			break;
> +
>  		if (idev && idev->cnf.disable_ipv6)
>  			break;
> 
> --
> 2.50.1
Re: [PATCH net v2] ipv6: addrconf: skip autoconf on unregistering devices
Posted by Xu Rao 4 weeks, 1 day ago
Hi Ido,

> > The leaked reference is held by an IPv6 local route created from
> > addrconf.  A late NETDEV_CHANGE notification can still reach
> > addrconf_notify() after the device has entered NETREG_UNREGISTERING.
> > The handler can then run automatic address configuration, add a
> > link-local address and install its host route after unregister teardown
> > has already started.  The route nexthop takes a netdev reference in
> > fib6_nh_init(), and there might not be a later ifdown pass to remove
> > the newly created address and route.
>
> Do you have a reproducer?

The reproducer is the syz repro from the syzbot report:

https://syzkaller.appspot.com/x/repro.syz?x=103f3dba580000

I don't have a standalone C reproducer. I asked syzbot to test the patch
against the original report tree:

git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci

The reproducer did not trigger the issue with this patch applied:

Reported-by: syzbot+e2af46126e0644cbebdd@syzkaller.appspotmail.com
Tested-by: syzbot+e2af46126e0644cbebdd@syzkaller.appspotmail.com

Tested on:

commit:         5cbb61bf arm64/fpsimd: ptrace: zero target's fpsimd_st..
git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
console output: https://syzkaller.appspot.com/x/log.txt?x=147a20ec580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=a834c6344141a58b
dashboard link: https://syzkaller.appspot.com/bug?extid=e2af46126e0644cbebdd
patch:          https://syzkaller.appspot.com/x/patch.diff?x=12255636580000

> The kernel repeatedly sends NETDEV_UNREGISTER notifications when it's
> waiting for the reference count to drop.

Yes, and this patch intentionally keeps NETDEV_DOWN and NETDEV_UNREGISTER
handling unchanged so addrconf_ifdown() can still remove existing IPv6
state during teardown.

The guard only affects the MTU / UP / CHANGE paths. The problem I was
trying to avoid is creating new IPv6 state once the device is already in
NETREG_UNREGISTERING. In the syzbot trace, addrconf_notify() is reached
from a NETDEV_CHANGE path and then creates a link-local address and its
route while unregister is already in progress. The route then holds a
netdev reference via fib6_nh_init().

So the patch does not rely on suppressing unregister processing; it only
prevents late autoconf from adding new state during unregister.

Thanks,
Xu Rao
Re: [PATCH net v2] ipv6: addrconf: skip autoconf on unregistering devices
Posted by Tetsuo Handa 4 weeks, 1 day ago
On 2026/05/14 18:44, Xu Rao wrote:
> The reproducer is the syz repro from the syzbot report:
> 
> https://syzkaller.appspot.com/x/repro.syz?x=103f3dba580000
> 
> I don't have a standalone C reproducer. I asked syzbot to test the patch
> against the original report tree:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
> 
> The reproducer did not trigger the issue with this patch applied:

Will you test your patch with

-		if (dev->reg_state == NETREG_UNREGISTERING)
-			break;
-
+		WARN_ON(dev->reg_state == NETREG_UNREGISTERING);

and confirm that WARN_ON() fires, for nobody knows whether this refcount
was obtained after dev->reg_state became NETREG_UNREGISTERING ?

Please be aware that the stack traces that follow

  unregister_netdevice: waiting for netdevsim1 to become free. Usage count = 2

do not indicate that these references are obtained after the

  unregister_netdevice: waiting for netdevsim1 to become free. Usage count = 2

message was printed. The stack traces indicates only where the leaking refcount
was obtained.

>> The kernel repeatedly sends NETDEV_UNREGISTER notifications when it's
>> waiting for the reference count to drop.

It is true that the kernel repeatedly sends NETDEV_UNREGISTER notifications,
but it is not always true that the NETDEV_UNREGISTER handlers works for
every notification. Some NETDEV_UNREGISTER handlers do something for only
the first NETDEV_UNREGISTER notification and is do nothing for the subsequent
NETDEV_UNREGISTER notifications, for a NETDEV_UNREGISTER handler might
unregister cleanup functions (and fails to cleanup resources obtained
afterwards).

My experience says that operations which are not serialized by the RTNL lock is
more prone to this refcount race than the netdev handlers which are serialized
by the RTNL lock. To close such race, we need to check operations which are not
serialized by the RTNL lock. An example was commit 5d5602236f5d ("can: j1939:
make j1939_session_activate() fail if device is no longer registered").

> The guard only affects the MTU / UP / CHANGE paths. The problem I was
> trying to avoid is creating new IPv6 state once the device is already in
> NETREG_UNREGISTERING. In the syzbot trace, addrconf_notify() is reached
> from a NETDEV_CHANGE path and then creates a link-local address and its
> route while unregister is already in progress. The route then holds a
> netdev reference via fib6_nh_init().
> 
> So the patch does not rely on suppressing unregister processing; it only
> prevents late autoconf from adding new state during unregister.

Unless you confirmed that WARN_ON() fires, a single successful "#syz test" response
is not sufficient for believing that your patch actually fixes this problem.

I think that we want some debug printk() patches like
https://lore.kernel.org/all/e0c7030b-261c-4ed1-b6b0-bf3b83a41d60@I-love.SAKURA.ne.jp/T/
in order to get more debug information.
Re: [PATCH net v2] ipv6: addrconf: skip autoconf on unregistering devices
Posted by Xu Rao 4 weeks, 1 day ago
Hi Tetsuo,

> Will you test your patch with
>
> - if (dev->reg_state == NETREG_UNREGISTERING)
> - break;
> -
> + WARN_ON(dev->reg_state == NETREG_UNREGISTERING);
>
> and confirm that WARN_ON() fires, for nobody knows whether this refcount
> was obtained after dev->reg_state became NETREG_UNREGISTERING ?

> Please be aware that the stack traces that follow

I ran the additional syzbot tests on the same tree.

First, the proposed v2 patch did not reproduce the issue:

  commit:         5cbb61bf4168
  git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
  result:         OK
  console:        https://syzkaller.appspot.com/x/log.txt?x=147a20ec580000
  patch:          https://syzkaller.appspot.com/x/patch.diff?x=12255636580000

Second, I tested the diagnostic WARN_ON() patch suggested in this
thread.  That patch adds WARN_ON(dev->reg_state == NETREG_UNREGISTERING)
at the NETDEV_CHANGEMTU and NETDEV_UP/NETDEV_CHANGE paths.  syzbot also
reported OK for that diagnostic patch:

  commit:         5cbb61bf4168
  git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
  result:         OK
  console:        https://syzkaller.appspot.com/x/log.txt?x=10eae996580000
  patch:          https://syzkaller.appspot.com/x/patch.diff?x=1182da73980000

I checked the console log and did not find a WARN_ON() splat.

Finally, I ran the same tree without any patch, as a baseline.  That run
did reproduce the problem:

  commit:         5cbb61bf4168
  git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
  result:         report
  report:         https://syzkaller.appspot.com/x/report.txt?x=13f620ec580000
  console:        https://syzkaller.appspot.com/x/log.txt?x=15f620ec580000

So the no-patch baseline still triggers the leak in the same environment
where both the proposed patch and the WARN_ON() diagnostic patch returned
OK.

I agree that the WARN_ON() result does not prove the exact
NETREG_UNREGISTERING hypothesis, because it did not trigger.  I also do
not think this negative result is strong enough to disprove the race:
this bug is timing-sensitive, and even adding the reg_state load, branch
and WARN_ON() site can perturb the execution enough for a best-effort
syzbot run not to hit the same interleaving.

Thanks,
Xu Rao
Re: [PATCH net v2] ipv6: addrconf: skip autoconf on unregistering devices
Posted by Tetsuo Handa 4 weeks, 1 day ago
On 2026/05/14 22:18, Xu Rao wrote:
> So the no-patch baseline still triggers the leak in the same environment
> where both the proposed patch and the WARN_ON() diagnostic patch returned
> OK.
> 
> I agree that the WARN_ON() result does not prove the exact
> NETREG_UNREGISTERING hypothesis, because it did not trigger.  I also do
> not think this negative result is strong enough to disprove the race:
> this bug is timing-sensitive, and even adding the reg_state load, branch
> and WARN_ON() site can perturb the execution enough for a best-effort
> syzbot run not to hit the same interleaving.

Since easily reproducible cases have been fixed in
"unregister_netdevice: waiting for DEV to become free (8)",
cases reported in
"unregister_netdevice: waiting for DEV to become free (9)" and
afterwards are likely timing-dependent difficult-to-reproduce races.

Therefore, we need careful reasoning for why a patch can fix the problem.
Carrying my debug printk() patches in networking trees might help.