net: phonet: fix BUG_ON() in pn_socket_autobind()

[PATCH] net: phonet: fix BUG_ON() in pn_socket_autobind()

Posted by Deepanshu Kartikey 1 month, 3 weeks ago

pn_socket_autobind() calls pn_socket_bind() and treats
-EINVAL as a signal that the socket was already bound,
then uses BUG_ON() to verify it:

    if (err != -EINVAL)
        return err;
    BUG_ON(!pn_port(pn_sk(sock->sk)->sobject));

However, pn_socket_bind() returns -EINVAL in multiple
cases:

  1. address length too short
  2. socket not in TCP_CLOSE state
  3. socket already bound  <- only intended case

When -EINVAL comes from cases 1 or 2, sobject is still
zero (never assigned), causing BUG_ON to fire and crash
the kernel.

Fix this by checking the bound state directly via
pn_port(sobject) BEFORE calling pn_socket_bind(),
eliminating the ambiguous -EINVAL interpretation
entirely.

Reported-by: syzbot+706f5eb79044e686c794@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=706f5eb79044e686c794
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
 net/phonet/socket.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index c4af26357144..5a55e7d14e85 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -204,14 +204,14 @@ static int pn_socket_autobind(struct socket *sock)
 	struct sockaddr_pn sa;
 	int err;
 
+	if (pn_port(pn_sk(sock->sk)->sobject))
+		return 0; /* socket was already bound */
+
 	memset(&sa, 0, sizeof(sa));
 	sa.spn_family = AF_PHONET;
 	err = pn_socket_bind(sock, (struct sockaddr_unsized *)&sa,
 			     sizeof(struct sockaddr_pn));
-	if (err != -EINVAL)
-		return err;
-	BUG_ON(!pn_port(pn_sk(sock->sk)->sobject));
-	return 0; /* socket was already bound */
+	return err;
 }
 
 static int pn_socket_connect(struct socket *sock, struct sockaddr_unsized *addr,
-- 
2.43.0

Re: [PATCH] net: phonet: fix BUG_ON() in pn_socket_autobind()

Posted by Jakub Kicinski 1 month, 2 weeks ago

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.

For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
---
net: phonet: fix BUG_ON() in pn_socket_autobind()

This commit fixes a kernel crash in pn_socket_autobind() by checking
the socket's bound state directly before calling pn_socket_bind(),
eliminating ambiguous -EINVAL interpretation.

Closes: https://syzkaller.appspot.com/bug?extid=706f5eb79044e686c794

> pn_socket_autobind() calls pn_socket_bind() and treats
> -EINVAL as a signal that the socket was already bound,
> then uses BUG_ON() to verify it:
>
>     if (err != -EINVAL)
>         return err;
>     BUG_ON(!pn_port(pn_sk(sock->sk)->sobject));
>
> However, pn_socket_bind() returns -EINVAL in multiple
> cases:
>
>   1. address length too short

The commit message lists "address length too short" as a cause of the
BUG_ON crash, but is this accurate?

Looking at pn_socket_autobind(), it always passes
len=sizeof(struct sockaddr_pn) to pn_socket_bind():

    err = pn_socket_bind(sock, (struct sockaddr_unsized *)&sa,
                         sizeof(struct sockaddr_pn));

While pn_socket_bind() does have this check:

    if (len < sizeof(struct sockaddr_pn))
        return -EINVAL;

This case cannot be triggered through pn_socket_autobind() since it
always passes the correct length. The BUG_ON is only reachable through
pn_socket_autobind(), so listing this as a cause seems misleading.

Should the commit message be updated to remove case 1, or clarify that
it's describing pn_socket_bind() in general rather than paths that can
trigger the BUG_ON?

>   2. socket not in TCP_CLOSE state

Can case 2 (socket not in TCP_CLOSE state) actually trigger the BUG_ON
crash with sobject still zero?

The combined check in pn_socket_bind() is:

    if (sk->sk_state != TCP_CLOSE || pn_port(pn->sobject)) {
        err = -EINVAL;
        goto out;
    }

For the BUG_ON to fire, we need sk_state != TCP_CLOSE while sobject
remains zero (unbound). Looking at phonet state transitions:

- pn_socket_connect() binds the socket before changing state
- pn_socket_listen() binds the socket before changing state
- pep_sock_accept() creates sockets with state=TCP_SYN_RECV but also
  sets sobject
- pep_ioctl(SIOCPNENABLEPIPE) checks sobject before changing state

Is there a code path that changes sk_state to non-TCP_CLOSE without
first binding the socket? The syzbot crash is real, but the mechanism
described couldn't be verified. Could this involve a race condition or
a different code path?

>   3. socket already bound  <- only intended case
>
> When -EINVAL comes from cases 1 or 2, sobject is still
> zero (never assigned), causing BUG_ON to fire and crash
> the kernel.

Given the issues with cases 1 and 2 above, is the explanation of the
crash mechanism accurate? Case 1 cannot happen from pn_socket_autobind(),
and case 2's ability to leave sobject at zero while triggering -EINVAL
could not be verified.

The syzbot crash is confirmed (RIP at pn_socket_autobind
net/phonet/socket.c:213 via pn_socket_sendmsg), but could the commit
message more accurately describe how the crash actually occurs?

> diff --git a/net/phonet/socket.c b/net/phonet/socket.c
> index c4af26357144..5a55e7d14e85 100644
> --- a/net/phonet/socket.c
> +++ b/net/phonet/socket.c
> @@ -204,14 +204,14 @@ static int pn_socket_autobind(struct socket *sock)
>  	struct sockaddr_pn sa;
>  	int err;
>
> +	if (pn_port(pn_sk(sock->sk)->sobject))
> +		return 0; /* socket was already bound */
> +
>  	memset(&sa, 0, sizeof(sa));
>  	sa.spn_family = AF_PHONET;
>  	err = pn_socket_bind(sock, (struct sockaddr_unsized *)&sa,
>  			     sizeof(struct sockaddr_pn));
> -	if (err != -EINVAL)
> -		return err;
> -	BUG_ON(!pn_port(pn_sk(sock->sk)->sobject));
> -	return 0; /* socket was already bound */
> +	return err;
>  }

The fix itself is correct and removes the dangerous BUG_ON(). However,
the pn_port(sobject) check is performed without holding any lock. Could
a concurrent pn_socket_bind() race with this check? This is likely
benign since pn_socket_bind() re-checks under lock_sock(), but is this
the intended behavior?
-- 
pw-bot: cr

Re: [PATCH] net: phonet: fix BUG_ON() in pn_socket_autobind()

Posted by Deepanshu Kartikey 1 month, 2 weeks ago

On Tue, Apr 28, 2026 at 6:24 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
>
> For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
> ---
> net: phonet: fix BUG_ON() in pn_socket_autobind()
>
> This commit fixes a kernel crash in pn_socket_autobind() by checking
> the socket's bound state directly before calling pn_socket_bind(),
> eliminating ambiguous -EINVAL interpretation.
>
> Closes: https://syzkaller.appspot.com/bug?extid=706f5eb79044e686c794
>
> > pn_socket_autobind() calls pn_socket_bind() and treats
> > -EINVAL as a signal that the socket was already bound,
> > then uses BUG_ON() to verify it:
> >
> >     if (err != -EINVAL)
> >         return err;
> >     BUG_ON(!pn_port(pn_sk(sock->sk)->sobject));
> >
> > However, pn_socket_bind() returns -EINVAL in multiple
> > cases:
> >
> >   1. address length too short
>
> The commit message lists "address length too short" as a cause of the
> BUG_ON crash, but is this accurate?
>
> Looking at pn_socket_autobind(), it always passes
> len=sizeof(struct sockaddr_pn) to pn_socket_bind():
>
>     err = pn_socket_bind(sock, (struct sockaddr_unsized *)&sa,
>                          sizeof(struct sockaddr_pn));
>
> While pn_socket_bind() does have this check:
>
>     if (len < sizeof(struct sockaddr_pn))
>         return -EINVAL;
>
> This case cannot be triggered through pn_socket_autobind() since it
> always passes the correct length. The BUG_ON is only reachable through
> pn_socket_autobind(), so listing this as a cause seems misleading.
>
> Should the commit message be updated to remove case 1, or clarify that
> it's describing pn_socket_bind() in general rather than paths that can
> trigger the BUG_ON?
>
> >   2. socket not in TCP_CLOSE state
>
> Can case 2 (socket not in TCP_CLOSE state) actually trigger the BUG_ON
> crash with sobject still zero?
>
> The combined check in pn_socket_bind() is:
>
>     if (sk->sk_state != TCP_CLOSE || pn_port(pn->sobject)) {
>         err = -EINVAL;
>         goto out;
>     }
>
> For the BUG_ON to fire, we need sk_state != TCP_CLOSE while sobject
> remains zero (unbound). Looking at phonet state transitions:
>
> - pn_socket_connect() binds the socket before changing state
> - pn_socket_listen() binds the socket before changing state
> - pep_sock_accept() creates sockets with state=TCP_SYN_RECV but also
>   sets sobject
> - pep_ioctl(SIOCPNENABLEPIPE) checks sobject before changing state
>
> Is there a code path that changes sk_state to non-TCP_CLOSE without
> first binding the socket? The syzbot crash is real, but the mechanism
> described couldn't be verified. Could this involve a race condition or
> a different code path?
>
> >   3. socket already bound  <- only intended case
> >
> > When -EINVAL comes from cases 1 or 2, sobject is still
> > zero (never assigned), causing BUG_ON to fire and crash
> > the kernel.
>
> Given the issues with cases 1 and 2 above, is the explanation of the
> crash mechanism accurate? Case 1 cannot happen from pn_socket_autobind(),
> and case 2's ability to leave sobject at zero while triggering -EINVAL
> could not be verified.
>
> The syzbot crash is confirmed (RIP at pn_socket_autobind
> net/phonet/socket.c:213 via pn_socket_sendmsg), but could the commit
> message more accurately describe how the crash actually occurs?
>
> > diff --git a/net/phonet/socket.c b/net/phonet/socket.c
> > index c4af26357144..5a55e7d14e85 100644
> > --- a/net/phonet/socket.c
> > +++ b/net/phonet/socket.c
> > @@ -204,14 +204,14 @@ static int pn_socket_autobind(struct socket *sock)
> >       struct sockaddr_pn sa;
> >       int err;
> >
> > +     if (pn_port(pn_sk(sock->sk)->sobject))
> > +             return 0; /* socket was already bound */
> > +
> >       memset(&sa, 0, sizeof(sa));
> >       sa.spn_family = AF_PHONET;
> >       err = pn_socket_bind(sock, (struct sockaddr_unsized *)&sa,
> >                            sizeof(struct sockaddr_pn));
> > -     if (err != -EINVAL)
> > -             return err;
> > -     BUG_ON(!pn_port(pn_sk(sock->sk)->sobject));
> > -     return 0; /* socket was already bound */
> > +     return err;
> >  }
>
> The fix itself is correct and removes the dangerous BUG_ON(). However,
> the pn_port(sobject) check is performed without holding any lock. Could
> a concurrent pn_socket_bind() race with this check? This is likely
> benign since pn_socket_bind() re-checks under lock_sock(), but is this
> the intended behavior?
> --
> pw-bot: cr



Hi Jakub,

Thank you for the detailed review!

You are correct on all three points. Let me address
each one:

1. Case 1 ("address length too short") is indeed
   impossible from pn_socket_autobind() since it
   always passes sizeof(struct sockaddr_pn). I will
   remove this from the commit message in v2.

2. Regarding case 2 (sk_state != TCP_CLOSE with
   sobject == 0), I verified this using the syzbot
   C reproducer. The exact path is:

     r0 = socket$kcm(AF_KCM, SOCK_SEQPACKET, 0)
     listen(r0, ...)
       → sk_state = TCP_LISTEN, sobject = 0

     r3 = accept4(r0, ...)
       → creates NEW socket r3
       → sk_state = TCP_SYN_RECV  (not TCP_CLOSE!)
       → sobject  = 0             (never bound!)

     sendmsg(r3)
       → pn_socket_autobind(r3)
       → pn_socket_bind() sees sk_state != TCP_CLOSE
         while sobject is still 0
       → returns -EINVAL
       → BUG_ON fires!

   sobject is only set in two places:
     a. successful pn_socket_bind() → get_port() assigns port
     b. pep_sock_accept()           → copied from request

   Since r3 comes from accept4() on a KCM socket
   (not a PEP phonet socket), pep_sock_accept() is
   never called and sobject stays zero. So sk_state
   != TCP_CLOSE and sobject == 0 can indeed be true
   simultaneously.

3. Regarding the missing lock: the check:

     if (pn_port(pn_sk(sock->sk)->sobject))
             return 0;

   is performed without lock_sock(). A concurrent
   pn_socket_bind() could race with this read.
   However since pn_socket_bind() re-checks
   pn_port(sobject) under lock_sock(), any racing
   bind would be caught there. So the race is
   benign. I can add a comment clarifying this
   if desired.

I will send a v2 with an updated commit message
reflecting the above.

Thanks,
Deepanshu Kartikey

Re: [PATCH] net: phonet: fix BUG_ON() in pn_socket_autobind()

Posted by Rémi Denis-Courmont 1 month, 3 weeks ago

Hi,

Le keskiviikkona 22. huhtikuuta 2026, 5.15.33 Itä-Euroopan kesäaika Deepanshu 
Kartikey a écrit :
> pn_socket_autobind() calls pn_socket_bind() and treats
> -EINVAL as a signal that the socket was already bound,
> then uses BUG_ON() to verify it:
> 
>     if (err != -EINVAL)
>         return err;
>     BUG_ON(!pn_port(pn_sk(sock->sk)->sobject));
> 
> However, pn_socket_bind() returns -EINVAL in multiple
> cases:
> 
>   1. address length too short
>   2. socket not in TCP_CLOSE state
>   3. socket already bound  <- only intended case
> 
> When -EINVAL comes from cases 1 or 2, sobject is still
> zero (never assigned), causing BUG_ON to fire and crash
> the kernel.
> 
> Fix this by checking the bound state directly via
> pn_port(sobject) BEFORE calling pn_socket_bind(),
> eliminating the ambiguous -EINVAL interpretation
> entirely.
> 
> Reported-by: syzbot+706f5eb79044e686c794@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=706f5eb79044e686c794
> Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
> ---
>  net/phonet/socket.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/net/phonet/socket.c b/net/phonet/socket.c
> index c4af26357144..5a55e7d14e85 100644
> --- a/net/phonet/socket.c
> +++ b/net/phonet/socket.c
> @@ -204,14 +204,14 @@ static int pn_socket_autobind(struct socket *sock)
>  	struct sockaddr_pn sa;
>  	int err;
> 
> +	if (pn_port(pn_sk(sock->sk)->sobject))
> +		return 0; /* socket was already bound */
> +

This was almost 20 years ago, but IIRC, we did not do it that way back then 
because it results in a data race on sobject if another task binds the socket 
in parallel.
 
>  	memset(&sa, 0, sizeof(sa));
>  	sa.spn_family = AF_PHONET;
>  	err = pn_socket_bind(sock, (struct sockaddr_unsized *)&sa,
>  			     sizeof(struct sockaddr_pn));
> -	if (err != -EINVAL)
> -		return err;
> -	BUG_ON(!pn_port(pn_sk(sock->sk)->sobject));
> -	return 0; /* socket was already bound */
> +	return err;
>  }
> 
>  static int pn_socket_connect(struct socket *sock, struct sockaddr_unsized
> *addr,


-- 
德尼-库尔蒙‧雷米
https://www.remlab.net/