net/smc/smc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
The syzbot report a crash:
Oops: general protection fault, probably for non-canonical address 0xfbd5a5d5a0000003: 0000 [#1] SMP KASAN NOPTI
KASAN: maybe wild-memory-access in range [0xdead4ead00000018-0xdead4ead0000001f]
CPU: 1 UID: 0 PID: 6949 Comm: syz.0.335 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025
RIP: 0010:smc_diag_msg_common_fill net/smc/smc_diag.c:44 [inline]
RIP: 0010:__smc_diag_dump.constprop.0+0x3ca/0x2550 net/smc/smc_diag.c:89
Call Trace:
<TASK>
smc_diag_dump_proto+0x26d/0x420 net/smc/smc_diag.c:217
smc_diag_dump+0x27/0x90 net/smc/smc_diag.c:234
netlink_dump+0x539/0xd30 net/netlink/af_netlink.c:2327
__netlink_dump_start+0x6d6/0x990 net/netlink/af_netlink.c:2442
netlink_dump_start include/linux/netlink.h:341 [inline]
smc_diag_handler_dump+0x1f9/0x240 net/smc/smc_diag.c:251
__sock_diag_cmd net/core/sock_diag.c:249 [inline]
sock_diag_rcv_msg+0x438/0x790 net/core/sock_diag.c:285
netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2552
netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline]
netlink_unicast+0x5a7/0x870 net/netlink/af_netlink.c:1346
netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1896
sock_sendmsg_nosec net/socket.c:714 [inline]
__sock_sendmsg net/socket.c:729 [inline]
____sys_sendmsg+0xa95/0xc70 net/socket.c:2614
___sys_sendmsg+0x134/0x1d0 net/socket.c:2668
__sys_sendmsg+0x16d/0x220 net/socket.c:2700
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0x4e0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
The process like this:
(CPU1) | (CPU2)
---------------------------------|-------------------------------
inet_create() |
// init clcsock to NULL |
sk = sk_alloc() |
|
// unexpectedly change clcsock |
inet_init_csk_locks() |
|
// add sk to hash table |
smc_inet_init_sock() |
smc_sk_init() |
smc_hash_sk() |
| // traverse the hash table
| smc_diag_dump_proto
| __smc_diag_dump()
| // visit wrong clcsock
| smc_diag_msg_common_fill()
// alloc clcsock |
smc_create_clcsk |
sock_create_kern |
With CONFIG_DEBUG_LOCK_ALLOC=y, the smc->clcsock is unexpectedly changed
in inet_init_csk_locks(), because the struct smc_sock does not have struct
inet_connection_sock as the first member.
Previous commit 60ada4fe644e ("smc: Fix various oops due to inet_sock type
confusion.") add inet_sock as the first member of smc_sock. For protocol
with INET_PROTOSW_ICSK, use inet_connection_sock instead of inet_sock is
more appropriate.
Reported-by: syzbot+f775be4458668f7d220e@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f775be4458668f7d220e
Tested-by: syzbot+f775be4458668f7d220e@syzkaller.appspotmail.com
Fixes: d25a92ccae6b ("net/smc: Introduce IPPROTO_SMC")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
---
net/smc/smc.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/smc/smc.h b/net/smc/smc.h
index 2c9084963739..1b20f0c927d3 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -285,7 +285,7 @@ struct smc_connection {
struct smc_sock { /* smc sock container */
union {
struct sock sk;
- struct inet_sock icsk_inet;
+ struct inet_connection_sock inet_conn;
};
struct socket *clcsock; /* internal tcp socket */
void (*clcsk_state_change)(struct sock *sk);
--
2.34.1
On Mon, Sep 22, 2025 at 4:57 AM Wang Liang <wangliang74@huawei.com> wrote: > > The syzbot report a crash: > > Oops: general protection fault, probably for non-canonical address 0xfbd5a5d5a0000003: 0000 [#1] SMP KASAN NOPTI > KASAN: maybe wild-memory-access in range [0xdead4ead00000018-0xdead4ead0000001f] > CPU: 1 UID: 0 PID: 6949 Comm: syz.0.335 Not tainted syzkaller #0 PREEMPT(full) > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025 > RIP: 0010:smc_diag_msg_common_fill net/smc/smc_diag.c:44 [inline] > RIP: 0010:__smc_diag_dump.constprop.0+0x3ca/0x2550 net/smc/smc_diag.c:89 > Call Trace: > <TASK> > smc_diag_dump_proto+0x26d/0x420 net/smc/smc_diag.c:217 > smc_diag_dump+0x27/0x90 net/smc/smc_diag.c:234 > netlink_dump+0x539/0xd30 net/netlink/af_netlink.c:2327 > __netlink_dump_start+0x6d6/0x990 net/netlink/af_netlink.c:2442 > netlink_dump_start include/linux/netlink.h:341 [inline] > smc_diag_handler_dump+0x1f9/0x240 net/smc/smc_diag.c:251 > __sock_diag_cmd net/core/sock_diag.c:249 [inline] > sock_diag_rcv_msg+0x438/0x790 net/core/sock_diag.c:285 > netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2552 > netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline] > netlink_unicast+0x5a7/0x870 net/netlink/af_netlink.c:1346 > netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1896 > sock_sendmsg_nosec net/socket.c:714 [inline] > __sock_sendmsg net/socket.c:729 [inline] > ____sys_sendmsg+0xa95/0xc70 net/socket.c:2614 > ___sys_sendmsg+0x134/0x1d0 net/socket.c:2668 > __sys_sendmsg+0x16d/0x220 net/socket.c:2700 > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > do_syscall_64+0xcd/0x4e0 arch/x86/entry/syscall_64.c:94 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > </TASK> > > The process like this: > > (CPU1) | (CPU2) > ---------------------------------|------------------------------- > inet_create() | > // init clcsock to NULL | > sk = sk_alloc() | > | > // unexpectedly change clcsock | > inet_init_csk_locks() | > | > // add sk to hash table | > smc_inet_init_sock() | > smc_sk_init() | > smc_hash_sk() | > | // traverse the hash table > | smc_diag_dump_proto > | __smc_diag_dump() > | // visit wrong clcsock > | smc_diag_msg_common_fill() > // alloc clcsock | > smc_create_clcsk | > sock_create_kern | > > With CONFIG_DEBUG_LOCK_ALLOC=y, the smc->clcsock is unexpectedly changed > in inet_init_csk_locks(), because the struct smc_sock does not have struct > inet_connection_sock as the first member. > > Previous commit 60ada4fe644e ("smc: Fix various oops due to inet_sock type > confusion.") add inet_sock as the first member of smc_sock. For protocol > with INET_PROTOSW_ICSK, use inet_connection_sock instead of inet_sock is > more appropriate. > > Reported-by: syzbot+f775be4458668f7d220e@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=f775be4458668f7d220e > Tested-by: syzbot+f775be4458668f7d220e@syzkaller.appspotmail.com > Fixes: d25a92ccae6b ("net/smc: Introduce IPPROTO_SMC") > Signed-off-by: Wang Liang <wangliang74@huawei.com> > --- > net/smc/smc.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/smc/smc.h b/net/smc/smc.h > index 2c9084963739..1b20f0c927d3 100644 > --- a/net/smc/smc.h > +++ b/net/smc/smc.h > @@ -285,7 +285,7 @@ struct smc_connection { > struct smc_sock { /* smc sock container */ > union { > struct sock sk; > - struct inet_sock icsk_inet; > + struct inet_connection_sock inet_conn; > }; > struct socket *clcsock; /* internal tcp socket */ > void (*clcsk_state_change)(struct sock *sk); > -- > 2.34.1 > Kuniyuki, can you please review, I think you had a related fix recently. Thanks. commit 60ada4fe644edaa6c2da97364184b0425e8aeaf5 Author: Kuniyuki Iwashima <kuniyu@google.com> Date: Fri Jul 11 06:07:52 2025 +0000 smc: Fix various oops due to inet_sock type confusion.
Thanks Eric for CCing me. On Thu, Sep 25, 2025 at 7:32 AM Eric Dumazet <edumazet@google.com> wrote: > > On Mon, Sep 22, 2025 at 4:57 AM Wang Liang <wangliang74@huawei.com> wrote: > > > > The syzbot report a crash: > > > > Oops: general protection fault, probably for non-canonical address 0xfbd5a5d5a0000003: 0000 [#1] SMP KASAN NOPTI > > KASAN: maybe wild-memory-access in range [0xdead4ead00000018-0xdead4ead0000001f] > > CPU: 1 UID: 0 PID: 6949 Comm: syz.0.335 Not tainted syzkaller #0 PREEMPT(full) > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025 > > RIP: 0010:smc_diag_msg_common_fill net/smc/smc_diag.c:44 [inline] > > RIP: 0010:__smc_diag_dump.constprop.0+0x3ca/0x2550 net/smc/smc_diag.c:89 > > Call Trace: > > <TASK> > > smc_diag_dump_proto+0x26d/0x420 net/smc/smc_diag.c:217 > > smc_diag_dump+0x27/0x90 net/smc/smc_diag.c:234 > > netlink_dump+0x539/0xd30 net/netlink/af_netlink.c:2327 > > __netlink_dump_start+0x6d6/0x990 net/netlink/af_netlink.c:2442 > > netlink_dump_start include/linux/netlink.h:341 [inline] > > smc_diag_handler_dump+0x1f9/0x240 net/smc/smc_diag.c:251 > > __sock_diag_cmd net/core/sock_diag.c:249 [inline] > > sock_diag_rcv_msg+0x438/0x790 net/core/sock_diag.c:285 > > netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2552 > > netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline] > > netlink_unicast+0x5a7/0x870 net/netlink/af_netlink.c:1346 > > netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1896 > > sock_sendmsg_nosec net/socket.c:714 [inline] > > __sock_sendmsg net/socket.c:729 [inline] > > ____sys_sendmsg+0xa95/0xc70 net/socket.c:2614 > > ___sys_sendmsg+0x134/0x1d0 net/socket.c:2668 > > __sys_sendmsg+0x16d/0x220 net/socket.c:2700 > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > > do_syscall_64+0xcd/0x4e0 arch/x86/entry/syscall_64.c:94 > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > </TASK> > > > > The process like this: > > > > (CPU1) | (CPU2) > > ---------------------------------|------------------------------- > > inet_create() | > > // init clcsock to NULL | > > sk = sk_alloc() | > > | > > // unexpectedly change clcsock | > > inet_init_csk_locks() | > > | > > // add sk to hash table | > > smc_inet_init_sock() | > > smc_sk_init() | > > smc_hash_sk() | > > | // traverse the hash table > > | smc_diag_dump_proto > > | __smc_diag_dump() > > | // visit wrong clcsock > > | smc_diag_msg_common_fill() > > // alloc clcsock | > > smc_create_clcsk | > > sock_create_kern | > > > > With CONFIG_DEBUG_LOCK_ALLOC=y, the smc->clcsock is unexpectedly changed > > in inet_init_csk_locks(), because the struct smc_sock does not have struct > > inet_connection_sock as the first member. > > > > Previous commit 60ada4fe644e ("smc: Fix various oops due to inet_sock type > > confusion.") add inet_sock as the first member of smc_sock. For protocol > > with INET_PROTOSW_ICSK, use inet_connection_sock instead of inet_sock is > > more appropriate. Why is INET_PROTOSW_ICSK necessary in the first place ? I don't see a clear reason because smc_clcsock_accept() allocates a new sock by smc_sock_alloc() and does not use inet_accept(). Or is there any other path where smc_sock is cast to inet_connection_sock ? > > > > Reported-by: syzbot+f775be4458668f7d220e@syzkaller.appspotmail.com > > Closes: https://syzkaller.appspot.com/bug?extid=f775be4458668f7d220e > > Tested-by: syzbot+f775be4458668f7d220e@syzkaller.appspotmail.com > > Fixes: d25a92ccae6b ("net/smc: Introduce IPPROTO_SMC") > > Signed-off-by: Wang Liang <wangliang74@huawei.com> > > --- > > net/smc/smc.h | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/smc/smc.h b/net/smc/smc.h > > index 2c9084963739..1b20f0c927d3 100644 > > --- a/net/smc/smc.h > > +++ b/net/smc/smc.h > > @@ -285,7 +285,7 @@ struct smc_connection { > > struct smc_sock { /* smc sock container */ > > union { > > struct sock sk; > > - struct inet_sock icsk_inet; > > + struct inet_connection_sock inet_conn; > > }; > > struct socket *clcsock; /* internal tcp socket */ > > void (*clcsk_state_change)(struct sock *sk); > > -- > > 2.34.1 > > > > Kuniyuki, can you please review, I think you had a related fix recently. > > Thanks. > > commit 60ada4fe644edaa6c2da97364184b0425e8aeaf5 > Author: Kuniyuki Iwashima <kuniyu@google.com> > Date: Fri Jul 11 06:07:52 2025 +0000 > > smc: Fix various oops due to inet_sock type confusion.
On Thu, Sep 25, 2025 at 11:46 AM Kuniyuki Iwashima <kuniyu@google.com> wrote: > > Thanks Eric for CCing me. > > On Thu, Sep 25, 2025 at 7:32 AM Eric Dumazet <edumazet@google.com> wrote: > > > > On Mon, Sep 22, 2025 at 4:57 AM Wang Liang <wangliang74@huawei.com> wrote: > > > > > > The syzbot report a crash: > > > > > > Oops: general protection fault, probably for non-canonical address 0xfbd5a5d5a0000003: 0000 [#1] SMP KASAN NOPTI > > > KASAN: maybe wild-memory-access in range [0xdead4ead00000018-0xdead4ead0000001f] > > > CPU: 1 UID: 0 PID: 6949 Comm: syz.0.335 Not tainted syzkaller #0 PREEMPT(full) > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025 > > > RIP: 0010:smc_diag_msg_common_fill net/smc/smc_diag.c:44 [inline] > > > RIP: 0010:__smc_diag_dump.constprop.0+0x3ca/0x2550 net/smc/smc_diag.c:89 > > > Call Trace: > > > <TASK> > > > smc_diag_dump_proto+0x26d/0x420 net/smc/smc_diag.c:217 > > > smc_diag_dump+0x27/0x90 net/smc/smc_diag.c:234 > > > netlink_dump+0x539/0xd30 net/netlink/af_netlink.c:2327 > > > __netlink_dump_start+0x6d6/0x990 net/netlink/af_netlink.c:2442 > > > netlink_dump_start include/linux/netlink.h:341 [inline] > > > smc_diag_handler_dump+0x1f9/0x240 net/smc/smc_diag.c:251 > > > __sock_diag_cmd net/core/sock_diag.c:249 [inline] > > > sock_diag_rcv_msg+0x438/0x790 net/core/sock_diag.c:285 > > > netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2552 > > > netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline] > > > netlink_unicast+0x5a7/0x870 net/netlink/af_netlink.c:1346 > > > netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1896 > > > sock_sendmsg_nosec net/socket.c:714 [inline] > > > __sock_sendmsg net/socket.c:729 [inline] > > > ____sys_sendmsg+0xa95/0xc70 net/socket.c:2614 > > > ___sys_sendmsg+0x134/0x1d0 net/socket.c:2668 > > > __sys_sendmsg+0x16d/0x220 net/socket.c:2700 > > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > > > do_syscall_64+0xcd/0x4e0 arch/x86/entry/syscall_64.c:94 > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > </TASK> > > > > > > The process like this: > > > > > > (CPU1) | (CPU2) > > > ---------------------------------|------------------------------- > > > inet_create() | > > > // init clcsock to NULL | > > > sk = sk_alloc() | > > > | > > > // unexpectedly change clcsock | > > > inet_init_csk_locks() | > > > | > > > // add sk to hash table | > > > smc_inet_init_sock() | > > > smc_sk_init() | > > > smc_hash_sk() | > > > | // traverse the hash table > > > | smc_diag_dump_proto > > > | __smc_diag_dump() > > > | // visit wrong clcsock > > > | smc_diag_msg_common_fill() > > > // alloc clcsock | > > > smc_create_clcsk | > > > sock_create_kern | > > > > > > With CONFIG_DEBUG_LOCK_ALLOC=y, the smc->clcsock is unexpectedly changed > > > in inet_init_csk_locks(), because the struct smc_sock does not have struct > > > inet_connection_sock as the first member. > > > > > > Previous commit 60ada4fe644e ("smc: Fix various oops due to inet_sock type > > > confusion.") add inet_sock as the first member of smc_sock. For protocol > > > with INET_PROTOSW_ICSK, use inet_connection_sock instead of inet_sock is > > > more appropriate. > > Why is INET_PROTOSW_ICSK necessary in the first place ? > > I don't see a clear reason because smc_clcsock_accept() allocates > a new sock by smc_sock_alloc() and does not use inet_accept(). > > Or is there any other path where smc_sock is cast to > inet_connection_sock ? What I saw in this code was a missing protection. smc_diag_msg_common_fill() runs without socket lock being held. I was thinking of this fix, but apparently syzbot still got crashes. diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c index 10219f55aad14d795dabe4331458bd1b73c22789..b6abd0efea22c0c9726090b5de60e648b86e09a0 100644 --- a/net/smc/smc_close.c +++ b/net/smc/smc_close.c @@ -30,7 +30,8 @@ void smc_clcsock_release(struct smc_sock *smc) mutex_lock(&smc->clcsock_release_lock); if (smc->clcsock) { tcp = smc->clcsock; - smc->clcsock = NULL; + WRITE_ONCE(smc->clcsock, NULL); + synchronize_rcu(); sock_release(tcp); } mutex_unlock(&smc->clcsock_release_lock); diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c index bf0beaa23bdb63edfe0c37515aa17a04bb648c08..069607c1db9aff76d1d4f23b47dfeb5177c433d8 100644 --- a/net/smc/smc_diag.c +++ b/net/smc/smc_diag.c @@ -35,26 +35,32 @@ static struct smc_diag_dump_ctx *smc_dump_context(struct netlink_callback *cb) static void smc_diag_msg_common_fill(struct smc_diag_msg *r, struct sock *sk) { struct smc_sock *smc = smc_sk(sk); + struct socket *clcsock; memset(r, 0, sizeof(*r)); r->diag_family = sk->sk_family; sock_diag_save_cookie(sk, r->id.idiag_cookie); - if (!smc->clcsock) - return; - r->id.idiag_sport = htons(smc->clcsock->sk->sk_num); - r->id.idiag_dport = smc->clcsock->sk->sk_dport; - r->id.idiag_if = smc->clcsock->sk->sk_bound_dev_if; + + rcu_read_lock(); + clcsock = READ_ONCE(smc->clcsock); + if (!clcsock) + goto unlock; + r->id.idiag_sport = htons(clcsock->sk->sk_num); + r->id.idiag_dport = clcsock->sk->sk_dport; + r->id.idiag_if = clcsock->sk->sk_bound_dev_if; if (sk->sk_protocol == SMCPROTO_SMC) { - r->id.idiag_src[0] = smc->clcsock->sk->sk_rcv_saddr; - r->id.idiag_dst[0] = smc->clcsock->sk->sk_daddr; + r->id.idiag_src[0] = clcsock->sk->sk_rcv_saddr; + r->id.idiag_dst[0] = clcsock->sk->sk_daddr; #if IS_ENABLED(CONFIG_IPV6) } else if (sk->sk_protocol == SMCPROTO_SMC6) { - memcpy(&r->id.idiag_src, &smc->clcsock->sk->sk_v6_rcv_saddr, - sizeof(smc->clcsock->sk->sk_v6_rcv_saddr)); - memcpy(&r->id.idiag_dst, &smc->clcsock->sk->sk_v6_daddr, - sizeof(smc->clcsock->sk->sk_v6_daddr)); + memcpy(&r->id.idiag_src, &clcsock->sk->sk_v6_rcv_saddr, + sizeof(clcsock->sk->sk_v6_rcv_saddr)); + memcpy(&r->id.idiag_dst, &clcsock->sk->sk_v6_daddr, + sizeof(clcsock->sk->sk_v6_daddr)); #endif } +unlock: + rcu_read_unlock(); }
On Thu, Sep 25, 2025 at 11:54 AM Eric Dumazet <edumazet@google.com> wrote: > > On Thu, Sep 25, 2025 at 11:46 AM Kuniyuki Iwashima <kuniyu@google.com> wrote: > > > > Thanks Eric for CCing me. > > > > On Thu, Sep 25, 2025 at 7:32 AM Eric Dumazet <edumazet@google.com> wrote: > > > > > > On Mon, Sep 22, 2025 at 4:57 AM Wang Liang <wangliang74@huawei.com> wrote: > > > > > > > > The syzbot report a crash: > > > > > > > > Oops: general protection fault, probably for non-canonical address 0xfbd5a5d5a0000003: 0000 [#1] SMP KASAN NOPTI > > > > KASAN: maybe wild-memory-access in range [0xdead4ead00000018-0xdead4ead0000001f] > > > > CPU: 1 UID: 0 PID: 6949 Comm: syz.0.335 Not tainted syzkaller #0 PREEMPT(full) > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025 > > > > RIP: 0010:smc_diag_msg_common_fill net/smc/smc_diag.c:44 [inline] > > > > RIP: 0010:__smc_diag_dump.constprop.0+0x3ca/0x2550 net/smc/smc_diag.c:89 > > > > Call Trace: > > > > <TASK> > > > > smc_diag_dump_proto+0x26d/0x420 net/smc/smc_diag.c:217 > > > > smc_diag_dump+0x27/0x90 net/smc/smc_diag.c:234 > > > > netlink_dump+0x539/0xd30 net/netlink/af_netlink.c:2327 > > > > __netlink_dump_start+0x6d6/0x990 net/netlink/af_netlink.c:2442 > > > > netlink_dump_start include/linux/netlink.h:341 [inline] > > > > smc_diag_handler_dump+0x1f9/0x240 net/smc/smc_diag.c:251 > > > > __sock_diag_cmd net/core/sock_diag.c:249 [inline] > > > > sock_diag_rcv_msg+0x438/0x790 net/core/sock_diag.c:285 > > > > netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2552 > > > > netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline] > > > > netlink_unicast+0x5a7/0x870 net/netlink/af_netlink.c:1346 > > > > netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1896 > > > > sock_sendmsg_nosec net/socket.c:714 [inline] > > > > __sock_sendmsg net/socket.c:729 [inline] > > > > ____sys_sendmsg+0xa95/0xc70 net/socket.c:2614 > > > > ___sys_sendmsg+0x134/0x1d0 net/socket.c:2668 > > > > __sys_sendmsg+0x16d/0x220 net/socket.c:2700 > > > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > > > > do_syscall_64+0xcd/0x4e0 arch/x86/entry/syscall_64.c:94 > > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > </TASK> > > > > > > > > The process like this: > > > > > > > > (CPU1) | (CPU2) > > > > ---------------------------------|------------------------------- > > > > inet_create() | > > > > // init clcsock to NULL | > > > > sk = sk_alloc() | > > > > | > > > > // unexpectedly change clcsock | > > > > inet_init_csk_locks() | > > > > | > > > > // add sk to hash table | > > > > smc_inet_init_sock() | > > > > smc_sk_init() | > > > > smc_hash_sk() | > > > > | // traverse the hash table > > > > | smc_diag_dump_proto > > > > | __smc_diag_dump() > > > > | // visit wrong clcsock > > > > | smc_diag_msg_common_fill() > > > > // alloc clcsock | > > > > smc_create_clcsk | > > > > sock_create_kern | > > > > > > > > With CONFIG_DEBUG_LOCK_ALLOC=y, the smc->clcsock is unexpectedly changed > > > > in inet_init_csk_locks(), because the struct smc_sock does not have struct > > > > inet_connection_sock as the first member. > > > > > > > > Previous commit 60ada4fe644e ("smc: Fix various oops due to inet_sock type > > > > confusion.") add inet_sock as the first member of smc_sock. For protocol > > > > with INET_PROTOSW_ICSK, use inet_connection_sock instead of inet_sock is > > > > more appropriate. > > > > Why is INET_PROTOSW_ICSK necessary in the first place ? > > > > I don't see a clear reason because smc_clcsock_accept() allocates > > a new sock by smc_sock_alloc() and does not use inet_accept(). > > > > Or is there any other path where smc_sock is cast to > > inet_connection_sock ? > > What I saw in this code was a missing protection. > > smc_diag_msg_common_fill() runs without socket lock being held. > > I was thinking of this fix, but apparently syzbot still got crashes. Looking at the test result, https://syzkaller.appspot.com/x/report.txt?x=15944c7c580000 KASAN: maybe wild-memory-access in range [0xdead4ead00000018-0xdead4ead0000001f] the top half of the address is SPINLOCK_MAGIC (0xdead4ead), so the type confusion mentioned in the commit message makes sense to me. $ pahole -C inet_connection_sock vmlinux struct inet_connection_sock { ... struct request_sock_queue icsk_accept_queue; /* 992 80 */ $ pahole -C smc_sock vmlinux struct smc_sock { ... struct socket * clcsock; /* 992 8 */ The option is 1) let inet_init_csk_locks() init inet_connection_sock or 2) avoid inet_init_csk_locks(), and I guess 2) could be better to avoid potential issues in IS_ICSK branches. > > diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c > index 10219f55aad14d795dabe4331458bd1b73c22789..b6abd0efea22c0c9726090b5de60e648b86e09a0 > 100644 > --- a/net/smc/smc_close.c > +++ b/net/smc/smc_close.c > @@ -30,7 +30,8 @@ void smc_clcsock_release(struct smc_sock *smc) > mutex_lock(&smc->clcsock_release_lock); > if (smc->clcsock) { > tcp = smc->clcsock; > - smc->clcsock = NULL; > + WRITE_ONCE(smc->clcsock, NULL); > + synchronize_rcu(); > sock_release(tcp); > } > mutex_unlock(&smc->clcsock_release_lock); > diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c > index bf0beaa23bdb63edfe0c37515aa17a04bb648c08..069607c1db9aff76d1d4f23b47dfeb5177c433d8 > 100644 > --- a/net/smc/smc_diag.c > +++ b/net/smc/smc_diag.c > @@ -35,26 +35,32 @@ static struct smc_diag_dump_ctx > *smc_dump_context(struct netlink_callback *cb) > static void smc_diag_msg_common_fill(struct smc_diag_msg *r, struct sock *sk) > { > struct smc_sock *smc = smc_sk(sk); > + struct socket *clcsock; > > memset(r, 0, sizeof(*r)); > r->diag_family = sk->sk_family; > sock_diag_save_cookie(sk, r->id.idiag_cookie); > - if (!smc->clcsock) > - return; > - r->id.idiag_sport = htons(smc->clcsock->sk->sk_num); > - r->id.idiag_dport = smc->clcsock->sk->sk_dport; > - r->id.idiag_if = smc->clcsock->sk->sk_bound_dev_if; > + > + rcu_read_lock(); > + clcsock = READ_ONCE(smc->clcsock); > + if (!clcsock) > + goto unlock; > + r->id.idiag_sport = htons(clcsock->sk->sk_num); > + r->id.idiag_dport = clcsock->sk->sk_dport; > + r->id.idiag_if = clcsock->sk->sk_bound_dev_if; > if (sk->sk_protocol == SMCPROTO_SMC) { > - r->id.idiag_src[0] = smc->clcsock->sk->sk_rcv_saddr; > - r->id.idiag_dst[0] = smc->clcsock->sk->sk_daddr; > + r->id.idiag_src[0] = clcsock->sk->sk_rcv_saddr; > + r->id.idiag_dst[0] = clcsock->sk->sk_daddr; > #if IS_ENABLED(CONFIG_IPV6) > } else if (sk->sk_protocol == SMCPROTO_SMC6) { > - memcpy(&r->id.idiag_src, &smc->clcsock->sk->sk_v6_rcv_saddr, > - sizeof(smc->clcsock->sk->sk_v6_rcv_saddr)); > - memcpy(&r->id.idiag_dst, &smc->clcsock->sk->sk_v6_daddr, > - sizeof(smc->clcsock->sk->sk_v6_daddr)); > + memcpy(&r->id.idiag_src, &clcsock->sk->sk_v6_rcv_saddr, > + sizeof(clcsock->sk->sk_v6_rcv_saddr)); > + memcpy(&r->id.idiag_dst, &clcsock->sk->sk_v6_daddr, > + sizeof(clcsock->sk->sk_v6_daddr)); > #endif > } > +unlock: > + rcu_read_unlock(); > }
On Thu, Sep 25, 2025 at 12:25 PM Kuniyuki Iwashima <kuniyu@google.com> wrote: > > On Thu, Sep 25, 2025 at 11:54 AM Eric Dumazet <edumazet@google.com> wrote: > > > > On Thu, Sep 25, 2025 at 11:46 AM Kuniyuki Iwashima <kuniyu@google.com> wrote: > > > > > > Thanks Eric for CCing me. > > > > > > On Thu, Sep 25, 2025 at 7:32 AM Eric Dumazet <edumazet@google.com> wrote: > > > > > > > > On Mon, Sep 22, 2025 at 4:57 AM Wang Liang <wangliang74@huawei.com> wrote: > > > > > > > > > > The syzbot report a crash: > > > > > > > > > > Oops: general protection fault, probably for non-canonical address 0xfbd5a5d5a0000003: 0000 [#1] SMP KASAN NOPTI > > > > > KASAN: maybe wild-memory-access in range [0xdead4ead00000018-0xdead4ead0000001f] > > > > > CPU: 1 UID: 0 PID: 6949 Comm: syz.0.335 Not tainted syzkaller #0 PREEMPT(full) > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025 > > > > > RIP: 0010:smc_diag_msg_common_fill net/smc/smc_diag.c:44 [inline] > > > > > RIP: 0010:__smc_diag_dump.constprop.0+0x3ca/0x2550 net/smc/smc_diag.c:89 > > > > > Call Trace: > > > > > <TASK> > > > > > smc_diag_dump_proto+0x26d/0x420 net/smc/smc_diag.c:217 > > > > > smc_diag_dump+0x27/0x90 net/smc/smc_diag.c:234 > > > > > netlink_dump+0x539/0xd30 net/netlink/af_netlink.c:2327 > > > > > __netlink_dump_start+0x6d6/0x990 net/netlink/af_netlink.c:2442 > > > > > netlink_dump_start include/linux/netlink.h:341 [inline] > > > > > smc_diag_handler_dump+0x1f9/0x240 net/smc/smc_diag.c:251 > > > > > __sock_diag_cmd net/core/sock_diag.c:249 [inline] > > > > > sock_diag_rcv_msg+0x438/0x790 net/core/sock_diag.c:285 > > > > > netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2552 > > > > > netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline] > > > > > netlink_unicast+0x5a7/0x870 net/netlink/af_netlink.c:1346 > > > > > netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1896 > > > > > sock_sendmsg_nosec net/socket.c:714 [inline] > > > > > __sock_sendmsg net/socket.c:729 [inline] > > > > > ____sys_sendmsg+0xa95/0xc70 net/socket.c:2614 > > > > > ___sys_sendmsg+0x134/0x1d0 net/socket.c:2668 > > > > > __sys_sendmsg+0x16d/0x220 net/socket.c:2700 > > > > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > > > > > do_syscall_64+0xcd/0x4e0 arch/x86/entry/syscall_64.c:94 > > > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > </TASK> > > > > > > > > > > The process like this: > > > > > > > > > > (CPU1) | (CPU2) > > > > > ---------------------------------|------------------------------- > > > > > inet_create() | > > > > > // init clcsock to NULL | > > > > > sk = sk_alloc() | > > > > > | > > > > > // unexpectedly change clcsock | > > > > > inet_init_csk_locks() | > > > > > | > > > > > // add sk to hash table | > > > > > smc_inet_init_sock() | > > > > > smc_sk_init() | > > > > > smc_hash_sk() | > > > > > | // traverse the hash table > > > > > | smc_diag_dump_proto > > > > > | __smc_diag_dump() > > > > > | // visit wrong clcsock > > > > > | smc_diag_msg_common_fill() > > > > > // alloc clcsock | > > > > > smc_create_clcsk | > > > > > sock_create_kern | > > > > > > > > > > With CONFIG_DEBUG_LOCK_ALLOC=y, the smc->clcsock is unexpectedly changed > > > > > in inet_init_csk_locks(), because the struct smc_sock does not have struct > > > > > inet_connection_sock as the first member. > > > > > > > > > > Previous commit 60ada4fe644e ("smc: Fix various oops due to inet_sock type > > > > > confusion.") add inet_sock as the first member of smc_sock. For protocol > > > > > with INET_PROTOSW_ICSK, use inet_connection_sock instead of inet_sock is > > > > > more appropriate. > > > > > > Why is INET_PROTOSW_ICSK necessary in the first place ? > > > > > > I don't see a clear reason because smc_clcsock_accept() allocates > > > a new sock by smc_sock_alloc() and does not use inet_accept(). > > > > > > Or is there any other path where smc_sock is cast to > > > inet_connection_sock ? > > > > What I saw in this code was a missing protection. > > > > smc_diag_msg_common_fill() runs without socket lock being held. > > > > I was thinking of this fix, but apparently syzbot still got crashes. > > Looking at the test result, > > https://syzkaller.appspot.com/x/report.txt?x=15944c7c580000 > KASAN: maybe wild-memory-access in range [0xdead4ead00000018-0xdead4ead0000001f] > > the top half of the address is SPINLOCK_MAGIC (0xdead4ead), > so the type confusion mentioned in the commit message makes > sense to me. > > $ pahole -C inet_connection_sock vmlinux > struct inet_connection_sock { > ... > struct request_sock_queue icsk_accept_queue; /* 992 80 */ > > $ pahole -C smc_sock vmlinux > struct smc_sock { > ... > struct socket * clcsock; /* 992 8 */ > > The option is 1) let inet_init_csk_locks() init inet_connection_sock > or 2) avoid inet_init_csk_locks(), and I guess 2) could be better to > avoid potential issues in IS_ICSK branches. > I definitely vote to remove INET_PROTOSW_ICSK from smc. We want to reserve inet_connection_sock to TCP only, so that we can move fields to better cache friendly locations in tcp_sock hopefully for linux-6.19
On Thu, Sep 25, 2025 at 12:37 PM Eric Dumazet <edumazet@google.com> wrote: > > On Thu, Sep 25, 2025 at 12:25 PM Kuniyuki Iwashima <kuniyu@google.com> wrote: > > > > On Thu, Sep 25, 2025 at 11:54 AM Eric Dumazet <edumazet@google.com> wrote: > > > > > > On Thu, Sep 25, 2025 at 11:46 AM Kuniyuki Iwashima <kuniyu@google.com> wrote: > > > > > > > > Thanks Eric for CCing me. > > > > > > > > On Thu, Sep 25, 2025 at 7:32 AM Eric Dumazet <edumazet@google.com> wrote: > > > > > > > > > > On Mon, Sep 22, 2025 at 4:57 AM Wang Liang <wangliang74@huawei.com> wrote: > > > > > > > > > > > > The syzbot report a crash: > > > > > > > > > > > > Oops: general protection fault, probably for non-canonical address 0xfbd5a5d5a0000003: 0000 [#1] SMP KASAN NOPTI > > > > > > KASAN: maybe wild-memory-access in range [0xdead4ead00000018-0xdead4ead0000001f] > > > > > > CPU: 1 UID: 0 PID: 6949 Comm: syz.0.335 Not tainted syzkaller #0 PREEMPT(full) > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025 > > > > > > RIP: 0010:smc_diag_msg_common_fill net/smc/smc_diag.c:44 [inline] > > > > > > RIP: 0010:__smc_diag_dump.constprop.0+0x3ca/0x2550 net/smc/smc_diag.c:89 > > > > > > Call Trace: > > > > > > <TASK> > > > > > > smc_diag_dump_proto+0x26d/0x420 net/smc/smc_diag.c:217 > > > > > > smc_diag_dump+0x27/0x90 net/smc/smc_diag.c:234 > > > > > > netlink_dump+0x539/0xd30 net/netlink/af_netlink.c:2327 > > > > > > __netlink_dump_start+0x6d6/0x990 net/netlink/af_netlink.c:2442 > > > > > > netlink_dump_start include/linux/netlink.h:341 [inline] > > > > > > smc_diag_handler_dump+0x1f9/0x240 net/smc/smc_diag.c:251 > > > > > > __sock_diag_cmd net/core/sock_diag.c:249 [inline] > > > > > > sock_diag_rcv_msg+0x438/0x790 net/core/sock_diag.c:285 > > > > > > netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2552 > > > > > > netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline] > > > > > > netlink_unicast+0x5a7/0x870 net/netlink/af_netlink.c:1346 > > > > > > netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1896 > > > > > > sock_sendmsg_nosec net/socket.c:714 [inline] > > > > > > __sock_sendmsg net/socket.c:729 [inline] > > > > > > ____sys_sendmsg+0xa95/0xc70 net/socket.c:2614 > > > > > > ___sys_sendmsg+0x134/0x1d0 net/socket.c:2668 > > > > > > __sys_sendmsg+0x16d/0x220 net/socket.c:2700 > > > > > > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > > > > > > do_syscall_64+0xcd/0x4e0 arch/x86/entry/syscall_64.c:94 > > > > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > > </TASK> > > > > > > > > > > > > The process like this: > > > > > > > > > > > > (CPU1) | (CPU2) > > > > > > ---------------------------------|------------------------------- > > > > > > inet_create() | > > > > > > // init clcsock to NULL | > > > > > > sk = sk_alloc() | > > > > > > | > > > > > > // unexpectedly change clcsock | > > > > > > inet_init_csk_locks() | > > > > > > | > > > > > > // add sk to hash table | > > > > > > smc_inet_init_sock() | > > > > > > smc_sk_init() | > > > > > > smc_hash_sk() | > > > > > > | // traverse the hash table > > > > > > | smc_diag_dump_proto > > > > > > | __smc_diag_dump() > > > > > > | // visit wrong clcsock > > > > > > | smc_diag_msg_common_fill() > > > > > > // alloc clcsock | > > > > > > smc_create_clcsk | > > > > > > sock_create_kern | > > > > > > > > > > > > With CONFIG_DEBUG_LOCK_ALLOC=y, the smc->clcsock is unexpectedly changed > > > > > > in inet_init_csk_locks(), because the struct smc_sock does not have struct > > > > > > inet_connection_sock as the first member. > > > > > > > > > > > > Previous commit 60ada4fe644e ("smc: Fix various oops due to inet_sock type > > > > > > confusion.") add inet_sock as the first member of smc_sock. For protocol > > > > > > with INET_PROTOSW_ICSK, use inet_connection_sock instead of inet_sock is > > > > > > more appropriate. > > > > > > > > Why is INET_PROTOSW_ICSK necessary in the first place ? > > > > > > > > I don't see a clear reason because smc_clcsock_accept() allocates > > > > a new sock by smc_sock_alloc() and does not use inet_accept(). > > > > > > > > Or is there any other path where smc_sock is cast to > > > > inet_connection_sock ? > > > > > > What I saw in this code was a missing protection. > > > > > > smc_diag_msg_common_fill() runs without socket lock being held. > > > > > > I was thinking of this fix, but apparently syzbot still got crashes. > > > > Looking at the test result, > > > > https://syzkaller.appspot.com/x/report.txt?x=15944c7c580000 > > KASAN: maybe wild-memory-access in range [0xdead4ead00000018-0xdead4ead0000001f] > > > > the top half of the address is SPINLOCK_MAGIC (0xdead4ead), > > so the type confusion mentioned in the commit message makes > > sense to me. > > > > $ pahole -C inet_connection_sock vmlinux > > struct inet_connection_sock { > > ... > > struct request_sock_queue icsk_accept_queue; /* 992 80 */ > > > > $ pahole -C smc_sock vmlinux > > struct smc_sock { > > ... > > struct socket * clcsock; /* 992 8 */ > > > > The option is 1) let inet_init_csk_locks() init inet_connection_sock > > or 2) avoid inet_init_csk_locks(), and I guess 2) could be better to > > avoid potential issues in IS_ICSK branches. > > > > I definitely vote to remove INET_PROTOSW_ICSK from smc. > > We want to reserve inet_connection_sock to TCP only, so that we can > move fields to better > cache friendly locations in tcp_sock hopefully for linux-6.19 Fully agreed. Wang: please squash the revert of 6fd27ea183c2 for INET_PROTOSW_ICSK removal. This is for one of IS_ICSK branches.
在 2025/9/26 3:51, Kuniyuki Iwashima 写道: > On Thu, Sep 25, 2025 at 12:37 PM Eric Dumazet <edumazet@google.com> wrote: >> On Thu, Sep 25, 2025 at 12:25 PM Kuniyuki Iwashima <kuniyu@google.com> wrote: >>> On Thu, Sep 25, 2025 at 11:54 AM Eric Dumazet <edumazet@google.com> wrote: >>>> On Thu, Sep 25, 2025 at 11:46 AM Kuniyuki Iwashima <kuniyu@google.com> wrote: >>>>> Thanks Eric for CCing me. >>>>> >>>>> On Thu, Sep 25, 2025 at 7:32 AM Eric Dumazet <edumazet@google.com> wrote: >>>>>> On Mon, Sep 22, 2025 at 4:57 AM Wang Liang <wangliang74@huawei.com> wrote: >>>>>>> The syzbot report a crash: >>>>>>> >>>>>>> Oops: general protection fault, probably for non-canonical address 0xfbd5a5d5a0000003: 0000 [#1] SMP KASAN NOPTI >>>>>>> KASAN: maybe wild-memory-access in range [0xdead4ead00000018-0xdead4ead0000001f] >>>>>>> CPU: 1 UID: 0 PID: 6949 Comm: syz.0.335 Not tainted syzkaller #0 PREEMPT(full) >>>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025 >>>>>>> RIP: 0010:smc_diag_msg_common_fill net/smc/smc_diag.c:44 [inline] >>>>>>> RIP: 0010:__smc_diag_dump.constprop.0+0x3ca/0x2550 net/smc/smc_diag.c:89 >>>>>>> Call Trace: >>>>>>> <TASK> >>>>>>> smc_diag_dump_proto+0x26d/0x420 net/smc/smc_diag.c:217 >>>>>>> smc_diag_dump+0x27/0x90 net/smc/smc_diag.c:234 >>>>>>> netlink_dump+0x539/0xd30 net/netlink/af_netlink.c:2327 >>>>>>> __netlink_dump_start+0x6d6/0x990 net/netlink/af_netlink.c:2442 >>>>>>> netlink_dump_start include/linux/netlink.h:341 [inline] >>>>>>> smc_diag_handler_dump+0x1f9/0x240 net/smc/smc_diag.c:251 >>>>>>> __sock_diag_cmd net/core/sock_diag.c:249 [inline] >>>>>>> sock_diag_rcv_msg+0x438/0x790 net/core/sock_diag.c:285 >>>>>>> netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2552 >>>>>>> netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline] >>>>>>> netlink_unicast+0x5a7/0x870 net/netlink/af_netlink.c:1346 >>>>>>> netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1896 >>>>>>> sock_sendmsg_nosec net/socket.c:714 [inline] >>>>>>> __sock_sendmsg net/socket.c:729 [inline] >>>>>>> ____sys_sendmsg+0xa95/0xc70 net/socket.c:2614 >>>>>>> ___sys_sendmsg+0x134/0x1d0 net/socket.c:2668 >>>>>>> __sys_sendmsg+0x16d/0x220 net/socket.c:2700 >>>>>>> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] >>>>>>> do_syscall_64+0xcd/0x4e0 arch/x86/entry/syscall_64.c:94 >>>>>>> entry_SYSCALL_64_after_hwframe+0x77/0x7f >>>>>>> </TASK> >>>>>>> >>>>>>> The process like this: >>>>>>> >>>>>>> (CPU1) | (CPU2) >>>>>>> ---------------------------------|------------------------------- >>>>>>> inet_create() | >>>>>>> // init clcsock to NULL | >>>>>>> sk = sk_alloc() | >>>>>>> | >>>>>>> // unexpectedly change clcsock | >>>>>>> inet_init_csk_locks() | >>>>>>> | >>>>>>> // add sk to hash table | >>>>>>> smc_inet_init_sock() | >>>>>>> smc_sk_init() | >>>>>>> smc_hash_sk() | >>>>>>> | // traverse the hash table >>>>>>> | smc_diag_dump_proto >>>>>>> | __smc_diag_dump() >>>>>>> | // visit wrong clcsock >>>>>>> | smc_diag_msg_common_fill() >>>>>>> // alloc clcsock | >>>>>>> smc_create_clcsk | >>>>>>> sock_create_kern | >>>>>>> >>>>>>> With CONFIG_DEBUG_LOCK_ALLOC=y, the smc->clcsock is unexpectedly changed >>>>>>> in inet_init_csk_locks(), because the struct smc_sock does not have struct >>>>>>> inet_connection_sock as the first member. >>>>>>> >>>>>>> Previous commit 60ada4fe644e ("smc: Fix various oops due to inet_sock type >>>>>>> confusion.") add inet_sock as the first member of smc_sock. For protocol >>>>>>> with INET_PROTOSW_ICSK, use inet_connection_sock instead of inet_sock is >>>>>>> more appropriate. >>>>> Why is INET_PROTOSW_ICSK necessary in the first place ? >>>>> >>>>> I don't see a clear reason because smc_clcsock_accept() allocates >>>>> a new sock by smc_sock_alloc() and does not use inet_accept(). >>>>> >>>>> Or is there any other path where smc_sock is cast to >>>>> inet_connection_sock ? >>>> What I saw in this code was a missing protection. >>>> >>>> smc_diag_msg_common_fill() runs without socket lock being held. >>>> >>>> I was thinking of this fix, but apparently syzbot still got crashes. >>> Looking at the test result, >>> >>> https://syzkaller.appspot.com/x/report.txt?x=15944c7c580000 >>> KASAN: maybe wild-memory-access in range [0xdead4ead00000018-0xdead4ead0000001f] >>> >>> the top half of the address is SPINLOCK_MAGIC (0xdead4ead), >>> so the type confusion mentioned in the commit message makes >>> sense to me. >>> >>> $ pahole -C inet_connection_sock vmlinux >>> struct inet_connection_sock { >>> ... >>> struct request_sock_queue icsk_accept_queue; /* 992 80 */ >>> >>> $ pahole -C smc_sock vmlinux >>> struct smc_sock { >>> ... >>> struct socket * clcsock; /* 992 8 */ >>> >>> The option is 1) let inet_init_csk_locks() init inet_connection_sock >>> or 2) avoid inet_init_csk_locks(), and I guess 2) could be better to >>> avoid potential issues in IS_ICSK branches. >>> >> I definitely vote to remove INET_PROTOSW_ICSK from smc. >> >> We want to reserve inet_connection_sock to TCP only, so that we can >> move fields to better >> cache friendly locations in tcp_sock hopefully for linux-6.19 > Fully agreed. > > Wang: please squash the revert of 6fd27ea183c2 for > INET_PROTOSW_ICSK removal. This is for one of > IS_ICSK branches. Thanks for your suggestions, they are helpful! I will remove INET_PROTOSW_ICSK from smc_inet_protosw and smc_inet6_protosw, and revert 6fd27ea183c2 ("net/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC") in one patchset later. ------ Best regards Wang Liang
© 2016 - 2025 Red Hat, Inc.