net/core/skmsg.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
When the stream_verdict program returns SK_PASS, it places the received skb
into its own receive queue, but a recursive lock eventually occurs, leading
to an operating system deadlock. This issue has been present since v6.9.
'''
sk_psock_strp_data_ready
write_lock_bh(&sk->sk_callback_lock)
strp_data_ready
strp_read_sock
read_sock -> tcp_read_sock
strp_recv
cb.rcv_msg -> sk_psock_strp_read
# now stream_verdict return SK_PASS without peer sock assign
__SK_PASS = sk_psock_map_verd(SK_PASS, NULL)
sk_psock_verdict_apply
sk_psock_skb_ingress_self
sk_psock_skb_ingress_enqueue
sk_psock_data_ready
read_lock_bh(&sk->sk_callback_lock) <= dead lock
'''
This topic has been discussed before, but it has not been fixed.
Previous discussion:
https://lore.kernel.org/all/6684a5864ec86_403d20898@john.notmuch
Fixes: 6648e613226e ("bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue")
Reported-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
Signed-off-by: Jiayuan Chen <mrpre@163.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
net/core/skmsg.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index b1dcbd3be89e..e90fbab703b2 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -1117,9 +1117,9 @@ static void sk_psock_strp_data_ready(struct sock *sk)
if (tls_sw_has_ctx_rx(sk)) {
psock->saved_data_ready(sk);
} else {
- write_lock_bh(&sk->sk_callback_lock);
+ read_lock_bh(&sk->sk_callback_lock);
strp_data_ready(&psock->strp);
- write_unlock_bh(&sk->sk_callback_lock);
+ read_unlock_bh(&sk->sk_callback_lock);
}
}
rcu_read_unlock();
--
2.43.5
On 11/6/24 4:44 AM, mrpre wrote: > When the stream_verdict program returns SK_PASS, it places the received skb > into its own receive queue, but a recursive lock eventually occurs, leading > to an operating system deadlock. This issue has been present since v6.9. > > ''' > sk_psock_strp_data_ready > write_lock_bh(&sk->sk_callback_lock) > strp_data_ready > strp_read_sock > read_sock -> tcp_read_sock > strp_recv > cb.rcv_msg -> sk_psock_strp_read > # now stream_verdict return SK_PASS without peer sock assign > __SK_PASS = sk_psock_map_verd(SK_PASS, NULL) > sk_psock_verdict_apply > sk_psock_skb_ingress_self > sk_psock_skb_ingress_enqueue > sk_psock_data_ready > read_lock_bh(&sk->sk_callback_lock) <= dead lock > > ''' > > This topic has been discussed before, but it has not been fixed. > Previous discussion: > https://lore.kernel.org/all/6684a5864ec86_403d20898@john.notmuch Is the selftest included in this link still useful to reproduce this bug? If yes, please include that also. > > Fixes: 6648e613226e ("bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue") > Reported-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com> > Signed-off-by: Jiayuan Chen <mrpre@163.com> Please also use the real name in the author (i.e. the email sender). The patch needs a real author name also. I had manually fixed one of your earlier lock_sock fix before applying. pw-bot: cr > Signed-off-by: John Fastabend <john.fastabend@gmail.com> The patch and the earlier discussion make sense to me. John and JakubS, please help to take another look in the next respin.
On 11/8/24 1:03 PM, Martin KaFai Lau wrote: > On 11/6/24 4:44 AM, mrpre wrote: >> When the stream_verdict program returns SK_PASS, it places the received skb >> into its own receive queue, but a recursive lock eventually occurs, leading >> to an operating system deadlock. This issue has been present since v6.9. >> >> ''' >> sk_psock_strp_data_ready >> write_lock_bh(&sk->sk_callback_lock) >> strp_data_ready >> strp_read_sock >> read_sock -> tcp_read_sock >> strp_recv >> cb.rcv_msg -> sk_psock_strp_read >> # now stream_verdict return SK_PASS without peer sock assign >> __SK_PASS = sk_psock_map_verd(SK_PASS, NULL) >> sk_psock_verdict_apply >> sk_psock_skb_ingress_self >> sk_psock_skb_ingress_enqueue >> sk_psock_data_ready >> read_lock_bh(&sk->sk_callback_lock) <= dead lock >> >> ''' >> >> This topic has been discussed before, but it has not been fixed. >> Previous discussion: >> https://lore.kernel.org/all/6684a5864ec86_403d20898@john.notmuch > > Is the selftest included in this link still useful to reproduce this bug? > If yes, please include that also. > >> >> Fixes: 6648e613226e ("bpf, skmsg: Fix NULL pointer dereference in >> sk_psock_skb_ingress_enqueue") >> Reported-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com> >> Signed-off-by: Jiayuan Chen <mrpre@163.com> > > Please also use the real name in the author (i.e. the email sender). The patch > needs a real author name also. I had manually fixed one of your earlier > lock_sock fix before applying. and the bpf mailing list address has a typo in the original patch email... I fixed that in this reply. > > pw-bot: cr > >> Signed-off-by: John Fastabend <john.fastabend@gmail.com> > > The patch and the earlier discussion make sense to me. > John and JakubS, please help to take another look in the next respin. > >
© 2016 - 2024 Red Hat, Inc.