Signal delivery during connect() may lead to a disconnect of an already
established socket. That involves removing socket from any sockmap and
resetting state to SS_UNCONNECTED. While it correctly restores socket's
proto, a call to vsock_bpf_recvmsg() might have been already under way in
another thread. If the connect()ing thread reassigns the vsock transport to
NULL, the recvmsg()ing thread may trigger a WARN_ON_ONCE.
connect
/ state = SS_CONNECTED /
sock_map_update_elem
vsock_bpf_recvmsg
psock = sk_psock_get()
lock sk
if signal_pending
unhash
sock_map_remove_links
state = SS_UNCONNECTED
release sk
connect
transport = NULL
lock sk
WARN_ON_ONCE(!vsk->transport)
Protect recvmsg() from racing against transport reassignment. Enforce the
sockmap invariant that psock implies transport: lock socket before getting
psock.
WARNING: CPU: 9 PID: 1222 at net/vmw_vsock/vsock_bpf.c:92 vsock_bpf_recvmsg+0xb55/0xe00
CPU: 9 UID: 0 PID: 1222 Comm: a.out Not tainted 6.14.0-rc5+
RIP: 0010:vsock_bpf_recvmsg+0xb55/0xe00
sock_recvmsg+0x1b2/0x220
__sys_recvfrom+0x190/0x270
__x64_sys_recvfrom+0xdc/0x1b0
do_syscall_64+0x93/0x1b0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fixes: 634f1a7110b4 ("vsock: support sockmap")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
net/vmw_vsock/vsock_bpf.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c
index c68fdaf09046b68254dac3ea70ffbe73dfa45cef..5138195d91fb258d4bc09b48e80e13651d62863a 100644
--- a/net/vmw_vsock/vsock_bpf.c
+++ b/net/vmw_vsock/vsock_bpf.c
@@ -73,28 +73,35 @@ static int __vsock_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int
return err;
}
-static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
- size_t len, int flags, int *addr_len)
+static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
+ int flags, int *addr_len)
{
struct sk_psock *psock;
struct vsock_sock *vsk;
int copied;
+ /* Since signal delivery during connect() may reset the state of socket
+ * that's already in a sockmap, take the lock before checking on psock.
+ * This serializes a possible transport reassignment, protecting this
+ * function from running with NULL transport.
+ */
+ lock_sock(sk);
+
psock = sk_psock_get(sk);
- if (unlikely(!psock))
+ if (unlikely(!psock)) {
+ release_sock(sk);
return __vsock_recvmsg(sk, msg, len, flags);
+ }
- lock_sock(sk);
vsk = vsock_sk(sk);
-
if (WARN_ON_ONCE(!vsk->transport)) {
copied = -ENODEV;
goto out;
}
if (vsock_has_data(sk, psock) && sk_psock_queue_empty(psock)) {
- release_sock(sk);
sk_psock_put(sk, psock);
+ release_sock(sk);
return __vsock_recvmsg(sk, msg, len, flags);
}
@@ -108,8 +115,8 @@ static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
}
if (sk_psock_queue_empty(psock)) {
- release_sock(sk);
sk_psock_put(sk, psock);
+ release_sock(sk);
return __vsock_recvmsg(sk, msg, len, flags);
}
@@ -117,8 +124,8 @@ static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
}
out:
- release_sock(sk);
sk_psock_put(sk, psock);
+ release_sock(sk);
return copied;
}
--
2.48.1
On Mon, Mar 17, 2025 at 10:52:25AM +0100, Michal Luczaj wrote: > Signal delivery during connect() may lead to a disconnect of an already > established socket. That involves removing socket from any sockmap and > resetting state to SS_UNCONNECTED. While it correctly restores socket's > proto, a call to vsock_bpf_recvmsg() might have been already under way in > another thread. If the connect()ing thread reassigns the vsock transport to > NULL, the recvmsg()ing thread may trigger a WARN_ON_ONCE. > > connect > / state = SS_CONNECTED / > sock_map_update_elem > vsock_bpf_recvmsg > psock = sk_psock_get() > lock sk > if signal_pending > unhash > sock_map_remove_links So vsock's ->recvmsg() should be restored after this, right? Then how is vsock_bpf_recvmsg() called afterward? > state = SS_UNCONNECTED > release sk > > connect > transport = NULL > lock sk > WARN_ON_ONCE(!vsk->transport) > And I am wondering why we need to WARN here since we can handle this error case correctly? Thanks.
On 3/19/25 23:18, Cong Wang wrote: > On Mon, Mar 17, 2025 at 10:52:25AM +0100, Michal Luczaj wrote: >> Signal delivery during connect() may lead to a disconnect of an already >> established socket. That involves removing socket from any sockmap and >> resetting state to SS_UNCONNECTED. While it correctly restores socket's >> proto, a call to vsock_bpf_recvmsg() might have been already under way in >> another thread. If the connect()ing thread reassigns the vsock transport to >> NULL, the recvmsg()ing thread may trigger a WARN_ON_ONCE. >> *THREAD 1* *THREAD 2* >> connect >> / state = SS_CONNECTED / >> sock_map_update_elem >> vsock_bpf_recvmsg >> psock = sk_psock_get() >> lock sk >> if signal_pending >> unhash >> sock_map_remove_links > > So vsock's ->recvmsg() should be restored after this, right? Then how is > vsock_bpf_recvmsg() called afterward? I'm not sure I understand the question, so I've added a header above: those are 2 parallel flows of execution. vsock_bpf_recvmsg() wasn't called afterwards. It was called before sock_map_remove_links(). Note that at the time of sock_map_remove_links() (in T1), vsock_bpf_recvmsg() is still executing (in T2). >> state = SS_UNCONNECTED >> release sk >> >> connect >> transport = NULL >> lock sk >> WARN_ON_ONCE(!vsk->transport) >> > > And I am wondering why we need to WARN here since we can handle this error > case correctly? The WARN and transport check are here for defensive measures, and to state a contract. But I think I get your point. If we accept for a fact of life that BPF code should be able to handle transport disappearing - then WARN can be removed (while keeping the check) and this patch can be dropped. My aim, instead, was to keep things consistent. By which I mean sticking to the conditions expressed in vsock_bpf_update_proto() as invariants; so that vsock with a psock is guaranteed to have transport assigned.
On Thu, Mar 20, 2025 at 01:05:27PM +0100, Michal Luczaj wrote: > On 3/19/25 23:18, Cong Wang wrote: > > On Mon, Mar 17, 2025 at 10:52:25AM +0100, Michal Luczaj wrote: > >> Signal delivery during connect() may lead to a disconnect of an already > >> established socket. That involves removing socket from any sockmap and > >> resetting state to SS_UNCONNECTED. While it correctly restores socket's > >> proto, a call to vsock_bpf_recvmsg() might have been already under way in > >> another thread. If the connect()ing thread reassigns the vsock transport to > >> NULL, the recvmsg()ing thread may trigger a WARN_ON_ONCE. > >> > > *THREAD 1* *THREAD 2* > > >> connect > >> / state = SS_CONNECTED / > >> sock_map_update_elem > >> vsock_bpf_recvmsg > >> psock = sk_psock_get() > >> lock sk > >> if signal_pending > >> unhash > >> sock_map_remove_links > > > > So vsock's ->recvmsg() should be restored after this, right? Then how is > > vsock_bpf_recvmsg() called afterward? > > I'm not sure I understand the question, so I've added a header above: those > are 2 parallel flows of execution. vsock_bpf_recvmsg() wasn't called > afterwards. It was called before sock_map_remove_links(). Note that at the > time of sock_map_remove_links() (in T1), vsock_bpf_recvmsg() is still > executing (in T2). I thought the above vsock_bpf_recvmsg() on the right side completed before sock_map_remove_links(), sorry for the confusion. > > >> state = SS_UNCONNECTED > >> release sk > >> > >> connect > >> transport = NULL > >> lock sk > >> WARN_ON_ONCE(!vsk->transport) > >> > > > > And I am wondering why we need to WARN here since we can handle this error > > case correctly? > > The WARN and transport check are here for defensive measures, and to state > a contract. > > But I think I get your point. If we accept for a fact of life that BPF code > should be able to handle transport disappearing - then WARN can be removed > (while keeping the check) and this patch can be dropped. I am thinking whether we have more elegant way to handle this case, WARN looks not pretty. > > My aim, instead, was to keep things consistent. By which I mean sticking to > the conditions expressed in vsock_bpf_update_proto() as invariants; so that > vsock with a psock is guaranteed to have transport assigned. Other than the WARN, I am also concerned about locking vsock_bpf_recvmsg() because for example UDP is (almost) lockless, so enforcing the sock lock for all vsock types looks not flexible and may hurt performance. Maybe it is time to let vsock_bpf_rebuild_protos() build different hooks for different struct proto (as we did for TCP/UDP)? Thanks.
On 3/20/25 21:54, Cong Wang wrote: > On Thu, Mar 20, 2025 at 01:05:27PM +0100, Michal Luczaj wrote: >> On 3/19/25 23:18, Cong Wang wrote: >>> On Mon, Mar 17, 2025 at 10:52:25AM +0100, Michal Luczaj wrote: >>>> Signal delivery during connect() may lead to a disconnect of an already >>>> established socket. That involves removing socket from any sockmap and >>>> resetting state to SS_UNCONNECTED. While it correctly restores socket's >>>> proto, a call to vsock_bpf_recvmsg() might have been already under way in >>>> another thread. If the connect()ing thread reassigns the vsock transport to >>>> NULL, the recvmsg()ing thread may trigger a WARN_ON_ONCE. >>>> >> >> *THREAD 1* *THREAD 2* >> >>>> connect >>>> / state = SS_CONNECTED / >>>> sock_map_update_elem >>>> vsock_bpf_recvmsg >>>> psock = sk_psock_get() >>>> lock sk >>>> if signal_pending >>>> unhash >>>> sock_map_remove_links >>> >>> So vsock's ->recvmsg() should be restored after this, right? Then how is >>> vsock_bpf_recvmsg() called afterward? >> >> I'm not sure I understand the question, so I've added a header above: those >> are 2 parallel flows of execution. vsock_bpf_recvmsg() wasn't called >> afterwards. It was called before sock_map_remove_links(). Note that at the >> time of sock_map_remove_links() (in T1), vsock_bpf_recvmsg() is still >> executing (in T2). > > I thought the above vsock_bpf_recvmsg() on the right side completed > before sock_map_remove_links(), sorry for the confusion. No problem, I see why you've might. Perhaps deeper indentation would make things clearer. >>>> state = SS_UNCONNECTED >>>> release sk >>>> >>>> connect >>>> transport = NULL >>>> lock sk >>>> WARN_ON_ONCE(!vsk->transport) >>>> >>> >>> And I am wondering why we need to WARN here since we can handle this error >>> case correctly? >> >> The WARN and transport check are here for defensive measures, and to state >> a contract. >> >> But I think I get your point. If we accept for a fact of life that BPF code >> should be able to handle transport disappearing - then WARN can be removed >> (while keeping the check) and this patch can be dropped. > > I am thinking whether we have more elegant way to handle this case, > WARN looks not pretty. Since the case should never happen, I like to think of WARN as a deliberate eyesore :) >> My aim, instead, was to keep things consistent. By which I mean sticking to >> the conditions expressed in vsock_bpf_update_proto() as invariants; so that >> vsock with a psock is guaranteed to have transport assigned. > > Other than the WARN, I am also concerned about locking vsock_bpf_recvmsg() > because for example UDP is (almost) lockless, so enforcing the sock lock > for all vsock types looks not flexible and may hurt performance. > > Maybe it is time to let vsock_bpf_rebuild_protos() build different hooks > for different struct proto (as we did for TCP/UDP)? By UDP you mean vsock SOCK_DGRAM? No need to worry. VMCI is the only transport that features VSOCK_TRANSPORT_F_DGRAM, but it does not implemented read_skb() callback, making it unsupported by BPF/sockmap.
On Mon, Mar 17, 2025 at 10:52:25AM +0100, Michal Luczaj wrote:
>Signal delivery during connect() may lead to a disconnect of an already
>established socket. That involves removing socket from any sockmap and
>resetting state to SS_UNCONNECTED. While it correctly restores socket's
>proto, a call to vsock_bpf_recvmsg() might have been already under way in
>another thread. If the connect()ing thread reassigns the vsock transport to
>NULL, the recvmsg()ing thread may trigger a WARN_ON_ONCE.
>
>connect
> / state = SS_CONNECTED /
> sock_map_update_elem
> vsock_bpf_recvmsg
> psock = sk_psock_get()
> lock sk
> if signal_pending
> unhash
> sock_map_remove_links
> state = SS_UNCONNECTED
> release sk
>
>connect
> transport = NULL
> lock sk
> WARN_ON_ONCE(!vsk->transport)
>
>Protect recvmsg() from racing against transport reassignment. Enforce the
>sockmap invariant that psock implies transport: lock socket before getting
>psock.
>
>WARNING: CPU: 9 PID: 1222 at net/vmw_vsock/vsock_bpf.c:92 vsock_bpf_recvmsg+0xb55/0xe00
>CPU: 9 UID: 0 PID: 1222 Comm: a.out Not tainted 6.14.0-rc5+
>RIP: 0010:vsock_bpf_recvmsg+0xb55/0xe00
> sock_recvmsg+0x1b2/0x220
> __sys_recvfrom+0x190/0x270
> __x64_sys_recvfrom+0xdc/0x1b0
> do_syscall_64+0x93/0x1b0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
>Fixes: 634f1a7110b4 ("vsock: support sockmap")
>Signed-off-by: Michal Luczaj <mhal@rbox.co>
>---
> net/vmw_vsock/vsock_bpf.c | 23 +++++++++++++++--------
> 1 file changed, 15 insertions(+), 8 deletions(-)
>
>diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c
>index c68fdaf09046b68254dac3ea70ffbe73dfa45cef..5138195d91fb258d4bc09b48e80e13651d62863a 100644
>--- a/net/vmw_vsock/vsock_bpf.c
>+++ b/net/vmw_vsock/vsock_bpf.c
>@@ -73,28 +73,35 @@ static int __vsock_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int
> return err;
> }
>
>-static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
>- size_t len, int flags, int *addr_len)
>+static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>+ int flags, int *addr_len)
I would avoid this change, especially in a patch with the Fixes tag then
to be backported.
> {
> struct sk_psock *psock;
> struct vsock_sock *vsk;
> int copied;
>
>+ /* Since signal delivery during connect() may reset the state of socket
>+ * that's already in a sockmap, take the lock before checking on psock.
>+ * This serializes a possible transport reassignment, protecting this
>+ * function from running with NULL transport.
>+ */
>+ lock_sock(sk);
>+
> psock = sk_psock_get(sk);
>- if (unlikely(!psock))
>+ if (unlikely(!psock)) {
>+ release_sock(sk);
> return __vsock_recvmsg(sk, msg, len, flags);
>+ }
>
>- lock_sock(sk);
> vsk = vsock_sk(sk);
>-
> if (WARN_ON_ONCE(!vsk->transport)) {
> copied = -ENODEV;
> goto out;
> }
>
> if (vsock_has_data(sk, psock) && sk_psock_queue_empty(psock)) {
>- release_sock(sk);
> sk_psock_put(sk, psock);
>+ release_sock(sk);
But here we release it, so can still a reset happen at this point,
before calling __vsock_connectible_recvmsg().
In there anyway we handle the case where transport is null, so there's
no problem, right?
The rest LTGM.
Thanks,
Stefano
> return __vsock_recvmsg(sk, msg, len, flags);
> }
>
>@@ -108,8 +115,8 @@ static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
> }
>
> if (sk_psock_queue_empty(psock)) {
>- release_sock(sk);
> sk_psock_put(sk, psock);
>+ release_sock(sk);
> return __vsock_recvmsg(sk, msg, len, flags);
> }
>
>@@ -117,8 +124,8 @@ static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
> }
>
> out:
>- release_sock(sk);
> sk_psock_put(sk, psock);
>+ release_sock(sk);
>
> return copied;
> }
>
>--
>2.48.1
>
On 3/19/25 10:34, Stefano Garzarella wrote:
> On Mon, Mar 17, 2025 at 10:52:25AM +0100, Michal Luczaj wrote:
>> ...
>> -static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
>> - size_t len, int flags, int *addr_len)
>> +static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>> + int flags, int *addr_len)
>
> I would avoid this change, especially in a patch with the Fixes tag then
> to be backported.
I thought that since I've modified this function in so many places, doing
this wouldn't hurt. But ok, I'll drop this change.
>> {
>> struct sk_psock *psock;
>> struct vsock_sock *vsk;
>> int copied;
>>
>> + /* Since signal delivery during connect() may reset the state of socket
>> + * that's already in a sockmap, take the lock before checking on psock.
>> + * This serializes a possible transport reassignment, protecting this
>> + * function from running with NULL transport.
>> + */
>> + lock_sock(sk);
>> +
>> psock = sk_psock_get(sk);
>> - if (unlikely(!psock))
>> + if (unlikely(!psock)) {
>> + release_sock(sk);
>> return __vsock_recvmsg(sk, msg, len, flags);
>> + }
>>
>> - lock_sock(sk);
>> vsk = vsock_sk(sk);
>> -
>> if (WARN_ON_ONCE(!vsk->transport)) {
>> copied = -ENODEV;
>> goto out;
>> }
>>
>> if (vsock_has_data(sk, psock) && sk_psock_queue_empty(psock)) {
>> - release_sock(sk);
>> sk_psock_put(sk, psock);
>> + release_sock(sk);
>
> But here we release it, so can still a reset happen at this point,
> before calling __vsock_connectible_recvmsg().
> In there anyway we handle the case where transport is null, so there's
> no problem, right?
Yes, I think we're good. That function needs to gracefully handle being
called without a transport, and it does.
Thanks,
Michal
© 2016 - 2025 Red Hat, Inc.