[v2] mptcp: fix fallback-related races

[PATCH mptcp-net v2 3/5] mptcp: fix status reset on disconnect()

Posted by Paolo Abeni 2 months, 2 weeks ago

disconnect() can still race with subflow fallback leaving the msk in
inconsistent status. To avoid such race, disconnect() must be blocking,
error out if the argument prevent such wait: the socket will linger
in SS_DISCONNECT state forever.

Fixes: b29fcfb54cd7 ("mptcp: full disconnect implementation")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 71b258aebcd9..0dc55a17f274 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3119,8 +3119,19 @@ static void mptcp_copy_inaddrs(struct sock *msk, const struct sock *ssk)
 	inet_sk(msk)->inet_rcv_saddr = inet_sk(ssk)->inet_rcv_saddr;
 }
 
+/* can safely clear the fallback status only if no subflow is going to
+ * push or receive any data on the wire
+ */
+static bool mptcp_can_reset_fb_status(const struct mptcp_sock *msk)
+{
+	return list_empty(&msk->conn_list) ||
+	       inet_sk_state_load(msk->first) == TCP_CLOSE;
+}
+
 static int mptcp_disconnect(struct sock *sk, int flags)
 {
+	long timeout = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
+	DEFINE_WAIT_FUNC(wait, woken_wake_function);
 	struct mptcp_sock *msk = mptcp_sk(sk);
 
 	/* We are on the fastopen error path. We can't call straight into the
@@ -3142,7 +3153,34 @@ static int mptcp_disconnect(struct sock *sk, int flags)
 	 * subflow
 	 */
 	mptcp_destroy_common(msk, MPTCP_CF_FASTCLOSE);
+
+	if (!mptcp_can_reset_fb_status(msk)) {
+		if (!timeout)
+			return -EAGAIN;
+
+		add_wait_queue(sk_sleep(sk), &wait);
+
+		/* __mptcp_close_ssk() will wake us */
+		do {
+			if (sk_wait_event(sk, &timeout,
+					  mptcp_can_reset_fb_status(msk),
+					  &wait))
+				break;
+		} while (!signal_pending(current) && timeout);
+
+		remove_wait_queue(sk_sleep(sk), &wait);
+
+		/* timeout/signal can prevent reaching the expected status */
+		if (!mptcp_can_reset_fb_status(msk))
+			return -EBUSY;
+	}
+
+	spin_lock_bh(&msk->fallback_lock);
+	msk->allow_subflows = true;
+	msk->allow_infinite_fallback = true;
 	WRITE_ONCE(msk->flags, 0);
+	spin_unlock_bh(&msk->fallback_lock);
+
 	msk->cb_flags = 0;
 	msk->recovery = false;
 	WRITE_ONCE(msk->can_ack, false);
-- 
2.49.0

Re: [PATCH mptcp-net v2 3/5] mptcp: fix status reset on disconnect()

Posted by Matthieu Baerts 2 months, 1 week ago

Hi Paolo,

On 05/07/2025 09:24, Paolo Abeni wrote:
> disconnect() can still race with subflow fallback leaving the msk in
> inconsistent status. To avoid such race, disconnect() must be blocking,
> error out if the argument prevent such wait: the socket will linger
> in SS_DISCONNECT state forever.

Good idea!

Just to be sure, this is not really a behaviour change compared to TCP,
right?

No risks of breaking some cases? (io_uring, etc.?)

In which cases is this MSG_DONTWAIT flag set with disconnect()?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-net v2 3/5] mptcp: fix status reset on disconnect()

Posted by Paolo Abeni 2 months, 1 week ago

On 7/8/25 5:43 PM, Matthieu Baerts wrote:
> Hi Paolo,
> 
> On 05/07/2025 09:24, Paolo Abeni wrote:
>> disconnect() can still race with subflow fallback leaving the msk in
>> inconsistent status. To avoid such race, disconnect() must be blocking,
>> error out if the argument prevent such wait: the socket will linger
>> in SS_DISCONNECT state forever.
> 
> Good idea!
> 
> Just to be sure, this is not really a behaviour change compared to TCP,
> right?

Actually it is, as the TCP disconnect never blocks. On the flip side the
race actually exists and I haven't find another way to plug it.

An alternative I thought about was to create a new first subflow on
disconnect and let the old one 'detach' from the mptcp socket, but that
was problematic as the 'detached' ssk would have a NULL subflow->conn
and currently we assume that to be ~always != NULL

> No risks of breaking some cases? (io_uring, etc.?)

Do io_uring work with mptcp? The disconnect failure when blocking is not
possible should really never be a problem from a functional PoV:
disconnect() can fail per documentation and the application should
handle that case.

Blocking could cause some additional latency or slowdown in that code
path. I think it should be not problematic, but I have no real idea and
tend to be too optimistic WRT this kind of issues.

I suggest not sending this to stable (with a note in the commit message).

> In which cases is this MSG_DONTWAIT flag set with disconnect()?

Typo in the patch, should be O_NONBLOCK. The socket layer, the vfs layer
 and iouring could set that.

/P

Re: [PATCH mptcp-net v2 3/5] mptcp: fix status reset on disconnect()

Posted by Matthieu Baerts 2 months, 1 week ago

Hi Paolo,

Thank you for your reply!

On 09/07/2025 11:27, Paolo Abeni wrote:
> 
> 
> On 7/8/25 5:43 PM, Matthieu Baerts wrote:
>> Hi Paolo,
>>
>> On 05/07/2025 09:24, Paolo Abeni wrote:
>>> disconnect() can still race with subflow fallback leaving the msk in
>>> inconsistent status. To avoid such race, disconnect() must be blocking,
>>> error out if the argument prevent such wait: the socket will linger
>>> in SS_DISCONNECT state forever.
>>
>> Good idea!
>>
>> Just to be sure, this is not really a behaviour change compared to TCP,
>> right?
> 
> Actually it is, as the TCP disconnect never blocks. On the flip side the
> race actually exists and I haven't find another way to plug it.
> 
> An alternative I thought about was to create a new first subflow on
> disconnect and let the old one 'detach' from the mptcp socket, but that
> was problematic as the 'detached' ssk would have a NULL subflow->conn
> and currently we assume that to be ~always != NULL

Indeed, it doesn't sound safer :)

>> No risks of breaking some cases? (io_uring, etc.?)
> 
> Do io_uring work with mptcp?

I don't know, I never tried. But I don't see why it wouldn't. (Of
course, not when used with zero copy, etc.)

> The disconnect failure when blocking is not
> possible should really never be a problem from a functional PoV:
> disconnect() can fail per documentation and the application should
> handle that case.

But it could set "O_NONBLOCK" and do the close later, no? Or is this
correctly handled?

> Blocking could cause some additional latency or slowdown in that code
> path. I think it should be not problematic, but I have no real idea and
> tend to be too optimistic WRT this kind of issues.
> 
> I suggest not sending this to stable (with a note in the commit message).

Maybe safer indeed.

>> In which cases is this MSG_DONTWAIT flag set with disconnect()?
> 
> Typo in the patch, should be O_NONBLOCK. The socket layer, the vfs layer
>  and iouring could set that.

So typically what we do in the packetdrill tests. But maybe we don't
check the returned value?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-net v2 3/5] mptcp: fix status reset on disconnect()

Posted by Paolo Abeni 2 months, 1 week ago

On 7/9/25 12:39 PM, Matthieu Baerts wrote:
> On 09/07/2025 11:27, Paolo Abeni wrote:
>> The disconnect failure when blocking is not
>> possible should really never be a problem from a functional PoV:
>> disconnect() can fail per documentation and the application should
>> handle that case.
> 
> But it could set "O_NONBLOCK" and do the close later, no? Or is this
> correctly handled?

Yes and no. Meaning the kernel does not offer any mechanism to complete
gracefully a blocking disconnect returning EAGAIN. In such a case the
socket will stay in SS_DISCONNECTING (socket) state until close time.

>>> In which cases is this MSG_DONTWAIT flag set with disconnect()?
>>
>> Typo in the patch, should be O_NONBLOCK. The socket layer, the vfs layer
>>  and iouring could set that.
> 
> So typically what we do in the packetdrill tests. But maybe we don't
> check the returned value?

The only relevant pktdrill test I could find is
`disconnect_after_accept.pkt` and it checks the return value.

Let me check how it will run after the MSG_DONTWAIT -> O_NONBLOCK fix

/P

Re: [PATCH mptcp-net v2 3/5] mptcp: fix status reset on disconnect()

Posted by Matthieu Baerts 2 months, 1 week ago

On 09/07/2025 12:49, Paolo Abeni wrote:
> On 7/9/25 12:39 PM, Matthieu Baerts wrote:
>> On 09/07/2025 11:27, Paolo Abeni wrote:
>>> The disconnect failure when blocking is not
>>> possible should really never be a problem from a functional PoV:
>>> disconnect() can fail per documentation and the application should
>>> handle that case.
>>
>> But it could set "O_NONBLOCK" and do the close later, no? Or is this
>> correctly handled?
> 
> Yes and no. Meaning the kernel does not offer any mechanism to complete
> gracefully a blocking disconnect returning EAGAIN. In such a case the
> socket will stay in SS_DISCONNECTING (socket) state until close time.
> 
>>>> In which cases is this MSG_DONTWAIT flag set with disconnect()?
>>>
>>> Typo in the patch, should be O_NONBLOCK. The socket layer, the vfs layer
>>>  and iouring could set that.
>>
>> So typically what we do in the packetdrill tests. But maybe we don't
>> check the returned value?
> 
> The only relevant pktdrill test I could find is
> `disconnect_after_accept.pkt` and it checks the return value.
> 
> Let me check how it will run after the MSG_DONTWAIT -> O_NONBLOCK fix

Thank you!

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-net v2 3/5] mptcp: fix status reset on disconnect()

Posted by Paolo Abeni 2 months, 1 week ago

On 7/9/25 12:55 PM, Matthieu Baerts wrote:
> On 09/07/2025 12:49, Paolo Abeni wrote:
>> On 7/9/25 12:39 PM, Matthieu Baerts wrote:
>>> On 09/07/2025 11:27, Paolo Abeni wrote:
>>>> The disconnect failure when blocking is not
>>>> possible should really never be a problem from a functional PoV:
>>>> disconnect() can fail per documentation and the application should
>>>> handle that case.
>>>
>>> But it could set "O_NONBLOCK" and do the close later, no? Or is this
>>> correctly handled?
>>
>> Yes and no. Meaning the kernel does not offer any mechanism to complete
>> gracefully a blocking disconnect returning EAGAIN. In such a case the
>> socket will stay in SS_DISCONNECTING (socket) state until close time.
>>
>>>>> In which cases is this MSG_DONTWAIT flag set with disconnect()?
>>>>
>>>> Typo in the patch, should be O_NONBLOCK. The socket layer, the vfs layer
>>>>  and iouring could set that.
>>>
>>> So typically what we do in the packetdrill tests. But maybe we don't
>>> check the returned value?
>>
>> The only relevant pktdrill test I could find is
>> `disconnect_after_accept.pkt` and it checks the return value.
>>
>> Let me check how it will run after the MSG_DONTWAIT -> O_NONBLOCK fix
> 
> Thank you!

The test is passing here. I think that is due to mptcp_disconnect()
invoking mptcp_close_ssk() with the MPTCP_CF_FASTCLOSE flag, which in
turn forces tcp_disconnect() invocation on msk->first, that forciby
closes the first subflow (sk_state == TCP_CLOSE after that).

So we can trim that blocking wait out of patch 3.

/P