[PATCH mptcp-next v11 1/4] mptcp: set next subflow to msk->first

Geliang Tang posted 4 patches 2 years, 4 months ago
Maintainers: Matthieu Baerts <matttbe@kernel.org>, Mat Martineau <martineau@kernel.org>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Shuah Khan <shuah@kernel.org>, Kishen Maloor <kishen.maloor@intel.com>
There is a newer version of this series
[PATCH mptcp-next v11 1/4] mptcp: set next subflow to msk->first
Posted by Geliang Tang 2 years, 4 months ago
When closing the msk->first socket in __mptcp_close_ssk(), the original
behaviour is to reset this socket. But if there's another subflow available
in this case, it's better to set the first available subflow to msk->first,
instead of resetting the msk->first.

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
 net/mptcp/protocol.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 30e0c29ae0a4..fde65d901f47 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2396,7 +2396,7 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 		goto out_release;
 	}
 
-	dispose_it = msk->free_first || ssk != msk->first;
+	dispose_it = msk->free_first || ssk != msk->first || msk->pm.subflows;
 	if (dispose_it)
 		list_del(&subflow->node);
 
@@ -2446,8 +2446,20 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 
 	sock_put(ssk);
 
-	if (ssk == msk->first)
-		WRITE_ONCE(msk->first, NULL);
+	if (ssk == msk->first) {
+		struct mptcp_subflow_context *tmp;
+		struct sock *next = NULL;
+
+		mptcp_for_each_subflow(msk, tmp) {
+			struct sock *s = mptcp_subflow_tcp_sock(tmp);
+
+			if (s != msk->first) {
+				next = s;
+				break;
+			}
+		}
+		WRITE_ONCE(msk->first, next);
+	}
 
 out:
 	__mptcp_sync_sndbuf(sk);
-- 
2.35.3
Re: [PATCH mptcp-next v11 1/4] mptcp: set next subflow to msk->first
Posted by Matthieu Baerts 2 years, 4 months ago
Hi Geliang,

+cc Paolo

On 08/10/2023 12:27, Geliang Tang wrote:
> When closing the msk->first socket in __mptcp_close_ssk(), the original
> behaviour is to reset this socket. But if there's another subflow available
> in this case, it's better to set the first available subflow to msk->first,
> instead of resetting the msk->first.

I don't think we want to do that. If I remember well, we want to keep
having 'msk->first' pointing to the initial subflow for various reasons
(I don't remember exactly).

I think what we want to do is not to reset the initial subflow if there
are other subflows still available and we don't want to set it to 'NULL'
in such case.

Could it be possible to do that instead you think?

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
Re: [PATCH mptcp-next v11 1/4] mptcp: set next subflow to msk->first
Posted by Geliang Tang 2 years, 4 months ago
On Sun, Oct 08, 2023 at 01:11:32PM +0200, Matthieu Baerts wrote:
> Hi Geliang,
> 
> +cc Paolo
> 
> On 08/10/2023 12:27, Geliang Tang wrote:
> > When closing the msk->first socket in __mptcp_close_ssk(), the original
> > behaviour is to reset this socket. But if there's another subflow available
> > in this case, it's better to set the first available subflow to msk->first,
> > instead of resetting the msk->first.
> 
> I don't think we want to do that. If I remember well, we want to keep
> having 'msk->first' pointing to the initial subflow for various reasons
> (I don't remember exactly).
> 
> I think what we want to do is not to reset the initial subflow if there
> are other subflows still available and we don't want to set it to 'NULL'
> in such case.

Is it like this:

       if (ssk == msk->first && !msk->pm.subflows)
               WRITE_ONCE(msk->first, NULL);

This doesn't work, it will get a NULL pointer crash:

[  390.852957] MPTCP: msk=0000000089655116 snd_data_fin_enable=1 pending=0 snd_nxt=8683552930344602029 write_seq=8683552930344602029
[  390.852969] BUG: kernel NULL pointer dereference, address: 00000000000000a0
[  390.852973] MPTCP: msk=0000000089655116 state=7 flags=0
[  390.852980] #PF: supervisor read access in kernel mode
[  390.852989] #PF: error_code(0x0000) - not-present page
[  390.852990] MPTCP: msk=0000000089655116 rx queue empty=1:1 copied=0
[  390.852998] PGD 0 P4D 0
[  390.853010] Oops: 0000 [#3] PREEMPT SMP NOPTI
[  390.853020] CPU: 6 PID: 796 Comm: kworker/6:3 Tainted: G      D            6.6.0-rc4-mptcp+ #19
[  390.853030] Hardware name: LENOVO 20UASA0901/20UASA0901, BIOS N2WET38W (1.28 ) 08/30/2022
[  390.853037] Workqueue: events mptcp_worker
[  390.853056] RIP: 0010:mptcp_worker+0x113/0x3e0
[  390.853070] Code: 83 28 fa ff ff a8 01 75 71 f0 48 0f ba 73 d8 02 0f 82 9c 01 00 00 4c 8b a3 80 00 00 00 4d 85 e4 74 18 49 8b 84 24 c0 04 00 00 <48> 8b 80 a0 00 00 00 48 85 c0 0f 85 9f 00 00 00 48 89 ef e8 b5 f2
[  390.853079] RSP: 0018:ffffc90000527e38 EFLAGS: 00010286
[  390.853087] RAX: 0000000000000000 RBX: ffff88815c26e0f8 RCX: 0000000000000000
[  390.853094] RDX: 0000000100016275 RSI: 0000000000000246 RDI: ffff888108c70000
[  390.853100] RBP: ffff88815c26dac0 R08: 0000000000000800 R09: 0000000000000400
[  390.853106] R10: ffff88846d700000 R11: ffffffffb026f448 R12: ffff888106f5a000
[  390.853111] R13: ffff888100099c05 R14: ffff88815c26e100 R15: ffff888103fd36c0
[  390.853117] FS:  0000000000000000(0000) GS:ffff88846d700000(0000) knlGS:0000000000000000
[  390.853125] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  390.853131] CR2: 00000000000000a0 CR3: 00000001a21fc002 CR4: 00000000003706e0
[  390.853138] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  390.853144] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  390.853149] Call Trace:
[  390.853155]  <TASK>
[  390.853164]  ? __die_body+0x1e/0x60
[  390.853177]  ? page_fault_oops+0x15b/0x470
[  390.853189]  ? pollwake+0x78/0xa0
[  390.853203]  ? __pfx_default_wake_function+0x10/0x10
[  390.853215]  ? exc_page_fault+0x71/0x160
[  390.853226]  ? asm_exc_page_fault+0x26/0x30
[  390.853240]  ? mptcp_worker+0x113/0x3e0
[  390.853253]  ? mptcp_worker+0xc8/0x3e0
[  390.853265]  ? __schedule+0x352/0xc20
[  390.853279]  process_scheduled_works+0x22b/0x360
[  390.853298]  worker_thread+0x147/0x2b0
[  390.853315]  ? __pfx_worker_thread+0x10/0x10
[  390.853331]  kthread+0xe5/0x120
[  390.853346]  ? __pfx_kthread+0x10/0x10
[  390.853362]  ret_from_fork+0x31/0x40
[  390.853372] MPTCP: msk=00000000c3583a71 state=11 flags=0
[  390.853375]  ? __pfx_kthread+0x10/0x10
[  390.853388]  ret_from_fork_asm+0x1b/0x30
[  390.853412]  </TASK>

Now ssk (msk->first) has been released, we must update msk->first.

Thanks,
-Geliang

> 
> Could it be possible to do that instead you think?
> 
> Cheers,
> Matt
> -- 
> Tessares | Belgium | Hybrid Access Solutions
> www.tessares.net
Re: [PATCH mptcp-next v11 1/4] mptcp: set next subflow to msk->first
Posted by Matthieu Baerts 2 years, 4 months ago
Hi Geliang,

On 08/10/2023 14:33, Geliang Tang wrote:
> On Sun, Oct 08, 2023 at 01:11:32PM +0200, Matthieu Baerts wrote:
>> Hi Geliang,
>>
>> +cc Paolo
>>
>> On 08/10/2023 12:27, Geliang Tang wrote:
>>> When closing the msk->first socket in __mptcp_close_ssk(), the original
>>> behaviour is to reset this socket. But if there's another subflow available
>>> in this case, it's better to set the first available subflow to msk->first,
>>> instead of resetting the msk->first.
>>
>> I don't think we want to do that. If I remember well, we want to keep
>> having 'msk->first' pointing to the initial subflow for various reasons
>> (I don't remember exactly).
>>
>> I think what we want to do is not to reset the initial subflow if there
>> are other subflows still available and we don't want to set it to 'NULL'
>> in such case.
> 
> Is it like this:
> 
>        if (ssk == msk->first && !msk->pm.subflows)
>                WRITE_ONCE(msk->first, NULL);

No, we cannot set it to NULL nor release it but we should avoid the
"reset" being sent.

I don't have access to the code for the moment but by looking at your
patch, I would say you should not modify 'dispose_it' nor 'msk->first'
but maybe check if you can avoid calling 'mptcp_subflow_ctx_reset()' if
'!dispose_it', maybe by checking something like
'list_is_singular(&msk->conn_list)'? (I think you should avoid looking
at the PM structure here).

Please tell me if it is not clear.

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
Re: [PATCH mptcp-next v11 1/4] mptcp: set next subflow to msk->first
Posted by Geliang Tang 2 years, 4 months ago
On Sun, Oct 08, 2023 at 04:05:10PM +0200, Matthieu Baerts wrote:
> Hi Geliang,
> 
> On 08/10/2023 14:33, Geliang Tang wrote:
> > On Sun, Oct 08, 2023 at 01:11:32PM +0200, Matthieu Baerts wrote:
> >> Hi Geliang,
> >>
> >> +cc Paolo
> >>
> >> On 08/10/2023 12:27, Geliang Tang wrote:
> >>> When closing the msk->first socket in __mptcp_close_ssk(), the original
> >>> behaviour is to reset this socket. But if there's another subflow available
> >>> in this case, it's better to set the first available subflow to msk->first,
> >>> instead of resetting the msk->first.
> >>
> >> I don't think we want to do that. If I remember well, we want to keep
> >> having 'msk->first' pointing to the initial subflow for various reasons
> >> (I don't remember exactly).
> >>
> >> I think what we want to do is not to reset the initial subflow if there
> >> are other subflows still available and we don't want to set it to 'NULL'
> >> in such case.
> > 
> > Is it like this:
> > 
> >        if (ssk == msk->first && !msk->pm.subflows)
> >                WRITE_ONCE(msk->first, NULL);

The code:

        if (ssk == msk->first && list_is_singular(&msk->conn_list))
                WRITE_ONCE(msk->first, NULL);

works, no NULL pointer crash with this now. I update it into v12.

> 
> No, we cannot set it to NULL nor release it but we should avoid the
> "reset" being sent.
> 
> I don't have access to the code for the moment but by looking at your
> patch, I would say you should not modify 'dispose_it' nor 'msk->first'
> but maybe check if you can avoid calling 'mptcp_subflow_ctx_reset()' if
> '!dispose_it', maybe by checking something like
> 'list_is_singular(&msk->conn_list)'? (I think you should avoid looking
> at the PM structure here).

Something like this, right:

                 * disconnect should never fail
                 */
                WARN_ON_ONCE(tcp_disconnect(ssk, 0));
-               mptcp_subflow_ctx_reset(subflow);
+               list_is_singular(&msk->conn_list)
+                       mptcp_subflow_ctx_reset(subflow);
                release_sock(ssk);
 
                goto out;

This doesn't work, if we only avoid calling mptcp_subflow_ctx_reset,
tcp_disconnect will send a RST too.

Thanks,
-Geliang

> 
> Please tell me if it is not clear.
> 
> Cheers,
> Matt
> -- 
> Tessares | Belgium | Hybrid Access Solutions
> www.tessares.net
Re: [PATCH mptcp-next v11 1/4] mptcp: set next subflow to msk->first
Posted by Matthieu Baerts 2 years, 4 months ago
Hi Geliang,

On 10/10/2023 08:04, Geliang Tang wrote:
> On Sun, Oct 08, 2023 at 04:05:10PM +0200, Matthieu Baerts wrote:
>> Hi Geliang,
>>
>> On 08/10/2023 14:33, Geliang Tang wrote:
>>> On Sun, Oct 08, 2023 at 01:11:32PM +0200, Matthieu Baerts wrote:
>>>> Hi Geliang,
>>>>
>>>> +cc Paolo
>>>>
>>>> On 08/10/2023 12:27, Geliang Tang wrote:
>>>>> When closing the msk->first socket in __mptcp_close_ssk(), the original
>>>>> behaviour is to reset this socket. But if there's another subflow available
>>>>> in this case, it's better to set the first available subflow to msk->first,
>>>>> instead of resetting the msk->first.
>>>>
>>>> I don't think we want to do that. If I remember well, we want to keep
>>>> having 'msk->first' pointing to the initial subflow for various reasons
>>>> (I don't remember exactly).
>>>>
>>>> I think what we want to do is not to reset the initial subflow if there
>>>> are other subflows still available and we don't want to set it to 'NULL'
>>>> in such case.
>>>
>>> Is it like this:
>>>
>>>        if (ssk == msk->first && !msk->pm.subflows)
>>>                WRITE_ONCE(msk->first, NULL);
> 
> The code:
> 
>         if (ssk == msk->first && list_is_singular(&msk->conn_list))
>                 WRITE_ONCE(msk->first, NULL);
> 
> works, no NULL pointer crash with this now. I update it into v12.

If I'm not mistaken, we can only set 'msk->first' back to NULL when
closing the MPTCP connection. We cannot do that if there are other
subflows still alive.

>> No, we cannot set it to NULL nor release it but we should avoid the
>> "reset" being sent.
>>
>> I don't have access to the code for the moment but by looking at your
>> patch, I would say you should not modify 'dispose_it' nor 'msk->first'
>> but maybe check if you can avoid calling 'mptcp_subflow_ctx_reset()' if
>> '!dispose_it', maybe by checking something like
>> 'list_is_singular(&msk->conn_list)'? (I think you should avoid looking
>> at the PM structure here).
> 
> Something like this, right:
> 
>                  * disconnect should never fail
>                  */
>                 WARN_ON_ONCE(tcp_disconnect(ssk, 0));
> -               mptcp_subflow_ctx_reset(subflow);
> +               list_is_singular(&msk->conn_list)
> +                       mptcp_subflow_ctx_reset(subflow);
>                 release_sock(ssk);
>  
>                 goto out;
> 
> This doesn't work, if we only avoid calling mptcp_subflow_ctx_reset,
> tcp_disconnect will send a RST too.

Sorry, yes I meant to say if we should skip both, tcp_disconnect()
included. We should only close the connection but not release it.

(or we need to close, then call disconnect? I didn't check that)

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net