[PATCH mptcp-next 1/4] mptcp: update __mptcp_push_pending

Geliang Tang posted 4 patches 1 year, 9 months ago
Maintainers: Mat Martineau <mathew.j.martineau@linux.intel.com>, Matthieu Baerts <matthieu.baerts@tessares.net>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>
There is a newer version of this series
[PATCH mptcp-next 1/4] mptcp: update __mptcp_push_pending
Posted by Geliang Tang 1 year, 9 months ago
To support redundant package schedulers more easily, this patch moves the
packet scheduler out of the dfrags loop in __mptcp_push_pending(), invoke
mptcp_sched_get_send() only once. And update the socket locks
correspondingly.

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
 net/mptcp/protocol.c | 33 +++++++++++----------------------
 1 file changed, 11 insertions(+), 22 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 7ca21915b3d1..fc972d56926e 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1530,15 +1530,24 @@ void mptcp_check_and_set_pending(struct sock *sk)
 
 void __mptcp_push_pending(struct sock *sk, unsigned int flags)
 {
-	struct sock *prev_ssk = NULL, *ssk = NULL;
 	struct mptcp_sock *msk = mptcp_sk(sk);
 	struct mptcp_sendmsg_info info = {
 				.flags = flags,
 	};
 	bool do_check_data_fin = false;
 	struct mptcp_data_frag *dfrag;
+	struct sock *ssk;
 	int len;
 
+	ssk = mptcp_sched_get_send(msk);
+	if (!ssk)
+		goto out;
+
+	if (!mptcp_send_head(sk))
+		goto out;
+
+	lock_sock(ssk);
+
 	while ((dfrag = mptcp_send_head(sk))) {
 		info.sent = dfrag->already_sent;
 		info.limit = dfrag->data_len;
@@ -1546,24 +1555,6 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
 		while (len > 0) {
 			int ret = 0;
 
-			prev_ssk = ssk;
-			ssk = mptcp_sched_get_send(msk);
-
-			/* First check. If the ssk has changed since
-			 * the last round, release prev_ssk
-			 */
-			if (ssk != prev_ssk && prev_ssk)
-				mptcp_push_release(prev_ssk, &info);
-			if (!ssk)
-				goto out;
-
-			/* Need to lock the new subflow only if different
-			 * from the previous one, otherwise we are still
-			 * helding the relevant lock
-			 */
-			if (ssk != prev_ssk)
-				lock_sock(ssk);
-
 			ret = mptcp_sendmsg_frag(sk, ssk, dfrag, &info);
 			if (ret <= 0) {
 				if (ret == -EAGAIN)
@@ -1581,9 +1572,7 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
 		WRITE_ONCE(msk->first_pending, mptcp_send_next(sk));
 	}
 
-	/* at this point we held the socket lock for the last subflow we used */
-	if (ssk)
-		mptcp_push_release(ssk, &info);
+	mptcp_push_release(ssk, &info);
 
 out:
 	/* ensure the rtx timer is running */
-- 
2.35.3
Re: [PATCH mptcp-next 1/4] mptcp: update __mptcp_push_pending
Posted by Paolo Abeni 1 year, 9 months ago
On Wed, 2022-09-28 at 17:48 +0800, Geliang Tang wrote:
> To support redundant package schedulers more easily, this patch moves the
> packet scheduler out of the dfrags loop in __mptcp_push_pending(), invoke
> mptcp_sched_get_send() only once. 

I fear the above will make cause hitting HoL blocking [more]
frequently/easily. I'm not sure if the simult_flows.sh self test will
catch that, but you should see a measurably worse running time for such
test - in non debug build, as the average of multiple runs.

I'm sorry, I can't see a feasible way out here :(

/P

Re: [PATCH mptcp-next 1/4] mptcp: update __mptcp_push_pending
Posted by Geliang Tang 1 year, 9 months ago
On Wed, Sep 28, 2022 at 01:02:07PM +0200, Paolo Abeni wrote:
> On Wed, 2022-09-28 at 17:48 +0800, Geliang Tang wrote:
> > To support redundant package schedulers more easily, this patch moves the
> > packet scheduler out of the dfrags loop in __mptcp_push_pending(), invoke
> > mptcp_sched_get_send() only once. 
> 
> I fear the above will make cause hitting HoL blocking [more]
> frequently/easily. I'm not sure if the simult_flows.sh self test will
> catch that, but you should see a measurably worse running time for such
> test - in non debug build, as the average of multiple runs.

Yes, simult_flows.sh self test failed in my tests. I guess this problem
also exists in "BPF redundant scheduler" v12, because it also uses the
same logic in __mptcp_push_pending(), move the packet scheduler out of
the dfrags loop.

> 
> I'm sorry, I can't see a feasible way out here :(
> 
> /P
> 
Re: [PATCH mptcp-next 1/4] mptcp: update __mptcp_push_pending
Posted by Mat Martineau 1 year, 9 months ago
On Wed, 28 Sep 2022, Geliang Tang wrote:

> On Wed, Sep 28, 2022 at 01:02:07PM +0200, Paolo Abeni wrote:
>> On Wed, 2022-09-28 at 17:48 +0800, Geliang Tang wrote:
>>> To support redundant package schedulers more easily, this patch moves the
>>> packet scheduler out of the dfrags loop in __mptcp_push_pending(), invoke
>>> mptcp_sched_get_send() only once.
>>
>> I fear the above will make cause hitting HoL blocking [more]
>> frequently/easily. I'm not sure if the simult_flows.sh self test will
>> catch that, but you should see a measurably worse running time for such
>> test - in non debug build, as the average of multiple runs.
>
> Yes, simult_flows.sh self test failed in my tests. I guess this problem
> also exists in "BPF redundant scheduler" v12, because it also uses the
> same logic in __mptcp_push_pending(), move the packet scheduler out of
> the dfrags loop.
>

Hi Geliang -


Thanks for posting the __mptcp_push_pending() changes as a separate 
series, that's the main area of complexity in the BPF redundant scheduler 
series. I was trying to come up with some more detailed proposals for this 
part of the code, but it will be easier to discuss it in this 4-patch 
series.


We had talked about the need to change the __mptcp_push_pending() loops to 
support the redundant scheduler, since the existing structure and locking 
optimizations don't fit well with trying to send the same dfrags across 
multiple subflows. A simplified view of the existing code is this:

/* [1] Existing send loops */
For each dfrag:
 	While sends succeed:
 		Call the scheduler (selects subflow and msk->snd_burst)
 		Update subflow locks (push/release/acquire as needed)
 		Send the dfrag data with mptcp_sendmsg_frag()
 		Update dfrag->already_sent, msk->snd_nxt, msk->snd_burst
 	Update msk->first_pending
Push/release on final subflow


With the redundant scheduler, it's necessary to send each dfrag on 
multiple subflows. If the top-level loop is "For each dfrag", then 
redundant sends have to lock/unlock every subflow on every iteration of 
the inner loop.

Because of that, I proposed changing the outer loop to be based on the 
scheduler:


/* [2] Proposed send loops */
While the scheduler selects one or more subflows:
 	For each subflow:
 		Lock the subflow
 		For each pending dfrag:
 			Send the dfrag data with mptcp_sendmsg_frag()
 			Break if required by msk->snd_burst / etc
 		Push and release the subflow
 	Update dfrag metadata and msk->first_pending

>>
>> I'm sorry, I can't see a feasible way out here :(
>>

I'm still optimistic here :)

Whatever the outer loop is, I think we can still make basically the same 
sequence of calls to the subflow locks, mptcp_sendmsg_frag(), and 
tcp_push(). And fewer scheduler calls too! That would maintain the 
blest-like behavior Paolo implemented.

It seems to me that this v1 iteration of the patches is missing one 
important detail compared to the proposed pseudocode [2]: There's no outer 
loop to call the scheduler again after msk->snd_burst is exhausted. If 
this block of code is moved out of mptcp_subflow_get_send():

 	/* re-use last subflow, if the burst allow that */
 	if (msk->last_snd && msk->snd_burst > 0 &&
 	    sk_stream_memory_free(msk->last_snd) &&
 	    mptcp_subflow_active(mptcp_subflow_ctx(msk->last_snd))) {
 		mptcp_set_timeout(sk);
 		return msk->last_snd;
 	}

and those conditions are instead checked in the inner "For each pending 
dfrag" loop of my proposed pseudocode above, I think the single subflow 
case is feasible. Once that is working then redundant subflow support can 
be added in later patches.


--
Mat Martineau
Intel