To support redundant package schedulers more easily, this patch moves the
packet scheduler out of the dfrags loop in __mptcp_push_pending(), invoke
mptcp_sched_get_send() only once. And update the socket locks
correspondingly.
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
net/mptcp/protocol.c | 33 +++++++++++----------------------
1 file changed, 11 insertions(+), 22 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 7ca21915b3d1..fc972d56926e 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1530,15 +1530,24 @@ void mptcp_check_and_set_pending(struct sock *sk)
void __mptcp_push_pending(struct sock *sk, unsigned int flags)
{
- struct sock *prev_ssk = NULL, *ssk = NULL;
struct mptcp_sock *msk = mptcp_sk(sk);
struct mptcp_sendmsg_info info = {
.flags = flags,
};
bool do_check_data_fin = false;
struct mptcp_data_frag *dfrag;
+ struct sock *ssk;
int len;
+ ssk = mptcp_sched_get_send(msk);
+ if (!ssk)
+ goto out;
+
+ if (!mptcp_send_head(sk))
+ goto out;
+
+ lock_sock(ssk);
+
while ((dfrag = mptcp_send_head(sk))) {
info.sent = dfrag->already_sent;
info.limit = dfrag->data_len;
@@ -1546,24 +1555,6 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
while (len > 0) {
int ret = 0;
- prev_ssk = ssk;
- ssk = mptcp_sched_get_send(msk);
-
- /* First check. If the ssk has changed since
- * the last round, release prev_ssk
- */
- if (ssk != prev_ssk && prev_ssk)
- mptcp_push_release(prev_ssk, &info);
- if (!ssk)
- goto out;
-
- /* Need to lock the new subflow only if different
- * from the previous one, otherwise we are still
- * helding the relevant lock
- */
- if (ssk != prev_ssk)
- lock_sock(ssk);
-
ret = mptcp_sendmsg_frag(sk, ssk, dfrag, &info);
if (ret <= 0) {
if (ret == -EAGAIN)
@@ -1581,9 +1572,7 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
WRITE_ONCE(msk->first_pending, mptcp_send_next(sk));
}
- /* at this point we held the socket lock for the last subflow we used */
- if (ssk)
- mptcp_push_release(ssk, &info);
+ mptcp_push_release(ssk, &info);
out:
/* ensure the rtx timer is running */
--
2.35.3
On Wed, 2022-09-28 at 17:48 +0800, Geliang Tang wrote: > To support redundant package schedulers more easily, this patch moves the > packet scheduler out of the dfrags loop in __mptcp_push_pending(), invoke > mptcp_sched_get_send() only once. I fear the above will make cause hitting HoL blocking [more] frequently/easily. I'm not sure if the simult_flows.sh self test will catch that, but you should see a measurably worse running time for such test - in non debug build, as the average of multiple runs. I'm sorry, I can't see a feasible way out here :( /P
On Wed, Sep 28, 2022 at 01:02:07PM +0200, Paolo Abeni wrote: > On Wed, 2022-09-28 at 17:48 +0800, Geliang Tang wrote: > > To support redundant package schedulers more easily, this patch moves the > > packet scheduler out of the dfrags loop in __mptcp_push_pending(), invoke > > mptcp_sched_get_send() only once. > > I fear the above will make cause hitting HoL blocking [more] > frequently/easily. I'm not sure if the simult_flows.sh self test will > catch that, but you should see a measurably worse running time for such > test - in non debug build, as the average of multiple runs. Yes, simult_flows.sh self test failed in my tests. I guess this problem also exists in "BPF redundant scheduler" v12, because it also uses the same logic in __mptcp_push_pending(), move the packet scheduler out of the dfrags loop. > > I'm sorry, I can't see a feasible way out here :( > > /P >
On Wed, 28 Sep 2022, Geliang Tang wrote:
> On Wed, Sep 28, 2022 at 01:02:07PM +0200, Paolo Abeni wrote:
>> On Wed, 2022-09-28 at 17:48 +0800, Geliang Tang wrote:
>>> To support redundant package schedulers more easily, this patch moves the
>>> packet scheduler out of the dfrags loop in __mptcp_push_pending(), invoke
>>> mptcp_sched_get_send() only once.
>>
>> I fear the above will make cause hitting HoL blocking [more]
>> frequently/easily. I'm not sure if the simult_flows.sh self test will
>> catch that, but you should see a measurably worse running time for such
>> test - in non debug build, as the average of multiple runs.
>
> Yes, simult_flows.sh self test failed in my tests. I guess this problem
> also exists in "BPF redundant scheduler" v12, because it also uses the
> same logic in __mptcp_push_pending(), move the packet scheduler out of
> the dfrags loop.
>
Hi Geliang -
Thanks for posting the __mptcp_push_pending() changes as a separate
series, that's the main area of complexity in the BPF redundant scheduler
series. I was trying to come up with some more detailed proposals for this
part of the code, but it will be easier to discuss it in this 4-patch
series.
We had talked about the need to change the __mptcp_push_pending() loops to
support the redundant scheduler, since the existing structure and locking
optimizations don't fit well with trying to send the same dfrags across
multiple subflows. A simplified view of the existing code is this:
/* [1] Existing send loops */
For each dfrag:
While sends succeed:
Call the scheduler (selects subflow and msk->snd_burst)
Update subflow locks (push/release/acquire as needed)
Send the dfrag data with mptcp_sendmsg_frag()
Update dfrag->already_sent, msk->snd_nxt, msk->snd_burst
Update msk->first_pending
Push/release on final subflow
With the redundant scheduler, it's necessary to send each dfrag on
multiple subflows. If the top-level loop is "For each dfrag", then
redundant sends have to lock/unlock every subflow on every iteration of
the inner loop.
Because of that, I proposed changing the outer loop to be based on the
scheduler:
/* [2] Proposed send loops */
While the scheduler selects one or more subflows:
For each subflow:
Lock the subflow
For each pending dfrag:
Send the dfrag data with mptcp_sendmsg_frag()
Break if required by msk->snd_burst / etc
Push and release the subflow
Update dfrag metadata and msk->first_pending
>>
>> I'm sorry, I can't see a feasible way out here :(
>>
I'm still optimistic here :)
Whatever the outer loop is, I think we can still make basically the same
sequence of calls to the subflow locks, mptcp_sendmsg_frag(), and
tcp_push(). And fewer scheduler calls too! That would maintain the
blest-like behavior Paolo implemented.
It seems to me that this v1 iteration of the patches is missing one
important detail compared to the proposed pseudocode [2]: There's no outer
loop to call the scheduler again after msk->snd_burst is exhausted. If
this block of code is moved out of mptcp_subflow_get_send():
/* re-use last subflow, if the burst allow that */
if (msk->last_snd && msk->snd_burst > 0 &&
sk_stream_memory_free(msk->last_snd) &&
mptcp_subflow_active(mptcp_subflow_ctx(msk->last_snd))) {
mptcp_set_timeout(sk);
return msk->last_snd;
}
and those conditions are instead checked in the inner "For each pending
dfrag" loop of my proposed pseudocode above, I think the single subflow
case is feasible. Once that is working then redundant subflow support can
be added in later patches.
--
Mat Martineau
Intel
© 2016 - 2026 Red Hat, Inc.