To support redundant package schedulers more easily, this patch moves the
packet scheduler out of the dfrags loop in __mptcp_push_pending(), invoke
mptcp_sched_get_send() only once. And update the socket locks
correspondingly.
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
net/mptcp/protocol.c | 33 +++++++++++----------------------
1 file changed, 11 insertions(+), 22 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 7ca21915b3d1..fc972d56926e 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1530,15 +1530,24 @@ void mptcp_check_and_set_pending(struct sock *sk)
void __mptcp_push_pending(struct sock *sk, unsigned int flags)
{
- struct sock *prev_ssk = NULL, *ssk = NULL;
struct mptcp_sock *msk = mptcp_sk(sk);
struct mptcp_sendmsg_info info = {
.flags = flags,
};
bool do_check_data_fin = false;
struct mptcp_data_frag *dfrag;
+ struct sock *ssk;
int len;
+ ssk = mptcp_sched_get_send(msk);
+ if (!ssk)
+ goto out;
+
+ if (!mptcp_send_head(sk))
+ goto out;
+
+ lock_sock(ssk);
+
while ((dfrag = mptcp_send_head(sk))) {
info.sent = dfrag->already_sent;
info.limit = dfrag->data_len;
@@ -1546,24 +1555,6 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
while (len > 0) {
int ret = 0;
- prev_ssk = ssk;
- ssk = mptcp_sched_get_send(msk);
-
- /* First check. If the ssk has changed since
- * the last round, release prev_ssk
- */
- if (ssk != prev_ssk && prev_ssk)
- mptcp_push_release(prev_ssk, &info);
- if (!ssk)
- goto out;
-
- /* Need to lock the new subflow only if different
- * from the previous one, otherwise we are still
- * helding the relevant lock
- */
- if (ssk != prev_ssk)
- lock_sock(ssk);
-
ret = mptcp_sendmsg_frag(sk, ssk, dfrag, &info);
if (ret <= 0) {
if (ret == -EAGAIN)
@@ -1581,9 +1572,7 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
WRITE_ONCE(msk->first_pending, mptcp_send_next(sk));
}
- /* at this point we held the socket lock for the last subflow we used */
- if (ssk)
- mptcp_push_release(ssk, &info);
+ mptcp_push_release(ssk, &info);
out:
/* ensure the rtx timer is running */
--
2.35.3
On Wed, 2022-09-28 at 17:48 +0800, Geliang Tang wrote: > To support redundant package schedulers more easily, this patch moves the > packet scheduler out of the dfrags loop in __mptcp_push_pending(), invoke > mptcp_sched_get_send() only once. I fear the above will make cause hitting HoL blocking [more] frequently/easily. I'm not sure if the simult_flows.sh self test will catch that, but you should see a measurably worse running time for such test - in non debug build, as the average of multiple runs. I'm sorry, I can't see a feasible way out here :( /P
On Wed, Sep 28, 2022 at 01:02:07PM +0200, Paolo Abeni wrote: > On Wed, 2022-09-28 at 17:48 +0800, Geliang Tang wrote: > > To support redundant package schedulers more easily, this patch moves the > > packet scheduler out of the dfrags loop in __mptcp_push_pending(), invoke > > mptcp_sched_get_send() only once. > > I fear the above will make cause hitting HoL blocking [more] > frequently/easily. I'm not sure if the simult_flows.sh self test will > catch that, but you should see a measurably worse running time for such > test - in non debug build, as the average of multiple runs. Yes, simult_flows.sh self test failed in my tests. I guess this problem also exists in "BPF redundant scheduler" v12, because it also uses the same logic in __mptcp_push_pending(), move the packet scheduler out of the dfrags loop. > > I'm sorry, I can't see a feasible way out here :( > > /P >
On Wed, 28 Sep 2022, Geliang Tang wrote: > On Wed, Sep 28, 2022 at 01:02:07PM +0200, Paolo Abeni wrote: >> On Wed, 2022-09-28 at 17:48 +0800, Geliang Tang wrote: >>> To support redundant package schedulers more easily, this patch moves the >>> packet scheduler out of the dfrags loop in __mptcp_push_pending(), invoke >>> mptcp_sched_get_send() only once. >> >> I fear the above will make cause hitting HoL blocking [more] >> frequently/easily. I'm not sure if the simult_flows.sh self test will >> catch that, but you should see a measurably worse running time for such >> test - in non debug build, as the average of multiple runs. > > Yes, simult_flows.sh self test failed in my tests. I guess this problem > also exists in "BPF redundant scheduler" v12, because it also uses the > same logic in __mptcp_push_pending(), move the packet scheduler out of > the dfrags loop. > Hi Geliang - Thanks for posting the __mptcp_push_pending() changes as a separate series, that's the main area of complexity in the BPF redundant scheduler series. I was trying to come up with some more detailed proposals for this part of the code, but it will be easier to discuss it in this 4-patch series. We had talked about the need to change the __mptcp_push_pending() loops to support the redundant scheduler, since the existing structure and locking optimizations don't fit well with trying to send the same dfrags across multiple subflows. A simplified view of the existing code is this: /* [1] Existing send loops */ For each dfrag: While sends succeed: Call the scheduler (selects subflow and msk->snd_burst) Update subflow locks (push/release/acquire as needed) Send the dfrag data with mptcp_sendmsg_frag() Update dfrag->already_sent, msk->snd_nxt, msk->snd_burst Update msk->first_pending Push/release on final subflow With the redundant scheduler, it's necessary to send each dfrag on multiple subflows. If the top-level loop is "For each dfrag", then redundant sends have to lock/unlock every subflow on every iteration of the inner loop. Because of that, I proposed changing the outer loop to be based on the scheduler: /* [2] Proposed send loops */ While the scheduler selects one or more subflows: For each subflow: Lock the subflow For each pending dfrag: Send the dfrag data with mptcp_sendmsg_frag() Break if required by msk->snd_burst / etc Push and release the subflow Update dfrag metadata and msk->first_pending >> >> I'm sorry, I can't see a feasible way out here :( >> I'm still optimistic here :) Whatever the outer loop is, I think we can still make basically the same sequence of calls to the subflow locks, mptcp_sendmsg_frag(), and tcp_push(). And fewer scheduler calls too! That would maintain the blest-like behavior Paolo implemented. It seems to me that this v1 iteration of the patches is missing one important detail compared to the proposed pseudocode [2]: There's no outer loop to call the scheduler again after msk->snd_burst is exhausted. If this block of code is moved out of mptcp_subflow_get_send(): /* re-use last subflow, if the burst allow that */ if (msk->last_snd && msk->snd_burst > 0 && sk_stream_memory_free(msk->last_snd) && mptcp_subflow_active(mptcp_subflow_ctx(msk->last_snd))) { mptcp_set_timeout(sk); return msk->last_snd; } and those conditions are instead checked in the inner "For each pending dfrag" loop of my proposed pseudocode above, I think the single subflow case is feasible. Once that is working then redundant subflow support can be added in later patches. -- Mat Martineau Intel
© 2016 - 2024 Red Hat, Inc.