[PATCH mptcp-net 1/6] mptcp: sched: check both directions for backup

Matthieu Baerts (NGI0) posted 6 patches 2 months, 1 week ago
There is a newer version of this series
[PATCH mptcp-net 1/6] mptcp: sched: check both directions for backup
Posted by Matthieu Baerts (NGI0) 2 months, 1 week ago
The 'mptcp_subflow_context' structure has two items related to the
backup flags:

 - 'backup': the subflow has been marked as backup by the other peer

- 'request_bkup': the backup flag has been set by the host

Before this patch, the scheduler was only looking at the 'backup' flag.
That can make sense in some cases, but it looks like that's not what we
wanted for the general use, because either the path-manager was setting
both of them when sending an MP_PRIO, or the receiver was duplicating
the 'backup' flag in the subflow request.

Note that the use of these two flags in the path-manager are going to be
fixed in the next commits, but this change here is needed not to modify
the behaviour.

Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
 include/trace/events/mptcp.h |  2 +-
 net/mptcp/protocol.c         | 10 ++++++----
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/trace/events/mptcp.h b/include/trace/events/mptcp.h
index 09e72215b9f9..085b749cdd97 100644
--- a/include/trace/events/mptcp.h
+++ b/include/trace/events/mptcp.h
@@ -34,7 +34,7 @@ TRACE_EVENT(mptcp_subflow_get_send,
 		struct sock *ssk;
 
 		__entry->active = mptcp_subflow_active(subflow);
-		__entry->backup = subflow->backup;
+		__entry->backup = subflow->backup || subflow->request_bkup;
 
 		if (subflow->tcp_sock && sk_fullsock(subflow->tcp_sock))
 			__entry->free = sk_stream_memory_free(subflow->tcp_sock);
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index ac94225489f8..b3a48d97f009 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1422,13 +1422,15 @@ struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk)
 	}
 
 	mptcp_for_each_subflow(msk, subflow) {
+		bool backup = subflow->backup || subflow->request_bkup;
+
 		trace_mptcp_subflow_get_send(subflow);
 		ssk =  mptcp_subflow_tcp_sock(subflow);
 		if (!mptcp_subflow_active(subflow))
 			continue;
 
 		tout = max(tout, mptcp_timeout_from_subflow(subflow));
-		nr_active += !subflow->backup;
+		nr_active += !backup;
 		pace = subflow->avg_pacing_rate;
 		if (unlikely(!pace)) {
 			/* init pacing rate from socket */
@@ -1439,9 +1441,9 @@ struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk)
 		}
 
 		linger_time = div_u64((u64)READ_ONCE(ssk->sk_wmem_queued) << 32, pace);
-		if (linger_time < send_info[subflow->backup].linger_time) {
-			send_info[subflow->backup].ssk = ssk;
-			send_info[subflow->backup].linger_time = linger_time;
+		if (linger_time < send_info[backup].linger_time) {
+			send_info[backup].ssk = ssk;
+			send_info[backup].linger_time = linger_time;
 		}
 	}
 	__mptcp_set_timeout(sk, tout);

-- 
2.45.2
Re: [PATCH mptcp-net 1/6] mptcp: sched: check both directions for backup
Posted by Paolo Abeni 2 months ago
On Thu, 2024-07-11 at 17:38 +0200, Matthieu Baerts (NGI0) wrote:
> The 'mptcp_subflow_context' structure has two items related to the
> backup flags:
> 
>  - 'backup': the subflow has been marked as backup by the other peer
> 
> - 'request_bkup': the backup flag has been set by the host
> 
> Before this patch, the scheduler was only looking at the 'backup' flag.
> That can make sense in some cases, but it looks like that's not what we
> wanted for the general use, because either the path-manager was setting
> both of them when sending an MP_PRIO, or the receiver was duplicating
> the 'backup' flag in the subflow request.

According to rfc 8684 section 3.3.8, setting the B (backup) bit in the
MPJ header """indicate to its peer that this path should be treated as
a backup path to use only in the event of failure of other working
subflows (i.e., a subflow where the receiver has indicated that B=1
SHOULD NOT be used to send data unless there are no usable subflows
where B=0)."""

I read the above as the path manager should only look at the 'backup'
field?!?

Thanks,

Paolo
Re: [PATCH mptcp-net 1/6] mptcp: sched: check both directions for backup
Posted by Matthieu Baerts 2 months ago
Hi Paolo,

Thank you for your review!

On 12/07/2024 16:53, Paolo Abeni wrote:
> On Thu, 2024-07-11 at 17:38 +0200, Matthieu Baerts (NGI0) wrote:
>> The 'mptcp_subflow_context' structure has two items related to the
>> backup flags:
>>
>>  - 'backup': the subflow has been marked as backup by the other peer
>>
>> - 'request_bkup': the backup flag has been set by the host
>>
>> Before this patch, the scheduler was only looking at the 'backup' flag.
>> That can make sense in some cases, but it looks like that's not what we
>> wanted for the general use, because either the path-manager was setting
>> both of them when sending an MP_PRIO, or the receiver was duplicating
>> the 'backup' flag in the subflow request.
> 
> According to rfc 8684 section 3.3.8, setting the B (backup) bit in the
> MPJ header """indicate to its peer that this path should be treated as
> a backup path to use only in the event of failure of other working
> subflows (i.e., a subflow where the receiver has indicated that B=1
> SHOULD NOT be used to send data unless there are no usable subflows
> where B=0)."""
> 
> I read the above as the path manager should only look at the 'backup'
> field?!?

(I guess you wanted to talk about the packet scheduler that is being
modified here, right?)

That's two different things to me: the RFC says here that in the MPJ,
the B flag is to tell the other peer that this subflow should be
considered as a backup one, but it doesn't say anything about how the
current host and its packet scheduler should consider this subflow.

Typically, one side knows that one path is more "costly", and should be
used only if needed. Then, if this hosts told the other peer to consider
this path as a backup one, it makes sense to do the same, no? (anyway,
that's what we always did apparently :) )

If someone wants a different behaviour for a specific use-case (backup
path only on one direction), it can use a different packet scheduler :)

PS: note that the ip-mptcp man page is saying this about this subject:

> backup: If this is a subflow endpoint, the subflows created using this
> endpoint will have the backup flag set during the connection process.
> This flag instructs the peer to only send data on a given subflow when
> all non-backup subflows are unavailable. This does not affect outgoing
> data, where subflow priority is determined by the backup/non-backup
> flag received from the peer

Which was not the current behaviour, and doesn't make sense for
"typical" use-cases where the two peers are not controlled by the same
user. That's how I started to investigate, found out that if the backup
flag was set only from one side -- what we do in our selftests -- the
scheduler on both side were treating the path as backup as I was hoping.
Then I looked at the packet scheduler code, and was surprised to see it
was only looking at the 'backup' bit, and not 'request_bkup', which led
me to the other patches :)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.
Re: [PATCH mptcp-net 1/6] mptcp: sched: check both directions for backup
Posted by Mat Martineau 2 months ago
On Fri, 12 Jul 2024, Matthieu Baerts wrote:

> Hi Paolo,
>
> Thank you for your review!
>
> On 12/07/2024 16:53, Paolo Abeni wrote:
>> On Thu, 2024-07-11 at 17:38 +0200, Matthieu Baerts (NGI0) wrote:
>>> The 'mptcp_subflow_context' structure has two items related to the
>>> backup flags:
>>>
>>>  - 'backup': the subflow has been marked as backup by the other peer
>>>
>>> - 'request_bkup': the backup flag has been set by the host
>>>
>>> Before this patch, the scheduler was only looking at the 'backup' flag.
>>> That can make sense in some cases, but it looks like that's not what we
>>> wanted for the general use, because either the path-manager was setting
>>> both of them when sending an MP_PRIO, or the receiver was duplicating
>>> the 'backup' flag in the subflow request.
>>
>> According to rfc 8684 section 3.3.8, setting the B (backup) bit in the
>> MPJ header """indicate to its peer that this path should be treated as
>> a backup path to use only in the event of failure of other working
>> subflows (i.e., a subflow where the receiver has indicated that B=1
>> SHOULD NOT be used to send data unless there are no usable subflows
>> where B=0)."""
>>
>> I read the above as the path manager should only look at the 'backup'
>> field?!?
>
> (I guess you wanted to talk about the packet scheduler that is being
> modified here, right?)
>
> That's two different things to me: the RFC says here that in the MPJ,
> the B flag is to tell the other peer that this subflow should be
> considered as a backup one, but it doesn't say anything about how the
> current host and its packet scheduler should consider this subflow.
>

Hi Matthieu -

I agree with your interpretation here. The RFC is talking about the 
meaning of the B bit in the headers, but that's not the only consideration 
for the packet scheduler and for our netlink API.

> Typically, one side knows that one path is more "costly", and should be
> used only if needed. Then, if this hosts told the other peer to consider
> this path as a backup one, it makes sense to do the same, no? (anyway,
> that's what we always did apparently :) )
>
> If someone wants a different behaviour for a specific use-case (backup
> path only on one direction), it can use a different packet scheduler :)
>

I suppose someone could be interested in a "half backup" use case if one 
or more links have asymmetrical data rates (like many DOCSIS connections). 
To properly support that we would want to be able to set the "backup" and 
"request_bkup" flags separately using 'ip mptcp' and our netlink API, 
but the netlink API doesn't make that distinction.

It is important to be able to change the behavior of the outgoing packet 
scheduler by the local host, so I think your fixes are the correct thing 
to do.

- Mat

> PS: note that the ip-mptcp man page is saying this about this subject:
>
>> backup: If this is a subflow endpoint, the subflows created using this
>> endpoint will have the backup flag set during the connection process.
>> This flag instructs the peer to only send data on a given subflow when
>> all non-backup subflows are unavailable. This does not affect outgoing
>> data, where subflow priority is determined by the backup/non-backup
>> flag received from the peer
>
> Which was not the current behaviour, and doesn't make sense for
> "typical" use-cases where the two peers are not controlled by the same
> user. That's how I started to investigate, found out that if the backup
> flag was set only from one side -- what we do in our selftests -- the
> scheduler on both side were treating the path as backup as I was hoping.
> Then I looked at the packet scheduler code, and was surprised to see it
> was only looking at the 'backup' bit, and not 'request_bkup', which led
> me to the other patches :)
>
> Cheers,
> Matt
> -- 
> Sponsored by the NGI0 Core fund.
>
>
>
Re: [PATCH mptcp-net 1/6] mptcp: sched: check both directions for backup
Posted by Matthieu Baerts 2 months ago
Hi Mat,

Thank you for the review!

On 16/07/2024 07:37, Mat Martineau wrote:
> On Fri, 12 Jul 2024, Matthieu Baerts wrote:
> 
>> Hi Paolo,
>>
>> Thank you for your review!
>>
>> On 12/07/2024 16:53, Paolo Abeni wrote:
>>> On Thu, 2024-07-11 at 17:38 +0200, Matthieu Baerts (NGI0) wrote:
>>>> The 'mptcp_subflow_context' structure has two items related to the
>>>> backup flags:
>>>>
>>>>  - 'backup': the subflow has been marked as backup by the other peer
>>>>
>>>> - 'request_bkup': the backup flag has been set by the host
>>>>
>>>> Before this patch, the scheduler was only looking at the 'backup' flag.
>>>> That can make sense in some cases, but it looks like that's not what we
>>>> wanted for the general use, because either the path-manager was setting
>>>> both of them when sending an MP_PRIO, or the receiver was duplicating
>>>> the 'backup' flag in the subflow request.
>>>
>>> According to rfc 8684 section 3.3.8, setting the B (backup) bit in the
>>> MPJ header """indicate to its peer that this path should be treated as
>>> a backup path to use only in the event of failure of other working
>>> subflows (i.e., a subflow where the receiver has indicated that B=1
>>> SHOULD NOT be used to send data unless there are no usable subflows
>>> where B=0)."""
>>>
>>> I read the above as the path manager should only look at the 'backup'
>>> field?!?
>>
>> (I guess you wanted to talk about the packet scheduler that is being
>> modified here, right?)
>>
>> That's two different things to me: the RFC says here that in the MPJ,
>> the B flag is to tell the other peer that this subflow should be
>> considered as a backup one, but it doesn't say anything about how the
>> current host and its packet scheduler should consider this subflow.
>>
> 
> Hi Matthieu -
> 
> I agree with your interpretation here. The RFC is talking about the
> meaning of the B bit in the headers, but that's not the only
> consideration for the packet scheduler and for our netlink API.
> 
>> Typically, one side knows that one path is more "costly", and should be
>> used only if needed. Then, if this hosts told the other peer to consider
>> this path as a backup one, it makes sense to do the same, no? (anyway,
>> that's what we always did apparently :) )
>>
>> If someone wants a different behaviour for a specific use-case (backup
>> path only on one direction), it can use a different packet scheduler :)
>>
> 
> I suppose someone could be interested in a "half backup" use case if one
> or more links have asymmetrical data rates (like many DOCSIS
> connections). To properly support that we would want to be able to set
> the "backup" and "request_bkup" flags separately using 'ip mptcp' and
> our netlink API, but the netlink API doesn't make that distinction.

Yes, if there is a use-case, the scheduler can be modified to support
this behaviour. This control can be also done via a new sysctl knob,
similar to other scheduler options we have, or via BPF when this feature
will be supported.

> It is important to be able to change the behavior of the outgoing packet
> scheduler by the local host, so I think your fixes are the correct thing
> to do.

Thanks!

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.