[PATCH mptcp-next] mptcp: record subflows in RPS table

Christoph Paasch via B4 Relay posted 1 patch 1 month, 1 week ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/multipath-tcp/mptcp_net-next tags/patchew/20250807-b4-mptcp._5Fperf-v1-1-e89cc14179cc@openai.com
net/mptcp/protocol.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
[PATCH mptcp-next] mptcp: record subflows in RPS table
Posted by Christoph Paasch via B4 Relay 1 month, 1 week ago
From: Christoph Paasch <cpaasch@openai.com>

Accelerated Receive Flow Steering (aRFS) relies on sockets recording
their RX flow hash into the rps_sock_flow_table so that incoming packets
are steered to the CPU where the application runs.

With MPTCP, the application interacts with the parent MPTCP socket while
data is carried over per-subflow TCP sockets. Without recording these
subflows, aRFS cannot steer interrupts and RX processing for the flows
to the desired CPU.

Record all subflows in the RPS table by calling sock_rps_record_flow()
for each subflow at the start of mptcp_sendmsg() and mptcp_recvmsg().

It does not by itself improve throughput, but ensures that IRQ and RX
processing are directed to the right CPU, which is a
prerequisite for effective aRFS.

Signed-off-by: Christoph Paasch <cpaasch@openai.com>
---
 net/mptcp/protocol.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 4b510e04724f5c8be08a00a1cc03093bcd031905..86f1fc5be47bd5f6dc8bf40a488de32f253c1beb 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -12,6 +12,7 @@
 #include <linux/sched/signal.h>
 #include <linux/atomic.h>
 #include <net/aligned_data.h>
+#include <net/rps.h>
 #include <net/sock.h>
 #include <net/inet_common.h>
 #include <net/inet_hashtables.h>
@@ -1740,6 +1741,17 @@ static u32 mptcp_send_limit(const struct sock *sk)
 	return limit - not_sent;
 }
 
+static void mptcp_rps_record_subflows(const struct mptcp_sock *msk)
+{
+	struct mptcp_subflow_context *subflow;
+
+	mptcp_for_each_subflow(msk, subflow) {
+		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+
+		sock_rps_record_flow(ssk);
+	}
+}
+
 static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
@@ -1753,6 +1765,8 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 	lock_sock(sk);
 
+	mptcp_rps_record_subflows(msk);
+
 	if (unlikely(inet_test_bit(DEFER_CONNECT, sk) ||
 		     msg->msg_flags & MSG_FASTOPEN)) {
 		int copied_syn = 0;
@@ -2131,6 +2145,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		goto out_err;
 	}
 
+	mptcp_rps_record_subflows(msk);
+
 	timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
 
 	len = min_t(size_t, len, INT_MAX);

---
base-commit: f756e5bb29fa59378c111a2401fc64e44cd53442
change-id: 20250731-b4-mptcp_perf-02912dd4f65b

Best regards,
-- 
Christoph Paasch <cpaasch@openai.com>
Re: [PATCH mptcp-next] mptcp: record subflows in RPS table
Posted by Matthieu Baerts 1 month ago
Hi Christoph,

On 07/08/2025 20:14, Christoph Paasch via B4 Relay wrote:
> From: Christoph Paasch <cpaasch@openai.com>
> 
> Accelerated Receive Flow Steering (aRFS) relies on sockets recording
> their RX flow hash into the rps_sock_flow_table so that incoming packets
> are steered to the CPU where the application runs.
> 
> With MPTCP, the application interacts with the parent MPTCP socket while
> data is carried over per-subflow TCP sockets. Without recording these
> subflows, aRFS cannot steer interrupts and RX processing for the flows
> to the desired CPU.
> 
> Record all subflows in the RPS table by calling sock_rps_record_flow()
> for each subflow at the start of mptcp_sendmsg() and mptcp_recvmsg().
> 
> It does not by itself improve throughput, but ensures that IRQ and RX
> processing are directed to the right CPU, which is a
> prerequisite for effective aRFS.

Thank you for looking at that!

> Signed-off-by: Christoph Paasch <cpaasch@openai.com>
> ---
>  net/mptcp/protocol.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 4b510e04724f5c8be08a00a1cc03093bcd031905..86f1fc5be47bd5f6dc8bf40a488de32f253c1beb 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -12,6 +12,7 @@
>  #include <linux/sched/signal.h>
>  #include <linux/atomic.h>
>  #include <net/aligned_data.h>
> +#include <net/rps.h>
>  #include <net/sock.h>
>  #include <net/inet_common.h>
>  #include <net/inet_hashtables.h>
> @@ -1740,6 +1741,17 @@ static u32 mptcp_send_limit(const struct sock *sk)
>  	return limit - not_sent;
>  }
>  
> +static void mptcp_rps_record_subflows(const struct mptcp_sock *msk)
> +{
> +	struct mptcp_subflow_context *subflow;
> +
> +	mptcp_for_each_subflow(msk, subflow) {
> +		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> +
> +		sock_rps_record_flow(ssk);

Should we eventually open code this RPS function and directly call
sock_rps_record_flow_hash()? If RPS is not needed, no need to go through
the subflow list. So we could add: #ifdef CONFIG_RPS + &rfs_needed
before the for_each, then check the state for each subflow, no?

Or maybe such optimisation is not worth it?

While at it, I see that sock_rps_record_flow() is called in
__inet_accept(), inet(6)_sendmsg and inet(6)_recvmsg, but with the msk.
I guess that's OK, right?
Also, is sock_rps_record_flow() called when a subflow is "accept()ed",
or is there something else to do there for this subflow to copy TCP's
behaviour?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.
Re: [PATCH mptcp-next] mptcp: record subflows in RPS table
Posted by Christoph Paasch 1 month ago
On Wed, Aug 13, 2025 at 7:54 AM Matthieu Baerts <matttbe@kernel.org> wrote:
>
> Hi Christoph,
>
> On 07/08/2025 20:14, Christoph Paasch via B4 Relay wrote:
> > From: Christoph Paasch <cpaasch@openai.com>
> >
> > Accelerated Receive Flow Steering (aRFS) relies on sockets recording
> > their RX flow hash into the rps_sock_flow_table so that incoming packets
> > are steered to the CPU where the application runs.
> >
> > With MPTCP, the application interacts with the parent MPTCP socket while
> > data is carried over per-subflow TCP sockets. Without recording these
> > subflows, aRFS cannot steer interrupts and RX processing for the flows
> > to the desired CPU.
> >
> > Record all subflows in the RPS table by calling sock_rps_record_flow()
> > for each subflow at the start of mptcp_sendmsg() and mptcp_recvmsg().
> >
> > It does not by itself improve throughput, but ensures that IRQ and RX
> > processing are directed to the right CPU, which is a
> > prerequisite for effective aRFS.
>
> Thank you for looking at that!
>
> > Signed-off-by: Christoph Paasch <cpaasch@openai.com>
> > ---
> >  net/mptcp/protocol.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > index 4b510e04724f5c8be08a00a1cc03093bcd031905..86f1fc5be47bd5f6dc8bf40a488de32f253c1beb 100644
> > --- a/net/mptcp/protocol.c
> > +++ b/net/mptcp/protocol.c
> > @@ -12,6 +12,7 @@
> >  #include <linux/sched/signal.h>
> >  #include <linux/atomic.h>
> >  #include <net/aligned_data.h>
> > +#include <net/rps.h>
> >  #include <net/sock.h>
> >  #include <net/inet_common.h>
> >  #include <net/inet_hashtables.h>
> > @@ -1740,6 +1741,17 @@ static u32 mptcp_send_limit(const struct sock *sk)
> >       return limit - not_sent;
> >  }
> >
> > +static void mptcp_rps_record_subflows(const struct mptcp_sock *msk)
> > +{
> > +     struct mptcp_subflow_context *subflow;
> > +
> > +     mptcp_for_each_subflow(msk, subflow) {
> > +             struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> > +
> > +             sock_rps_record_flow(ssk);
>
> Should we eventually open code this RPS function and directly call
> sock_rps_record_flow_hash()? If RPS is not needed, no need to go through
> the subflow list. So we could add: #ifdef CONFIG_RPS + &rfs_needed
> before the for_each, then check the state for each subflow, no?
>
> Or maybe such optimisation is not worth it?

IMO it's not worth it. Because it would also mean that we pull in more
of non-MPTCP things over to MPTCP files.

> While at it, I see that sock_rps_record_flow() is called in
> __inet_accept(), inet(6)_sendmsg and inet(6)_recvmsg, but with the msk.
> I guess that's OK, right?

AFAICS, that should be alright. sk_rxhash should not be set on an msk.
Even on the passive opener side, we clone msk from the ssk, but at
that point sk_rxhash is not set yet. (I may be wrong of course, I'm
still not super familiar with the source-code...)

> Also, is sock_rps_record_flow() called when a subflow is "accept()ed",
> or is there something else to do there for this subflow to copy TCP's
> behaviour?

For new subflows, I don't think we can easily do that. Because
sock_rps_record_flow() records the CPU on which it is running. From
what I see, when new subflows get accepted, that is not happening in
the application's context necessarily. So, we have to wait for a
recvmsg/sendmsg.


Christoph

>
> Cheers,
> Matt
> --
> Sponsored by the NGI0 Core fund.
>
Re: [PATCH mptcp-next] mptcp: record subflows in RPS table
Posted by Matthieu Baerts 1 month ago
Hi Christoph,

On 14/08/2025 05:31, Christoph Paasch wrote:
> On Wed, Aug 13, 2025 at 7:54 AM Matthieu Baerts <matttbe@kernel.org> wrote:
>>
>> Hi Christoph,
>>
>> On 07/08/2025 20:14, Christoph Paasch via B4 Relay wrote:
>>> From: Christoph Paasch <cpaasch@openai.com>
>>>
>>> Accelerated Receive Flow Steering (aRFS) relies on sockets recording
>>> their RX flow hash into the rps_sock_flow_table so that incoming packets
>>> are steered to the CPU where the application runs.
>>>
>>> With MPTCP, the application interacts with the parent MPTCP socket while
>>> data is carried over per-subflow TCP sockets. Without recording these
>>> subflows, aRFS cannot steer interrupts and RX processing for the flows
>>> to the desired CPU.
>>>
>>> Record all subflows in the RPS table by calling sock_rps_record_flow()
>>> for each subflow at the start of mptcp_sendmsg() and mptcp_recvmsg().
>>>
>>> It does not by itself improve throughput, but ensures that IRQ and RX
>>> processing are directed to the right CPU, which is a
>>> prerequisite for effective aRFS.
>>
>> Thank you for looking at that!
>>
>>> Signed-off-by: Christoph Paasch <cpaasch@openai.com>
>>> ---
>>>  net/mptcp/protocol.c | 16 ++++++++++++++++
>>>  1 file changed, 16 insertions(+)
>>>
>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>> index 4b510e04724f5c8be08a00a1cc03093bcd031905..86f1fc5be47bd5f6dc8bf40a488de32f253c1beb 100644
>>> --- a/net/mptcp/protocol.c
>>> +++ b/net/mptcp/protocol.c
>>> @@ -12,6 +12,7 @@
>>>  #include <linux/sched/signal.h>
>>>  #include <linux/atomic.h>
>>>  #include <net/aligned_data.h>
>>> +#include <net/rps.h>
>>>  #include <net/sock.h>
>>>  #include <net/inet_common.h>
>>>  #include <net/inet_hashtables.h>
>>> @@ -1740,6 +1741,17 @@ static u32 mptcp_send_limit(const struct sock *sk)
>>>       return limit - not_sent;
>>>  }
>>>
>>> +static void mptcp_rps_record_subflows(const struct mptcp_sock *msk)
>>> +{
>>> +     struct mptcp_subflow_context *subflow;
>>> +
>>> +     mptcp_for_each_subflow(msk, subflow) {
>>> +             struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
>>> +
>>> +             sock_rps_record_flow(ssk);
>>
>> Should we eventually open code this RPS function and directly call
>> sock_rps_record_flow_hash()? If RPS is not needed, no need to go through
>> the subflow list. So we could add: #ifdef CONFIG_RPS + &rfs_needed
>> before the for_each, then check the state for each subflow, no?
>>
>> Or maybe such optimisation is not worth it?
> 
> IMO it's not worth it. Because it would also mean that we pull in more
> of non-MPTCP things over to MPTCP files.

Good point for the maintenance!

But then, what about adding a small helper in include/net/rps.h? Here we
could have this at the top of mptcp_rps_record_subflows():

  if (!sock_rps_record_needed())   /* with likely()? */
      return;

I'm suggesting this because rfs is disabled by default, so it sounds
better to check that before iterating over different subflows for
nothing, no?

>> While at it, I see that sock_rps_record_flow() is called in
>> __inet_accept(), inet(6)_sendmsg and inet(6)_recvmsg, but with the msk.
>> I guess that's OK, right?
> 
> AFAICS, that should be alright. sk_rxhash should not be set on an msk.
> Even on the passive opener side, we clone msk from the ssk, but at
> that point sk_rxhash is not set yet. (I may be wrong of course, I'm
> still not super familiar with the source-code...)

Thank you for having looked! I guess you would have seen issues when
testing if there was a problem with that, so no need to investigate more.

>> Also, is sock_rps_record_flow() called when a subflow is "accept()ed",
>> or is there something else to do there for this subflow to copy TCP's
>> behaviour?
> 
> For new subflows, I don't think we can easily do that. Because
> sock_rps_record_flow() records the CPU on which it is running. From
> what I see, when new subflows get accepted, that is not happening in
> the application's context necessarily. So, we have to wait for a
> recvmsg/sendmsg.

Ah yes, of course, we should not do it there for additional subflows!
And for the first subflow, I think that it will be done because when an
accept() is triggered from the userspace, mptcp_stream_accept(msk, arg)
will call inet_csk_accept(ssk, arg) that will end up doing that I think.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-next] mptcp: record subflows in RPS table
Posted by Matthieu Baerts 1 month ago
Hi Christoph,

On 14/08/2025 13:20, Matthieu Baerts wrote:
> Hi Christoph,
> 
> On 14/08/2025 05:31, Christoph Paasch wrote:
>> On Wed, Aug 13, 2025 at 7:54 AM Matthieu Baerts <matttbe@kernel.org> wrote:
>>>
>>> Hi Christoph,
>>>
>>> On 07/08/2025 20:14, Christoph Paasch via B4 Relay wrote:
>>>> From: Christoph Paasch <cpaasch@openai.com>
>>>>
>>>> Accelerated Receive Flow Steering (aRFS) relies on sockets recording
>>>> their RX flow hash into the rps_sock_flow_table so that incoming packets
>>>> are steered to the CPU where the application runs.
>>>>
>>>> With MPTCP, the application interacts with the parent MPTCP socket while
>>>> data is carried over per-subflow TCP sockets. Without recording these
>>>> subflows, aRFS cannot steer interrupts and RX processing for the flows
>>>> to the desired CPU.
>>>>
>>>> Record all subflows in the RPS table by calling sock_rps_record_flow()
>>>> for each subflow at the start of mptcp_sendmsg() and mptcp_recvmsg().
>>>>
>>>> It does not by itself improve throughput, but ensures that IRQ and RX
>>>> processing are directed to the right CPU, which is a
>>>> prerequisite for effective aRFS.
>>>
>>> Thank you for looking at that!
>>>
>>>> Signed-off-by: Christoph Paasch <cpaasch@openai.com>
>>>> ---
>>>>  net/mptcp/protocol.c | 16 ++++++++++++++++
>>>>  1 file changed, 16 insertions(+)
>>>>
>>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>>> index 4b510e04724f5c8be08a00a1cc03093bcd031905..86f1fc5be47bd5f6dc8bf40a488de32f253c1beb 100644
>>>> --- a/net/mptcp/protocol.c
>>>> +++ b/net/mptcp/protocol.c
>>>> @@ -12,6 +12,7 @@
>>>>  #include <linux/sched/signal.h>
>>>>  #include <linux/atomic.h>
>>>>  #include <net/aligned_data.h>
>>>> +#include <net/rps.h>
>>>>  #include <net/sock.h>
>>>>  #include <net/inet_common.h>
>>>>  #include <net/inet_hashtables.h>
>>>> @@ -1740,6 +1741,17 @@ static u32 mptcp_send_limit(const struct sock *sk)
>>>>       return limit - not_sent;
>>>>  }
>>>>
>>>> +static void mptcp_rps_record_subflows(const struct mptcp_sock *msk)
>>>> +{
>>>> +     struct mptcp_subflow_context *subflow;
>>>> +
>>>> +     mptcp_for_each_subflow(msk, subflow) {
>>>> +             struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
>>>> +
>>>> +             sock_rps_record_flow(ssk);
>>>
>>> Should we eventually open code this RPS function and directly call
>>> sock_rps_record_flow_hash()? If RPS is not needed, no need to go through
>>> the subflow list. So we could add: #ifdef CONFIG_RPS + &rfs_needed
>>> before the for_each, then check the state for each subflow, no?
>>>
>>> Or maybe such optimisation is not worth it?
>>
>> IMO it's not worth it. Because it would also mean that we pull in more
>> of non-MPTCP things over to MPTCP files.
> 
> Good point for the maintenance!
> 
> But then, what about adding a small helper in include/net/rps.h? Here we
> could have this at the top of mptcp_rps_record_subflows():
> 
>   if (!sock_rps_record_needed())   /* with likely()? */
>       return;
> 
> I'm suggesting this because rfs is disabled by default, so it sounds
> better to check that before iterating over different subflows for
> nothing, no?
> 
>>> While at it, I see that sock_rps_record_flow() is called in
>>> __inet_accept(), inet(6)_sendmsg and inet(6)_recvmsg, but with the msk.
>>> I guess that's OK, right?
>>
>> AFAICS, that should be alright. sk_rxhash should not be set on an msk.
>> Even on the passive opener side, we clone msk from the ssk, but at
>> that point sk_rxhash is not set yet. (I may be wrong of course, I'm
>> still not super familiar with the source-code...)
> 
> Thank you for having looked! I guess you would have seen issues when
> testing if there was a problem with that, so no need to investigate more.
> 
>>> Also, is sock_rps_record_flow() called when a subflow is "accept()ed",
>>> or is there something else to do there for this subflow to copy TCP's
>>> behaviour?
>>
>> For new subflows, I don't think we can easily do that. Because
>> sock_rps_record_flow() records the CPU on which it is running. From
>> what I see, when new subflows get accepted, that is not happening in
>> the application's context necessarily. So, we have to wait for a
>> recvmsg/sendmsg.
> 
> Ah yes, of course, we should not do it there for additional subflows!
> And for the first subflow, I think that it will be done because when an
> accept() is triggered from the userspace, mptcp_stream_accept(msk, arg)
> will call inet_csk_accept(ssk, arg) that will end up doing that I think.

My bad, I think __inet_accept will not be called from inet_csk_accept().
For TCP, I think the path is:

  inet_accept()
    --> inet_csk_accept()
    --> __inet_accept()

But with MPTCP, mptcp_stream_accept is called instead of inet_accept(),
and I don't think __inet_accept() will be called on a subflow. (I didn't
validate that.) Then, maybe we should call sock_rps_record_flow() on the
newsk in mptcp_stream_accept, no?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-next] mptcp: record subflows in RPS table
Posted by Christoph Paasch 3 weeks, 1 day ago
On Thu, Aug 14, 2025 at 4:47 AM Matthieu Baerts <matttbe@kernel.org> wrote:
>
> Hi Christoph,
>
> On 14/08/2025 13:20, Matthieu Baerts wrote:
> > Hi Christoph,
> >
> > On 14/08/2025 05:31, Christoph Paasch wrote:
> >> On Wed, Aug 13, 2025 at 7:54 AM Matthieu Baerts <matttbe@kernel.org> wrote:
> >>>
> >>> Hi Christoph,
> >>>
> >>> On 07/08/2025 20:14, Christoph Paasch via B4 Relay wrote:
> >>>> From: Christoph Paasch <cpaasch@openai.com>
> >>>>
> >>>> Accelerated Receive Flow Steering (aRFS) relies on sockets recording
> >>>> their RX flow hash into the rps_sock_flow_table so that incoming packets
> >>>> are steered to the CPU where the application runs.
> >>>>
> >>>> With MPTCP, the application interacts with the parent MPTCP socket while
> >>>> data is carried over per-subflow TCP sockets. Without recording these
> >>>> subflows, aRFS cannot steer interrupts and RX processing for the flows
> >>>> to the desired CPU.
> >>>>
> >>>> Record all subflows in the RPS table by calling sock_rps_record_flow()
> >>>> for each subflow at the start of mptcp_sendmsg() and mptcp_recvmsg().
> >>>>
> >>>> It does not by itself improve throughput, but ensures that IRQ and RX
> >>>> processing are directed to the right CPU, which is a
> >>>> prerequisite for effective aRFS.
> >>>
> >>> Thank you for looking at that!
> >>>
> >>>> Signed-off-by: Christoph Paasch <cpaasch@openai.com>
> >>>> ---
> >>>>  net/mptcp/protocol.c | 16 ++++++++++++++++
> >>>>  1 file changed, 16 insertions(+)
> >>>>
> >>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> >>>> index 4b510e04724f5c8be08a00a1cc03093bcd031905..86f1fc5be47bd5f6dc8bf40a488de32f253c1beb 100644
> >>>> --- a/net/mptcp/protocol.c
> >>>> +++ b/net/mptcp/protocol.c
> >>>> @@ -12,6 +12,7 @@
> >>>>  #include <linux/sched/signal.h>
> >>>>  #include <linux/atomic.h>
> >>>>  #include <net/aligned_data.h>
> >>>> +#include <net/rps.h>
> >>>>  #include <net/sock.h>
> >>>>  #include <net/inet_common.h>
> >>>>  #include <net/inet_hashtables.h>
> >>>> @@ -1740,6 +1741,17 @@ static u32 mptcp_send_limit(const struct sock *sk)
> >>>>       return limit - not_sent;
> >>>>  }
> >>>>
> >>>> +static void mptcp_rps_record_subflows(const struct mptcp_sock *msk)
> >>>> +{
> >>>> +     struct mptcp_subflow_context *subflow;
> >>>> +
> >>>> +     mptcp_for_each_subflow(msk, subflow) {
> >>>> +             struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> >>>> +
> >>>> +             sock_rps_record_flow(ssk);
> >>>
> >>> Should we eventually open code this RPS function and directly call
> >>> sock_rps_record_flow_hash()? If RPS is not needed, no need to go through
> >>> the subflow list. So we could add: #ifdef CONFIG_RPS + &rfs_needed
> >>> before the for_each, then check the state for each subflow, no?
> >>>
> >>> Or maybe such optimisation is not worth it?
> >>
> >> IMO it's not worth it. Because it would also mean that we pull in more
> >> of non-MPTCP things over to MPTCP files.
> >
> > Good point for the maintenance!
> >
> > But then, what about adding a small helper in include/net/rps.h? Here we
> > could have this at the top of mptcp_rps_record_subflows():
> >
> >   if (!sock_rps_record_needed())   /* with likely()? */
> >       return;
> >
> > I'm suggesting this because rfs is disabled by default, so it sounds
> > better to check that before iterating over different subflows for
> > nothing, no?

I see! Yes, that may be a good addition. I am doing that.

> >>> While at it, I see that sock_rps_record_flow() is called in
> >>> __inet_accept(), inet(6)_sendmsg and inet(6)_recvmsg, but with the msk.
> >>> I guess that's OK, right?
> >>
> >> AFAICS, that should be alright. sk_rxhash should not be set on an msk.
> >> Even on the passive opener side, we clone msk from the ssk, but at
> >> that point sk_rxhash is not set yet. (I may be wrong of course, I'm
> >> still not super familiar with the source-code...)
> >
> > Thank you for having looked! I guess you would have seen issues when
> > testing if there was a problem with that, so no need to investigate more.
> >
> >>> Also, is sock_rps_record_flow() called when a subflow is "accept()ed",
> >>> or is there something else to do there for this subflow to copy TCP's
> >>> behaviour?
> >>
> >> For new subflows, I don't think we can easily do that. Because
> >> sock_rps_record_flow() records the CPU on which it is running. From
> >> what I see, when new subflows get accepted, that is not happening in
> >> the application's context necessarily. So, we have to wait for a
> >> recvmsg/sendmsg.
> >
> > Ah yes, of course, we should not do it there for additional subflows!
> > And for the first subflow, I think that it will be done because when an
> > accept() is triggered from the userspace, mptcp_stream_accept(msk, arg)
> > will call inet_csk_accept(ssk, arg) that will end up doing that I think.
>
> My bad, I think __inet_accept will not be called from inet_csk_accept().
> For TCP, I think the path is:
>
>   inet_accept()
>     --> inet_csk_accept()
>     --> __inet_accept()
>
> But with MPTCP, mptcp_stream_accept is called instead of inet_accept(),
> and I don't think __inet_accept() will be called on a subflow. (I didn't
> validate that.) Then, maybe we should call sock_rps_record_flow() on the
> newsk in mptcp_stream_accept, no?

Indeed! I will iterate over the subflows there as well and take care of them.


Christoph
Re: [PATCH mptcp-next] mptcp: record subflows in RPS table
Posted by MPTCP CI 1 month, 1 week ago
Hi Christoph,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/16812537653

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/8452b3707158
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=989208


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)