[PATCH mptcp-net v2] net/sched: act_pedit: really ensure the skb is writable

Paolo Abeni posted 1 patch 3 years, 4 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/multipath-tcp/mptcp_net-next tags/patchew/26445210b10b18b39129c4ede9d7fde0e37fe21f.1651253087.git.pabeni@redhat.com
Maintainers: Jamal Hadi Salim <jhs@mojatatu.com>, "David S. Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, Cong Wang <xiyou.wangcong@gmail.com>, Jiri Pirko <jiri@resnulli.us>, Paolo Abeni <pabeni@redhat.com>
There is a newer version of this series
include/net/tc_act/tc_pedit.h |  1 +
net/sched/act_pedit.c         | 23 +++++++++++++++++++++--
2 files changed, 22 insertions(+), 2 deletions(-)
[PATCH mptcp-net v2] net/sched: act_pedit: really ensure the skb is writable
Posted by Paolo Abeni 3 years, 4 months ago
Currently pedit tries to ensure that the accessed skb offset
is writeble via skb_unclone(). The action potentially allows
touching any skb bytes, so it may end-up modifying shared data.

The above causes some sporadic MPTCP self-test failures.

Address the issue keeping track of a rough over-estimate highest skb
offset accessed by the action and ensure such offset is really
writable.

Note that this may cause performance regressions in some scenario,
but hopefully pedit is not critical path.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v1 -> v2:
 - fix build issue
 - account for the skb hdr offset, too

this almost solves issues/265 here. I'm still getting some rare
failure with MPTcpExtMPFailTx==0: sometimes the transfer completes
before we are able to use the 2nd/failing link. The relevant fix
is a purely seft-test one

Note that a much simpler alternatives would be simply replacing
skb_unshare() with skb_ensure_writable(skb, skb->len), but that
really could causes more visible regressions
---
 include/net/tc_act/tc_pedit.h |  1 +
 net/sched/act_pedit.c         | 23 +++++++++++++++++++++--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/include/net/tc_act/tc_pedit.h b/include/net/tc_act/tc_pedit.h
index 748cf87a4d7e..3e02709a1df6 100644
--- a/include/net/tc_act/tc_pedit.h
+++ b/include/net/tc_act/tc_pedit.h
@@ -14,6 +14,7 @@ struct tcf_pedit {
 	struct tc_action	common;
 	unsigned char		tcfp_nkeys;
 	unsigned char		tcfp_flags;
+	u32			tcfp_off_max_hint;
 	struct tc_pedit_key	*tcfp_keys;
 	struct tcf_pedit_key_ex	*tcfp_keys_ex;
 };
diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index e01ef7f109f4..301ad7f19da9 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -149,7 +149,7 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla,
 	struct nlattr *pattr;
 	struct tcf_pedit *p;
 	int ret = 0, err;
-	int ksize;
+	int i, ksize;
 	u32 index;
 
 	if (!nla) {
@@ -228,6 +228,20 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla,
 		p->tcfp_nkeys = parm->nkeys;
 	}
 	memcpy(p->tcfp_keys, parm->keys, ksize);
+	p->tcfp_off_max_hint = 0;
+	for (i = 0; i < p->tcfp_nkeys; ++i) {
+		u32 cur = p->tcfp_keys[i].off;
+
+		/* The AT option can read a single byte, we can bound the actual
+		 * value with uchar max. Each key touches 4 bytes starting from
+		 * the computed offset
+		 */
+		if (p->tcfp_keys[i].offmask) {
+			cur += 255 >> p->tcfp_keys[i].shift;
+			cur = max(p->tcfp_keys[i].at, cur);
+		}
+		p->tcfp_off_max_hint = max(p->tcfp_off_max_hint, cur + 4);
+	}
 
 	p->tcfp_flags = parm->flags;
 	goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
@@ -308,9 +322,14 @@ static int tcf_pedit_act(struct sk_buff *skb, const struct tc_action *a,
 			 struct tcf_result *res)
 {
 	struct tcf_pedit *p = to_pedit(a);
+	u32 max_offset;
 	int i;
 
-	if (skb_unclone(skb, GFP_ATOMIC))
+	max_offset = (skb_transport_header_was_set(skb) ?
+		      skb_transport_offset(skb) :
+		      skb_network_offset(skb)) +
+		     p->tcfp_off_max_hint;
+	if (skb_ensure_writable(skb, min(skb->len, max_offset)))
 		return p->tcf_action;
 
 	spin_lock(&p->tcf_lock);
-- 
2.35.1


Re: [PATCH mptcp-net v2] net/sched: act_pedit: really ensure the skb is writable
Posted by Matthieu Baerts 3 years, 4 months ago
Hi Paolo, Mat, Geliang,

On 29/04/2022 19:29, Paolo Abeni wrote:
> Currently pedit tries to ensure that the accessed skb offset
> is writeble via skb_unclone(). The action potentially allows
> touching any skb bytes, so it may end-up modifying shared data.
> 
> The above causes some sporadic MPTCP self-test failures.
> 
> Address the issue keeping track of a rough over-estimate highest skb
> offset accessed by the action and ensure such offset is really
> writable.
> 
> Note that this may cause performance regressions in some scenario,
> but hopefully pedit is not critical path.

Thank you for the patch, review and tests!

Now in our tree (fixes for -net) with Mat's ACK and Geliang's Test tags:

New patches for t/upstream:
- b841c3f765af: net/sched: act_pedit: really ensure the skb is writable
- Results: 2767792d035c..8ae619c5e009 (export)

New patches for t/upstream-net:
- b841c3f765af: net/sched: act_pedit: really ensure the skb is writable
- Results: eefc441cb5ab..8b01b4ca3343 (export-net)

Builds and tests are now in progress:

https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export/20220502T202900
https://github.com/multipath-tcp/mptcp_net-next/actions/workflows/build-validation.yml?query=branch:export

https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export-net/20220502T202900
https://github.com/multipath-tcp/mptcp_net-next/actions/workflows/build-validation.yml?query=branch:export-net

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

Re: [PATCH mptcp-net v2] net/sched: act_pedit: really ensure the skb is writable
Posted by Mat Martineau 3 years, 4 months ago
On Fri, 29 Apr 2022, Paolo Abeni wrote:

> Currently pedit tries to ensure that the accessed skb offset
> is writeble via skb_unclone(). The action potentially allows
> touching any skb bytes, so it may end-up modifying shared data.
>
> The above causes some sporadic MPTCP self-test failures.
>
> Address the issue keeping track of a rough over-estimate highest skb
> offset accessed by the action and ensure such offset is really
> writable.
>
> Note that this may cause performance regressions in some scenario,
> but hopefully pedit is not critical path.
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v1 -> v2:
> - fix build issue
> - account for the skb hdr offset, too
>
> this almost solves issues/265 here. I'm still getting some rare
> failure with MPTcpExtMPFailTx==0: sometimes the transfer completes
> before we are able to use the 2nd/failing link. The relevant fix
> is a purely seft-test one
>
> Note that a much simpler alternatives would be simply replacing
> skb_unshare() with skb_ensure_writable(skb, skb->len), but that
> really could causes more visible regressions

To make sure I'm understanding correctly: skb_ensure_writable(skb, 
skb->len) would copy the entire packet payload on every edited 
packet, but this patch will only copy the part that might be modified (and 
maybe a little extra). Seems like the full copy is worth avoiding, and 
that users shouldn't be depending on pedit modifying shared data.

I did run the associated test for a while (with the other patches for 
#265) and the changes look good from a MPTCP perspective:

Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

Do you plan to upstream this one yourself, or should I include it with the 
other mptcp-net patches?


Thanks,
Mat

> ---
> include/net/tc_act/tc_pedit.h |  1 +
> net/sched/act_pedit.c         | 23 +++++++++++++++++++++--
> 2 files changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/tc_act/tc_pedit.h b/include/net/tc_act/tc_pedit.h
> index 748cf87a4d7e..3e02709a1df6 100644
> --- a/include/net/tc_act/tc_pedit.h
> +++ b/include/net/tc_act/tc_pedit.h
> @@ -14,6 +14,7 @@ struct tcf_pedit {
> 	struct tc_action	common;
> 	unsigned char		tcfp_nkeys;
> 	unsigned char		tcfp_flags;
> +	u32			tcfp_off_max_hint;
> 	struct tc_pedit_key	*tcfp_keys;
> 	struct tcf_pedit_key_ex	*tcfp_keys_ex;
> };
> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> index e01ef7f109f4..301ad7f19da9 100644
> --- a/net/sched/act_pedit.c
> +++ b/net/sched/act_pedit.c
> @@ -149,7 +149,7 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla,
> 	struct nlattr *pattr;
> 	struct tcf_pedit *p;
> 	int ret = 0, err;
> -	int ksize;
> +	int i, ksize;
> 	u32 index;
>
> 	if (!nla) {
> @@ -228,6 +228,20 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla,
> 		p->tcfp_nkeys = parm->nkeys;
> 	}
> 	memcpy(p->tcfp_keys, parm->keys, ksize);
> +	p->tcfp_off_max_hint = 0;
> +	for (i = 0; i < p->tcfp_nkeys; ++i) {
> +		u32 cur = p->tcfp_keys[i].off;
> +
> +		/* The AT option can read a single byte, we can bound the actual
> +		 * value with uchar max. Each key touches 4 bytes starting from
> +		 * the computed offset
> +		 */
> +		if (p->tcfp_keys[i].offmask) {
> +			cur += 255 >> p->tcfp_keys[i].shift;
> +			cur = max(p->tcfp_keys[i].at, cur);
> +		}
> +		p->tcfp_off_max_hint = max(p->tcfp_off_max_hint, cur + 4);
> +	}
>
> 	p->tcfp_flags = parm->flags;
> 	goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
> @@ -308,9 +322,14 @@ static int tcf_pedit_act(struct sk_buff *skb, const struct tc_action *a,
> 			 struct tcf_result *res)
> {
> 	struct tcf_pedit *p = to_pedit(a);
> +	u32 max_offset;
> 	int i;
>
> -	if (skb_unclone(skb, GFP_ATOMIC))
> +	max_offset = (skb_transport_header_was_set(skb) ?
> +		      skb_transport_offset(skb) :
> +		      skb_network_offset(skb)) +
> +		     p->tcfp_off_max_hint;
> +	if (skb_ensure_writable(skb, min(skb->len, max_offset)))
> 		return p->tcf_action;
>
> 	spin_lock(&p->tcf_lock);
> -- 
> 2.35.1
>
>
>

--
Mat Martineau
Intel

Re: [PATCH mptcp-net v2] net/sched: act_pedit: really ensure the skb is writable
Posted by Paolo Abeni 3 years, 4 months ago
On Fri, 2022-04-29 at 13:56 -0700, Mat Martineau wrote:
> On Fri, 29 Apr 2022, Paolo Abeni wrote:
> 
> > Currently pedit tries to ensure that the accessed skb offset
> > is writeble via skb_unclone(). The action potentially allows
> > touching any skb bytes, so it may end-up modifying shared data.
> > 
> > The above causes some sporadic MPTCP self-test failures.
> > 
> > Address the issue keeping track of a rough over-estimate highest skb
> > offset accessed by the action and ensure such offset is really
> > writable.
> > 
> > Note that this may cause performance regressions in some scenario,
> > but hopefully pedit is not critical path.
> > 
> > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > v1 -> v2:
> > - fix build issue
> > - account for the skb hdr offset, too
> > 
> > this almost solves issues/265 here. I'm still getting some rare
> > failure with MPTcpExtMPFailTx==0: sometimes the transfer completes
> > before we are able to use the 2nd/failing link. The relevant fix
> > is a purely seft-test one
> > 
> > Note that a much simpler alternatives would be simply replacing
> > skb_unshare() with skb_ensure_writable(skb, skb->len), but that
> > really could causes more visible regressions
> 
> To make sure I'm understanding correctly: skb_ensure_writable(skb, 
> skb->len) would copy the entire packet payload on every edited 
> packet, but this patch will only copy the part that might be modified (and 
> maybe a little extra). 
> 
Yes, that is. All the above when the relevant packet is cloned.

> Seems like the full copy is worth avoiding, and 
> that users shouldn't be depending on pedit modifying shared data.
> 
> I did run the associated test for a while (with the other patches for 
> #265) and the changes look good from a MPTCP perspective:
> 
> Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
> 
> Do you plan to upstream this one yourself, or should I include it with the 
> other mptcp-net patches?

I think it can (and should, to avoid blocking mptcp patches with
net/sched discussion and vice-versa) go on it's own. I can send it.

Thanks!

Paolo


Re: [PATCH mptcp-net v2] net/sched: act_pedit: really ensure the skb is writable
Posted by Geliang Tang 3 years, 4 months ago
Hi Paolo,

Thank you so much for this patch. It fixed the issue that bothered me
for a long time. With this change, it seems the bad data in the
MP_FAIL multiple subflows test case has been dropped correctly now:

> sudo ./mptcp_join.sh -F
Created /tmp/tmp.lqIytszwoM (size 1 KB) containing data sent by client
Created /tmp/tmp.zChji3JlEJ (size 1 KB) containing data sent by server
Created /tmp/tmp.iTuFZCucnH (size 128 KB) containing data sent by client
Created /tmp/tmp.YTN6JELVhq (size 128 KB) containing data sent by server
file received by server has inverted byte at 169
001 Infinite map: 5 corrupted pkts       syn[ ok ] - synack[ ok ] - ack[ ok ]
                                                            sum[ ok ]
- csum  [ ok ]
                                                            ftx[ ok ]
- failrx[ ok ]
                                                            rtx[ ok ]
- rstrx [ ok ]
                                                            itx[ ok ]
- infirx[ ok ]
                                                            ftx[ ok ]
- failrx[ ok ] invert
Created /tmp/tmp.iTuFZCucnH (size 1024 KB) containing data sent by client
Created /tmp/tmp.YTN6JELVhq (size 1024 KB) containing data sent by server
002 MP_FAIL MP_RST: 1 corrupted pkts     syn[ ok ] - synack[ ok ] - ack[ ok ]

sum[ ok ] - csum  [ ok ]

ftx[ ok ] - failrx[ ok ]

rtx[ ok ] - rstrx [ ok ]

itx[ ok ] - infirx[ ok ]

No inverted byte is received in the MP_FAIL MP_RST test.

Do I understand this correctly?

Anyway, please add my tested-by tag for this patch:

Tested-by: Geliang Tang <geliang.tang@suse.com>

Thanks,
-Geliang

Paolo Abeni <pabeni@redhat.com> 于2022年5月2日周一 15:55写道:

>
> On Fri, 2022-04-29 at 13:56 -0700, Mat Martineau wrote:
> > On Fri, 29 Apr 2022, Paolo Abeni wrote:
> >
> > > Currently pedit tries to ensure that the accessed skb offset
> > > is writeble via skb_unclone(). The action potentially allows
> > > touching any skb bytes, so it may end-up modifying shared data.
> > >
> > > The above causes some sporadic MPTCP self-test failures.
> > >
> > > Address the issue keeping track of a rough over-estimate highest skb
> > > offset accessed by the action and ensure such offset is really
> > > writable.
> > >
> > > Note that this may cause performance regressions in some scenario,
> > > but hopefully pedit is not critical path.
> > >
> > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > ---
> > > v1 -> v2:
> > > - fix build issue
> > > - account for the skb hdr offset, too
> > >
> > > this almost solves issues/265 here. I'm still getting some rare
> > > failure with MPTcpExtMPFailTx==0: sometimes the transfer completes
> > > before we are able to use the 2nd/failing link. The relevant fix
> > > is a purely seft-test one
> > >
> > > Note that a much simpler alternatives would be simply replacing
> > > skb_unshare() with skb_ensure_writable(skb, skb->len), but that
> > > really could causes more visible regressions
> >
> > To make sure I'm understanding correctly: skb_ensure_writable(skb,
> > skb->len) would copy the entire packet payload on every edited
> > packet, but this patch will only copy the part that might be modified (and
> > maybe a little extra).
> >
> Yes, that is. All the above when the relevant packet is cloned.
>
> > Seems like the full copy is worth avoiding, and
> > that users shouldn't be depending on pedit modifying shared data.
> >
> > I did run the associated test for a while (with the other patches for
> > #265) and the changes look good from a MPTCP perspective:
> >
> > Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
> >
> > Do you plan to upstream this one yourself, or should I include it with the
> > other mptcp-net patches?
>
> I think it can (and should, to avoid blocking mptcp patches with
> net/sched discussion and vice-versa) go on it's own. I can send it.
>
> Thanks!
>
> Paolo
>
>

Re: [PATCH mptcp-net v2] net/sched: act_pedit: really ensure the skb is writable
Posted by Paolo Abeni 3 years, 4 months ago
On Mon, 2022-05-02 at 23:52 +0800, Geliang Tang wrote:
> Hi Paolo,
> 
> Thank you so much for this patch. It fixed the issue that bothered me
> for a long time. With this change, it seems the bad data in the
> MP_FAIL multiple subflows test case has been dropped correctly now:
> 
> > sudo ./mptcp_join.sh -F
> Created /tmp/tmp.lqIytszwoM (size 1 KB) containing data sent by client
> Created /tmp/tmp.zChji3JlEJ (size 1 KB) containing data sent by server
> Created /tmp/tmp.iTuFZCucnH (size 128 KB) containing data sent by client
> Created /tmp/tmp.YTN6JELVhq (size 128 KB) containing data sent by server
> file received by server has inverted byte at 169
> 001 Infinite map: 5 corrupted pkts       syn[ ok ] - synack[ ok ] - ack[ ok ]
>                                                             sum[ ok ]
> - csum  [ ok ]
>                                                             ftx[ ok ]
> - failrx[ ok ]
>                                                             rtx[ ok ]
> - rstrx [ ok ]
>                                                             itx[ ok ]
> - infirx[ ok ]
>                                                             ftx[ ok ]
> - failrx[ ok ] invert
> Created /tmp/tmp.iTuFZCucnH (size 1024 KB) containing data sent by client
> Created /tmp/tmp.YTN6JELVhq (size 1024 KB) containing data sent by server
> 002 MP_FAIL MP_RST: 1 corrupted pkts     syn[ ok ] - synack[ ok ] - ack[ ok ]
> 
> sum[ ok ] - csum  [ ok ]
> 
> ftx[ ok ] - failrx[ ok ]
> 
> rtx[ ok ] - rstrx [ ok ]
> 
> itx[ ok ] - infirx[ ok ]
> 
> No inverted byte is received in the MP_FAIL MP_RST test.
> 
> Do I understand this correctly?

The issues was/is really a TC/act_pedit one. That action modifies the
packet data even when the packet is shared (cloned). Any retransmission
(or reinjection) of such packet are thus corrupted.

The test failed due to a reinjection on of the corrupted packet on the
supposedly-non-corrupting link. 

Not sure if the above clarifies the scenario ;)

> Anyway, please add my tested-by tag for this patch:
> 
> Tested-by: Geliang Tang <geliang.tang@suse.com>
> 
Thanks!

Paolo