[v1] net: Expand headroom to send fragmented packets in bridge fragment forward

[PATCH] net: Expand headroom to send fragmented packets in bridge fragment forward

Posted by Huajian Yang 10 months ago

The config NF_CONNTRACK_BRIDGE will change the way fragments are processed.
Bridge does not know that it is a fragmented packet and forwards it
directly, after NF_CONNTRACK_BRIDGE is enabled, function nf_br_ip_fragment
will check and fraglist this packet.

Some network devices that would not able to ping large packet under bridge,
but large packet ping is successful if not enable NF_CONNTRACK_BRIDGE.

In function nf_br_ip_fragment, checking the headroom before sending is
undoubted, but it is unreasonable to directly drop skb with insufficient
headroom.

Using skb_copy_expand to expand the headroom of skb instead of dropping
it.

Signed-off-by: Huajian Yang <huajianyang@asrmicro.com>
---
 net/bridge/netfilter/nf_conntrack_bridge.c | 14 ++++++++++++--
 net/ipv6/netfilter.c                       | 14 ++++++++++++--
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c
index 816bb0fde718..b8fb81a49377 100644
--- a/net/bridge/netfilter/nf_conntrack_bridge.c
+++ b/net/bridge/netfilter/nf_conntrack_bridge.c
@@ -62,7 +62,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
 
 		if (first_len - hlen > mtu ||
 		    skb_headroom(skb) < ll_rs)
-			goto blackhole;
+			goto expand_headroom;
 
 		if (skb_cloned(skb))
 			goto slow_path;
@@ -70,7 +70,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
 		skb_walk_frags(skb, frag) {
 			if (frag->len > mtu ||
 			    skb_headroom(frag) < hlen + ll_rs)
-				goto blackhole;
+				goto expand_headroom;
 
 			if (skb_shared(frag))
 				goto slow_path;
@@ -97,6 +97,16 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
 
 		return err;
 	}
+
+expand_headroom:
+	struct sk_buff *expand_skb;
+
+	expand_skb = skb_copy_expand(skb, ll_rs, skb_tailroom(skb), GFP_ATOMIC);
+	if (unlikely(!expand_skb))
+		goto blackhole;
+	kfree_skb(skb);
+	skb = expand_skb;
+
 slow_path:
 	/* This is a linearized skbuff, the original geometry is lost for us.
 	 * This may also be a clone skbuff, we could preserve the geometry for
diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c
index 581ce055bf52..619d4b97581b 100644
--- a/net/ipv6/netfilter.c
+++ b/net/ipv6/netfilter.c
@@ -166,7 +166,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
 
 		if (first_len - hlen > mtu ||
 		    skb_headroom(skb) < (hroom + sizeof(struct frag_hdr)))
-			goto blackhole;
+			goto expand_headroom;
 
 		if (skb_cloned(skb))
 			goto slow_path;
@@ -174,7 +174,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
 		skb_walk_frags(skb, frag2) {
 			if (frag2->len > mtu ||
 			    skb_headroom(frag2) < (hlen + hroom + sizeof(struct frag_hdr)))
-				goto blackhole;
+				goto expand_headroom;
 
 			/* Partially cloned skb? */
 			if (skb_shared(frag2))
@@ -208,6 +208,16 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
 		kfree_skb_list(iter.frag);
 		return err;
 	}
+
+expand_headroom:
+	struct sk_buff *expand_skb;
+
+	expand_skb = skb_copy_expand(skb, ll_rs, skb_tailroom(skb), GFP_ATOMIC);
+	if (unlikely(!expand_skb))
+		goto blackhole;
+	kfree_skb(skb);
+	skb = expand_skb;
+
 slow_path:
 	/* This is a linearized skbuff, the original geometry is lost for us.
 	 * This may also be a clone skbuff, we could preserve the geometry for
-- 
2.48.1

Re: [PATCH] net: Expand headroom to send fragmented packets in bridge fragment forward

Posted by Florian Westphal 10 months ago

Huajian Yang <huajianyang@asrmicro.com> wrote:
> The config NF_CONNTRACK_BRIDGE will change the way fragments are processed.
> Bridge does not know that it is a fragmented packet and forwards it
> directly, after NF_CONNTRACK_BRIDGE is enabled, function nf_br_ip_fragment
> will check and fraglist this packet.
> 
> Some network devices that would not able to ping large packet under bridge,
> but large packet ping is successful if not enable NF_CONNTRACK_BRIDGE.

Can you add a new test to tools/testing/selftests/net/netfilter/ that
demonstrates this problem?

> In function nf_br_ip_fragment, checking the headroom before sending is
> undoubted, but it is unreasonable to directly drop skb with insufficient
> headroom.

Are we talking about
if (first_len - hlen > mtu
  or
skb_headroom(skb) < ll_rs)

?

>  
>  		if (first_len - hlen > mtu ||
>  		    skb_headroom(skb) < ll_rs)
> -			goto blackhole;
> +			goto expand_headroom;

I guess this should be

if (first_len - hlen > mtu)
	goto blackhole;
if (skb_headroom(skb) < ll_rs)
	goto expand_headroom;

... but I'm not sure what the actual problem is.

> +expand_headroom:
> +	struct sk_buff *expand_skb;
> +
> +	expand_skb = skb_copy_expand(skb, ll_rs, skb_tailroom(skb), GFP_ATOMIC);
> +	if (unlikely(!expand_skb))
> +		goto blackhole;

Why does this need to make a full skb copy?
Should that be using skb_expand_head()?

>  slow_path:

Actually, can't you just (re)use the slowpath for the skb_headroom < ll_rs
case instead of adding headroom expansion?

答复: [PATCH] net: Expand headroom to send fragmented packets in bridge fragment forward

Posted by Yang Huajian（杨华健） 10 months ago

Thank you for your reply!

> Some network devices that would not able to ping large packet under 
> bridge, but large packet ping is successful if not enable NF_CONNTRACK_BRIDGE.

> Can you add a new test to tools/testing/selftests/net/netfilter/ that demonstrates this problem?

Maybe I can't demonstrate this problem with a shell script,
I actually discovered this problem while debugging a wifi network device.
This netdevice is set a large needed_headroom(80), so ll_rs is oversize and goto blackhole.

We can easily to reproduce it by configing needed_headroom in a netdevice,
then add this netdevice to a bridge, and test bridge forwarding.

ping large packet could reproduce this appearance.(successful if not enable NF_CONNTRACK_BRIDGE)

> I guess this should be
> 
> if (first_len - hlen > mtu)
>	goto blackhole;
> if (skb_headroom(skb) < ll_rs)
>	goto expand_headroom;

> ... but I'm not sure what the actual problem is.

Yes, your guess is correct!

Actual problem: I think it is unreasonable to directly drop skb with insufficient headroom.

> Why does this need to make a full skb copy?
> Should that be using skb_expand_head()?

Using skb_expand_head has the same effect.

> Actually, can't you just (re)use the slowpath for the skb_headroom < ll_rs case instead of adding headroom expansion?

I tested it just now, reuse the slowpath will successed.
But maybe this change cannot resolve all cases if the netdevice really needs this headroom.

Best Regards,
Huajian

-----邮件原件-----
发件人: Florian Westphal [mailto:fw@strlen.de] 
发送时间: 2025年4月9日 17:18
收件人: Yang Huajian（杨华健） <huajianyang@asrmicro.com>
抄送: pablo@netfilter.org; kadlec@netfilter.org; razor@blackwall.org; idosch@nvidia.com; davem@davemloft.net; dsahern@kernel.org; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; horms@kernel.org; netfilter-devel@vger.kernel.org; coreteam@netfilter.org; bridge@lists.linux.dev; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
主题: Re: [PATCH] net: Expand headroom to send fragmented packets in bridge fragment forward

Huajian Yang <huajianyang@asrmicro.com> wrote:
> The config NF_CONNTRACK_BRIDGE will change the way fragments are processed.
> Bridge does not know that it is a fragmented packet and forwards it 
> directly, after NF_CONNTRACK_BRIDGE is enabled, function 
> nf_br_ip_fragment will check and fraglist this packet.
> 
> Some network devices that would not able to ping large packet under 
> bridge, but large packet ping is successful if not enable NF_CONNTRACK_BRIDGE.

Can you add a new test to tools/testing/selftests/net/netfilter/ that demonstrates this problem?

> In function nf_br_ip_fragment, checking the headroom before sending is 
> undoubted, but it is unreasonable to directly drop skb with 
> insufficient headroom.

Are we talking about
if (first_len - hlen > mtu
  or
skb_headroom(skb) < ll_rs)

?

>  
>  		if (first_len - hlen > mtu ||
>  		    skb_headroom(skb) < ll_rs)
> -			goto blackhole;
> +			goto expand_headroom;

I guess this should be

if (first_len - hlen > mtu)
	goto blackhole;
if (skb_headroom(skb) < ll_rs)
	goto expand_headroom;

... but I'm not sure what the actual problem is.

> +expand_headroom:
> +	struct sk_buff *expand_skb;
> +
> +	expand_skb = skb_copy_expand(skb, ll_rs, skb_tailroom(skb), GFP_ATOMIC);
> +	if (unlikely(!expand_skb))
> +		goto blackhole;

Why does this need to make a full skb copy?
Should that be using skb_expand_head()?

>  slow_path:

Actually, can't you just (re)use the slowpath for the skb_headroom < ll_rs case instead of adding headroom expansion?

Re: 答复: [PATCH] net: Expand headroom to send fragmented packets in bridge fragment forward

Posted by Florian Westphal 10 months ago

Yang Huajian（杨华健） <huajianyang@asrmicro.com> wrote:
> > if (skb_headroom(skb) < ll_rs)
> >	goto expand_headroom;
> 
> > ... but I'm not sure what the actual problem is.
> 
> Yes, your guess is correct!
> 
> Actual problem: I think it is unreasonable to directly drop skb with insufficient headroom.
> 
> > Why does this need to make a full skb copy?
> > Should that be using skb_expand_head()?
> 
> Using skb_expand_head has the same effect.
 
> > Actually, can't you just (re)use the slowpath for the skb_headroom < ll_rs case instead of adding headroom expansion?
> 
> I tested it just now, reuse the slowpath will successed.
> But maybe this change cannot resolve all cases if the netdevice really needs this headroom.

The slowpath considers headroom requirements, see ip_frag_next():

        skb2 = alloc_skb(len + state->hlen + state->ll_rs, GFP_ATOMIC);

You should wait for more feedback and then send a v2 tomorrow.

Thanks!