net/bridge/netfilter/nf_conntrack_bridge.c | 14 ++++++++++++-- net/ipv6/netfilter.c | 14 ++++++++++++-- 2 files changed, 24 insertions(+), 4 deletions(-)
The config NF_CONNTRACK_BRIDGE will change the way fragments are processed.
Bridge does not know that it is a fragmented packet and forwards it
directly, after NF_CONNTRACK_BRIDGE is enabled, function nf_br_ip_fragment
will check and fraglist this packet.
Some network devices that would not able to ping large packet under bridge,
but large packet ping is successful if not enable NF_CONNTRACK_BRIDGE.
In function nf_br_ip_fragment, checking the headroom before sending is
undoubted, but it is unreasonable to directly drop skb with insufficient
headroom.
Using skb_copy_expand to expand the headroom of skb instead of dropping
it.
Signed-off-by: Huajian Yang <huajianyang@asrmicro.com>
---
net/bridge/netfilter/nf_conntrack_bridge.c | 14 ++++++++++++--
net/ipv6/netfilter.c | 14 ++++++++++++--
2 files changed, 24 insertions(+), 4 deletions(-)
diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c
index 816bb0fde718..b8fb81a49377 100644
--- a/net/bridge/netfilter/nf_conntrack_bridge.c
+++ b/net/bridge/netfilter/nf_conntrack_bridge.c
@@ -62,7 +62,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
if (first_len - hlen > mtu ||
skb_headroom(skb) < ll_rs)
- goto blackhole;
+ goto expand_headroom;
if (skb_cloned(skb))
goto slow_path;
@@ -70,7 +70,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
skb_walk_frags(skb, frag) {
if (frag->len > mtu ||
skb_headroom(frag) < hlen + ll_rs)
- goto blackhole;
+ goto expand_headroom;
if (skb_shared(frag))
goto slow_path;
@@ -97,6 +97,16 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
return err;
}
+
+expand_headroom:
+ struct sk_buff *expand_skb;
+
+ expand_skb = skb_copy_expand(skb, ll_rs, skb_tailroom(skb), GFP_ATOMIC);
+ if (unlikely(!expand_skb))
+ goto blackhole;
+ kfree_skb(skb);
+ skb = expand_skb;
+
slow_path:
/* This is a linearized skbuff, the original geometry is lost for us.
* This may also be a clone skbuff, we could preserve the geometry for
diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c
index 581ce055bf52..619d4b97581b 100644
--- a/net/ipv6/netfilter.c
+++ b/net/ipv6/netfilter.c
@@ -166,7 +166,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
if (first_len - hlen > mtu ||
skb_headroom(skb) < (hroom + sizeof(struct frag_hdr)))
- goto blackhole;
+ goto expand_headroom;
if (skb_cloned(skb))
goto slow_path;
@@ -174,7 +174,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
skb_walk_frags(skb, frag2) {
if (frag2->len > mtu ||
skb_headroom(frag2) < (hlen + hroom + sizeof(struct frag_hdr)))
- goto blackhole;
+ goto expand_headroom;
/* Partially cloned skb? */
if (skb_shared(frag2))
@@ -208,6 +208,16 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
kfree_skb_list(iter.frag);
return err;
}
+
+expand_headroom:
+ struct sk_buff *expand_skb;
+
+ expand_skb = skb_copy_expand(skb, ll_rs, skb_tailroom(skb), GFP_ATOMIC);
+ if (unlikely(!expand_skb))
+ goto blackhole;
+ kfree_skb(skb);
+ skb = expand_skb;
+
slow_path:
/* This is a linearized skbuff, the original geometry is lost for us.
* This may also be a clone skbuff, we could preserve the geometry for
--
2.48.1
Huajian Yang <huajianyang@asrmicro.com> wrote: > The config NF_CONNTRACK_BRIDGE will change the way fragments are processed. > Bridge does not know that it is a fragmented packet and forwards it > directly, after NF_CONNTRACK_BRIDGE is enabled, function nf_br_ip_fragment > will check and fraglist this packet. > > Some network devices that would not able to ping large packet under bridge, > but large packet ping is successful if not enable NF_CONNTRACK_BRIDGE. Can you add a new test to tools/testing/selftests/net/netfilter/ that demonstrates this problem? > In function nf_br_ip_fragment, checking the headroom before sending is > undoubted, but it is unreasonable to directly drop skb with insufficient > headroom. Are we talking about if (first_len - hlen > mtu or skb_headroom(skb) < ll_rs) ? > > if (first_len - hlen > mtu || > skb_headroom(skb) < ll_rs) > - goto blackhole; > + goto expand_headroom; I guess this should be if (first_len - hlen > mtu) goto blackhole; if (skb_headroom(skb) < ll_rs) goto expand_headroom; ... but I'm not sure what the actual problem is. > +expand_headroom: > + struct sk_buff *expand_skb; > + > + expand_skb = skb_copy_expand(skb, ll_rs, skb_tailroom(skb), GFP_ATOMIC); > + if (unlikely(!expand_skb)) > + goto blackhole; Why does this need to make a full skb copy? Should that be using skb_expand_head()? > slow_path: Actually, can't you just (re)use the slowpath for the skb_headroom < ll_rs case instead of adding headroom expansion?
Thank you for your reply! > Some network devices that would not able to ping large packet under > bridge, but large packet ping is successful if not enable NF_CONNTRACK_BRIDGE. > Can you add a new test to tools/testing/selftests/net/netfilter/ that demonstrates this problem? Maybe I can't demonstrate this problem with a shell script, I actually discovered this problem while debugging a wifi network device. This netdevice is set a large needed_headroom(80), so ll_rs is oversize and goto blackhole. We can easily to reproduce it by configing needed_headroom in a netdevice, then add this netdevice to a bridge, and test bridge forwarding. ping large packet could reproduce this appearance.(successful if not enable NF_CONNTRACK_BRIDGE) > I guess this should be > > if (first_len - hlen > mtu) > goto blackhole; > if (skb_headroom(skb) < ll_rs) > goto expand_headroom; > ... but I'm not sure what the actual problem is. Yes, your guess is correct! Actual problem: I think it is unreasonable to directly drop skb with insufficient headroom. > Why does this need to make a full skb copy? > Should that be using skb_expand_head()? Using skb_expand_head has the same effect. > Actually, can't you just (re)use the slowpath for the skb_headroom < ll_rs case instead of adding headroom expansion? I tested it just now, reuse the slowpath will successed. But maybe this change cannot resolve all cases if the netdevice really needs this headroom. Best Regards, Huajian -----邮件原件----- 发件人: Florian Westphal [mailto:fw@strlen.de] 发送时间: 2025年4月9日 17:18 收件人: Yang Huajian(杨华健) <huajianyang@asrmicro.com> 抄送: pablo@netfilter.org; kadlec@netfilter.org; razor@blackwall.org; idosch@nvidia.com; davem@davemloft.net; dsahern@kernel.org; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; horms@kernel.org; netfilter-devel@vger.kernel.org; coreteam@netfilter.org; bridge@lists.linux.dev; netdev@vger.kernel.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH] net: Expand headroom to send fragmented packets in bridge fragment forward Huajian Yang <huajianyang@asrmicro.com> wrote: > The config NF_CONNTRACK_BRIDGE will change the way fragments are processed. > Bridge does not know that it is a fragmented packet and forwards it > directly, after NF_CONNTRACK_BRIDGE is enabled, function > nf_br_ip_fragment will check and fraglist this packet. > > Some network devices that would not able to ping large packet under > bridge, but large packet ping is successful if not enable NF_CONNTRACK_BRIDGE. Can you add a new test to tools/testing/selftests/net/netfilter/ that demonstrates this problem? > In function nf_br_ip_fragment, checking the headroom before sending is > undoubted, but it is unreasonable to directly drop skb with > insufficient headroom. Are we talking about if (first_len - hlen > mtu or skb_headroom(skb) < ll_rs) ? > > if (first_len - hlen > mtu || > skb_headroom(skb) < ll_rs) > - goto blackhole; > + goto expand_headroom; I guess this should be if (first_len - hlen > mtu) goto blackhole; if (skb_headroom(skb) < ll_rs) goto expand_headroom; ... but I'm not sure what the actual problem is. > +expand_headroom: > + struct sk_buff *expand_skb; > + > + expand_skb = skb_copy_expand(skb, ll_rs, skb_tailroom(skb), GFP_ATOMIC); > + if (unlikely(!expand_skb)) > + goto blackhole; Why does this need to make a full skb copy? Should that be using skb_expand_head()? > slow_path: Actually, can't you just (re)use the slowpath for the skb_headroom < ll_rs case instead of adding headroom expansion?
Yang Huajian(杨华健) <huajianyang@asrmicro.com> wrote:
> > if (skb_headroom(skb) < ll_rs)
> > goto expand_headroom;
>
> > ... but I'm not sure what the actual problem is.
>
> Yes, your guess is correct!
>
> Actual problem: I think it is unreasonable to directly drop skb with insufficient headroom.
>
> > Why does this need to make a full skb copy?
> > Should that be using skb_expand_head()?
>
> Using skb_expand_head has the same effect.
> > Actually, can't you just (re)use the slowpath for the skb_headroom < ll_rs case instead of adding headroom expansion?
>
> I tested it just now, reuse the slowpath will successed.
> But maybe this change cannot resolve all cases if the netdevice really needs this headroom.
The slowpath considers headroom requirements, see ip_frag_next():
skb2 = alloc_skb(len + state->hlen + state->ll_rs, GFP_ATOMIC);
You should wait for more feedback and then send a v2 tomorrow.
Thanks!
© 2016 - 2026 Red Hat, Inc.