bridge:fragmented packets dropped by bridge

Vyacheslav Salnikov posted 1 patch 3 years, 6 months ago
include/net/ip.h | 2 ++
1 file changed, 2 insertions(+)
bridge:fragmented packets dropped by bridge
Posted by Vyacheslav Salnikov 3 years, 6 months ago
Hi.

I switched from kernel versions 4.9 to 5.15 and found that the MTU on
the interfaces in the bridge does not change.
For example:
I have the following bridge:
bridge      interface
br0          sw1
               sw2
               sw3

And I change with ifconfig MTU.
I see that br0 sw1..sw3 has changed MTU from 1500 -> 1982.

But if i send a ping through these interfaces, I get 1500(I added
prints for output)
I investigated the code and found the reason:
The following commit came in the new kernel:
https://github.com/torvalds/linux/commit/ac6627a28dbfb5d96736544a00c3938fa7ea6dfb

And the behavior of the MTU setting has changed:
>
> Kernel 4.9:
> if (net->ipv4.sysctl_ip_fwd_use_pmtu ||
>    ip_mtu_locked(dst) ||
>    !forwarding)  <--- True
> return dst_mtu(dst) <--- 1982
>
>
> / 'forwarding = true' case should always honour route mtu /
> mtu = dst_metric_raw(dst, RTAX_MTU);
> if (mtu)
> return mtu;



Kernel 5.15:
>
> if (READ_ONCE(net->ipv4.sysctl_ip_fwd_use_pmtu) ||
>    ip_mtu_locked(dst) ||
>    !forwarding) { <--- True
> mtu = rt->rt_pmtu;  <--- 0
> if (mtu && time_before(jiffies, rt->dst.expires)) <-- False
> goto out;
> }
>
> / 'forwarding = true' case should always honour route mtu /
> mtu = dst_metric_raw(dst, RTAX_MTU); <---- 1500
> if (mtu) <--- True
> goto out;

As I see from the code in the end takes mtu from br_dst_default_metrics
> static const u32 br_dst_default_metrics[RTAX_MAX] = {
> [RTAX_MTU - 1] = 1500,
> };

Why is rt_pmtu now used instead of dst_mtu?
Why is forwarding = False called with dst_metric_raw?
Maybe we should add processing when mtu = rt->rt_pmtu == 0?
Could this be an error?


I found a thread discussing a similar problem. It suggested porting
the next patch:
Signed-off-by: Rundong Ge <rdong.ge@gmail.com>
---
 include/net/ip.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/net/ip.h b/include/net/ip.h
index 29d89de..0512de3 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -450,6 +450,8 @@ static inline unsigned int
ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
 static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
    const struct sk_buff *skb)
 {
+ if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev)
+ return min(skb->dev->mtu, IP_MAX_MTU);
  if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) {
  bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED;


Why was this patch not accepted in the end?
-- 
Best regards,
Slava.
Re: bridge:fragmented packets dropped by bridge
Posted by Vadim Fedorenko 3 years, 5 months ago
On 07.10.2022 12:21, Vyacheslav Salnikov wrote:
> Hi.
> 
> I switched from kernel versions 4.9 to 5.15 and found that the MTU on
> the interfaces in the bridge does not change.
> For example:
> I have the following bridge:
> bridge      interface
> br0          sw1
>                 sw2
>                 sw3
> 
> And I change with ifconfig MTU.
> I see that br0 sw1..sw3 has changed MTU from 1500 -> 1982.
> 
> But if i send a ping through these interfaces, I get 1500(I added
> prints for output)
> I investigated the code and found the reason:
> The following commit came in the new kernel:
> https://github.com/torvalds/linux/commit/ac6627a28dbfb5d96736544a00c3938fa7ea6dfb
> 
> And the behavior of the MTU setting has changed:
>>
>> Kernel 4.9:
>> if (net->ipv4.sysctl_ip_fwd_use_pmtu ||
>>     ip_mtu_locked(dst) ||
>>     !forwarding)  <--- True
>> return dst_mtu(dst) <--- 1982
>>
>>
>> / 'forwarding = true' case should always honour route mtu /
>> mtu = dst_metric_raw(dst, RTAX_MTU);
>> if (mtu)
>> return mtu;
> 
> 
> 
> Kernel 5.15:
>>
>> if (READ_ONCE(net->ipv4.sysctl_ip_fwd_use_pmtu) ||
>>     ip_mtu_locked(dst) ||
>>     !forwarding) { <--- True
>> mtu = rt->rt_pmtu;  <--- 0
>> if (mtu && time_before(jiffies, rt->dst.expires)) <-- False
>> goto out;
>> }
>>
>> / 'forwarding = true' case should always honour route mtu /
>> mtu = dst_metric_raw(dst, RTAX_MTU); <---- 1500
>> if (mtu) <--- True
>> goto out;
> 
> As I see from the code in the end takes mtu from br_dst_default_metrics
>> static const u32 br_dst_default_metrics[RTAX_MAX] = {
>> [RTAX_MTU - 1] = 1500,
>> };
> 
> Why is rt_pmtu now used instead of dst_mtu?
> Why is forwarding = False called with dst_metric_raw?
> Maybe we should add processing when mtu = rt->rt_pmtu == 0?
> Could this be an error?
> 

Can you share kernel configs for both versions? Actually only one config value
is needed - CONFIG_BRIDGE_NETFILTER.

It will help me investigate the issue.

> 
> I found a thread discussing a similar problem. It suggested porting
> the next patch:
> Signed-off-by: Rundong Ge <rdong.ge@gmail.com>
> ---
>   include/net/ip.h | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/include/net/ip.h b/include/net/ip.h
> index 29d89de..0512de3 100644
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -450,6 +450,8 @@ static inline unsigned int
> ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
>   static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
>      const struct sk_buff *skb)
>   {
> + if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev)
> + return min(skb->dev->mtu, IP_MAX_MTU);
>    if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) {
>    bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED;
> 
> 
> Why was this patch not accepted in the end?
Re: bridge:fragmented packets dropped by bridge
Posted by Vadim Fedorenko 3 years, 5 months ago
On 07.10.2022 12:21, Vyacheslav Salnikov wrote:
> Hi.
> 
> I switched from kernel versions 4.9 to 5.15 and found that the MTU on
> the interfaces in the bridge does not change.
> For example:
> I have the following bridge:
> bridge      interface
> br0          sw1
>                 sw2
>                 sw3
> 
> And I change with ifconfig MTU.
> I see that br0 sw1..sw3 has changed MTU from 1500 -> 1982.
> 
> But if i send a ping through these interfaces, I get 1500(I added
> prints for output)
> I investigated the code and found the reason:
> The following commit came in the new kernel:
> https://github.com/torvalds/linux/commit/ac6627a28dbfb5d96736544a00c3938fa7ea6dfb
> 
> And the behavior of the MTU setting has changed:
>>
>> Kernel 4.9:
>> if (net->ipv4.sysctl_ip_fwd_use_pmtu ||
>>     ip_mtu_locked(dst) ||
>>     !forwarding)  <--- True
>> return dst_mtu(dst) <--- 1982
>>
>>
>> / 'forwarding = true' case should always honour route mtu /
>> mtu = dst_metric_raw(dst, RTAX_MTU);
>> if (mtu)
>> return mtu;
> 
> 
> 
> Kernel 5.15:
>>
>> if (READ_ONCE(net->ipv4.sysctl_ip_fwd_use_pmtu) ||
>>     ip_mtu_locked(dst) ||
>>     !forwarding) { <--- True
>> mtu = rt->rt_pmtu;  <--- 0
>> if (mtu && time_before(jiffies, rt->dst.expires)) <-- False
>> goto out;
>> }
>>
>> / 'forwarding = true' case should always honour route mtu /
>> mtu = dst_metric_raw(dst, RTAX_MTU); <---- 1500
>> if (mtu) <--- True
>> goto out;
> 
> As I see from the code in the end takes mtu from br_dst_default_metrics
>> static const u32 br_dst_default_metrics[RTAX_MAX] = {
>> [RTAX_MTU - 1] = 1500,
>> };
> 
> Why is rt_pmtu now used instead of dst_mtu?
> Why is forwarding = False called with dst_metric_raw?
> Maybe we should add processing when mtu = rt->rt_pmtu == 0?
> Could this be an error?
> 
If you compare ipv4_mtu code from 4.9 you will see that the very first mtu value 
is filled by rt->rt_pmtu value. I believe there were changes to the bridge code 
where rt_pmtu value got empty or cleared.

I'm still looking for the root cause of the problem, will update you once I find it.


> 
> I found a thread discussing a similar problem. It suggested porting
> the next patch:
> Signed-off-by: Rundong Ge <rdong.ge@gmail.com>
> ---
>   include/net/ip.h | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/include/net/ip.h b/include/net/ip.h
> index 29d89de..0512de3 100644
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -450,6 +450,8 @@ static inline unsigned int
> ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
>   static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
>      const struct sk_buff *skb)
>   {
> + if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev)
> + return min(skb->dev->mtu, IP_MAX_MTU);
>    if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) {
>    bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED;
> 
> 
> Why was this patch not accepted in the end?
Re: bridge:fragmented packets dropped by bridge
Posted by Slade Watkins 3 years, 6 months ago
Hey there,

On 10/7/22 at 7:21 AM, Vyacheslav Salnikov wrote:
> Why was this patch not accepted in the end?

Huh...

I had to do just a little bit of digging to find the original thread,
but it really doesn't seem to me like there was a consensus on whether
or not to take the patch:
https://lore.kernel.org/lkml/20190730122534.30687-1-rdong.ge@gmail.com/T/#u

Reason I say that is that the thread is rather old, and died off quickly.

Someone involved with that patch may be able to offer the answers you're
looking for, if they haven't forgotten about it after all this time.

-srw