[PATCH v2 net] gro_cells: Avoid packet re-ordering for cloned skbs

Thomas Bogendoerfer posted 1 patch 11 months ago
net/core/gro_cells.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
[PATCH v2 net] gro_cells: Avoid packet re-ordering for cloned skbs
Posted by Thomas Bogendoerfer 11 months ago
gro_cells_receive() passes a cloned skb directly up the stack and
could cause re-ordering against segments still in GRO. To avoid
this queue cloned skbs and use gro_normal_one() to pass it during
normal NAPI work.

Fixes: c9e6bc644e55 ("net: add gro_cells infrastructure")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
--
v2: don't use skb_copy(), but make decision how to pass cloned skbs in
    napi poll function (suggested by Eric)
v1: https://lore.kernel.org/lkml/20250109142724.29228-1-tbogendoerfer@suse.de/
  
 net/core/gro_cells.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/core/gro_cells.c b/net/core/gro_cells.c
index ff8e5b64bf6b..762746d18486 100644
--- a/net/core/gro_cells.c
+++ b/net/core/gro_cells.c
@@ -2,6 +2,7 @@
 #include <linux/skbuff.h>
 #include <linux/slab.h>
 #include <linux/netdevice.h>
+#include <net/gro.h>
 #include <net/gro_cells.h>
 #include <net/hotdata.h>
 
@@ -20,7 +21,7 @@ int gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
 	if (unlikely(!(dev->flags & IFF_UP)))
 		goto drop;
 
-	if (!gcells->cells || skb_cloned(skb) || netif_elide_gro(dev)) {
+	if (!gcells->cells || netif_elide_gro(dev)) {
 		res = netif_rx(skb);
 		goto unlock;
 	}
@@ -58,7 +59,11 @@ static int gro_cell_poll(struct napi_struct *napi, int budget)
 		skb = __skb_dequeue(&cell->napi_skbs);
 		if (!skb)
 			break;
-		napi_gro_receive(napi, skb);
+		/* Core GRO stack does not play well with clones. */
+		if (skb_cloned(skb))
+			gro_normal_one(napi, skb, 1);
+		else
+			napi_gro_receive(napi, skb);
 		work_done++;
 	}
 
-- 
2.35.3
Re: [PATCH v2 net] gro_cells: Avoid packet re-ordering for cloned skbs
Posted by Paolo Abeni 11 months ago
On 1/21/25 12:50 PM, Thomas Bogendoerfer wrote:
> gro_cells_receive() passes a cloned skb directly up the stack and
> could cause re-ordering against segments still in GRO. To avoid
> this queue cloned skbs and use gro_normal_one() to pass it during
> normal NAPI work.
> 
> Fixes: c9e6bc644e55 ("net: add gro_cells infrastructure")
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
> --
> v2: don't use skb_copy(), but make decision how to pass cloned skbs in
>     napi poll function (suggested by Eric)
> v1: https://lore.kernel.org/lkml/20250109142724.29228-1-tbogendoerfer@suse.de/
>   
>  net/core/gro_cells.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/gro_cells.c b/net/core/gro_cells.c
> index ff8e5b64bf6b..762746d18486 100644
> --- a/net/core/gro_cells.c
> +++ b/net/core/gro_cells.c
> @@ -2,6 +2,7 @@
>  #include <linux/skbuff.h>
>  #include <linux/slab.h>
>  #include <linux/netdevice.h>
> +#include <net/gro.h>
>  #include <net/gro_cells.h>
>  #include <net/hotdata.h>
>  
> @@ -20,7 +21,7 @@ int gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
>  	if (unlikely(!(dev->flags & IFF_UP)))
>  		goto drop;
>  
> -	if (!gcells->cells || skb_cloned(skb) || netif_elide_gro(dev)) {
> +	if (!gcells->cells || netif_elide_gro(dev)) {
>  		res = netif_rx(skb);
>  		goto unlock;
>  	}
> @@ -58,7 +59,11 @@ static int gro_cell_poll(struct napi_struct *napi, int budget)
>  		skb = __skb_dequeue(&cell->napi_skbs);
>  		if (!skb)
>  			break;
> -		napi_gro_receive(napi, skb);
> +		/* Core GRO stack does not play well with clones. */
> +		if (skb_cloned(skb))
> +			gro_normal_one(napi, skb, 1);
> +		else
> +			napi_gro_receive(napi, skb);

I must admit it's not clear to me how/why the above will avoid OoO. I
assume OoO happens when we observe both cloned and uncloned packets
belonging to the same connection/flow.

What if we have a (uncloned) packet for the relevant flow in the GRO,
'rx_count - 1' packets already sitting in 'rx_list' and a cloned packet
for the critical flow reaches gro_cells_receive()?

Don't we need to unconditionally flush any packets belonging to the same
flow?

Thanks!

Paolo
Re: [PATCH v2 net] gro_cells: Avoid packet re-ordering for cloned skbs
Posted by Eric Dumazet 11 months ago
On Thu, Jan 23, 2025 at 9:43 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 1/21/25 12:50 PM, Thomas Bogendoerfer wrote:
> > gro_cells_receive() passes a cloned skb directly up the stack and
> > could cause re-ordering against segments still in GRO. To avoid
> > this queue cloned skbs and use gro_normal_one() to pass it during
> > normal NAPI work.
> >
> > Fixes: c9e6bc644e55 ("net: add gro_cells infrastructure")
> > Suggested-by: Eric Dumazet <edumazet@google.com>
> > Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
> > --
> > v2: don't use skb_copy(), but make decision how to pass cloned skbs in
> >     napi poll function (suggested by Eric)
> > v1: https://lore.kernel.org/lkml/20250109142724.29228-1-tbogendoerfer@suse.de/
> >
> >  net/core/gro_cells.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/core/gro_cells.c b/net/core/gro_cells.c
> > index ff8e5b64bf6b..762746d18486 100644
> > --- a/net/core/gro_cells.c
> > +++ b/net/core/gro_cells.c
> > @@ -2,6 +2,7 @@
> >  #include <linux/skbuff.h>
> >  #include <linux/slab.h>
> >  #include <linux/netdevice.h>
> > +#include <net/gro.h>
> >  #include <net/gro_cells.h>
> >  #include <net/hotdata.h>
> >
> > @@ -20,7 +21,7 @@ int gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
> >       if (unlikely(!(dev->flags & IFF_UP)))
> >               goto drop;
> >
> > -     if (!gcells->cells || skb_cloned(skb) || netif_elide_gro(dev)) {
> > +     if (!gcells->cells || netif_elide_gro(dev)) {
> >               res = netif_rx(skb);
> >               goto unlock;
> >       }
> > @@ -58,7 +59,11 @@ static int gro_cell_poll(struct napi_struct *napi, int budget)
> >               skb = __skb_dequeue(&cell->napi_skbs);
> >               if (!skb)
> >                       break;
> > -             napi_gro_receive(napi, skb);
> > +             /* Core GRO stack does not play well with clones. */
> > +             if (skb_cloned(skb))
> > +                     gro_normal_one(napi, skb, 1);
> > +             else
> > +                     napi_gro_receive(napi, skb);
>
> I must admit it's not clear to me how/why the above will avoid OoO. I
> assume OoO happens when we observe both cloned and uncloned packets
> belonging to the same connection/flow.
>
> What if we have a (uncloned) packet for the relevant flow in the GRO,
> 'rx_count - 1' packets already sitting in 'rx_list' and a cloned packet
> for the critical flow reaches gro_cells_receive()?
>
> Don't we need to unconditionally flush any packets belonging to the same
> flow?

It would only matter if we had 2 or more segments that would belong
to the same flow and packet train (potential 'GRO super packet'), with
the 'cloned'
status being of mixed value on various segments.

In practice, the cloned status will be the same for all segments.

Same issue would happen when/if dev->features NETIF_F_GRO is flipped
back and forth : We do not really care.
Re: [PATCH v2 net] gro_cells: Avoid packet re-ordering for cloned skbs
Posted by Paolo Abeni 11 months ago
On 1/23/25 11:07 AM, Eric Dumazet wrote:
> On Thu, Jan 23, 2025 at 9:43 AM Paolo Abeni <pabeni@redhat.com> wrote:
>> On 1/21/25 12:50 PM, Thomas Bogendoerfer wrote:
>>> gro_cells_receive() passes a cloned skb directly up the stack and
>>> could cause re-ordering against segments still in GRO. To avoid
>>> this queue cloned skbs and use gro_normal_one() to pass it during
>>> normal NAPI work.
>>>
>>> Fixes: c9e6bc644e55 ("net: add gro_cells infrastructure")
>>> Suggested-by: Eric Dumazet <edumazet@google.com>
>>> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
>>> --
>>> v2: don't use skb_copy(), but make decision how to pass cloned skbs in
>>>     napi poll function (suggested by Eric)
>>> v1: https://lore.kernel.org/lkml/20250109142724.29228-1-tbogendoerfer@suse.de/
>>>
>>>  net/core/gro_cells.c | 9 +++++++--
>>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/core/gro_cells.c b/net/core/gro_cells.c
>>> index ff8e5b64bf6b..762746d18486 100644
>>> --- a/net/core/gro_cells.c
>>> +++ b/net/core/gro_cells.c
>>> @@ -2,6 +2,7 @@
>>>  #include <linux/skbuff.h>
>>>  #include <linux/slab.h>
>>>  #include <linux/netdevice.h>
>>> +#include <net/gro.h>
>>>  #include <net/gro_cells.h>
>>>  #include <net/hotdata.h>
>>>
>>> @@ -20,7 +21,7 @@ int gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
>>>       if (unlikely(!(dev->flags & IFF_UP)))
>>>               goto drop;
>>>
>>> -     if (!gcells->cells || skb_cloned(skb) || netif_elide_gro(dev)) {
>>> +     if (!gcells->cells || netif_elide_gro(dev)) {
>>>               res = netif_rx(skb);
>>>               goto unlock;
>>>       }
>>> @@ -58,7 +59,11 @@ static int gro_cell_poll(struct napi_struct *napi, int budget)
>>>               skb = __skb_dequeue(&cell->napi_skbs);
>>>               if (!skb)
>>>                       break;
>>> -             napi_gro_receive(napi, skb);
>>> +             /* Core GRO stack does not play well with clones. */
>>> +             if (skb_cloned(skb))
>>> +                     gro_normal_one(napi, skb, 1);
>>> +             else
>>> +                     napi_gro_receive(napi, skb);
>>
>> I must admit it's not clear to me how/why the above will avoid OoO. I
>> assume OoO happens when we observe both cloned and uncloned packets
>> belonging to the same connection/flow.
>>
>> What if we have a (uncloned) packet for the relevant flow in the GRO,
>> 'rx_count - 1' packets already sitting in 'rx_list' and a cloned packet
>> for the critical flow reaches gro_cells_receive()?
>>
>> Don't we need to unconditionally flush any packets belonging to the same
>> flow?
> 
> It would only matter if we had 2 or more segments that would belong
> to the same flow and packet train (potential 'GRO super packet'), with
> the 'cloned'
> status being of mixed value on various segments.
> 
> In practice, the cloned status will be the same for all segments.

I agree with the above, but my doubt is: does the above also mean that
in practice there are no OoO to deal with, even without this patch?

To rephrase my doubt: which scenario is addressed by this patch that
would lead to OoO without it?

Thanks,

Paolo

Re: [PATCH v2 net] gro_cells: Avoid packet re-ordering for cloned skbs
Posted by Eric Dumazet 11 months ago
On Thu, Jan 23, 2025 at 11:42 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 1/23/25 11:07 AM, Eric Dumazet wrote:
> > On Thu, Jan 23, 2025 at 9:43 AM Paolo Abeni <pabeni@redhat.com> wrote:
> >> On 1/21/25 12:50 PM, Thomas Bogendoerfer wrote:
> >>> gro_cells_receive() passes a cloned skb directly up the stack and
> >>> could cause re-ordering against segments still in GRO. To avoid
> >>> this queue cloned skbs and use gro_normal_one() to pass it during
> >>> normal NAPI work.
> >>>
> >>> Fixes: c9e6bc644e55 ("net: add gro_cells infrastructure")
> >>> Suggested-by: Eric Dumazet <edumazet@google.com>
> >>> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
> >>> --
> >>> v2: don't use skb_copy(), but make decision how to pass cloned skbs in
> >>>     napi poll function (suggested by Eric)
> >>> v1: https://lore.kernel.org/lkml/20250109142724.29228-1-tbogendoerfer@suse.de/
> >>>
> >>>  net/core/gro_cells.c | 9 +++++++--
> >>>  1 file changed, 7 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/net/core/gro_cells.c b/net/core/gro_cells.c
> >>> index ff8e5b64bf6b..762746d18486 100644
> >>> --- a/net/core/gro_cells.c
> >>> +++ b/net/core/gro_cells.c
> >>> @@ -2,6 +2,7 @@
> >>>  #include <linux/skbuff.h>
> >>>  #include <linux/slab.h>
> >>>  #include <linux/netdevice.h>
> >>> +#include <net/gro.h>
> >>>  #include <net/gro_cells.h>
> >>>  #include <net/hotdata.h>
> >>>
> >>> @@ -20,7 +21,7 @@ int gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
> >>>       if (unlikely(!(dev->flags & IFF_UP)))
> >>>               goto drop;
> >>>
> >>> -     if (!gcells->cells || skb_cloned(skb) || netif_elide_gro(dev)) {
> >>> +     if (!gcells->cells || netif_elide_gro(dev)) {
> >>>               res = netif_rx(skb);
> >>>               goto unlock;
> >>>       }
> >>> @@ -58,7 +59,11 @@ static int gro_cell_poll(struct napi_struct *napi, int budget)
> >>>               skb = __skb_dequeue(&cell->napi_skbs);
> >>>               if (!skb)
> >>>                       break;
> >>> -             napi_gro_receive(napi, skb);
> >>> +             /* Core GRO stack does not play well with clones. */
> >>> +             if (skb_cloned(skb))
> >>> +                     gro_normal_one(napi, skb, 1);
> >>> +             else
> >>> +                     napi_gro_receive(napi, skb);
> >>
> >> I must admit it's not clear to me how/why the above will avoid OoO. I
> >> assume OoO happens when we observe both cloned and uncloned packets
> >> belonging to the same connection/flow.
> >>
> >> What if we have a (uncloned) packet for the relevant flow in the GRO,
> >> 'rx_count - 1' packets already sitting in 'rx_list' and a cloned packet
> >> for the critical flow reaches gro_cells_receive()?
> >>
> >> Don't we need to unconditionally flush any packets belonging to the same
> >> flow?
> >
> > It would only matter if we had 2 or more segments that would belong
> > to the same flow and packet train (potential 'GRO super packet'), with
> > the 'cloned'
> > status being of mixed value on various segments.
> >
> > In practice, the cloned status will be the same for all segments.
>
> I agree with the above, but my doubt is: does the above also mean that
> in practice there are no OoO to deal with, even without this patch?
>
> To rephrase my doubt: which scenario is addressed by this patch that
> would lead to OoO without it?

Fair point, a detailed changelog would be really nice.
Re: [PATCH v2 net] gro_cells: Avoid packet re-ordering for cloned skbs
Posted by Thomas Bogendoerfer 10 months, 3 weeks ago
On Thu, 23 Jan 2025 11:43:05 +0100
Eric Dumazet <edumazet@google.com> wrote:

> On Thu, Jan 23, 2025 at 11:42 AM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > On 1/23/25 11:07 AM, Eric Dumazet wrote:  
> > > On Thu, Jan 23, 2025 at 9:43 AM Paolo Abeni <pabeni@redhat.com> wrote:  
> > >> On 1/21/25 12:50 PM, Thomas Bogendoerfer wrote:  
> > >>> gro_cells_receive() passes a cloned skb directly up the stack and
> > >>> could cause re-ordering against segments still in GRO. To avoid
> > >>> this queue cloned skbs and use gro_normal_one() to pass it during
> > >>> normal NAPI work.
> > >>>
> > >>> Fixes: c9e6bc644e55 ("net: add gro_cells infrastructure")
> > >>> Suggested-by: Eric Dumazet <edumazet@google.com>
> > >>> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
> > >>> --
> > >>> v2: don't use skb_copy(), but make decision how to pass cloned skbs in
> > >>>     napi poll function (suggested by Eric)
> > >>> v1: https://lore.kernel.org/lkml/20250109142724.29228-1-tbogendoerfer@suse.de/
> > >>>
> > >>>  net/core/gro_cells.c | 9 +++++++--
> > >>>  1 file changed, 7 insertions(+), 2 deletions(-)
> > >>>
> > >>> diff --git a/net/core/gro_cells.c b/net/core/gro_cells.c
> > >>> index ff8e5b64bf6b..762746d18486 100644
> > >>> --- a/net/core/gro_cells.c
> > >>> +++ b/net/core/gro_cells.c
> > >>> @@ -2,6 +2,7 @@
> > >>>  #include <linux/skbuff.h>
> > >>>  #include <linux/slab.h>
> > >>>  #include <linux/netdevice.h>
> > >>> +#include <net/gro.h>
> > >>>  #include <net/gro_cells.h>
> > >>>  #include <net/hotdata.h>
> > >>>
> > >>> @@ -20,7 +21,7 @@ int gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
> > >>>       if (unlikely(!(dev->flags & IFF_UP)))
> > >>>               goto drop;
> > >>>
> > >>> -     if (!gcells->cells || skb_cloned(skb) || netif_elide_gro(dev)) {
> > >>> +     if (!gcells->cells || netif_elide_gro(dev)) {
> > >>>               res = netif_rx(skb);
> > >>>               goto unlock;
> > >>>       }
> > >>> @@ -58,7 +59,11 @@ static int gro_cell_poll(struct napi_struct *napi, int budget)
> > >>>               skb = __skb_dequeue(&cell->napi_skbs);
> > >>>               if (!skb)
> > >>>                       break;
> > >>> -             napi_gro_receive(napi, skb);
> > >>> +             /* Core GRO stack does not play well with clones. */
> > >>> +             if (skb_cloned(skb))
> > >>> +                     gro_normal_one(napi, skb, 1);
> > >>> +             else
> > >>> +                     napi_gro_receive(napi, skb);  
> > >>
> > >> I must admit it's not clear to me how/why the above will avoid OoO. I
> > >> assume OoO happens when we observe both cloned and uncloned packets
> > >> belonging to the same connection/flow.
> > >>
> > >> What if we have a (uncloned) packet for the relevant flow in the GRO,
> > >> 'rx_count - 1' packets already sitting in 'rx_list' and a cloned packet
> > >> for the critical flow reaches gro_cells_receive()?
> > >>
> > >> Don't we need to unconditionally flush any packets belonging to the same
> > >> flow?  
> > >
> > > It would only matter if we had 2 or more segments that would belong
> > > to the same flow and packet train (potential 'GRO super packet'), with
> > > the 'cloned'
> > > status being of mixed value on various segments.
> > >
> > > In practice, the cloned status will be the same for all segments.  
> >
> > I agree with the above, but my doubt is: does the above also mean that
> > in practice there are no OoO to deal with, even without this patch?
> >
> > To rephrase my doubt: which scenario is addressed by this patch that
> > would lead to OoO without it?  
> 
> Fair point, a detailed changelog would be really nice.

My test scenario is simple:

TCP Sender in namespace A -> ip6_tunnel -> ipvlan -> ipvlan -> ip6_tunnel -> TCP receiver

Sender does continuous writes in 15k chunks, receiver reads data from socket in a loop.

And that is what I see:

   40   0.002766      1000::1 → 2000::1      TCP 15088 51238 → 5060 [PSH, ACK] Seq=2862576060 Ack=1152583678 Win=65536 Len=15000 TSval=3343493494 TSecr=629171944
   41   0.002844      1000::1 → 2000::1      TCP 9816 51238 → 5060 [PSH, ACK] Seq=2862591060 Ack=1152583678 Win=65536 Len=9728 TSval=3343493494 TSecr=629171944
   42   0.004122      1000::1 → 2000::1      TCP 1468 [TCP Previous segment not captured] 51238 → 5060 [ACK] Seq=2862642188 Ack=1152583678 Win=65536 Len=1380 TSval=3343493496 TSecr=629171946
   43   0.004128      1000::1 → 2000::1      TCP 20788 [TCP Out-Of-Order] 51238 → 5060 [PSH, ACK] Seq=2862600788 Ack=1152583678 Win=65536 Len=20700 TSval=3343493496 TSecr=629171946
   44   0.004133      1000::1 → 2000::1      TCP 20788 [TCP Out-Of-Order] 51238 → 5060 [PSH, ACK] Seq=2862621488 Ack=1152583678 Win=65536 Len=20700 TSval=3343493496 TSecr=629171946
   45   0.004169      1000::1 → 2000::1      TCP 500 [TCP Previous segment not captured] 51238 → 5060 [PSH, ACK] Seq=2862665648 Ack=1152583678 Win=65536 Len=412 TSval=3343493496 TSecr=629171946
   46   0.004180      1000::1 → 2000::1      TCP 22168 [TCP Out-Of-Order] 51238 → 5060 [PSH, ACK] Seq=2862643568 Ack=1152583678 Win=65536 Len=22080 TSval=3343493496 TSecr=629171946
   47   0.004187      1000::1 → 2000::1      TCP 13888 51238 → 5060 [PSH, ACK] Seq=2862666060 Ack=1152583678 Win=65536 Len=13800 TSval=3343493496 TSecr=629171946
   48   0.004201      1000::1 → 2000::1      TCP 1288 51238 → 5060 [PSH, ACK] Seq=2862679860 Ack=1152583678 Win=65536 Len=1200 TSval=3343493496 TSecr=629171946
   49   0.004273      1000::1 → 2000::1      TCP 13888 51238 → 5060 [PSH, ACK] Seq=2862681060 Ack=1152583678 Win=65536 Len=13800 TSval=3343493496 TSecr=629171946

IMHO these ooO are retransmits for segments still waiting in GRO. With the
v2 patch this looks applied trace looks like this:

 2856   9.526256      1000::1 → 2000::1      TCP 64948 50452 → 5060 [PSH, ACK] Seq=1871837193 Ack=209151777 Win=65536 Len=64860 TSval=2755210164 TSecr=2795235137
 2857   9.526258      1000::1 → 2000::1      TCP 5480 50452 → 5060 [PSH, ACK] Seq=1871902053 Ack=209151777 Win=65536 Len=5392 TSval=2755210164 TSecr=2795235137
 2858   9.535262      1000::1 → 2000::1      TCP 1340 [TCP Retransmission] 50452 → 5060 [ACK] Seq=1871906193 Ack=209151777 Win=65536 Len=1252 TSval=2755210174 TSecr=2795235137
 2859   9.585477      1000::1 → 2000::1      TCP 64948 50452 → 5060 [PSH, ACK] Seq=1871907445 Ack=209151777 Win=65536 Len=64860 TSval=2755210224 TSecr=2795235197
 2860   9.585486      1000::1 → 2000::1      TCP 64948 50452 → 5060 [PSH, ACK] Seq=1871972305 Ack=209151777 Win=65536 Len=64860 TSval=2755210224 TSecr=2795235197

Looks ok to me, but without a GRO flush there is still a chance of ooO packets.
I've worked on a new patch (below as a RFC) which pushes the check for skb_cloned()
into GRO. Result is comparable to the v2 patch:

  604   1.987863      1000::1 → 2000::1      TCP 64948 57278 → 5060 [PSH, ACK] Seq=1220895319 Ack=1484877190 Win=65536 Len=64860 TSval=646104760 TSecr=459787214
  605   1.987866      1000::1 → 2000::1      TCP 16488 57278 → 5060 [PSH, ACK] Seq=1220960179 Ack=1484877190 Win=65536 Len=16400 TSval=646104760 TSecr=459787214
  606   1.998231      1000::1 → 2000::1      TCP 1308 [TCP Retransmission] 57278 → 5060 [ACK] Seq=1220975359 Ack=1484877190 Win=65536 Len=1220 TSval=646104771 TSecr=459787214
  607   2.049288      1000::1 → 2000::1      TCP 64948 57278 → 5060 [PSH,
  ACK] Seq=1220976579 Ack=1484877190 Win=65536 Len=64860 TSval=646104822
  TSecr=459787276
  608   2.049304      1000::1 → 2000::1      TCP 64948 57278 → 5060 [PSH,
  ACK] Seq=1221041439 Ack=1484877190 Win=65536 Len=64860 TSval=646104822
  TSecr=459787276



diff --git a/net/core/gro_cells.c b/net/core/gro_cells.c
index ff8e5b64bf6b..06e6889138ba 100644
--- a/net/core/gro_cells.c
+++ b/net/core/gro_cells.c
@@ -20,7 +20,7 @@ int gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
 	if (unlikely(!(dev->flags & IFF_UP)))
 		goto drop;
 
-	if (!gcells->cells || skb_cloned(skb) || netif_elide_gro(dev)) {
+	if (!gcells->cells || netif_elide_gro(dev)) {
 		res = netif_rx(skb);
 		goto unlock;
 	}
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 2308665b51c5..66a2bb849e85 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -322,6 +322,12 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb,
 	if (!p)
 		goto out_check_final;
 
+	if (unlikely(skb_cloned(skb))) {
+		NAPI_GRO_CB(skb)->flush |= 1;
+		NAPI_GRO_CB(skb)->same_flow = 0;
+		return p;
+	}
+
 	th2 = tcp_hdr(p);
 	flush = (__force int)(flags & TCP_FLAG_CWR);
 	flush |= (__force int)((flags ^ tcp_flag_word(th2)) &
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index a5be6e4ed326..a9c85b0556ce 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -647,6 +647,11 @@ struct sk_buff *udp4_gro_receive(struct list_head *head, struct sk_buff *skb)
 	struct sock *sk = NULL;
 	struct sk_buff *pp;
 
+	if (unlikely(skb_cloned(skb))) {
+		NAPI_GRO_CB(skb)->same_flow = 0;
+		goto flush;
+	}
+
 	if (unlikely(!uh))
 		goto flush;
 
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index b41152dd4246..b754747e3e8a 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -134,6 +134,11 @@ struct sk_buff *udp6_gro_receive(struct list_head *head, struct sk_buff *skb)
 	struct sock *sk = NULL;
 	struct sk_buff *pp;
 
+	if (unlikely(skb_cloned(skb))) {
+		NAPI_GRO_CB(skb)->same_flow = 0;
+		goto flush;
+	}
+
 	if (unlikely(!uh))
 		goto flush;
 

What do you think about this approach ?

Thomas.

-- 
SUSE Software Solutions Germany GmbH
HRB 36809 (AG Nürnberg)
Geschäftsführer: Ivo Totev, Andrew McDonald, Werner Knoblich
Re: [PATCH v2 net] gro_cells: Avoid packet re-ordering for cloned skbs
Posted by Thomas Bogendoerfer 10 months, 3 weeks ago
On Wed, 29 Jan 2025 12:31:29 +0100
Thomas Bogendoerfer <tbogendoerfer@suse.de> wrote:

> My test scenario is simple:
> 
> TCP Sender in namespace A -> ip6_tunnel -> ipvlan -> ipvlan -> ip6_tunnel -> TCP receiver

sorry, messed it up. It looks like this

<-        Namespace A           ->    <-        Namespace b             ->
TCP Sender -> ip6_tunnel -> ipvlan -> ipvlan -> ip6_tunnel -> TCP Receiver

Thomas.

-- 
SUSE Software Solutions Germany GmbH
HRB 36809 (AG Nürnberg)
Geschäftsführer: Ivo Totev, Andrew McDonald, Werner Knoblich
Re: [PATCH v2 net] gro_cells: Avoid packet re-ordering for cloned skbs
Posted by Eric Dumazet 10 months, 3 weeks ago
On Wed, Jan 29, 2025 at 12:57 PM Thomas Bogendoerfer
<tbogendoerfer@suse.de> wrote:
>
> On Wed, 29 Jan 2025 12:31:29 +0100
> Thomas Bogendoerfer <tbogendoerfer@suse.de> wrote:
>
> > My test scenario is simple:
> >
> > TCP Sender in namespace A -> ip6_tunnel -> ipvlan -> ipvlan -> ip6_tunnel -> TCP receiver
>
> sorry, messed it up. It looks like this
>
> <-        Namespace A           ->    <-        Namespace b             ->
> TCP Sender -> ip6_tunnel -> ipvlan -> ipvlan -> ip6_tunnel -> TCP Receiver
>


We are trying to avoid adding costs in GRO layer (a critical piece of
software for high speed flows), for a doubtful use case,
possibly obsolete.

BTW I am still unsure about the skb_cloned() test vs
skb_header_cloned() which would solve this case just  fine.
Because TCP sender is ok if some layer wants to change the headers,
thanks to __skb_header_release() call
from tcp_skb_entail()

"TCP Sender in namespace A -> ip6_tunnel -> ipvlan -> ipvlan ->
ip6_tunnel -> TCP receiver"
or
" TCP Sender -> ip6_tunnel -> ipvlan -> ipvlan -> ip6_tunnel -> TCP Receiver"

In this case, GRO in ip6_tunnel is not needed at all, since proper TSO
packets should already be cooked by TCP sender and be carried
to the receiver as plain GRO packets.

gro_cells was added at a time GRO layer was only  supporting native
encapsulations : IPv4 + TCP or IPv6 + TCP.

Nowadays, GRO supports encapsulated traffic just fine, same for TSO
packets encapsulated in ip6_tunnel

Maybe it is time to remove gro_cells from net/ipv6/ip6_tunnel.c

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 48fd53b9897265338086136e96ea8e8c6ec3cac..b91c253dc4f1998f8df74251a93e29d00c03db5
100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -246,7 +246,6 @@ static void ip6_dev_free(struct net_device *dev)
 {
        struct ip6_tnl *t = netdev_priv(dev);

-       gro_cells_destroy(&t->gro_cells);
        dst_cache_destroy(&t->dst_cache);
 }

@@ -877,7 +876,7 @@ static int __ip6_tnl_rcv(struct ip6_tnl *tunnel,
struct sk_buff *skb,
        if (tun_dst)
                skb_dst_set(skb, (struct dst_entry *)tun_dst);

-       gro_cells_receive(&tunnel->gro_cells, skb);
+       netif_rx(skb);
        return 0;

 drop:
@@ -1884,10 +1883,6 @@ ip6_tnl_dev_init_gen(struct net_device *dev)
        if (ret)
                return ret;

-       ret = gro_cells_init(&t->gro_cells, dev);
-       if (ret)
-               goto destroy_dst;
-
        t->tun_hlen = 0;
        t->hlen = t->encap_hlen + t->tun_hlen;
        t_hlen = t->hlen + sizeof(struct ipv6hdr);
@@ -1902,11 +1897,6 @@ ip6_tnl_dev_init_gen(struct net_device *dev)
        netdev_hold(dev, &t->dev_tracker, GFP_KERNEL);
        netdev_lockdep_set_classes(dev);
        return 0;
-
-destroy_dst:
-       dst_cache_destroy(&t->dst_cache);
-
-       return ret;
 }

 /**
Re: [PATCH v2 net] gro_cells: Avoid packet re-ordering for cloned skbs
Posted by Thomas Bogendoerfer 10 months, 3 weeks ago
On Wed, 29 Jan 2025 13:06:49 +0100
Eric Dumazet <edumazet@google.com> wrote:

> "TCP Sender in namespace A -> ip6_tunnel -> ipvlan -> ipvlan ->
> ip6_tunnel -> TCP receiver"
> or
> " TCP Sender -> ip6_tunnel -> ipvlan -> ipvlan -> ip6_tunnel -> TCP Receiver"
> 
> In this case, GRO in ip6_tunnel is not needed at all, since proper TSO
> packets should already be cooked by TCP sender and be carried
> to the receiver as plain GRO packets.
> 
> gro_cells was added at a time GRO layer was only  supporting native
> encapsulations : IPv4 + TCP or IPv6 + TCP.
> 
> Nowadays, GRO supports encapsulated traffic just fine, same for TSO
> packets encapsulated in ip6_tunnel
> 
> Maybe it is time to remove gro_cells from net/ipv6/ip6_tunnel.c
> 
> diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
> index 48fd53b9897265338086136e96ea8e8c6ec3cac..b91c253dc4f1998f8df74251a93e29d00c03db5
> 100644
> --- a/net/ipv6/ip6_tunnel.c
> +++ b/net/ipv6/ip6_tunnel.c
> [...]

this patch works for my test case. So the same thing should be probably
done for net/ipv4/ip_tunnel.c and net/ipv6/ip6_gre.c, too ?

Thomas.

-- 
SUSE Software Solutions Germany GmbH
HRB 36809 (AG Nürnberg)
Geschäftsführer: Ivo Totev, Andrew McDonald, Werner Knoblich
Re: [PATCH v2 net] gro_cells: Avoid packet re-ordering for cloned skbs
Posted by Eric Dumazet 11 months ago
On Tue, Jan 21, 2025 at 12:50 PM Thomas Bogendoerfer
<tbogendoerfer@suse.de> wrote:
>
> gro_cells_receive() passes a cloned skb directly up the stack and
> could cause re-ordering against segments still in GRO. To avoid
> this queue cloned skbs and use gro_normal_one() to pass it during
> normal NAPI work.
>
> Fixes: c9e6bc644e55 ("net: add gro_cells infrastructure")
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
> --
> v2: don't use skb_copy(), but make decision how to pass cloned skbs in
>     napi poll function (suggested by Eric)
> v1: https://lore.kernel.org/lkml/20250109142724.29228-1-tbogendoerfer@suse.de/
>

Reviewed-by: Eric Dumazet <edumazet@google.com>

Thanks.