net/core/gro_cells.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
gro_cells_receive() passes a cloned skb directly up the stack and
could cause re-ordering against segments still in GRO. To avoid
this queue cloned skbs and use gro_normal_one() to pass it during
normal NAPI work.
Fixes: c9e6bc644e55 ("net: add gro_cells infrastructure")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
--
v2: don't use skb_copy(), but make decision how to pass cloned skbs in
napi poll function (suggested by Eric)
v1: https://lore.kernel.org/lkml/20250109142724.29228-1-tbogendoerfer@suse.de/
net/core/gro_cells.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/net/core/gro_cells.c b/net/core/gro_cells.c
index ff8e5b64bf6b..762746d18486 100644
--- a/net/core/gro_cells.c
+++ b/net/core/gro_cells.c
@@ -2,6 +2,7 @@
#include <linux/skbuff.h>
#include <linux/slab.h>
#include <linux/netdevice.h>
+#include <net/gro.h>
#include <net/gro_cells.h>
#include <net/hotdata.h>
@@ -20,7 +21,7 @@ int gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
if (unlikely(!(dev->flags & IFF_UP)))
goto drop;
- if (!gcells->cells || skb_cloned(skb) || netif_elide_gro(dev)) {
+ if (!gcells->cells || netif_elide_gro(dev)) {
res = netif_rx(skb);
goto unlock;
}
@@ -58,7 +59,11 @@ static int gro_cell_poll(struct napi_struct *napi, int budget)
skb = __skb_dequeue(&cell->napi_skbs);
if (!skb)
break;
- napi_gro_receive(napi, skb);
+ /* Core GRO stack does not play well with clones. */
+ if (skb_cloned(skb))
+ gro_normal_one(napi, skb, 1);
+ else
+ napi_gro_receive(napi, skb);
work_done++;
}
--
2.35.3
On 1/21/25 12:50 PM, Thomas Bogendoerfer wrote:
> gro_cells_receive() passes a cloned skb directly up the stack and
> could cause re-ordering against segments still in GRO. To avoid
> this queue cloned skbs and use gro_normal_one() to pass it during
> normal NAPI work.
>
> Fixes: c9e6bc644e55 ("net: add gro_cells infrastructure")
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
> --
> v2: don't use skb_copy(), but make decision how to pass cloned skbs in
> napi poll function (suggested by Eric)
> v1: https://lore.kernel.org/lkml/20250109142724.29228-1-tbogendoerfer@suse.de/
>
> net/core/gro_cells.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/net/core/gro_cells.c b/net/core/gro_cells.c
> index ff8e5b64bf6b..762746d18486 100644
> --- a/net/core/gro_cells.c
> +++ b/net/core/gro_cells.c
> @@ -2,6 +2,7 @@
> #include <linux/skbuff.h>
> #include <linux/slab.h>
> #include <linux/netdevice.h>
> +#include <net/gro.h>
> #include <net/gro_cells.h>
> #include <net/hotdata.h>
>
> @@ -20,7 +21,7 @@ int gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
> if (unlikely(!(dev->flags & IFF_UP)))
> goto drop;
>
> - if (!gcells->cells || skb_cloned(skb) || netif_elide_gro(dev)) {
> + if (!gcells->cells || netif_elide_gro(dev)) {
> res = netif_rx(skb);
> goto unlock;
> }
> @@ -58,7 +59,11 @@ static int gro_cell_poll(struct napi_struct *napi, int budget)
> skb = __skb_dequeue(&cell->napi_skbs);
> if (!skb)
> break;
> - napi_gro_receive(napi, skb);
> + /* Core GRO stack does not play well with clones. */
> + if (skb_cloned(skb))
> + gro_normal_one(napi, skb, 1);
> + else
> + napi_gro_receive(napi, skb);
I must admit it's not clear to me how/why the above will avoid OoO. I
assume OoO happens when we observe both cloned and uncloned packets
belonging to the same connection/flow.
What if we have a (uncloned) packet for the relevant flow in the GRO,
'rx_count - 1' packets already sitting in 'rx_list' and a cloned packet
for the critical flow reaches gro_cells_receive()?
Don't we need to unconditionally flush any packets belonging to the same
flow?
Thanks!
Paolo
On Thu, Jan 23, 2025 at 9:43 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 1/21/25 12:50 PM, Thomas Bogendoerfer wrote:
> > gro_cells_receive() passes a cloned skb directly up the stack and
> > could cause re-ordering against segments still in GRO. To avoid
> > this queue cloned skbs and use gro_normal_one() to pass it during
> > normal NAPI work.
> >
> > Fixes: c9e6bc644e55 ("net: add gro_cells infrastructure")
> > Suggested-by: Eric Dumazet <edumazet@google.com>
> > Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
> > --
> > v2: don't use skb_copy(), but make decision how to pass cloned skbs in
> > napi poll function (suggested by Eric)
> > v1: https://lore.kernel.org/lkml/20250109142724.29228-1-tbogendoerfer@suse.de/
> >
> > net/core/gro_cells.c | 9 +++++++--
> > 1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/core/gro_cells.c b/net/core/gro_cells.c
> > index ff8e5b64bf6b..762746d18486 100644
> > --- a/net/core/gro_cells.c
> > +++ b/net/core/gro_cells.c
> > @@ -2,6 +2,7 @@
> > #include <linux/skbuff.h>
> > #include <linux/slab.h>
> > #include <linux/netdevice.h>
> > +#include <net/gro.h>
> > #include <net/gro_cells.h>
> > #include <net/hotdata.h>
> >
> > @@ -20,7 +21,7 @@ int gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
> > if (unlikely(!(dev->flags & IFF_UP)))
> > goto drop;
> >
> > - if (!gcells->cells || skb_cloned(skb) || netif_elide_gro(dev)) {
> > + if (!gcells->cells || netif_elide_gro(dev)) {
> > res = netif_rx(skb);
> > goto unlock;
> > }
> > @@ -58,7 +59,11 @@ static int gro_cell_poll(struct napi_struct *napi, int budget)
> > skb = __skb_dequeue(&cell->napi_skbs);
> > if (!skb)
> > break;
> > - napi_gro_receive(napi, skb);
> > + /* Core GRO stack does not play well with clones. */
> > + if (skb_cloned(skb))
> > + gro_normal_one(napi, skb, 1);
> > + else
> > + napi_gro_receive(napi, skb);
>
> I must admit it's not clear to me how/why the above will avoid OoO. I
> assume OoO happens when we observe both cloned and uncloned packets
> belonging to the same connection/flow.
>
> What if we have a (uncloned) packet for the relevant flow in the GRO,
> 'rx_count - 1' packets already sitting in 'rx_list' and a cloned packet
> for the critical flow reaches gro_cells_receive()?
>
> Don't we need to unconditionally flush any packets belonging to the same
> flow?
It would only matter if we had 2 or more segments that would belong
to the same flow and packet train (potential 'GRO super packet'), with
the 'cloned'
status being of mixed value on various segments.
In practice, the cloned status will be the same for all segments.
Same issue would happen when/if dev->features NETIF_F_GRO is flipped
back and forth : We do not really care.
On 1/23/25 11:07 AM, Eric Dumazet wrote:
> On Thu, Jan 23, 2025 at 9:43 AM Paolo Abeni <pabeni@redhat.com> wrote:
>> On 1/21/25 12:50 PM, Thomas Bogendoerfer wrote:
>>> gro_cells_receive() passes a cloned skb directly up the stack and
>>> could cause re-ordering against segments still in GRO. To avoid
>>> this queue cloned skbs and use gro_normal_one() to pass it during
>>> normal NAPI work.
>>>
>>> Fixes: c9e6bc644e55 ("net: add gro_cells infrastructure")
>>> Suggested-by: Eric Dumazet <edumazet@google.com>
>>> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
>>> --
>>> v2: don't use skb_copy(), but make decision how to pass cloned skbs in
>>> napi poll function (suggested by Eric)
>>> v1: https://lore.kernel.org/lkml/20250109142724.29228-1-tbogendoerfer@suse.de/
>>>
>>> net/core/gro_cells.c | 9 +++++++--
>>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/core/gro_cells.c b/net/core/gro_cells.c
>>> index ff8e5b64bf6b..762746d18486 100644
>>> --- a/net/core/gro_cells.c
>>> +++ b/net/core/gro_cells.c
>>> @@ -2,6 +2,7 @@
>>> #include <linux/skbuff.h>
>>> #include <linux/slab.h>
>>> #include <linux/netdevice.h>
>>> +#include <net/gro.h>
>>> #include <net/gro_cells.h>
>>> #include <net/hotdata.h>
>>>
>>> @@ -20,7 +21,7 @@ int gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
>>> if (unlikely(!(dev->flags & IFF_UP)))
>>> goto drop;
>>>
>>> - if (!gcells->cells || skb_cloned(skb) || netif_elide_gro(dev)) {
>>> + if (!gcells->cells || netif_elide_gro(dev)) {
>>> res = netif_rx(skb);
>>> goto unlock;
>>> }
>>> @@ -58,7 +59,11 @@ static int gro_cell_poll(struct napi_struct *napi, int budget)
>>> skb = __skb_dequeue(&cell->napi_skbs);
>>> if (!skb)
>>> break;
>>> - napi_gro_receive(napi, skb);
>>> + /* Core GRO stack does not play well with clones. */
>>> + if (skb_cloned(skb))
>>> + gro_normal_one(napi, skb, 1);
>>> + else
>>> + napi_gro_receive(napi, skb);
>>
>> I must admit it's not clear to me how/why the above will avoid OoO. I
>> assume OoO happens when we observe both cloned and uncloned packets
>> belonging to the same connection/flow.
>>
>> What if we have a (uncloned) packet for the relevant flow in the GRO,
>> 'rx_count - 1' packets already sitting in 'rx_list' and a cloned packet
>> for the critical flow reaches gro_cells_receive()?
>>
>> Don't we need to unconditionally flush any packets belonging to the same
>> flow?
>
> It would only matter if we had 2 or more segments that would belong
> to the same flow and packet train (potential 'GRO super packet'), with
> the 'cloned'
> status being of mixed value on various segments.
>
> In practice, the cloned status will be the same for all segments.
I agree with the above, but my doubt is: does the above also mean that
in practice there are no OoO to deal with, even without this patch?
To rephrase my doubt: which scenario is addressed by this patch that
would lead to OoO without it?
Thanks,
Paolo
On Thu, Jan 23, 2025 at 11:42 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 1/23/25 11:07 AM, Eric Dumazet wrote:
> > On Thu, Jan 23, 2025 at 9:43 AM Paolo Abeni <pabeni@redhat.com> wrote:
> >> On 1/21/25 12:50 PM, Thomas Bogendoerfer wrote:
> >>> gro_cells_receive() passes a cloned skb directly up the stack and
> >>> could cause re-ordering against segments still in GRO. To avoid
> >>> this queue cloned skbs and use gro_normal_one() to pass it during
> >>> normal NAPI work.
> >>>
> >>> Fixes: c9e6bc644e55 ("net: add gro_cells infrastructure")
> >>> Suggested-by: Eric Dumazet <edumazet@google.com>
> >>> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
> >>> --
> >>> v2: don't use skb_copy(), but make decision how to pass cloned skbs in
> >>> napi poll function (suggested by Eric)
> >>> v1: https://lore.kernel.org/lkml/20250109142724.29228-1-tbogendoerfer@suse.de/
> >>>
> >>> net/core/gro_cells.c | 9 +++++++--
> >>> 1 file changed, 7 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/net/core/gro_cells.c b/net/core/gro_cells.c
> >>> index ff8e5b64bf6b..762746d18486 100644
> >>> --- a/net/core/gro_cells.c
> >>> +++ b/net/core/gro_cells.c
> >>> @@ -2,6 +2,7 @@
> >>> #include <linux/skbuff.h>
> >>> #include <linux/slab.h>
> >>> #include <linux/netdevice.h>
> >>> +#include <net/gro.h>
> >>> #include <net/gro_cells.h>
> >>> #include <net/hotdata.h>
> >>>
> >>> @@ -20,7 +21,7 @@ int gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
> >>> if (unlikely(!(dev->flags & IFF_UP)))
> >>> goto drop;
> >>>
> >>> - if (!gcells->cells || skb_cloned(skb) || netif_elide_gro(dev)) {
> >>> + if (!gcells->cells || netif_elide_gro(dev)) {
> >>> res = netif_rx(skb);
> >>> goto unlock;
> >>> }
> >>> @@ -58,7 +59,11 @@ static int gro_cell_poll(struct napi_struct *napi, int budget)
> >>> skb = __skb_dequeue(&cell->napi_skbs);
> >>> if (!skb)
> >>> break;
> >>> - napi_gro_receive(napi, skb);
> >>> + /* Core GRO stack does not play well with clones. */
> >>> + if (skb_cloned(skb))
> >>> + gro_normal_one(napi, skb, 1);
> >>> + else
> >>> + napi_gro_receive(napi, skb);
> >>
> >> I must admit it's not clear to me how/why the above will avoid OoO. I
> >> assume OoO happens when we observe both cloned and uncloned packets
> >> belonging to the same connection/flow.
> >>
> >> What if we have a (uncloned) packet for the relevant flow in the GRO,
> >> 'rx_count - 1' packets already sitting in 'rx_list' and a cloned packet
> >> for the critical flow reaches gro_cells_receive()?
> >>
> >> Don't we need to unconditionally flush any packets belonging to the same
> >> flow?
> >
> > It would only matter if we had 2 or more segments that would belong
> > to the same flow and packet train (potential 'GRO super packet'), with
> > the 'cloned'
> > status being of mixed value on various segments.
> >
> > In practice, the cloned status will be the same for all segments.
>
> I agree with the above, but my doubt is: does the above also mean that
> in practice there are no OoO to deal with, even without this patch?
>
> To rephrase my doubt: which scenario is addressed by this patch that
> would lead to OoO without it?
Fair point, a detailed changelog would be really nice.
On Thu, 23 Jan 2025 11:43:05 +0100
Eric Dumazet <edumazet@google.com> wrote:
> On Thu, Jan 23, 2025 at 11:42 AM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > On 1/23/25 11:07 AM, Eric Dumazet wrote:
> > > On Thu, Jan 23, 2025 at 9:43 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > >> On 1/21/25 12:50 PM, Thomas Bogendoerfer wrote:
> > >>> gro_cells_receive() passes a cloned skb directly up the stack and
> > >>> could cause re-ordering against segments still in GRO. To avoid
> > >>> this queue cloned skbs and use gro_normal_one() to pass it during
> > >>> normal NAPI work.
> > >>>
> > >>> Fixes: c9e6bc644e55 ("net: add gro_cells infrastructure")
> > >>> Suggested-by: Eric Dumazet <edumazet@google.com>
> > >>> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
> > >>> --
> > >>> v2: don't use skb_copy(), but make decision how to pass cloned skbs in
> > >>> napi poll function (suggested by Eric)
> > >>> v1: https://lore.kernel.org/lkml/20250109142724.29228-1-tbogendoerfer@suse.de/
> > >>>
> > >>> net/core/gro_cells.c | 9 +++++++--
> > >>> 1 file changed, 7 insertions(+), 2 deletions(-)
> > >>>
> > >>> diff --git a/net/core/gro_cells.c b/net/core/gro_cells.c
> > >>> index ff8e5b64bf6b..762746d18486 100644
> > >>> --- a/net/core/gro_cells.c
> > >>> +++ b/net/core/gro_cells.c
> > >>> @@ -2,6 +2,7 @@
> > >>> #include <linux/skbuff.h>
> > >>> #include <linux/slab.h>
> > >>> #include <linux/netdevice.h>
> > >>> +#include <net/gro.h>
> > >>> #include <net/gro_cells.h>
> > >>> #include <net/hotdata.h>
> > >>>
> > >>> @@ -20,7 +21,7 @@ int gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
> > >>> if (unlikely(!(dev->flags & IFF_UP)))
> > >>> goto drop;
> > >>>
> > >>> - if (!gcells->cells || skb_cloned(skb) || netif_elide_gro(dev)) {
> > >>> + if (!gcells->cells || netif_elide_gro(dev)) {
> > >>> res = netif_rx(skb);
> > >>> goto unlock;
> > >>> }
> > >>> @@ -58,7 +59,11 @@ static int gro_cell_poll(struct napi_struct *napi, int budget)
> > >>> skb = __skb_dequeue(&cell->napi_skbs);
> > >>> if (!skb)
> > >>> break;
> > >>> - napi_gro_receive(napi, skb);
> > >>> + /* Core GRO stack does not play well with clones. */
> > >>> + if (skb_cloned(skb))
> > >>> + gro_normal_one(napi, skb, 1);
> > >>> + else
> > >>> + napi_gro_receive(napi, skb);
> > >>
> > >> I must admit it's not clear to me how/why the above will avoid OoO. I
> > >> assume OoO happens when we observe both cloned and uncloned packets
> > >> belonging to the same connection/flow.
> > >>
> > >> What if we have a (uncloned) packet for the relevant flow in the GRO,
> > >> 'rx_count - 1' packets already sitting in 'rx_list' and a cloned packet
> > >> for the critical flow reaches gro_cells_receive()?
> > >>
> > >> Don't we need to unconditionally flush any packets belonging to the same
> > >> flow?
> > >
> > > It would only matter if we had 2 or more segments that would belong
> > > to the same flow and packet train (potential 'GRO super packet'), with
> > > the 'cloned'
> > > status being of mixed value on various segments.
> > >
> > > In practice, the cloned status will be the same for all segments.
> >
> > I agree with the above, but my doubt is: does the above also mean that
> > in practice there are no OoO to deal with, even without this patch?
> >
> > To rephrase my doubt: which scenario is addressed by this patch that
> > would lead to OoO without it?
>
> Fair point, a detailed changelog would be really nice.
My test scenario is simple:
TCP Sender in namespace A -> ip6_tunnel -> ipvlan -> ipvlan -> ip6_tunnel -> TCP receiver
Sender does continuous writes in 15k chunks, receiver reads data from socket in a loop.
And that is what I see:
40 0.002766 1000::1 → 2000::1 TCP 15088 51238 → 5060 [PSH, ACK] Seq=2862576060 Ack=1152583678 Win=65536 Len=15000 TSval=3343493494 TSecr=629171944
41 0.002844 1000::1 → 2000::1 TCP 9816 51238 → 5060 [PSH, ACK] Seq=2862591060 Ack=1152583678 Win=65536 Len=9728 TSval=3343493494 TSecr=629171944
42 0.004122 1000::1 → 2000::1 TCP 1468 [TCP Previous segment not captured] 51238 → 5060 [ACK] Seq=2862642188 Ack=1152583678 Win=65536 Len=1380 TSval=3343493496 TSecr=629171946
43 0.004128 1000::1 → 2000::1 TCP 20788 [TCP Out-Of-Order] 51238 → 5060 [PSH, ACK] Seq=2862600788 Ack=1152583678 Win=65536 Len=20700 TSval=3343493496 TSecr=629171946
44 0.004133 1000::1 → 2000::1 TCP 20788 [TCP Out-Of-Order] 51238 → 5060 [PSH, ACK] Seq=2862621488 Ack=1152583678 Win=65536 Len=20700 TSval=3343493496 TSecr=629171946
45 0.004169 1000::1 → 2000::1 TCP 500 [TCP Previous segment not captured] 51238 → 5060 [PSH, ACK] Seq=2862665648 Ack=1152583678 Win=65536 Len=412 TSval=3343493496 TSecr=629171946
46 0.004180 1000::1 → 2000::1 TCP 22168 [TCP Out-Of-Order] 51238 → 5060 [PSH, ACK] Seq=2862643568 Ack=1152583678 Win=65536 Len=22080 TSval=3343493496 TSecr=629171946
47 0.004187 1000::1 → 2000::1 TCP 13888 51238 → 5060 [PSH, ACK] Seq=2862666060 Ack=1152583678 Win=65536 Len=13800 TSval=3343493496 TSecr=629171946
48 0.004201 1000::1 → 2000::1 TCP 1288 51238 → 5060 [PSH, ACK] Seq=2862679860 Ack=1152583678 Win=65536 Len=1200 TSval=3343493496 TSecr=629171946
49 0.004273 1000::1 → 2000::1 TCP 13888 51238 → 5060 [PSH, ACK] Seq=2862681060 Ack=1152583678 Win=65536 Len=13800 TSval=3343493496 TSecr=629171946
IMHO these ooO are retransmits for segments still waiting in GRO. With the
v2 patch this looks applied trace looks like this:
2856 9.526256 1000::1 → 2000::1 TCP 64948 50452 → 5060 [PSH, ACK] Seq=1871837193 Ack=209151777 Win=65536 Len=64860 TSval=2755210164 TSecr=2795235137
2857 9.526258 1000::1 → 2000::1 TCP 5480 50452 → 5060 [PSH, ACK] Seq=1871902053 Ack=209151777 Win=65536 Len=5392 TSval=2755210164 TSecr=2795235137
2858 9.535262 1000::1 → 2000::1 TCP 1340 [TCP Retransmission] 50452 → 5060 [ACK] Seq=1871906193 Ack=209151777 Win=65536 Len=1252 TSval=2755210174 TSecr=2795235137
2859 9.585477 1000::1 → 2000::1 TCP 64948 50452 → 5060 [PSH, ACK] Seq=1871907445 Ack=209151777 Win=65536 Len=64860 TSval=2755210224 TSecr=2795235197
2860 9.585486 1000::1 → 2000::1 TCP 64948 50452 → 5060 [PSH, ACK] Seq=1871972305 Ack=209151777 Win=65536 Len=64860 TSval=2755210224 TSecr=2795235197
Looks ok to me, but without a GRO flush there is still a chance of ooO packets.
I've worked on a new patch (below as a RFC) which pushes the check for skb_cloned()
into GRO. Result is comparable to the v2 patch:
604 1.987863 1000::1 → 2000::1 TCP 64948 57278 → 5060 [PSH, ACK] Seq=1220895319 Ack=1484877190 Win=65536 Len=64860 TSval=646104760 TSecr=459787214
605 1.987866 1000::1 → 2000::1 TCP 16488 57278 → 5060 [PSH, ACK] Seq=1220960179 Ack=1484877190 Win=65536 Len=16400 TSval=646104760 TSecr=459787214
606 1.998231 1000::1 → 2000::1 TCP 1308 [TCP Retransmission] 57278 → 5060 [ACK] Seq=1220975359 Ack=1484877190 Win=65536 Len=1220 TSval=646104771 TSecr=459787214
607 2.049288 1000::1 → 2000::1 TCP 64948 57278 → 5060 [PSH,
ACK] Seq=1220976579 Ack=1484877190 Win=65536 Len=64860 TSval=646104822
TSecr=459787276
608 2.049304 1000::1 → 2000::1 TCP 64948 57278 → 5060 [PSH,
ACK] Seq=1221041439 Ack=1484877190 Win=65536 Len=64860 TSval=646104822
TSecr=459787276
diff --git a/net/core/gro_cells.c b/net/core/gro_cells.c
index ff8e5b64bf6b..06e6889138ba 100644
--- a/net/core/gro_cells.c
+++ b/net/core/gro_cells.c
@@ -20,7 +20,7 @@ int gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
if (unlikely(!(dev->flags & IFF_UP)))
goto drop;
- if (!gcells->cells || skb_cloned(skb) || netif_elide_gro(dev)) {
+ if (!gcells->cells || netif_elide_gro(dev)) {
res = netif_rx(skb);
goto unlock;
}
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 2308665b51c5..66a2bb849e85 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -322,6 +322,12 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb,
if (!p)
goto out_check_final;
+ if (unlikely(skb_cloned(skb))) {
+ NAPI_GRO_CB(skb)->flush |= 1;
+ NAPI_GRO_CB(skb)->same_flow = 0;
+ return p;
+ }
+
th2 = tcp_hdr(p);
flush = (__force int)(flags & TCP_FLAG_CWR);
flush |= (__force int)((flags ^ tcp_flag_word(th2)) &
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index a5be6e4ed326..a9c85b0556ce 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -647,6 +647,11 @@ struct sk_buff *udp4_gro_receive(struct list_head *head, struct sk_buff *skb)
struct sock *sk = NULL;
struct sk_buff *pp;
+ if (unlikely(skb_cloned(skb))) {
+ NAPI_GRO_CB(skb)->same_flow = 0;
+ goto flush;
+ }
+
if (unlikely(!uh))
goto flush;
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index b41152dd4246..b754747e3e8a 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -134,6 +134,11 @@ struct sk_buff *udp6_gro_receive(struct list_head *head, struct sk_buff *skb)
struct sock *sk = NULL;
struct sk_buff *pp;
+ if (unlikely(skb_cloned(skb))) {
+ NAPI_GRO_CB(skb)->same_flow = 0;
+ goto flush;
+ }
+
if (unlikely(!uh))
goto flush;
What do you think about this approach ?
Thomas.
--
SUSE Software Solutions Germany GmbH
HRB 36809 (AG Nürnberg)
Geschäftsführer: Ivo Totev, Andrew McDonald, Werner Knoblich
On Wed, 29 Jan 2025 12:31:29 +0100 Thomas Bogendoerfer <tbogendoerfer@suse.de> wrote: > My test scenario is simple: > > TCP Sender in namespace A -> ip6_tunnel -> ipvlan -> ipvlan -> ip6_tunnel -> TCP receiver sorry, messed it up. It looks like this <- Namespace A -> <- Namespace b -> TCP Sender -> ip6_tunnel -> ipvlan -> ipvlan -> ip6_tunnel -> TCP Receiver Thomas. -- SUSE Software Solutions Germany GmbH HRB 36809 (AG Nürnberg) Geschäftsführer: Ivo Totev, Andrew McDonald, Werner Knoblich
On Wed, Jan 29, 2025 at 12:57 PM Thomas Bogendoerfer
<tbogendoerfer@suse.de> wrote:
>
> On Wed, 29 Jan 2025 12:31:29 +0100
> Thomas Bogendoerfer <tbogendoerfer@suse.de> wrote:
>
> > My test scenario is simple:
> >
> > TCP Sender in namespace A -> ip6_tunnel -> ipvlan -> ipvlan -> ip6_tunnel -> TCP receiver
>
> sorry, messed it up. It looks like this
>
> <- Namespace A -> <- Namespace b ->
> TCP Sender -> ip6_tunnel -> ipvlan -> ipvlan -> ip6_tunnel -> TCP Receiver
>
We are trying to avoid adding costs in GRO layer (a critical piece of
software for high speed flows), for a doubtful use case,
possibly obsolete.
BTW I am still unsure about the skb_cloned() test vs
skb_header_cloned() which would solve this case just fine.
Because TCP sender is ok if some layer wants to change the headers,
thanks to __skb_header_release() call
from tcp_skb_entail()
"TCP Sender in namespace A -> ip6_tunnel -> ipvlan -> ipvlan ->
ip6_tunnel -> TCP receiver"
or
" TCP Sender -> ip6_tunnel -> ipvlan -> ipvlan -> ip6_tunnel -> TCP Receiver"
In this case, GRO in ip6_tunnel is not needed at all, since proper TSO
packets should already be cooked by TCP sender and be carried
to the receiver as plain GRO packets.
gro_cells was added at a time GRO layer was only supporting native
encapsulations : IPv4 + TCP or IPv6 + TCP.
Nowadays, GRO supports encapsulated traffic just fine, same for TSO
packets encapsulated in ip6_tunnel
Maybe it is time to remove gro_cells from net/ipv6/ip6_tunnel.c
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 48fd53b9897265338086136e96ea8e8c6ec3cac..b91c253dc4f1998f8df74251a93e29d00c03db5
100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -246,7 +246,6 @@ static void ip6_dev_free(struct net_device *dev)
{
struct ip6_tnl *t = netdev_priv(dev);
- gro_cells_destroy(&t->gro_cells);
dst_cache_destroy(&t->dst_cache);
}
@@ -877,7 +876,7 @@ static int __ip6_tnl_rcv(struct ip6_tnl *tunnel,
struct sk_buff *skb,
if (tun_dst)
skb_dst_set(skb, (struct dst_entry *)tun_dst);
- gro_cells_receive(&tunnel->gro_cells, skb);
+ netif_rx(skb);
return 0;
drop:
@@ -1884,10 +1883,6 @@ ip6_tnl_dev_init_gen(struct net_device *dev)
if (ret)
return ret;
- ret = gro_cells_init(&t->gro_cells, dev);
- if (ret)
- goto destroy_dst;
-
t->tun_hlen = 0;
t->hlen = t->encap_hlen + t->tun_hlen;
t_hlen = t->hlen + sizeof(struct ipv6hdr);
@@ -1902,11 +1897,6 @@ ip6_tnl_dev_init_gen(struct net_device *dev)
netdev_hold(dev, &t->dev_tracker, GFP_KERNEL);
netdev_lockdep_set_classes(dev);
return 0;
-
-destroy_dst:
- dst_cache_destroy(&t->dst_cache);
-
- return ret;
}
/**
On Wed, 29 Jan 2025 13:06:49 +0100 Eric Dumazet <edumazet@google.com> wrote: > "TCP Sender in namespace A -> ip6_tunnel -> ipvlan -> ipvlan -> > ip6_tunnel -> TCP receiver" > or > " TCP Sender -> ip6_tunnel -> ipvlan -> ipvlan -> ip6_tunnel -> TCP Receiver" > > In this case, GRO in ip6_tunnel is not needed at all, since proper TSO > packets should already be cooked by TCP sender and be carried > to the receiver as plain GRO packets. > > gro_cells was added at a time GRO layer was only supporting native > encapsulations : IPv4 + TCP or IPv6 + TCP. > > Nowadays, GRO supports encapsulated traffic just fine, same for TSO > packets encapsulated in ip6_tunnel > > Maybe it is time to remove gro_cells from net/ipv6/ip6_tunnel.c > > diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c > index 48fd53b9897265338086136e96ea8e8c6ec3cac..b91c253dc4f1998f8df74251a93e29d00c03db5 > 100644 > --- a/net/ipv6/ip6_tunnel.c > +++ b/net/ipv6/ip6_tunnel.c > [...] this patch works for my test case. So the same thing should be probably done for net/ipv4/ip_tunnel.c and net/ipv6/ip6_gre.c, too ? Thomas. -- SUSE Software Solutions Germany GmbH HRB 36809 (AG Nürnberg) Geschäftsführer: Ivo Totev, Andrew McDonald, Werner Knoblich
On Tue, Jan 21, 2025 at 12:50 PM Thomas Bogendoerfer
<tbogendoerfer@suse.de> wrote:
>
> gro_cells_receive() passes a cloned skb directly up the stack and
> could cause re-ordering against segments still in GRO. To avoid
> this queue cloned skbs and use gro_normal_one() to pass it during
> normal NAPI work.
>
> Fixes: c9e6bc644e55 ("net: add gro_cells infrastructure")
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
> --
> v2: don't use skb_copy(), but make decision how to pass cloned skbs in
> napi poll function (suggested by Eric)
> v1: https://lore.kernel.org/lkml/20250109142724.29228-1-tbogendoerfer@suse.de/
>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Thanks.
© 2016 - 2025 Red Hat, Inc.