[PATCH] xfrm: Add pre-encap fragmentation for packet offload

Ilia Lin posted 1 patch 1 year, 2 months ago
net/ipv4/xfrm4_output.c | 31 +++++++++++++++++++++++++++++--
net/ipv6/xfrm6_output.c |  8 ++++++--
2 files changed, 35 insertions(+), 4 deletions(-)
[PATCH] xfrm: Add pre-encap fragmentation for packet offload
Posted by Ilia Lin 1 year, 2 months ago
In packet offload mode the raw packets will be sent to the NiC,
and will not return to the Network Stack. In event of crossing
the MTU size after the encapsulation, the NiC HW may not be
able to fragment the final packet.
Adding mandatory pre-encapsulation fragmentation for both
IPv4 and IPv6, if tunnel mode with packet offload is configured
on the state.

Signed-off-by: Ilia Lin <ilia.lin@kernel.org>
---
 net/ipv4/xfrm4_output.c | 31 +++++++++++++++++++++++++++++--
 net/ipv6/xfrm6_output.c |  8 ++++++--
 2 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
index 3cff51ba72bb0..a4271e0dd51bb 100644
--- a/net/ipv4/xfrm4_output.c
+++ b/net/ipv4/xfrm4_output.c
@@ -14,17 +14,44 @@
 #include <net/xfrm.h>
 #include <net/icmp.h>
 
+static int __xfrm4_output_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
+{
+	return xfrm_output(sk, skb);
+}
+
 static int __xfrm4_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
-#ifdef CONFIG_NETFILTER
-	struct xfrm_state *x = skb_dst(skb)->xfrm;
+	struct dst_entry *dst = skb_dst(skb);
+	struct xfrm_state *x = dst->xfrm;
+	unsigned int mtu;
+	bool toobig;
 
+#ifdef CONFIG_NETFILTER
 	if (!x) {
 		IPCB(skb)->flags |= IPSKB_REROUTED;
 		return dst_output(net, sk, skb);
 	}
 #endif
 
+	if (x->props.mode != XFRM_MODE_TUNNEL || x->xso.type != XFRM_DEV_OFFLOAD_PACKET)
+		goto skip_frag;
+
+	mtu = xfrm_state_mtu(x, dst_mtu(skb_dst(skb)));
+
+	toobig = skb->len > mtu && !skb_is_gso(skb);
+
+	if (!skb->ignore_df && toobig && skb->sk) {
+		xfrm_local_error(skb, mtu);
+		kfree_skb(skb);
+		return -EMSGSIZE;
+	}
+
+	if (toobig) {
+		IPCB(skb)->frag_max_size = mtu;
+		return ip_do_fragment(net, sk, skb, __xfrm4_output_finish);
+	}
+
+skip_frag:
 	return xfrm_output(sk, skb);
 }
 
diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
index 5f7b1fdbffe62..fdd2f2f5adc71 100644
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -75,10 +75,14 @@ static int __xfrm6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 	if (x->props.mode != XFRM_MODE_TUNNEL)
 		goto skip_frag;
 
-	if (skb->protocol == htons(ETH_P_IPV6))
+	if (x->xso.type == XFRM_DEV_OFFLOAD_PACKET) {
+		mtu = xfrm_state_mtu(x, dst_mtu(skb_dst(skb)));
+		IP6CB(skb)->frag_max_size = mtu;
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
 		mtu = ip6_skb_dst_mtu(skb);
-	else
+	} else {
 		mtu = dst_mtu(skb_dst(skb));
+	}
 
 	toobig = skb->len > mtu && !skb_is_gso(skb);
 
-- 
2.25.1
Re: [PATCH] xfrm: Add pre-encap fragmentation for packet offload
Posted by Steffen Klassert 1 year, 2 months ago
On Sun, Nov 24, 2024 at 11:35:31AM +0200, Ilia Lin wrote:
> In packet offload mode the raw packets will be sent to the NiC,
> and will not return to the Network Stack. In event of crossing
> the MTU size after the encapsulation, the NiC HW may not be
> able to fragment the final packet.
> Adding mandatory pre-encapsulation fragmentation for both
> IPv4 and IPv6, if tunnel mode with packet offload is configured
> on the state.
> 
> Signed-off-by: Ilia Lin <ilia.lin@kernel.org>
> ---
>  net/ipv4/xfrm4_output.c | 31 +++++++++++++++++++++++++++++--
>  net/ipv6/xfrm6_output.c |  8 ++++++--
>  2 files changed, 35 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
> index 3cff51ba72bb0..a4271e0dd51bb 100644
> --- a/net/ipv4/xfrm4_output.c
> +++ b/net/ipv4/xfrm4_output.c
> @@ -14,17 +14,44 @@
>  #include <net/xfrm.h>
>  #include <net/icmp.h>
>  
> +static int __xfrm4_output_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
> +{
> +	return xfrm_output(sk, skb);
> +}
> +
>  static int __xfrm4_output(struct net *net, struct sock *sk, struct sk_buff *skb)
>  {
> -#ifdef CONFIG_NETFILTER
> -	struct xfrm_state *x = skb_dst(skb)->xfrm;
> +	struct dst_entry *dst = skb_dst(skb);
> +	struct xfrm_state *x = dst->xfrm;
> +	unsigned int mtu;
> +	bool toobig;
>  
> +#ifdef CONFIG_NETFILTER
>  	if (!x) {
>  		IPCB(skb)->flags |= IPSKB_REROUTED;
>  		return dst_output(net, sk, skb);
>  	}
>  #endif
>  
> +	if (x->props.mode != XFRM_MODE_TUNNEL || x->xso.type != XFRM_DEV_OFFLOAD_PACKET)
> +		goto skip_frag;
> +
> +	mtu = xfrm_state_mtu(x, dst_mtu(skb_dst(skb)));
> +
> +	toobig = skb->len > mtu && !skb_is_gso(skb);
> +
> +	if (!skb->ignore_df && toobig && skb->sk) {
> +		xfrm_local_error(skb, mtu);
> +		kfree_skb(skb);
> +		return -EMSGSIZE;
> +	}
> +
> +	if (toobig) {
> +		IPCB(skb)->frag_max_size = mtu;
> +		return ip_do_fragment(net, sk, skb, __xfrm4_output_finish);
> +	}

This would fragment the packet even if the DF bit is set.

Please no further packet offload stuff in generic code.
Re: [PATCH] xfrm: Add pre-encap fragmentation for packet offload
Posted by Leon Romanovsky 1 year, 2 months ago
On Tue, Nov 26, 2024 at 01:51:42PM +0100, Steffen Klassert wrote:
> On Sun, Nov 24, 2024 at 11:35:31AM +0200, Ilia Lin wrote:
> > In packet offload mode the raw packets will be sent to the NiC,
> > and will not return to the Network Stack. In event of crossing
> > the MTU size after the encapsulation, the NiC HW may not be
> > able to fragment the final packet.
> > Adding mandatory pre-encapsulation fragmentation for both
> > IPv4 and IPv6, if tunnel mode with packet offload is configured
> > on the state.
> > 
> > Signed-off-by: Ilia Lin <ilia.lin@kernel.org>
> > ---
> >  net/ipv4/xfrm4_output.c | 31 +++++++++++++++++++++++++++++--
> >  net/ipv6/xfrm6_output.c |  8 ++++++--
> >  2 files changed, 35 insertions(+), 4 deletions(-)
> > 
> > diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
> > index 3cff51ba72bb0..a4271e0dd51bb 100644
> > --- a/net/ipv4/xfrm4_output.c
> > +++ b/net/ipv4/xfrm4_output.c
> > @@ -14,17 +14,44 @@
> >  #include <net/xfrm.h>
> >  #include <net/icmp.h>
> >  
> > +static int __xfrm4_output_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
> > +{
> > +	return xfrm_output(sk, skb);
> > +}
> > +
> >  static int __xfrm4_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> >  {
> > -#ifdef CONFIG_NETFILTER
> > -	struct xfrm_state *x = skb_dst(skb)->xfrm;
> > +	struct dst_entry *dst = skb_dst(skb);
> > +	struct xfrm_state *x = dst->xfrm;
> > +	unsigned int mtu;
> > +	bool toobig;
> >  
> > +#ifdef CONFIG_NETFILTER
> >  	if (!x) {
> >  		IPCB(skb)->flags |= IPSKB_REROUTED;
> >  		return dst_output(net, sk, skb);
> >  	}
> >  #endif
> >  
> > +	if (x->props.mode != XFRM_MODE_TUNNEL || x->xso.type != XFRM_DEV_OFFLOAD_PACKET)
> > +		goto skip_frag;
> > +
> > +	mtu = xfrm_state_mtu(x, dst_mtu(skb_dst(skb)));
> > +
> > +	toobig = skb->len > mtu && !skb_is_gso(skb);
> > +
> > +	if (!skb->ignore_df && toobig && skb->sk) {
> > +		xfrm_local_error(skb, mtu);
> > +		kfree_skb(skb);
> > +		return -EMSGSIZE;
> > +	}
> > +
> > +	if (toobig) {
> > +		IPCB(skb)->frag_max_size = mtu;
> > +		return ip_do_fragment(net, sk, skb, __xfrm4_output_finish);
> > +	}
> 
> This would fragment the packet even if the DF bit is set.
> 
> Please no further packet offload stuff in generic code.

+ 100000

Thanks

>
Re: [PATCH] xfrm: Add pre-encap fragmentation for packet offload
Posted by Leon Romanovsky 1 year, 2 months ago
On Sun, Nov 24, 2024 at 11:35:31AM +0200, Ilia Lin wrote:
> In packet offload mode the raw packets will be sent to the NiC,
> and will not return to the Network Stack. In event of crossing
> the MTU size after the encapsulation, the NiC HW may not be
> able to fragment the final packet.

Yes, HW doesn't know how to handle these packets.

> Adding mandatory pre-encapsulation fragmentation for both
> IPv4 and IPv6, if tunnel mode with packet offload is configured
> on the state.

I was under impression is that xfrm_dev_offload_ok() is responsible to
prevent fragmentation.
https://elixir.bootlin.com/linux/v6.12/source/net/xfrm/xfrm_device.c#L410

Thanks
Re: [PATCH] xfrm: Add pre-encap fragmentation for packet offload
Posted by Ilia Lin 1 year, 2 months ago
On Sun, Nov 24, 2024 at 2:04 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Sun, Nov 24, 2024 at 11:35:31AM +0200, Ilia Lin wrote:
> > In packet offload mode the raw packets will be sent to the NiC,
> > and will not return to the Network Stack. In event of crossing
> > the MTU size after the encapsulation, the NiC HW may not be
> > able to fragment the final packet.
>
> Yes, HW doesn't know how to handle these packets.
>
> > Adding mandatory pre-encapsulation fragmentation for both
> > IPv4 and IPv6, if tunnel mode with packet offload is configured
> > on the state.
>
> I was under impression is that xfrm_dev_offload_ok() is responsible to
> prevent fragmentation.
> https://elixir.bootlin.com/linux/v6.12/source/net/xfrm/xfrm_device.c#L410

With my change we can both support inner fragmentation or prevent it,
depending on the network device driver implementation.

>
> Thanks
Re: [PATCH] xfrm: Add pre-encap fragmentation for packet offload
Posted by Leon Romanovsky 1 year, 2 months ago
On Mon, Nov 25, 2024 at 11:26:14AM +0200, Ilia Lin wrote:
> On Sun, Nov 24, 2024 at 2:04 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Sun, Nov 24, 2024 at 11:35:31AM +0200, Ilia Lin wrote:
> > > In packet offload mode the raw packets will be sent to the NiC,
> > > and will not return to the Network Stack. In event of crossing
> > > the MTU size after the encapsulation, the NiC HW may not be
> > > able to fragment the final packet.
> >
> > Yes, HW doesn't know how to handle these packets.
> >
> > > Adding mandatory pre-encapsulation fragmentation for both
> > > IPv4 and IPv6, if tunnel mode with packet offload is configured
> > > on the state.
> >
> > I was under impression is that xfrm_dev_offload_ok() is responsible to
> > prevent fragmentation.
> > https://elixir.bootlin.com/linux/v6.12/source/net/xfrm/xfrm_device.c#L410
> 
> With my change we can both support inner fragmentation or prevent it,
> depending on the network device driver implementation.

The thing is that fragmentation isn't desirable thing. Why didn't PMTU
take into account headers so we can rely on existing code and do not add
extra logic for packet offload?

Thanks

> 
> >
> > Thanks
[PATCH] xfrm: Add pre-encap fragmentation for packet offload
Posted by Ilia Lin 1 year, 2 months ago
On Mon, Nov 25, 2024 at 9:43 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Mon, Nov 25, 2024 at 11:26:14AM +0200, Ilia Lin wrote:
> > On Sun, Nov 24, 2024 at 2:04 PM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > On Sun, Nov 24, 2024 at 11:35:31AM +0200, Ilia Lin wrote:
> > > > In packet offload mode the raw packets will be sent to the NiC,
> > > > and will not return to the Network Stack. In event of crossing
> > > > the MTU size after the encapsulation, the NiC HW may not be
> > > > able to fragment the final packet.
> > >
> > > Yes, HW doesn't know how to handle these packets.
> > >
> > > > Adding mandatory pre-encapsulation fragmentation for both
> > > > IPv4 and IPv6, if tunnel mode with packet offload is configured
> > > > on the state.
> > >
> > > I was under impression is that xfrm_dev_offload_ok() is responsible to
> > > prevent fragmentation.
> > > https://elixir.bootlin.com/linux/v6.12/source/net/xfrm/xfrm_device.c#L410
> >
> > With my change we can both support inner fragmentation or prevent it,
> > depending on the network device driver implementation.
>
> The thing is that fragmentation isn't desirable thing. Why didn't PMTU
> take into account headers so we can rely on existing code and do not add
> extra logic for packet offload?

I agree that PMTU is a preferred option, but the packets may be routed from
a host behind the VPN, which is unaware that it transmits into an IPsec tunnel,
and therefore will not count on the extra headers.

>
> Thanks
>
> >
> > >
> > > Thanks