net/core/skbuff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
We found an issue when using bpf_redirect with ipvs NAT mode after
commit ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within
the same name space"). Particularly, we use bpf_redirect to return
the skb directly back to the netif it comes from, i.e., xnet is
false in skb_scrub_packet(), and then ipvs_property is preserved
and SNAT is skipped in the rx path.
ipvs_property has been already cleared when netns is changed in
commit 2b5ec1a5f973 ("netfilter/ipvs: clear ipvs_property flag when
SKB net namespace changed"). This patch just clears it in spite of
netns.
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
---
This is in fact a fix patch, and the issue was found after commit
ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within
the same name space"). But I'm not sure if a "Fixes" tag should be
added to that commit.
---
net/core/skbuff.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7b03b64fdcb2..b1c81687e9d8 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -6033,11 +6033,11 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet)
skb->offload_fwd_mark = 0;
skb->offload_l3_fwd_mark = 0;
#endif
+ ipvs_reset(skb);
if (!xnet)
return;
- ipvs_reset(skb);
skb->mark = 0;
skb_clear_tstamp(skb);
}
--
2.32.0.3.g01195cf9f
Hello,
On Fri, 21 Feb 2025, Philo Lu wrote:
> We found an issue when using bpf_redirect with ipvs NAT mode after
> commit ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within
> the same name space"). Particularly, we use bpf_redirect to return
> the skb directly back to the netif it comes from, i.e., xnet is
> false in skb_scrub_packet(), and then ipvs_property is preserved
> and SNAT is skipped in the rx path.
>
> ipvs_property has been already cleared when netns is changed in
> commit 2b5ec1a5f973 ("netfilter/ipvs: clear ipvs_property flag when
> SKB net namespace changed"). This patch just clears it in spite of
> netns.
>
> Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
> ---
> This is in fact a fix patch, and the issue was found after commit
> ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within
> the same name space"). But I'm not sure if a "Fixes" tag should be
> added to that commit.
You can add 2b5ec1a5f973 as a Fixes tag in v2 and I'll ack it.
Nowadays, ipvs_property prevents unneeded connection
lookups while the packet traverses the hooks (different IPVS
hook handlers can see the same packet in same or next hook).
This includes the case where packet can be routed to local server,
eg. via loopback_xmit(), for example:
Packet from device:
LOCAL_IN: set ipvs_property=1
Reroute to local server ?
Then to hit the local server continue the NF hook
Otherwise, route via device:
LOCAL_OUT: use ipvs_property to avoid conn lookup
and rerouting
LOCAL_OUT:
Unexpected divert to local server by someone else?
=> LOCAL_IN: use ipvs_property to avoid conn
lookup and rerouting
Packet from local client:
LOCAL_OUT: set ipvs_property=1
Reroute to local server?
Then continue via loopback_xmit() without
skb_scrub_packet()
LOCAL_IN: use ipvs_property to avoid conn lookup and
rerouting
Hit the local server
Otherwise, route via device:
Others at LOCAL_OUT: Unexpected divert to local server by
someone else ?
=> LOCAL_IN: use ipvs_property to avoid conn
lookup and rerouting
So, we need ipvs_property set only during the normal
path where packet is received and sent, possibly to local
server via local route. We do not care if the packet leaves
this path. Currently, ipvs_property was cleared if netns
changes but it is a safe option. What matters is if packet
is routed via local route. Other paths can reset the flag
safely.
> ---
> net/core/skbuff.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 7b03b64fdcb2..b1c81687e9d8 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -6033,11 +6033,11 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet)
> skb->offload_fwd_mark = 0;
> skb->offload_l3_fwd_mark = 0;
> #endif
> + ipvs_reset(skb);
>
> if (!xnet)
> return;
>
> - ipvs_reset(skb);
> skb->mark = 0;
> skb_clear_tstamp(skb);
> }
> --
> 2.32.0.3.g01195cf9f
Regards
--
Julian Anastasov <ja@ssi.bg>
On 2025/2/21 19:42, Julian Anastasov wrote:
>
> Hello,
>
> On Fri, 21 Feb 2025, Philo Lu wrote:
>
>> We found an issue when using bpf_redirect with ipvs NAT mode after
>> commit ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within
>> the same name space"). Particularly, we use bpf_redirect to return
>> the skb directly back to the netif it comes from, i.e., xnet is
>> false in skb_scrub_packet(), and then ipvs_property is preserved
>> and SNAT is skipped in the rx path.
>>
>> ipvs_property has been already cleared when netns is changed in
>> commit 2b5ec1a5f973 ("netfilter/ipvs: clear ipvs_property flag when
>> SKB net namespace changed"). This patch just clears it in spite of
>> netns.
>>
>> Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
>> ---
>> This is in fact a fix patch, and the issue was found after commit
>> ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within
>> the same name space"). But I'm not sure if a "Fixes" tag should be
>> added to that commit.
>
> You can add 2b5ec1a5f973 as a Fixes tag in v2 and I'll ack it.
Thank you, Julian. You also solve my worries. I'll post v2 soon.
--
Philo
Le 21/02/2025 à 02:36, Philo Lu a écrit :
> We found an issue when using bpf_redirect with ipvs NAT mode after
> commit ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within
> the same name space"). Particularly, we use bpf_redirect to return
> the skb directly back to the netif it comes from, i.e., xnet is
> false in skb_scrub_packet(), and then ipvs_property is preserved
> and SNAT is skipped in the rx path.
>
> ipvs_property has been already cleared when netns is changed in
> commit 2b5ec1a5f973 ("netfilter/ipvs: clear ipvs_property flag when
> SKB net namespace changed"). This patch just clears it in spite of
> netns.
>
> Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
> ---
> This is in fact a fix patch, and the issue was found after commit
> ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within
> the same name space"). But I'm not sure if a "Fixes" tag should be
> added to that commit.
> ---
> net/core/skbuff.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 7b03b64fdcb2..b1c81687e9d8 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -6033,11 +6033,11 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet)
> skb->offload_fwd_mark = 0;
> skb->offload_l3_fwd_mark = 0;
> #endif
> + ipvs_reset(skb);
>
> if (!xnet)
> return;
>
> - ipvs_reset(skb);
I don't know IPVS, but I wonder if this patch will not introduce a regression
for other users. skb_scrub_packet() is used by a lot of tunnels, it's not
specific to bpf_redirect().
Regards,
Nicolas
Hello,
On Fri, 21 Feb 2025, Nicolas Dichtel wrote:
> Le 21/02/2025 à 02:36, Philo Lu a écrit :
> > We found an issue when using bpf_redirect with ipvs NAT mode after
> > commit ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within
> > the same name space"). Particularly, we use bpf_redirect to return
> > the skb directly back to the netif it comes from, i.e., xnet is
> > false in skb_scrub_packet(), and then ipvs_property is preserved
> > and SNAT is skipped in the rx path.
> >
> > ipvs_property has been already cleared when netns is changed in
> > commit 2b5ec1a5f973 ("netfilter/ipvs: clear ipvs_property flag when
> > SKB net namespace changed"). This patch just clears it in spite of
> > netns.
> >
> > Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
> > ---
> > This is in fact a fix patch, and the issue was found after commit
> > ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within
> > the same name space"). But I'm not sure if a "Fixes" tag should be
> > added to that commit.
> > ---
> > net/core/skbuff.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 7b03b64fdcb2..b1c81687e9d8 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -6033,11 +6033,11 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet)
> > skb->offload_fwd_mark = 0;
> > skb->offload_l3_fwd_mark = 0;
> > #endif
> > + ipvs_reset(skb);
> >
> > if (!xnet)
> > return;
> >
> > - ipvs_reset(skb);
> I don't know IPVS, but I wonder if this patch will not introduce a regression
> for other users. skb_scrub_packet() is used by a lot of tunnels, it's not
> specific to bpf_redirect().
Indeed, now we will start to clear the flag for tunnels
and it can cause IPVS to attempt rerouting for UDP tunnels, i.e.
once the packet is routed by IPVS to tunnel and second time later
after tunneling again when ip_local_out() is called and we see
ipvs_property=0 for the outer UDP header. Before such patch
ipvs_property remains 1 and we do not try to balance the UDP
traffic. So, for now, this patch may be can spend more cycles for
the traffic via UDP tunnels but this looks like a rare IPVS setup,
i.e. real servers reachable via UDP tunnels. Note that IPVS
has own support for UDP tunnels where it is set as forwarding
method + tunnel type GUE for the real server. It is not affected
by this patch.
Regards
--
Julian Anastasov <ja@ssi.bg>
© 2016 - 2025 Red Hat, Inc.