[PATCH net v3] xfrm: fix ip_rt_bug race in icmp_route_lookup reverse path

Jiayuan Chen posted 1 patch 2 days, 1 hour ago
There is a newer version of this series
net/ipv4/icmp.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
[PATCH net v3] xfrm: fix ip_rt_bug race in icmp_route_lookup reverse path
Posted by Jiayuan Chen 2 days, 1 hour ago
From: Jiayuan Chen <jiayuan.chen@shopee.com>

icmp_route_lookup() performs multiple route lookups to find a suitable
route for sending ICMP error messages, with special handling for XFRM
(IPsec) policies.

The lookup sequence is:
1. First, lookup output route for ICMP reply (dst = original src)
2. Pass through xfrm_lookup() for policy check
3. If blocked (-EPERM) or dst is not local, enter "reverse path"
4. In reverse path, call xfrm_decode_session_reverse() to get fl4_dec
   which reverses the original packet's flow (saddr<->daddr swapped)
5. If fl4_dec.saddr is local (we are the original destination), use
   __ip_route_output_key() for output route lookup
6. If fl4_dec.saddr is NOT local (we are a forwarding node), use
   ip_route_input() to simulate the reverse packet's input path
7. Finally, pass rt2 through xfrm_lookup() with XFRM_LOOKUP_ICMP flag

The bug occurs in step 6: ip_route_input() is called with fl4_dec.daddr
(original packet's source) as destination. If this address becomes local
between the initial check and ip_route_input() call (e.g., due to
concurrent "ip addr add"), ip_route_input() returns a LOCAL route with
dst.output set to ip_rt_bug.

This route is then used for ICMP output, causing dst_output() to call
ip_rt_bug(), triggering a WARN_ON:

 ------------[ cut here ]------------
 WARNING: net/ipv4/route.c:1275 at ip_rt_bug+0x21/0x30, CPU#1
 Call Trace:
  <TASK>
  ip_push_pending_frames+0x202/0x240
  icmp_push_reply+0x30d/0x430
  __icmp_send+0x1149/0x24f0
  ip_options_compile+0xa2/0xd0
  ip_rcv_finish_core+0x829/0x1950
  ip_rcv+0x2d7/0x420
  __netif_receive_skb_one_core+0x185/0x1f0
  netif_receive_skb+0x90/0x450
  tun_get_user+0x3413/0x3fb0
  tun_chr_write_iter+0xe4/0x220
  ...

Fix this by checking rt2->rt_type after ip_route_input(). If it's
RTN_LOCAL, the route cannot be used for output, so treat it as an error.

The reproducer requires kernel modification to widen the race window,
making it unsuitable as a selftest. It is available at:

  https://gist.github.com/mrpre/eae853b72ac6a750f5d45d64ddac1e81

Reported-by: syzbot+e738404dcd14b620923c@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000b1060905eada8881@google.com/T/
Closes: https://lore.kernel.org/r/20260128090523.356953-1-jiayuan.chen@linux.dev
Fixes: 8b7817f3a959 ("[IPSEC]: Add ICMP host relookup support")
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>

---
v1 -> v3:
    Suggested by Paolo Abeni:
      - Resend it using net tree and using xfrm prefix
      - Fix text string over 80 chars limit.
      - Simplify commit message.
    v1: https://lore.kernel.org/r/20260128090523.356953-1-jiayuan.chen@linux.dev
    v2: https://lore.kernel.org/netdev/20260203063449.44737-1-jiayuan.chen@linux.dev/
---
 net/ipv4/icmp.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 4abbec2f47ef..35816ac749bc 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -554,6 +554,22 @@ static struct rtable *icmp_route_lookup(struct net *net, struct flowi4 *fl4,
 		/* steal dst entry from skb_in, don't drop refcnt */
 		skb_dstref_steal(skb_in);
 		skb_dstref_restore(skb_in, orefdst);
+
+		/*
+		 * At this point, fl4_dec.daddr should NOT be local (we
+		 * checked fl4_dec.saddr above). However, a race condition
+		 * may occur if the address is added to the interface
+		 * concurrently. In that case, ip_route_input() returns a
+		 * LOCAL route with dst.output=ip_rt_bug, which must not
+		 * be used for output.
+		 */
+		if (!err && rt2 && rt2->rt_type == RTN_LOCAL) {
+			net_warn_ratelimited("detected local route for %pI4 "
+					     "during ICMP sending, src %pI4\n",
+					     &fl4_dec.daddr, &fl4_dec.saddr);
+			dst_release(&rt2->dst);
+			err = -EINVAL;
+		}
 	}
 
 	if (err)
-- 
2.43.0
Re: [PATCH net v3] xfrm: fix ip_rt_bug race in icmp_route_lookup reverse path
Posted by David Ahern 1 day, 17 hours ago
On 2/5/26 12:02 AM, Jiayuan Chen wrote:
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index 4abbec2f47ef..35816ac749bc 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -554,6 +554,22 @@ static struct rtable *icmp_route_lookup(struct net *net, struct flowi4 *fl4,
>  		/* steal dst entry from skb_in, don't drop refcnt */
>  		skb_dstref_steal(skb_in);
>  		skb_dstref_restore(skb_in, orefdst);
> +
> +		/*
> +		 * At this point, fl4_dec.daddr should NOT be local (we
> +		 * checked fl4_dec.saddr above). However, a race condition
> +		 * may occur if the address is added to the interface
> +		 * concurrently. In that case, ip_route_input() returns a
> +		 * LOCAL route with dst.output=ip_rt_bug, which must not
> +		 * be used for output.
> +		 */
> +		if (!err && rt2 && rt2->rt_type == RTN_LOCAL) {
> +			net_warn_ratelimited("detected local route for %pI4 "
> +					     "during ICMP sending, src %pI4\n",
> +					     &fl4_dec.daddr, &fl4_dec.saddr);

per Paolo comment on the previous revision of this patch, strings should
not be split across lines like this. It should be:

net_warn_ratelimited("detected local route for %pI4 during ICMP sending,
src %pI4\n",

> +			dst_release(&rt2->dst);
> +			err = -EINVAL;
> +		}
>  	}
>  
>  	if (err)
Re: [PATCH net v3] xfrm: fix ip_rt_bug race in icmp_route_lookup reverse path
Posted by Jiayuan Chen 1 day, 17 hours ago
2026/2/5 23:17, "David Ahern" <dsahern@kernel.org mailto:dsahern@kernel.org?to=%22David%20Ahern%22%20%3Cdsahern%40kernel.org%3E > 写到:


> 
> On 2/5/26 12:02 AM, Jiayuan Chen wrote:
> 
> > 
> > diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> >  index 4abbec2f47ef..35816ac749bc 100644
> >  --- a/net/ipv4/icmp.c
> >  +++ b/net/ipv4/icmp.c
> >  @@ -554,6 +554,22 @@ static struct rtable *icmp_route_lookup(struct net *net, struct flowi4 *fl4,
> >  /* steal dst entry from skb_in, don't drop refcnt */
> >  skb_dstref_steal(skb_in);
> >  skb_dstref_restore(skb_in, orefdst);
> >  +
> >  + /*
> >  + * At this point, fl4_dec.daddr should NOT be local (we
> >  + * checked fl4_dec.saddr above). However, a race condition
> >  + * may occur if the address is added to the interface
> >  + * concurrently. In that case, ip_route_input() returns a
> >  + * LOCAL route with dst.output=ip_rt_bug, which must not
> >  + * be used for output.
> >  + */
> >  + if (!err && rt2 && rt2->rt_type == RTN_LOCAL) {
> >  + net_warn_ratelimited("detected local route for %pI4 "
> >  + "during ICMP sending, src %pI4\n",
> >  + &fl4_dec.daddr, &fl4_dec.saddr);
> > 
> per Paolo comment on the previous revision of this patch, strings should
> not be split across lines like this. It should be:
> 
> net_warn_ratelimited("detected local route for %pI4 during ICMP sending,
> src %pI4\n",


Sorry about that. I totally misunderstood Paolo's comment.

pw-bot: cr
> > 
> > + dst_release(&rt2->dst);
> >  + err = -EINVAL;
> >  + }
> >  }
> >  
> >  if (err)
> >
>