[PATCH net-next v1] icmp: fix ip_rt_bug race in icmp_route_lookup reverse path

Jiayuan Chen posted 1 patch 1 week, 2 days ago
net/ipv4/icmp.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
[PATCH net-next v1] icmp: fix ip_rt_bug race in icmp_route_lookup reverse path
Posted by Jiayuan Chen 1 week, 2 days ago
From: Jiayuan Chen <jiayuan.chen@shopee.com>

icmp_route_lookup() performs multiple route lookups to find a suitable
route for sending ICMP error messages, with special handling for XFRM
(IPsec) policies.

The lookup sequence is:
1. First, lookup output route for ICMP reply (dst = original src)
2. Pass through xfrm_lookup() for policy check
3. If blocked (-EPERM) or dst is not local, enter "reverse path"
4. In reverse path, call xfrm_decode_session_reverse() to get fl4_dec
   which reverses the original packet's flow (saddr<->daddr swapped)
5. If fl4_dec.saddr is local (we are the original destination), use
   __ip_route_output_key() for output route lookup
6. If fl4_dec.saddr is NOT local (we are a forwarding node), use
   ip_route_input() to simulate the reverse packet's input path
7. Finally, pass rt2 through xfrm_lookup() with XFRM_LOOKUP_ICMP flag

The bug occurs in step 6: ip_route_input() is called with fl4_dec.daddr
(original packet's source) as destination. If this address becomes local
between the initial check and ip_route_input() call (e.g., due to
concurrent "ip addr add"), ip_route_input() returns a LOCAL route with
dst.output set to ip_rt_bug.

This route is then used for ICMP output, causing dst_output() to call
ip_rt_bug(), triggering a WARN_ON:

 ------------[ cut here ]------------
 WARNING: net/ipv4/route.c:1275 at ip_rt_bug+0x21/0x30, CPU#1
 Call Trace:
  <TASK>
  ip_push_pending_frames+0x202/0x240
  icmp_push_reply+0x30d/0x430
  __icmp_send+0x1149/0x24f0
  ip_options_compile+0xa2/0xd0
  ip_rcv_finish_core+0x829/0x1950
  ip_rcv+0x2d7/0x420
  __netif_receive_skb_one_core+0x185/0x1f0
  netif_receive_skb+0x90/0x450
  tun_get_user+0x3413/0x3fb0
  tun_chr_write_iter+0xe4/0x220
  ...

Fix this by checking rt2->rt_type after ip_route_input(). If it's
RTN_LOCAL, the route cannot be used for output, so treat it as an error.

The reproducer requires kernel modification to widen the race window,
making it unsuitable as a selftest. It is available at:

  https://gist.github.com/mrpre/eae853b72ac6a750f5d45d64ddac1e81

Reported-by: syzbot+e738404dcd14b620923c@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000b1060905eada8881@google.com/T/
Fixes: 8b7817f3a959 ("[IPSEC]: Add ICMP host relookup support")
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
 net/ipv4/icmp.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 19c9c838967f..dc9dcc799824 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -559,6 +559,23 @@ static struct rtable *icmp_route_lookup(struct net *net, struct flowi4 *fl4,
 		/* steal dst entry from skb_in, don't drop refcnt */
 		skb_dstref_steal(skb_in);
 		skb_dstref_restore(skb_in, orefdst);
+
+		/*
+		 * At this point, fl4_dec.daddr should NOT be local (we
+		 * checked fl4_dec.saddr above). However, a race condition
+		 * may occur if the address is added to the interface
+		 * concurrently. In that case, ip_route_input() returns a
+		 * LOCAL route with dst.output=ip_rt_bug, which must not
+		 * be used for output.
+		 */
+		if (!err && rt2 && rt2->rt_type == RTN_LOCAL) {
+			net_warn_ratelimited("%s: detected local route for %pI4 "
+					     "during ICMP error handling (src %pI4), "
+					     "possible address race\n",
+					     __func__, &fl4_dec.daddr, &fl4_dec.saddr);
+			dst_release(&rt2->dst);
+			err = -EINVAL;
+		}
 	}
 
 	if (err)
-- 
2.43.0
Re: [PATCH net-next v1] icmp: fix ip_rt_bug race in icmp_route_lookup reverse path
Posted by Paolo Abeni 3 days, 18 hours ago
On 1/28/26 10:05 AM, Jiayuan Chen wrote:
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index 19c9c838967f..dc9dcc799824 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -559,6 +559,23 @@ static struct rtable *icmp_route_lookup(struct net *net, struct flowi4 *fl4,
>  		/* steal dst entry from skb_in, don't drop refcnt */
>  		skb_dstref_steal(skb_in);
>  		skb_dstref_restore(skb_in, orefdst);
> +
> +		/*
> +		 * At this point, fl4_dec.daddr should NOT be local (we
> +		 * checked fl4_dec.saddr above). However, a race condition
> +		 * may occur if the address is added to the interface
> +		 * concurrently. In that case, ip_route_input() returns a
> +		 * LOCAL route with dst.output=ip_rt_bug, which must not
> +		 * be used for output.
> +		 */
> +		if (!err && rt2 && rt2->rt_type == RTN_LOCAL) {
> +			net_warn_ratelimited("%s: detected local route for %pI4 "
> +					     "during ICMP error handling (src %pI4), "
> +					     "possible address race\n",
> +					     __func__, &fl4_dec.daddr, &fl4_dec.saddr);

The fix looks correct to me, but this patch should target the 'net' tree
and the above warning message is a bit off: the text string should not
be broken to fit the 80 chars limit - it need to be greepable - it's
probably better to not include the function name.

/P
Re: [PATCH net-next v1] icmp: fix ip_rt_bug race in icmp_route_lookup reverse path
Posted by David Ahern 3 days, 13 hours ago
On 2/3/26 3:41 AM, Paolo Abeni wrote:
> On 1/28/26 10:05 AM, Jiayuan Chen wrote:
>> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
>> index 19c9c838967f..dc9dcc799824 100644
>> --- a/net/ipv4/icmp.c
>> +++ b/net/ipv4/icmp.c
>> @@ -559,6 +559,23 @@ static struct rtable *icmp_route_lookup(struct net *net, struct flowi4 *fl4,
>>  		/* steal dst entry from skb_in, don't drop refcnt */
>>  		skb_dstref_steal(skb_in);
>>  		skb_dstref_restore(skb_in, orefdst);
>> +
>> +		/*
>> +		 * At this point, fl4_dec.daddr should NOT be local (we
>> +		 * checked fl4_dec.saddr above). However, a race condition
>> +		 * may occur if the address is added to the interface
>> +		 * concurrently. In that case, ip_route_input() returns a
>> +		 * LOCAL route with dst.output=ip_rt_bug, which must not
>> +		 * be used for output.
>> +		 */
>> +		if (!err && rt2 && rt2->rt_type == RTN_LOCAL) {
>> +			net_warn_ratelimited("%s: detected local route for %pI4 "
>> +					     "during ICMP error handling (src %pI4), "
>> +					     "possible address race\n",
>> +					     __func__, &fl4_dec.daddr, &fl4_dec.saddr);
> 
> The fix looks correct to me, but this patch should target the 'net' tree
> and the above warning message is a bit off: the text string should not
> be broken to fit the 80 chars limit - it need to be greepable - it's
> probably better to not include the function name.
> 
> /P
> 

Does the message even provide value? There is nothing a user can do
about it.
Re: [PATCH net-next v1] icmp: fix ip_rt_bug race in icmp_route_lookup reverse path
Posted by Jiayuan Chen 3 days, 3 hours ago
February 4, 2026 at 24:24, "David Ahern" <dsahern@kernel.org mailto:dsahern@kernel.org?to=%22David%20Ahern%22%20%3Cdsahern%40kernel.org%3E > wrote:


> 
> On 2/3/26 3:41 AM, Paolo Abeni wrote:
> 
> > 
> > On 1/28/26 10:05 AM, Jiayuan Chen wrote:
> > 
> > > 
> > > diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> > >  index 19c9c838967f..dc9dcc799824 100644
> > >  --- a/net/ipv4/icmp.c
> > >  +++ b/net/ipv4/icmp.c
> > >  @@ -559,6 +559,23 @@ static struct rtable *icmp_route_lookup(struct net *net, struct flowi4 *fl4,
> > >  /* steal dst entry from skb_in, don't drop refcnt */
> > >  skb_dstref_steal(skb_in);
> > >  skb_dstref_restore(skb_in, orefdst);
> > >  +
> > >  + /*
> > >  + * At this point, fl4_dec.daddr should NOT be local (we
> > >  + * checked fl4_dec.saddr above). However, a race condition
> > >  + * may occur if the address is added to the interface
> > >  + * concurrently. In that case, ip_route_input() returns a
> > >  + * LOCAL route with dst.output=ip_rt_bug, which must not
> > >  + * be used for output.
> > >  + */
> > >  + if (!err && rt2 && rt2->rt_type == RTN_LOCAL) {
> > >  + net_warn_ratelimited("%s: detected local route for %pI4 "
> > >  + "during ICMP error handling (src %pI4), "
> > >  + "possible address race\n",
> > >  + __func__, &fl4_dec.daddr, &fl4_dec.saddr);
> > > 
> >  
> >  The fix looks correct to me, but this patch should target the 'net' tree
> >  and the above warning message is a bit off: the text string should not
> >  be broken to fit the 80 chars limit - it need to be greepable - it's
> >  probably better to not include the function name.
> >  
> >  /P
> > 
> Does the message even provide value? There is nothing a user can do
> about it.
>

Hi Paolo, David,

Thanks for the review.

Paolo, I'll fix it in newer version.

Regarding David's question about whether the message provides value:
I think it could still be useful for debugging. If a user's configuration
change causes ICMP packets to be silently dropped, there's currently no counter
or indication of what happened. Adding a dedicated counter for this rare race
condition seems overkill, so a rate-limited warning at least gives some visibility
into the failure.

That said, I'm open to dropping the message entirely if you both think silent handling
is preferred. Let me know and I'll send a new version either way.

Thanks,
Jiayuan
Re: [PATCH net-next v1] icmp: fix ip_rt_bug race in icmp_route_lookup reverse path
Posted by David Ahern 2 days, 13 hours ago
On 2/3/26 7:27 PM, Jiayuan Chen wrote:
> 
> Thanks for the review.
> 
> Paolo, I'll fix it in newer version.
> 
> Regarding David's question about whether the message provides value:
> I think it could still be useful for debugging. If a user's configuration
> change causes ICMP packets to be silently dropped, there's currently no counter
> or indication of what happened. Adding a dedicated counter for this rare race
> condition seems overkill, so a rate-limited warning at least gives some visibility
> into the failure.
> 
> That said, I'm open to dropping the message entirely if you both think silent handling
> is preferred. Let me know and I'll send a new version either way.
> 

no strong opinions.