net/ipv4/icmp.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
From: Jiayuan Chen <jiayuan.chen@shopee.com>
icmp_route_lookup() performs multiple route lookups to find a suitable
route for sending ICMP error messages, with special handling for XFRM
(IPsec) policies.
The lookup sequence is:
1. First, lookup output route for ICMP reply (dst = original src)
2. Pass through xfrm_lookup() for policy check
3. If blocked (-EPERM) or dst is not local, enter "reverse path"
4. In reverse path, call xfrm_decode_session_reverse() to get fl4_dec
which reverses the original packet's flow (saddr<->daddr swapped)
5. If fl4_dec.saddr is local (we are the original destination), use
__ip_route_output_key() for output route lookup
6. If fl4_dec.saddr is NOT local (we are a forwarding node), use
ip_route_input() to simulate the reverse packet's input path
7. Finally, pass rt2 through xfrm_lookup() with XFRM_LOOKUP_ICMP flag
The bug occurs in step 6: ip_route_input() is called with fl4_dec.daddr
(original packet's source) as destination. If this address becomes local
between the initial check and ip_route_input() call (e.g., due to
concurrent "ip addr add"), ip_route_input() returns a LOCAL route with
dst.output set to ip_rt_bug.
This route is then used for ICMP output, causing dst_output() to call
ip_rt_bug(), triggering a WARN_ON:
------------[ cut here ]------------
WARNING: net/ipv4/route.c:1275 at ip_rt_bug+0x21/0x30, CPU#1
Call Trace:
<TASK>
ip_push_pending_frames+0x202/0x240
icmp_push_reply+0x30d/0x430
__icmp_send+0x1149/0x24f0
ip_options_compile+0xa2/0xd0
ip_rcv_finish_core+0x829/0x1950
ip_rcv+0x2d7/0x420
__netif_receive_skb_one_core+0x185/0x1f0
netif_receive_skb+0x90/0x450
tun_get_user+0x3413/0x3fb0
tun_chr_write_iter+0xe4/0x220
...
Fix this by checking rt2->rt_type after ip_route_input(). If it's
RTN_LOCAL, the route cannot be used for output, so treat it as an error.
The reproducer requires kernel modification to widen the race window,
making it unsuitable as a selftest. It is available at:
https://gist.github.com/mrpre/eae853b72ac6a750f5d45d64ddac1e81
Reported-by: syzbot+e738404dcd14b620923c@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000b1060905eada8881@google.com/T/
Fixes: 8b7817f3a959 ("[IPSEC]: Add ICMP host relookup support")
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
net/ipv4/icmp.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 19c9c838967f..dc9dcc799824 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -559,6 +559,23 @@ static struct rtable *icmp_route_lookup(struct net *net, struct flowi4 *fl4,
/* steal dst entry from skb_in, don't drop refcnt */
skb_dstref_steal(skb_in);
skb_dstref_restore(skb_in, orefdst);
+
+ /*
+ * At this point, fl4_dec.daddr should NOT be local (we
+ * checked fl4_dec.saddr above). However, a race condition
+ * may occur if the address is added to the interface
+ * concurrently. In that case, ip_route_input() returns a
+ * LOCAL route with dst.output=ip_rt_bug, which must not
+ * be used for output.
+ */
+ if (!err && rt2 && rt2->rt_type == RTN_LOCAL) {
+ net_warn_ratelimited("%s: detected local route for %pI4 "
+ "during ICMP error handling (src %pI4), "
+ "possible address race\n",
+ __func__, &fl4_dec.daddr, &fl4_dec.saddr);
+ dst_release(&rt2->dst);
+ err = -EINVAL;
+ }
}
if (err)
--
2.43.0
On 1/28/26 10:05 AM, Jiayuan Chen wrote:
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index 19c9c838967f..dc9dcc799824 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -559,6 +559,23 @@ static struct rtable *icmp_route_lookup(struct net *net, struct flowi4 *fl4,
> /* steal dst entry from skb_in, don't drop refcnt */
> skb_dstref_steal(skb_in);
> skb_dstref_restore(skb_in, orefdst);
> +
> + /*
> + * At this point, fl4_dec.daddr should NOT be local (we
> + * checked fl4_dec.saddr above). However, a race condition
> + * may occur if the address is added to the interface
> + * concurrently. In that case, ip_route_input() returns a
> + * LOCAL route with dst.output=ip_rt_bug, which must not
> + * be used for output.
> + */
> + if (!err && rt2 && rt2->rt_type == RTN_LOCAL) {
> + net_warn_ratelimited("%s: detected local route for %pI4 "
> + "during ICMP error handling (src %pI4), "
> + "possible address race\n",
> + __func__, &fl4_dec.daddr, &fl4_dec.saddr);
The fix looks correct to me, but this patch should target the 'net' tree
and the above warning message is a bit off: the text string should not
be broken to fit the 80 chars limit - it need to be greepable - it's
probably better to not include the function name.
/P
On 2/3/26 3:41 AM, Paolo Abeni wrote:
> On 1/28/26 10:05 AM, Jiayuan Chen wrote:
>> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
>> index 19c9c838967f..dc9dcc799824 100644
>> --- a/net/ipv4/icmp.c
>> +++ b/net/ipv4/icmp.c
>> @@ -559,6 +559,23 @@ static struct rtable *icmp_route_lookup(struct net *net, struct flowi4 *fl4,
>> /* steal dst entry from skb_in, don't drop refcnt */
>> skb_dstref_steal(skb_in);
>> skb_dstref_restore(skb_in, orefdst);
>> +
>> + /*
>> + * At this point, fl4_dec.daddr should NOT be local (we
>> + * checked fl4_dec.saddr above). However, a race condition
>> + * may occur if the address is added to the interface
>> + * concurrently. In that case, ip_route_input() returns a
>> + * LOCAL route with dst.output=ip_rt_bug, which must not
>> + * be used for output.
>> + */
>> + if (!err && rt2 && rt2->rt_type == RTN_LOCAL) {
>> + net_warn_ratelimited("%s: detected local route for %pI4 "
>> + "during ICMP error handling (src %pI4), "
>> + "possible address race\n",
>> + __func__, &fl4_dec.daddr, &fl4_dec.saddr);
>
> The fix looks correct to me, but this patch should target the 'net' tree
> and the above warning message is a bit off: the text string should not
> be broken to fit the 80 chars limit - it need to be greepable - it's
> probably better to not include the function name.
>
> /P
>
Does the message even provide value? There is nothing a user can do
about it.
February 4, 2026 at 24:24, "David Ahern" <dsahern@kernel.org mailto:dsahern@kernel.org?to=%22David%20Ahern%22%20%3Cdsahern%40kernel.org%3E > wrote:
>
> On 2/3/26 3:41 AM, Paolo Abeni wrote:
>
> >
> > On 1/28/26 10:05 AM, Jiayuan Chen wrote:
> >
> > >
> > > diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> > > index 19c9c838967f..dc9dcc799824 100644
> > > --- a/net/ipv4/icmp.c
> > > +++ b/net/ipv4/icmp.c
> > > @@ -559,6 +559,23 @@ static struct rtable *icmp_route_lookup(struct net *net, struct flowi4 *fl4,
> > > /* steal dst entry from skb_in, don't drop refcnt */
> > > skb_dstref_steal(skb_in);
> > > skb_dstref_restore(skb_in, orefdst);
> > > +
> > > + /*
> > > + * At this point, fl4_dec.daddr should NOT be local (we
> > > + * checked fl4_dec.saddr above). However, a race condition
> > > + * may occur if the address is added to the interface
> > > + * concurrently. In that case, ip_route_input() returns a
> > > + * LOCAL route with dst.output=ip_rt_bug, which must not
> > > + * be used for output.
> > > + */
> > > + if (!err && rt2 && rt2->rt_type == RTN_LOCAL) {
> > > + net_warn_ratelimited("%s: detected local route for %pI4 "
> > > + "during ICMP error handling (src %pI4), "
> > > + "possible address race\n",
> > > + __func__, &fl4_dec.daddr, &fl4_dec.saddr);
> > >
> >
> > The fix looks correct to me, but this patch should target the 'net' tree
> > and the above warning message is a bit off: the text string should not
> > be broken to fit the 80 chars limit - it need to be greepable - it's
> > probably better to not include the function name.
> >
> > /P
> >
> Does the message even provide value? There is nothing a user can do
> about it.
>
Hi Paolo, David,
Thanks for the review.
Paolo, I'll fix it in newer version.
Regarding David's question about whether the message provides value:
I think it could still be useful for debugging. If a user's configuration
change causes ICMP packets to be silently dropped, there's currently no counter
or indication of what happened. Adding a dedicated counter for this rare race
condition seems overkill, so a rate-limited warning at least gives some visibility
into the failure.
That said, I'm open to dropping the message entirely if you both think silent handling
is preferred. Let me know and I'll send a new version either way.
Thanks,
Jiayuan
On 2/3/26 7:27 PM, Jiayuan Chen wrote: > > Thanks for the review. > > Paolo, I'll fix it in newer version. > > Regarding David's question about whether the message provides value: > I think it could still be useful for debugging. If a user's configuration > change causes ICMP packets to be silently dropped, there's currently no counter > or indication of what happened. Adding a dedicated counter for this rare race > condition seems overkill, so a rate-limited warning at least gives some visibility > into the failure. > > That said, I'm open to dropping the message entirely if you both think silent handling > is preferred. Let me know and I'll send a new version either way. > no strong opinions.
© 2016 - 2026 Red Hat, Inc.