include/net/ip6_route.h | 3 +-- net/ipv6/route.c | 6 ------ 2 files changed, 1 insertion(+), 8 deletions(-)
At some point after b5d2d75e079a ("net/ipv6: Do not allow device only
routes via the multipath API"), the IPv6 stack was updated such that
device-only multipath routes can be installed and work correctly, but
still weren't allowed in the code.
This change removes the has_gateway check from rtm_to_fib6_multipath_config()
and the fib_nh_gw_family check from rt6_qualify_for_ecmp(), allowing
device-only multipath routes to be installed again.
Signed-off-by: azey <me@azey.net>
---
I tested this on a VM with two wireguard interfaces, and it seems to
work as expected. It also causes fe80::/64 and ff00::/8 to be installed as
multipath routes if there are multiple interfaces, but from my (somewhat
limited) testing that doesn't cause any issues.
I'm also not completely sure whether there are any other places in the
code that assume multipath nexthops must have a gateway addr, but I
didn't immediately find any.
PS: This is my very first contribution to the kernel (and indeed first time
sending a patch via mail), so sorry in advance if I messed anything up.
---
include/net/ip6_route.h | 3 +--
net/ipv6/route.c | 6 ------
2 files changed, 1 insertion(+), 8 deletions(-)
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 7c5512baa4b2..07e131f9fcf5 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -73,8 +73,7 @@ static inline bool rt6_need_strict(const struct in6_addr *daddr)
static inline bool rt6_qualify_for_ecmp(const struct fib6_info *f6i)
{
/* the RTF_ADDRCONF flag filters out RA's */
- return !(f6i->fib6_flags & RTF_ADDRCONF) && !f6i->nh &&
- f6i->fib6_nh->fib_nh_gw_family;
+ return !(f6i->fib6_flags & RTF_ADDRCONF) && !f6i->nh;
}
void ip6_route_input(struct sk_buff *skb);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index aee6a10b112a..40763b90e22c 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -5138,12 +5138,6 @@ static int rtm_to_fib6_multipath_config(struct fib6_config *cfg,
}
}
- if (newroute && (cfg->fc_nh_id || !has_gateway)) {
- NL_SET_ERR_MSG(extack,
- "Device only routes can not be added for IPv6 using the multipath API.");
- return -EINVAL;
- }
-
rtnh = rtnh_next(rtnh, &remaining);
} while (rtnh_ok(rtnh, remaining));
--
2.51.0
Hi azey,
kernel test robot noticed the following build warnings:
[auto build test WARNING on net-next/main]
[also build test WARNING on net/main klassert-ipsec/master linus/master v6.18-rc6 next-20251117]
[cannot apply to horms-ipvs/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/azey/net-ipv6-allow-device-only-routes-via-the-multipath-API/20251117-023331
base: net-next/main
patch link: https://lore.kernel.org/r/a6vmtv3ylu224fnj5awi6xrgnjoib5r2jm3kny672hemsk5ifi%40ychcxqnmy5us
patch subject: [PATCH] net/ipv6: allow device-only routes via the multipath API
config: i386-randconfig-141-20251117 (https://download.01.org/0day-ci/archive/20251118/202511180742.7iC868V8-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251118/202511180742.7iC868V8-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202511180742.7iC868V8-lkp@intel.com/
All warnings (new ones prefixed by >>):
net/ipv6/route.c: In function 'rtm_to_fib6_multipath_config':
>> net/ipv6/route.c:5122:22: warning: variable 'has_gateway' set but not used [-Wunused-but-set-variable]
5122 | bool has_gateway = cfg->fc_flags & RTF_GATEWAY;
| ^~~~~~~~~~~
vim +/has_gateway +5122 net/ipv6/route.c
86872cb57925c4 Thomas Graf 2006-08-22 5105
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5106 static int rtm_to_fib6_multipath_config(struct fib6_config *cfg,
bd11ff421d36ab Kuniyuki Iwashima 2025-04-17 5107 struct netlink_ext_ack *extack,
bd11ff421d36ab Kuniyuki Iwashima 2025-04-17 5108 bool newroute)
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5109 {
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5110 struct rtnexthop *rtnh;
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5111 int remaining;
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5112
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5113 remaining = cfg->fc_mp_len;
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5114 rtnh = (struct rtnexthop *)cfg->fc_mp;
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5115
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5116 if (!rtnh_ok(rtnh, remaining)) {
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5117 NL_SET_ERR_MSG(extack, "Invalid nexthop configuration - no valid nexthops");
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5118 return -EINVAL;
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5119 }
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5120
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5121 do {
e6f497955fb6a0 Kuniyuki Iwashima 2025-04-17 @5122 bool has_gateway = cfg->fc_flags & RTF_GATEWAY;
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5123 int attrlen = rtnh_attrlen(rtnh);
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5124
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5125 if (attrlen > 0) {
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5126 struct nlattr *nla, *attrs;
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5127
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5128 attrs = rtnh_attrs(rtnh);
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5129 nla = nla_find(attrs, attrlen, RTA_GATEWAY);
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5130 if (nla) {
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5131 if (nla_len(nla) < sizeof(cfg->fc_gateway)) {
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5132 NL_SET_ERR_MSG(extack,
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5133 "Invalid IPv6 address in RTA_GATEWAY");
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5134 return -EINVAL;
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5135 }
e6f497955fb6a0 Kuniyuki Iwashima 2025-04-17 5136
e6f497955fb6a0 Kuniyuki Iwashima 2025-04-17 5137 has_gateway = true;
e6f497955fb6a0 Kuniyuki Iwashima 2025-04-17 5138 }
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5139 }
e6f497955fb6a0 Kuniyuki Iwashima 2025-04-17 5140
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5141 rtnh = rtnh_next(rtnh, &remaining);
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5142 } while (rtnh_ok(rtnh, remaining));
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5143
f0a56c17e64bb5 Kuniyuki Iwashima 2025-05-15 5144 return lwtunnel_valid_encap_type_attr(cfg->fc_mp, cfg->fc_mp_len, extack);
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5145 }
4cb4861d8c3b3b Kuniyuki Iwashima 2025-04-17 5146
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On 11/16/25 11:31 AM, azey wrote:
> At some point after b5d2d75e079a ("net/ipv6: Do not allow device only
> routes via the multipath API"), the IPv6 stack was updated such that
> device-only multipath routes can be installed and work correctly, but
> still weren't allowed in the code.
>
> This change removes the has_gateway check from rtm_to_fib6_multipath_config()
> and the fib_nh_gw_family check from rt6_qualify_for_ecmp(), allowing
> device-only multipath routes to be installed again.
>
My recollection is that device only legs of an ECMP route is only valid
with the separate nexthop code. Added Nicholas (author of the original
IPv4 multipath code) to keep me honest.
Le 17/11/2025 à 02:57, David Ahern a écrit :
> On 11/16/25 11:31 AM, azey wrote:
>> At some point after b5d2d75e079a ("net/ipv6: Do not allow device only
>> routes via the multipath API"), the IPv6 stack was updated such that
>> device-only multipath routes can be installed and work correctly, but
>> still weren't allowed in the code.
>>
>> This change removes the has_gateway check from rtm_to_fib6_multipath_config()
>> and the fib_nh_gw_family check from rt6_qualify_for_ecmp(), allowing
>> device-only multipath routes to be installed again.
>>
>
> My recollection is that device only legs of an ECMP route is only valid
> with the separate nexthop code. Added Nicholas (author of the original
> IPv4 multipath code) to keep me honest.
If I remember well, it was to avoid merging connected routes to ECMP routes.
For example, fe80:: but also if two interfaces have an address in the same
prefix. With the current code, the last route will always be used. With this
patch, packets will be distributed across the two interfaces, right?
If yes, it may cause regression on some setups.
Regards,
Nicolas
On 2025-11-18 10:05:55, +0100 Nicolas Dichtel wrote: > If I remember well, it was to avoid merging connected routes to ECMP routes. > For example, fe80:: but also if two interfaces have an address in the same > prefix. With the current code, the last route will always be used. With this > patch, packets will be distributed across the two interfaces, right? > If yes, it may cause regression on some setups. Thanks! Yes, with this patch routes with the same destination and metric automatically become multipath. From my testing, for link-locals this shouldn't make a difference as the interface must always be specified with % anyway. For non-LL addresses, this could indeed cause a regression in obscure setups. In my opinion though, I feel that it is very unlikely anyone who has two routes with the same prefix and metric (which AFAIK, isn't really a supported configuration without ECMP anyway) relies on this quirk. The most plausible setup relying on this I can think of would be a server with two interfaces on the same L2 segment, and a firewall somewhere that only allows the source address of one interface through. IMO, setups like that are more of a misconfiguration than a "practical use case" that'd make this a real regression, but I'd completely understand if it'd be enough to block this.
On 11/18/25 4:00 AM, azey wrote: > On 2025-11-18 10:05:55, +0100 Nicolas Dichtel wrote: >> If I remember well, it was to avoid merging connected routes to ECMP routes. >> For example, fe80:: but also if two interfaces have an address in the same >> prefix. With the current code, the last route will always be used. With this >> patch, packets will be distributed across the two interfaces, right? >> If yes, it may cause regression on some setups. > > Thanks! Yes, with this patch routes with the same destination and metric automatically > become multipath. From my testing, for link-locals this shouldn't make a difference > as the interface must always be specified with % anyway. > > For non-LL addresses, this could indeed cause a regression in obscure setups. In my > opinion though, I feel that it is very unlikely anyone who has two routes with the > same prefix and metric (which AFAIK, isn't really a supported configuration without > ECMP anyway) relies on this quirk. The most plausible setup relying on this I can > think of would be a server with two interfaces on the same L2 segment, and a > firewall somewhere that only allows the source address of one interface through. > > IMO, setups like that are more of a misconfiguration than a "practical use case" > that'd make this a real regression, but I'd completely understand if it'd be enough > to block this. There is really no reason to take a risk of a regression. If someone wants ecmp with device only nexthops, then use the new nexthop infra to do it.
On 2025-11-18 17:04:38 +0100, David Ahern <dsahern@kernel.org> wrote: > There is really no reason to take a risk of a regression. If someone > wants ecmp with device only nexthops, then use the new nexthop infra to > do it. My initial reason was that device-only ECMP via `ip route` works with IPv4 but not IPv6, so I thought it'd make sense to unify functionality - but if this is final I won't argue any further. Thanks again for the reviews, and sorry for potentially wasting your time.
On 11/18/25 9:47 AM, azey wrote: > On 2025-11-18 17:04:38 +0100, David Ahern <dsahern@kernel.org> wrote: >> There is really no reason to take a risk of a regression. If someone >> wants ecmp with device only nexthops, then use the new nexthop infra to >> do it. > > My initial reason was that device-only ECMP via `ip route` works with IPv4 > but not IPv6, so I thought it'd make sense to unify functionality - but if > this is final I won't argue any further. > There was a push many years ago to align v4 and v6 as much as possible. Certain areas - like ipv6 multipath - proved to be too difficult and ended up causing regressions.
Le 18/11/2025 à 17:04, David Ahern a écrit : > On 11/18/25 4:00 AM, azey wrote: >> On 2025-11-18 10:05:55, +0100 Nicolas Dichtel wrote: >>> If I remember well, it was to avoid merging connected routes to ECMP routes. >>> For example, fe80:: but also if two interfaces have an address in the same >>> prefix. With the current code, the last route will always be used. With this >>> patch, packets will be distributed across the two interfaces, right? >>> If yes, it may cause regression on some setups. >> >> Thanks! Yes, with this patch routes with the same destination and metric automatically >> become multipath. From my testing, for link-locals this shouldn't make a difference >> as the interface must always be specified with % anyway. >> >> For non-LL addresses, this could indeed cause a regression in obscure setups. In my Having an address in the same prefix on two interfaces is not an "obscure setups". >> opinion though, I feel that it is very unlikely anyone who has two routes with the >> same prefix and metric (which AFAIK, isn't really a supported configuration without >> ECMP anyway) relies on this quirk. The most plausible setup relying on this I can >> think of would be a server with two interfaces on the same L2 segment, and a >> firewall somewhere that only allows the source address of one interface through. >> >> IMO, setups like that are more of a misconfiguration than a "practical use case" >> that'd make this a real regression, but I'd completely understand if it'd be enough >> to block this. > > There is really no reason to take a risk of a regression. If someone > wants ecmp with device only nexthops, then use the new nexthop infra to > do it. +1
On 2025-11-18 17:41:14 +0100, Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote: > Having an address in the same prefix on two interfaces is not an "obscure setups". Sorry, just a clarification on this since I didn't get the email in time before sending my reply to David: I meant specifically the case where someone relies on the last route always being selected in this scenario, setups that don't rely on that shouldn't be affected.
© 2016 - 2026 Red Hat, Inc.