net/ipv6/ip6_output.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
When the routing decision involves ECMP, the initial hash is calculated
with saddr being :: and the decision which interface to select from an
ECMP group is based on that hash. If the route lookup has to be
repeated, e.g. because a route was updated, the hash is calculated with
saddr set to the previously selected address. This can cause the
selected interface to change, breaking ongoing connections.
To ensure the initial interface selection is based on an actual saddr
the route lookup is repeated after the source address selection if a
hash was calculated.
Signed-off-by: Maximilian Moehl <maximilian@moehl.eu>
Link: https://lore.kernel.org/all/aOYLRyIlc7XU7-7n@shredder/
---
I've created a write-up of what I've done, including the steps taken
to test the patch: https://moehl.eu/blog/linux-ipv6-ecmp-instability.html
net/ipv6/ip6_output.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 8e2a6b28cea7..465fce51d017 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1148,6 +1148,18 @@ static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk,
*dst = NULL;
}
+ /* If ECMP was involved the initial hash was calculted
+ * with saddr=:: which can result in instability
+ * when it is later re-calculated with the selected
+ * saddr. Lookup the route again with the chosen
+ * saddr to get a stable result.
+ */
+ if (fl6->mp_hash) {
+ fl6->mp_hash = 0;
+ dst_release(*dst);
+ *dst = NULL;
+ }
+
if (fl6->flowi6_oif)
flags |= RT6_LOOKUP_F_IFACE;
}
base-commit: dc9e9d61e301c087bcd990dbf2fa18ad3e2e1429
--
2.53.0
On 3/29/26 11:12 AM, Maximilian Moehl wrote:
> When the routing decision involves ECMP, the initial hash is calculated
> with saddr being :: and the decision which interface to select from an
> ECMP group is based on that hash. If the route lookup has to be
> repeated, e.g. because a route was updated, the hash is calculated with
> saddr set to the previously selected address. This can cause the
> selected interface to change, breaking ongoing connections.
>
> To ensure the initial interface selection is based on an actual saddr
> the route lookup is repeated after the source address selection if a
> hash was calculated.
>
> Signed-off-by: Maximilian Moehl <maximilian@moehl.eu>
> Link: https://lore.kernel.org/all/aOYLRyIlc7XU7-7n@shredder/
> ---
> I've created a write-up of what I've done, including the steps taken
> to test the patch: https://moehl.eu/blog/linux-ipv6-ecmp-instability.html
>
> net/ipv6/ip6_output.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 8e2a6b28cea7..465fce51d017 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1148,6 +1148,18 @@ static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk,
> *dst = NULL;
> }
>
> + /* If ECMP was involved the initial hash was calculted
> + * with saddr=:: which can result in instability
> + * when it is later re-calculated with the selected
> + * saddr. Lookup the route again with the chosen
> + * saddr to get a stable result.
> + */
> + if (fl6->mp_hash) {
> + fl6->mp_hash = 0;
> + dst_release(*dst);
> + *dst = NULL;
> + }
> +
> if (fl6->flowi6_oif)
> flags |= RT6_LOOKUP_F_IFACE;
> }
This apparently breaks ipv6 fib tests (fib_tests.sh):
# IPv6 multipath load balance test
# TEST: IPv6 multipath loadbalance [FAIL]
see
https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style
on how to reproduce the tests.
Also this would deserve additional testcases.
Without diving much inside the code I have the feeling this change is
plugged into the wrong place: multipath selection logic should be
encapsulated by fib6_select_path().
/P
On Mon Mar 30, 2026 at 9:56 AM CEST, Paolo Abeni wrote:
> On 3/29/26 11:12 AM, Maximilian Moehl wrote:
>> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>> index 8e2a6b28cea7..465fce51d017 100644
>> --- a/net/ipv6/ip6_output.c
>> +++ b/net/ipv6/ip6_output.c
>> @@ -1148,6 +1148,18 @@ static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk,
>> *dst = NULL;
>> }
>>
>> + /* If ECMP was involved the initial hash was calculted
>> + * with saddr=:: which can result in instability
>> + * when it is later re-calculated with the selected
>> + * saddr. Lookup the route again with the chosen
>> + * saddr to get a stable result.
>> + */
>> + if (fl6->mp_hash) {
>> + fl6->mp_hash = 0;
>> + dst_release(*dst);
>> + *dst = NULL;
>> + }
>> +
>> if (fl6->flowi6_oif)
>> flags |= RT6_LOOKUP_F_IFACE;
>> }
>
> This apparently breaks ipv6 fib tests (fib_tests.sh):
>
> # IPv6 multipath load balance test
> # TEST: IPv6 multipath loadbalance [FAIL]
>
> see
> https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style
> on how to reproduce the tests.
>
> Also this would deserve additional testcases.
Thank you for the pointer, I will look into the tests.
> Without diving much inside the code I have the feeling this change is
> plugged into the wrong place: multipath selection logic should be
> encapsulated by fib6_select_path().
To be able to move this logic into fib6_select_path(), the call to
ip6_route_get_saddr() would probably have to move as well as we need
to set the source address after the first lookup. This would then also
set the source address as a side-effect.
I can give this a try, just want to confirm that it's the correct way
before doing so or if I'm missing something.
--
Max
© 2016 - 2026 Red Hat, Inc.