include/net/route.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
Branch annotation traces from approximately 200 IPv6-enabled hosts
revealed that the 'likely' branch in ip_neigh_for_gw() was consistently
mispredicted. Given the increasing prevalence of IPv6 in modern networks,
this commit adjusts the function to favor the IPv6 path.
Swap the order of the conditional statements and move the 'likely'
annotation to the IPv6 case. This change aims to improve performance in
IPv6-dominant environments by reducing branch mispredictions.
This optimization aligns with the trend of IPv6 becoming the default IP
version in many deployments, and should benefit modern network
configurations.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
include/net/route.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/net/route.h b/include/net/route.h
index 1789f1e6640b..b90b7b1effb8 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt,
struct net_device *dev = rt->dst.dev;
struct neighbour *neigh;
- if (likely(rt->rt_gw_family == AF_INET)) {
- neigh = ip_neigh_gw4(dev, rt->rt_gw4);
- } else if (rt->rt_gw_family == AF_INET6) {
+ if (likely(rt->rt_gw_family == AF_INET6)) {
neigh = ip_neigh_gw6(dev, &rt->rt_gw6);
*is_v6gw = true;
+ } else if (rt->rt_gw_family == AF_INET) {
+ neigh = ip_neigh_gw4(dev, rt->rt_gw4);
} else {
neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr);
}
--
2.43.5
On 10/4/24 10:27 AM, Breno Leitao wrote: > Branch annotation traces from approximately 200 IPv6-enabled hosts > revealed that the 'likely' branch in ip_neigh_for_gw() was consistently > mispredicted. Given the increasing prevalence of IPv6 in modern networks, > this commit adjusts the function to favor the IPv6 path. > > Swap the order of the conditional statements and move the 'likely' > annotation to the IPv6 case. This change aims to improve performance in > IPv6-dominant environments by reducing branch mispredictions. > > This optimization aligns with the trend of IPv6 becoming the default IP > version in many deployments, and should benefit modern network > configurations. > > Signed-off-by: Breno Leitao <leitao@debian.org> > --- > include/net/route.h | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/include/net/route.h b/include/net/route.h > index 1789f1e6640b..b90b7b1effb8 100644 > --- a/include/net/route.h > +++ b/include/net/route.h > @@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt, > struct net_device *dev = rt->dst.dev; > struct neighbour *neigh; > > - if (likely(rt->rt_gw_family == AF_INET)) { > - neigh = ip_neigh_gw4(dev, rt->rt_gw4); > - } else if (rt->rt_gw_family == AF_INET6) { > + if (likely(rt->rt_gw_family == AF_INET6)) { > neigh = ip_neigh_gw6(dev, &rt->rt_gw6); > *is_v6gw = true; > + } else if (rt->rt_gw_family == AF_INET) { > + neigh = ip_neigh_gw4(dev, rt->rt_gw4); > } else { > neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr); > } This is an IPv4 function allowing support for IPv6 addresses as a nexthop. It is appropriate for IPv4 family checks to be first.
Hello David, On Fri, Oct 04, 2024 at 11:01:29AM -0600, David Ahern wrote: > On 10/4/24 10:27 AM, Breno Leitao wrote: > > Branch annotation traces from approximately 200 IPv6-enabled hosts > > revealed that the 'likely' branch in ip_neigh_for_gw() was consistently > > mispredicted. Given the increasing prevalence of IPv6 in modern networks, > > this commit adjusts the function to favor the IPv6 path. > > > > Swap the order of the conditional statements and move the 'likely' > > annotation to the IPv6 case. This change aims to improve performance in > > IPv6-dominant environments by reducing branch mispredictions. > > > > This optimization aligns with the trend of IPv6 becoming the default IP > > version in many deployments, and should benefit modern network > > configurations. > > > > Signed-off-by: Breno Leitao <leitao@debian.org> > > --- > > include/net/route.h | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/include/net/route.h b/include/net/route.h > > index 1789f1e6640b..b90b7b1effb8 100644 > > --- a/include/net/route.h > > +++ b/include/net/route.h > > @@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt, > > struct net_device *dev = rt->dst.dev; > > struct neighbour *neigh; > > > > - if (likely(rt->rt_gw_family == AF_INET)) { > > - neigh = ip_neigh_gw4(dev, rt->rt_gw4); > > - } else if (rt->rt_gw_family == AF_INET6) { > > + if (likely(rt->rt_gw_family == AF_INET6)) { > > neigh = ip_neigh_gw6(dev, &rt->rt_gw6); > > *is_v6gw = true; > > + } else if (rt->rt_gw_family == AF_INET) { > > + neigh = ip_neigh_gw4(dev, rt->rt_gw4); > > } else { > > neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr); > > } > > This is an IPv4 function allowing support for IPv6 addresses as a > nexthop. It is appropriate for IPv4 family checks to be first. Right. In which case is this called on IPv6 only systems? On my IPv6-only 200 systems, the annotated branch predictor is showing it is mispredicted 100% of the time. Thanks for the review --breno
On 10/4/24 19:37, Breno Leitao wrote: > On Fri, Oct 04, 2024 at 11:01:29AM -0600, David Ahern wrote: >> On 10/4/24 10:27 AM, Breno Leitao wrote: >>> Branch annotation traces from approximately 200 IPv6-enabled hosts >>> revealed that the 'likely' branch in ip_neigh_for_gw() was consistently >>> mispredicted. Given the increasing prevalence of IPv6 in modern networks, >>> this commit adjusts the function to favor the IPv6 path. >>> >>> Swap the order of the conditional statements and move the 'likely' >>> annotation to the IPv6 case. This change aims to improve performance in >>> IPv6-dominant environments by reducing branch mispredictions. >>> >>> This optimization aligns with the trend of IPv6 becoming the default IP >>> version in many deployments, and should benefit modern network >>> configurations. >>> >>> Signed-off-by: Breno Leitao <leitao@debian.org> >>> --- >>> include/net/route.h | 6 +++--- >>> 1 file changed, 3 insertions(+), 3 deletions(-) >>> >>> diff --git a/include/net/route.h b/include/net/route.h >>> index 1789f1e6640b..b90b7b1effb8 100644 >>> --- a/include/net/route.h >>> +++ b/include/net/route.h >>> @@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt, >>> struct net_device *dev = rt->dst.dev; >>> struct neighbour *neigh; >>> >>> - if (likely(rt->rt_gw_family == AF_INET)) { >>> - neigh = ip_neigh_gw4(dev, rt->rt_gw4); >>> - } else if (rt->rt_gw_family == AF_INET6) { >>> + if (likely(rt->rt_gw_family == AF_INET6)) { >>> neigh = ip_neigh_gw6(dev, &rt->rt_gw6); >>> *is_v6gw = true; >>> + } else if (rt->rt_gw_family == AF_INET) { >>> + neigh = ip_neigh_gw4(dev, rt->rt_gw4); >>> } else { >>> neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr); >>> } >> >> This is an IPv4 function allowing support for IPv6 addresses as a >> nexthop. It is appropriate for IPv4 family checks to be first. > > Right. In which case is this called on IPv6 only systems? > > On my IPv6-only 200 systems, the annotated branch predictor is showing > it is mispredicted 100% of the time. perf probe -a ip_neigh_for_gw; perf record -e probe:ip_neigh_for_gw -ag; perf script should give you an hint. Cheers, Paolo
Hello Paolo, On Tue, Oct 08, 2024 at 12:51:05PM +0200, Paolo Abeni wrote: > On 10/4/24 19:37, Breno Leitao wrote: > > On Fri, Oct 04, 2024 at 11:01:29AM -0600, David Ahern wrote: > > > On 10/4/24 10:27 AM, Breno Leitao wrote: > > > > Branch annotation traces from approximately 200 IPv6-enabled hosts > > > > revealed that the 'likely' branch in ip_neigh_for_gw() was consistently > > > > mispredicted. Given the increasing prevalence of IPv6 in modern networks, > > > > this commit adjusts the function to favor the IPv6 path. > > > > > > > > Swap the order of the conditional statements and move the 'likely' > > > > annotation to the IPv6 case. This change aims to improve performance in > > > > IPv6-dominant environments by reducing branch mispredictions. > > > > > > > > This optimization aligns with the trend of IPv6 becoming the default IP > > > > version in many deployments, and should benefit modern network > > > > configurations. > > > > > > > > Signed-off-by: Breno Leitao <leitao@debian.org> > > > > --- > > > > include/net/route.h | 6 +++--- > > > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/include/net/route.h b/include/net/route.h > > > > index 1789f1e6640b..b90b7b1effb8 100644 > > > > --- a/include/net/route.h > > > > +++ b/include/net/route.h > > > > @@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt, > > > > struct net_device *dev = rt->dst.dev; > > > > struct neighbour *neigh; > > > > - if (likely(rt->rt_gw_family == AF_INET)) { > > > > - neigh = ip_neigh_gw4(dev, rt->rt_gw4); > > > > - } else if (rt->rt_gw_family == AF_INET6) { > > > > + if (likely(rt->rt_gw_family == AF_INET6)) { > > > > neigh = ip_neigh_gw6(dev, &rt->rt_gw6); > > > > *is_v6gw = true; > > > > + } else if (rt->rt_gw_family == AF_INET) { > > > > + neigh = ip_neigh_gw4(dev, rt->rt_gw4); > > > > } else { > > > > neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr); > > > > } > > > > > > This is an IPv4 function allowing support for IPv6 addresses as a > > > nexthop. It is appropriate for IPv4 family checks to be first. > > > > Right. In which case is this called on IPv6 only systems? > > > > On my IPv6-only 200 systems, the annotated branch predictor is showing > > it is mispredicted 100% of the time. > > perf probe -a ip_neigh_for_gw; perf record -e probe:ip_neigh_for_gw -ag; > perf script > > should give you an hint. Thanks. That proved to be very useful. As I said above, all the hosts I have a webserver running, I see this that likely mispredicted. Same for this server: # cat /sys/kernel/tracing/trace_stat/branch_annotated | grep ip_neigh_for_gw correct incorrect % Function File Line 0 17127 100 ip_neigh_for_gw route.h 393 It is mostly coming from ip_finish_output2() and tcp_v4. Important to say that these machine has no IPv4 configured, except 127.0.0.1 (localhost). Output of `perf script`: curl 3284017 [020] 342043.646674: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 __ip_queue_xmit+0x504 __tcp_transmit_skb+0xcfe tcp_connect+0xa1d tcp_v4_connect+0x463 __inet_stream_connect+0x5b inet_stream_connect+0x36 __sys_connect+0x8d __x64_sys_connect+0x16 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_connect+0x4b (/usr/lib64/libc.so.6) 0 [unknown] ([unknown]) curl 3284017 [020] 342043.646720: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 ip_send_unicast_reply+0x3fc tcp_v4_send_reset+0x668 tcp_v4_rcv+0xcdf ip_protocol_deliver_rcu+0x10e ip_local_deliver_finish+0x97 ip_local_deliver+0x43 ip_rcv+0x35 process_backlog+0x1b8 __napi_poll+0x30 net_rx_action+0x180 __kprobes_text_end+0xf2 __local_bh_enable_ip+0xeb __dev_queue_xmit+0xc18 ip_finish_output2+0x63a ip_output+0x73 __ip_queue_xmit+0x504 __tcp_transmit_skb+0xcfe tcp_connect+0xa1d tcp_v4_connect+0x463 __inet_stream_connect+0x5b inet_stream_connect+0x36 __sys_connect+0x8d __x64_sys_connect+0x16 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_connect+0x4b (/usr/lib64/libc.so.6) 0 [unknown] ([unknown]) isc-net-0000 3286356 [026] 342055.690384: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 ip_send_skb+0x15 udp_send_skb+0xd2 udp_sendmsg+0xaa7 __x64_sys_sendmsg+0x338 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_sendmsg+0x4d (/usr/lib64/libc.so.6) curl 3288713 [032] 342103.631991: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 __ip_queue_xmit+0x504 __tcp_transmit_skb+0xcfe tcp_connect+0xa1d tcp_v4_connect+0x463 __inet_stream_connect+0x5b inet_stream_connect+0x36 __sys_connect+0x8d __x64_sys_connect+0x16 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_connect+0x4b (/usr/lib64/libc.so.6) 0 [unknown] ([unknown]) curl 3288713 [032] 342103.632039: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 ip_send_unicast_reply+0x3fc tcp_v4_send_reset+0x668 tcp_v4_rcv+0xcdf ip_protocol_deliver_rcu+0x10e ip_local_deliver_finish+0x97 ip_local_deliver+0x43 ip_rcv+0x35 process_backlog+0x1b8 __napi_poll+0x30 net_rx_action+0x180 __kprobes_text_end+0xf2 __local_bh_enable_ip+0xeb __dev_queue_xmit+0xc18 ip_finish_output2+0x63a ip_output+0x73 __ip_queue_xmit+0x504 __tcp_transmit_skb+0xcfe tcp_connect+0xa1d tcp_v4_connect+0x463 __inet_stream_connect+0x5b inet_stream_connect+0x36 __sys_connect+0x8d __x64_sys_connect+0x16 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_connect+0x4b (/usr/lib64/libc.so.6) 0 [unknown] ([unknown]) isc-net-0000 3289126 [021] 342115.725482: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 ip_send_skb+0x15 udp_send_skb+0xd2 udp_sendmsg+0xaa7 __x64_sys_sendmsg+0x338 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_sendmsg+0x4d (/usr/lib64/libc.so.6) curl 3291018 [030] 342163.627633: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 __ip_queue_xmit+0x504 __tcp_transmit_skb+0xcfe tcp_connect+0xa1d tcp_v4_connect+0x463 __inet_stream_connect+0x5b inet_stream_connect+0x36 __sys_connect+0x8d __x64_sys_connect+0x16 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_connect+0x4b (/usr/lib64/libc.so.6) 0 [unknown] ([unknown]) curl 3291018 [030] 342163.627673: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 ip_send_unicast_reply+0x3fc tcp_v4_send_reset+0x668 tcp_v4_rcv+0xcdf ip_protocol_deliver_rcu+0x10e ip_local_deliver_finish+0x97 ip_local_deliver+0x43 ip_rcv+0x35 process_backlog+0x1b8 __napi_poll+0x30 net_rx_action+0x180 __kprobes_text_end+0xf2 __local_bh_enable_ip+0xeb __dev_queue_xmit+0xc18 ip_finish_output2+0x63a ip_output+0x73 __ip_queue_xmit+0x504 __tcp_transmit_skb+0xcfe tcp_connect+0xa1d tcp_v4_connect+0x463 __inet_stream_connect+0x5b inet_stream_connect+0x36 __sys_connect+0x8d __x64_sys_connect+0x16 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_connect+0x4b (/usr/lib64/libc.so.6) 0 [unknown] ([unknown]) isc-net-0000 3291256 [031] 342175.683527: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 ip_send_skb+0x15 udp_send_skb+0xd2 udp_sendmsg+0xaa7 __x64_sys_sendmsg+0x338 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_sendmsg+0x4d (/usr/lib64/libc.so.6) curl 3293421 [025] 342223.618198: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 __ip_queue_xmit+0x504 __tcp_transmit_skb+0xcfe tcp_connect+0xa1d tcp_v4_connect+0x463 __inet_stream_connect+0x5b inet_stream_connect+0x36 __sys_connect+0x8d __x64_sys_connect+0x16 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_connect+0x4b (/usr/lib64/libc.so.6) 0 [unknown] ([unknown]) curl 3293421 [025] 342223.618239: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 ip_send_unicast_reply+0x3fc tcp_v4_send_reset+0x668 tcp_v4_rcv+0xcdf ip_protocol_deliver_rcu+0x10e ip_local_deliver_finish+0x97 ip_local_deliver+0x43 ip_rcv+0x35 process_backlog+0x1b8 __napi_poll+0x30 net_rx_action+0x180 __kprobes_text_end+0xf2 __local_bh_enable_ip+0xeb __dev_queue_xmit+0xc18 ip_finish_output2+0x63a ip_output+0x73 __ip_queue_xmit+0x504 __tcp_transmit_skb+0xcfe tcp_connect+0xa1d tcp_v4_connect+0x463 __inet_stream_connect+0x5b inet_stream_connect+0x36 __sys_connect+0x8d __x64_sys_connect+0x16 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_connect+0x4b (/usr/lib64/libc.so.6) 0 [unknown] ([unknown]) isc-net-0000 3293659 [034] 342235.695019: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 ip_send_skb+0x15 udp_send_skb+0xd2 udp_sendmsg+0xaa7 __x64_sys_sendmsg+0x338 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_sendmsg+0x4d (/usr/lib64/libc.so.6) curl 3295399 [012] 342283.632642: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 __ip_queue_xmit+0x504 __tcp_transmit_skb+0xcfe tcp_connect+0xa1d tcp_v4_connect+0x463 __inet_stream_connect+0x5b inet_stream_connect+0x36 __sys_connect+0x8d __x64_sys_connect+0x16 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_connect+0x4b (/usr/lib64/libc.so.6) 0 [unknown] ([unknown]) curl 3295399 [012] 342283.632691: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 ip_send_unicast_reply+0x3fc tcp_v4_send_reset+0x668 tcp_v4_rcv+0xcdf ip_protocol_deliver_rcu+0x10e ip_local_deliver_finish+0x97 ip_local_deliver+0x43 ip_rcv+0x35 process_backlog+0x1b8 __napi_poll+0x30 net_rx_action+0x180 __kprobes_text_end+0xf2 __local_bh_enable_ip+0xeb __dev_queue_xmit+0xc18 ip_finish_output2+0x63a ip_output+0x73 __ip_queue_xmit+0x504 __tcp_transmit_skb+0xcfe tcp_connect+0xa1d tcp_v4_connect+0x463 __inet_stream_connect+0x5b inet_stream_connect+0x36 __sys_connect+0x8d __x64_sys_connect+0x16 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_connect+0x4b (/usr/lib64/libc.so.6) 0 [unknown] ([unknown]) isc-net-0000 3295746 [001] 342295.712436: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 ip_send_skb+0x15 udp_send_skb+0xd2 udp_sendmsg+0xaa7 __x64_sys_sendmsg+0x338 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_sendmsg+0x4d (/usr/lib64/libc.so.6) curl 3298603 [020] 342343.608814: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 __ip_queue_xmit+0x504 __tcp_transmit_skb+0xcfe tcp_connect+0xa1d tcp_v4_connect+0x463 __inet_stream_connect+0x5b inet_stream_connect+0x36 __sys_connect+0x8d __x64_sys_connect+0x16 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_connect+0x4b (/usr/lib64/libc.so.6) 0 [unknown] ([unknown]) curl 3298603 [020] 342343.608858: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 ip_send_unicast_reply+0x3fc tcp_v4_send_reset+0x668 tcp_v4_rcv+0xcdf ip_protocol_deliver_rcu+0x10e ip_local_deliver_finish+0x97 ip_local_deliver+0x43 ip_rcv+0x35 process_backlog+0x1b8 __napi_poll+0x30 net_rx_action+0x180 __kprobes_text_end+0xf2 __local_bh_enable_ip+0xeb __dev_queue_xmit+0xc18 ip_finish_output2+0x63a ip_output+0x73 __ip_queue_xmit+0x504 __tcp_transmit_skb+0xcfe tcp_connect+0xa1d tcp_v4_connect+0x463 __inet_stream_connect+0x5b inet_stream_connect+0x36 __sys_connect+0x8d __x64_sys_connect+0x16 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_connect+0x4b (/usr/lib64/libc.so.6) 0 [unknown] ([unknown]) isc-net-0000 3299252 [032] 342355.693816: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 ip_send_skb+0x15 udp_send_skb+0xd2 udp_sendmsg+0xaa7 __x64_sys_sendmsg+0x338 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_sendmsg+0x4d (/usr/lib64/libc.so.6) curl 3303255 [033] 342403.616685: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 __ip_queue_xmit+0x504 __tcp_transmit_skb+0xcfe tcp_connect+0xa1d tcp_v4_connect+0x463 __inet_stream_connect+0x5b inet_stream_connect+0x36 __sys_connect+0x8d __x64_sys_connect+0x16 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_connect+0x4b (/usr/lib64/libc.so.6) 0 [unknown] ([unknown]) curl 3303255 [033] 342403.616729: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 ip_send_unicast_reply+0x3fc tcp_v4_send_reset+0x668 tcp_v4_rcv+0xcdf ip_protocol_deliver_rcu+0x10e ip_local_deliver_finish+0x97 ip_local_deliver+0x43 ip_rcv+0x35 process_backlog+0x1b8 __napi_poll+0x30 net_rx_action+0x180 __kprobes_text_end+0xf2 __local_bh_enable_ip+0xeb __dev_queue_xmit+0xc18 ip_finish_output2+0x63a ip_output+0x73 __ip_queue_xmit+0x504 __tcp_transmit_skb+0xcfe tcp_connect+0xa1d tcp_v4_connect+0x463 __inet_stream_connect+0x5b inet_stream_connect+0x36 __sys_connect+0x8d __x64_sys_connect+0x16 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_connect+0x4b (/usr/lib64/libc.so.6) 0 [unknown] ([unknown]) isc-net-0000 3304989 [011] 342415.740580: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 ip_send_skb+0x15 udp_send_skb+0xd2 udp_sendmsg+0xaa7 __x64_sys_sendmsg+0x338 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_sendmsg+0x4d (/usr/lib64/libc.so.6) curl 3312952 [035] 342463.633808: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 __ip_queue_xmit+0x504 __tcp_transmit_skb+0xcfe tcp_connect+0xa1d tcp_v4_connect+0x463 __inet_stream_connect+0x5b inet_stream_connect+0x36 __sys_connect+0x8d __x64_sys_connect+0x16 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_connect+0x4b (/usr/lib64/libc.so.6) 0 [unknown] ([unknown]) curl 3312952 [035] 342463.633859: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 ip_send_unicast_reply+0x3fc tcp_v4_send_reset+0x668 tcp_v4_rcv+0xcdf ip_protocol_deliver_rcu+0x10e ip_local_deliver_finish+0x97 ip_local_deliver+0x43 ip_rcv+0x35 process_backlog+0x1b8 __napi_poll+0x30 net_rx_action+0x180 __kprobes_text_end+0xf2 __local_bh_enable_ip+0xeb __dev_queue_xmit+0xc18 ip_finish_output2+0x63a ip_output+0x73 __ip_queue_xmit+0x504 __tcp_transmit_skb+0xcfe tcp_connect+0xa1d tcp_v4_connect+0x463 __inet_stream_connect+0x5b inet_stream_connect+0x36 __sys_connect+0x8d __x64_sys_connect+0x16 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_connect+0x4b (/usr/lib64/libc.so.6) 0 [unknown] ([unknown]) isc-net-0000 3314546 [032] 342475.766762: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 ip_send_skb+0x15 udp_send_skb+0xd2 udp_sendmsg+0xaa7 __x64_sys_sendmsg+0x338 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_sendmsg+0x4d (/usr/lib64/libc.so.6) curl 3321983 [006] 342523.654221: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 __ip_queue_xmit+0x504 __tcp_transmit_skb+0xcfe tcp_connect+0xa1d tcp_v4_connect+0x463 __inet_stream_connect+0x5b inet_stream_connect+0x36 __sys_connect+0x8d __x64_sys_connect+0x16 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_connect+0x4b (/usr/lib64/libc.so.6) 0 [unknown] ([unknown]) curl 3321983 [006] 342523.654262: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 ip_send_unicast_reply+0x3fc tcp_v4_send_reset+0x668 tcp_v4_rcv+0xcdf ip_protocol_deliver_rcu+0x10e ip_local_deliver_finish+0x97 ip_local_deliver+0x43 ip_rcv+0x35 process_backlog+0x1b8 __napi_poll+0x30 net_rx_action+0x180 __kprobes_text_end+0xf2 __local_bh_enable_ip+0xeb __dev_queue_xmit+0xc18 ip_finish_output2+0x63a ip_output+0x73 __ip_queue_xmit+0x504 __tcp_transmit_skb+0xcfe tcp_connect+0xa1d tcp_v4_connect+0x463 __inet_stream_connect+0x5b inet_stream_connect+0x36 __sys_connect+0x8d __x64_sys_connect+0x16 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_connect+0x4b (/usr/lib64/libc.so.6) 0 [unknown] ([unknown]) isc-net-0000 3323932 [025] 342535.718587: probe:ip_neigh_for_gw: () ip_finish_output2+0x150 ip_output+0x73 ip_send_skb+0x15 udp_send_skb+0xd2 udp_sendmsg+0xaa7 __x64_sys_sendmsg+0x338 do_syscall_64+0xc2 entry_SYSCALL_64_after_hwframe+0x4b __libc_sendmsg+0x4d (/usr/lib64/libc.so.6)
On Tue, Oct 8, 2024 at 4:07 PM Breno Leitao <leitao@debian.org> wrote: > > Hello Paolo, > > On Tue, Oct 08, 2024 at 12:51:05PM +0200, Paolo Abeni wrote: > > On 10/4/24 19:37, Breno Leitao wrote: > > > On Fri, Oct 04, 2024 at 11:01:29AM -0600, David Ahern wrote: > > > > On 10/4/24 10:27 AM, Breno Leitao wrote: > > > > > Branch annotation traces from approximately 200 IPv6-enabled hosts > > > > > revealed that the 'likely' branch in ip_neigh_for_gw() was consistently > > > > > mispredicted. Given the increasing prevalence of IPv6 in modern networks, > > > > > this commit adjusts the function to favor the IPv6 path. > > > > > > > > > > Swap the order of the conditional statements and move the 'likely' > > > > > annotation to the IPv6 case. This change aims to improve performance in > > > > > IPv6-dominant environments by reducing branch mispredictions. > > > > > > > > > > This optimization aligns with the trend of IPv6 becoming the default IP > > > > > version in many deployments, and should benefit modern network > > > > > configurations. > > > > > > > > > > Signed-off-by: Breno Leitao <leitao@debian.org> > > > > > --- > > > > > include/net/route.h | 6 +++--- > > > > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/include/net/route.h b/include/net/route.h > > > > > index 1789f1e6640b..b90b7b1effb8 100644 > > > > > --- a/include/net/route.h > > > > > +++ b/include/net/route.h > > > > > @@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt, > > > > > struct net_device *dev = rt->dst.dev; > > > > > struct neighbour *neigh; > > > > > - if (likely(rt->rt_gw_family == AF_INET)) { > > > > > - neigh = ip_neigh_gw4(dev, rt->rt_gw4); > > > > > - } else if (rt->rt_gw_family == AF_INET6) { > > > > > + if (likely(rt->rt_gw_family == AF_INET6)) { > > > > > neigh = ip_neigh_gw6(dev, &rt->rt_gw6); > > > > > *is_v6gw = true; > > > > > + } else if (rt->rt_gw_family == AF_INET) { > > > > > + neigh = ip_neigh_gw4(dev, rt->rt_gw4); > > > > > } else { > > > > > neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr); > > > > > } > > > > > > > > This is an IPv4 function allowing support for IPv6 addresses as a > > > > nexthop. It is appropriate for IPv4 family checks to be first. > > > > > > Right. In which case is this called on IPv6 only systems? > > > > > > On my IPv6-only 200 systems, the annotated branch predictor is showing > > > it is mispredicted 100% of the time. > > > > perf probe -a ip_neigh_for_gw; perf record -e probe:ip_neigh_for_gw -ag; > > perf script > > > > should give you an hint. > > Thanks. That proved to be very useful. > > As I said above, all the hosts I have a webserver running, I see this > that likely mispredicted. Same for this server: > > # cat /sys/kernel/tracing/trace_stat/branch_annotated | grep ip_neigh_for_gw > correct incorrect % Function File Line > 0 17127 100 ip_neigh_for_gw route.h 393 > > It is mostly coming from ip_finish_output2() and tcp_v4. Important to > say that these machine has no IPv4 configured, except 127.0.0.1 > (localhost). Now run the experiment on a typical server using IPv4 ? I would advise removing the likely() if it really bothers you. (I doubt this has any impact) But assuming everything is IPv6 is too soon. There are more obvious changes like : diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c index b6e7d4921309741193a8c096aeb278255ec56794..445f4fe712603e8c14b1006ad4cbaac278bae4ea 100644 --- a/net/ipv4/ip_input.c +++ b/net/ipv4/ip_input.c @@ -462,7 +462,7 @@ static struct sk_buff *ip_rcv_core(struct sk_buff *skb, struct net *net) /* When the interface is in promisc. mode, drop all the crap * that it receives, do not try to analyse it. */ - if (skb->pkt_type == PACKET_OTHERHOST) { + if (unlikely(skb->pkt_type == PACKET_OTHERHOST)) { dev_core_stats_rx_otherhost_dropped_inc(skb->dev); drop_reason = SKB_DROP_REASON_OTHERHOST; goto drop; diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c index 70c0e16c0ae6837d1c64d0036829c8b61799578b..3d0797afa499fa880eb5452a0dea8a23505b3e60 100644 --- a/net/ipv6/ip6_input.c +++ b/net/ipv6/ip6_input.c @@ -153,7 +153,7 @@ static struct sk_buff *ip6_rcv_core(struct sk_buff *skb, struct net_device *dev, u32 pkt_len; struct inet6_dev *idev; - if (skb->pkt_type == PACKET_OTHERHOST) { + if (unlikely(skb->pkt_type == PACKET_OTHERHOST)) { dev_core_stats_rx_otherhost_dropped_inc(skb->dev); kfree_skb_reason(skb, SKB_DROP_REASON_OTHERHOST); return NULL;
Hello Eric, On Tue, Oct 08, 2024 at 04:15:37PM +0200, Eric Dumazet wrote: > On Tue, Oct 8, 2024 at 4:07 PM Breno Leitao <leitao@debian.org> wrote: > > > > Hello Paolo, > > > > On Tue, Oct 08, 2024 at 12:51:05PM +0200, Paolo Abeni wrote: > > > On 10/4/24 19:37, Breno Leitao wrote: > > > > On Fri, Oct 04, 2024 at 11:01:29AM -0600, David Ahern wrote: > > > > > On 10/4/24 10:27 AM, Breno Leitao wrote: > > > > > > Branch annotation traces from approximately 200 IPv6-enabled hosts > > > > > > revealed that the 'likely' branch in ip_neigh_for_gw() was consistently > > > > > > mispredicted. Given the increasing prevalence of IPv6 in modern networks, > > > > > > this commit adjusts the function to favor the IPv6 path. > > > > > > > > > > > > Swap the order of the conditional statements and move the 'likely' > > > > > > annotation to the IPv6 case. This change aims to improve performance in > > > > > > IPv6-dominant environments by reducing branch mispredictions. > > > > > > > > > > > > This optimization aligns with the trend of IPv6 becoming the default IP > > > > > > version in many deployments, and should benefit modern network > > > > > > configurations. > > > > > > > > > > > > Signed-off-by: Breno Leitao <leitao@debian.org> > > > > > > --- > > > > > > include/net/route.h | 6 +++--- > > > > > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > > > > > > > > > diff --git a/include/net/route.h b/include/net/route.h > > > > > > index 1789f1e6640b..b90b7b1effb8 100644 > > > > > > --- a/include/net/route.h > > > > > > +++ b/include/net/route.h > > > > > > @@ -389,11 +389,11 @@ static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt, > > > > > > struct net_device *dev = rt->dst.dev; > > > > > > struct neighbour *neigh; > > > > > > - if (likely(rt->rt_gw_family == AF_INET)) { > > > > > > - neigh = ip_neigh_gw4(dev, rt->rt_gw4); > > > > > > - } else if (rt->rt_gw_family == AF_INET6) { > > > > > > + if (likely(rt->rt_gw_family == AF_INET6)) { > > > > > > neigh = ip_neigh_gw6(dev, &rt->rt_gw6); > > > > > > *is_v6gw = true; > > > > > > + } else if (rt->rt_gw_family == AF_INET) { > > > > > > + neigh = ip_neigh_gw4(dev, rt->rt_gw4); > > > > > > } else { > > > > > > neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr); > > > > > > } > > > > > > > > > > This is an IPv4 function allowing support for IPv6 addresses as a > > > > > nexthop. It is appropriate for IPv4 family checks to be first. > > > > > > > > Right. In which case is this called on IPv6 only systems? > > > > > > > > On my IPv6-only 200 systems, the annotated branch predictor is showing > > > > it is mispredicted 100% of the time. > > > > > > perf probe -a ip_neigh_for_gw; perf record -e probe:ip_neigh_for_gw -ag; > > > perf script > > > > > > should give you an hint. > > > > Thanks. That proved to be very useful. > > > > As I said above, all the hosts I have a webserver running, I see this > > that likely mispredicted. Same for this server: > > > > # cat /sys/kernel/tracing/trace_stat/branch_annotated | grep ip_neigh_for_gw > > correct incorrect % Function File Line > > 0 17127 100 ip_neigh_for_gw route.h 393 > > > > It is mostly coming from ip_finish_output2() and tcp_v4. Important to > > say that these machine has no IPv4 configured, except 127.0.0.1 > > (localhost). > > Now run the experiment on a typical server using IPv4 ? > > I would advise removing the likely() if it really bothers you. > (I doubt this has any impact) Thanks. I am mostly concerned about likely/unlikely() that are wrong 100% the time when running production workloads in modern hardware. I just got a few hundreds host to able to run annotated branches enabled, and I am looking on how it can help us to understand our code flow better. Regarding performance impact, I agree with you that performance is minimal (at least on x86). On other architectures, such as powerpc things can be more evident, given that the branch hint is encoded in the instruction itself, so, the hardware knows where to predict. That said, I think the best approach is just to remove the likely() from that code path. > But assuming everything is IPv6 is too soon. > > There are more obvious changes like : > > diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c > index b6e7d4921309741193a8c096aeb278255ec56794..445f4fe712603e8c14b1006ad4cbaac278bae4ea > 100644 > --- a/net/ipv4/ip_input.c > +++ b/net/ipv4/ip_input.c > @@ -462,7 +462,7 @@ static struct sk_buff *ip_rcv_core(struct sk_buff > *skb, struct net *net) > /* When the interface is in promisc. mode, drop all the crap > * that it receives, do not try to analyse it. > */ > - if (skb->pkt_type == PACKET_OTHERHOST) { > + if (unlikely(skb->pkt_type == PACKET_OTHERHOST)) { > dev_core_stats_rx_otherhost_dropped_inc(skb->dev); > drop_reason = SKB_DROP_REASON_OTHERHOST; > goto drop; > diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c > index 70c0e16c0ae6837d1c64d0036829c8b61799578b..3d0797afa499fa880eb5452a0dea8a23505b3e60 > 100644 > --- a/net/ipv6/ip6_input.c > +++ b/net/ipv6/ip6_input.c > @@ -153,7 +153,7 @@ static struct sk_buff *ip6_rcv_core(struct sk_buff > *skb, struct net_device *dev, > u32 pkt_len; > struct inet6_dev *idev; > > - if (skb->pkt_type == PACKET_OTHERHOST) { > + if (unlikely(skb->pkt_type == PACKET_OTHERHOST)) { > dev_core_stats_rx_otherhost_dropped_inc(skb->dev); > kfree_skb_reason(skb, SKB_DROP_REASON_OTHERHOST); > return NULL; Agree, that would be obvious changes also.
© 2016 - 2024 Red Hat, Inc.